From 860a2e947158e26d210cec94a1f962adcae27ff5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Timoth=C3=A9e=20Floure?= <timothee.floure@posteo.net> Date: Wed, 16 Oct 2019 18:26:47 +0200 Subject: [PATCH] Import instructions for forcomp assignment --- week5/00-homework5.md | 282 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 282 insertions(+) create mode 100644 week5/00-homework5.md diff --git a/week5/00-homework5.md b/week5/00-homework5.md new file mode 100644 index 0000000..5d6b96c --- /dev/null +++ b/week5/00-homework5.md @@ -0,0 +1,282 @@ +# Assignment 5: For-comprehensions and Collections + +In this assignment, you will solve the combinatorial problem of finding all +the anagrams of a sentence using the Scala Collections API and for-comprehensions. + +You are encouraged to look at the Scala API documentation while solving this +exercise, which can be found here: + +[http://www.scala-lang.org/api/current/index.html](http://www.scala-lang.org/api/current/index.html) + +Note that Scala uses the `String` from Java, therefore the documentation +for strings has to be looked up in the Javadoc API: + +[http://docs.oracle.com/javase/6/docs/api/java/lang/String.html](http://docs.oracle.com/javase/6/docs/api/java/lang/String.html) + +## Setup + +You can use the following commands to make a fresh clone of your repository: + +```shell +git clone -b forcomp git@gitlab.epfl.ch:lamp/student-repositories-f19/cs210-GASPAR.git cs210-forcomp +cd cs210-patmat +``` + +You can always refer to: + * [the example assignment](https://gitlab.epfl.ch/lamp/cs-210-functional-programming-2019/blob/master/week1/01-example.md) on the development workflow. + * [this guide](https://gitlab.epfl.ch/lamp/cs-210-functional-programming-2019/blob/master/week1/02-grading-and-submission.md) for details on the submission system. + **Make sure to submit your assignment before the deadline written in [README.md](/README.md)** + * [The documentation of the Scala standard library](https://www.scala-lang.org/files/archive/api/2.13.1) + * [The documentation of the Java standard library](https://docs.oracle.com/en/java/javase/11/docs/api/index.html) + +## The problem + +An anagram of a word is a rearrangement of its letters such that a word with +a different meaning is formed. For example, if we rearrange the letters of +the word `Elvis` we can obtain the word `lives`, which is one of its anagrams. + +In a similar way, an anagram of a sentence is a rearrangement of all the +characters in the sentence such that a new sentence is formed. The new +sentence consists of meaningful words, the number of which may or may not +correspond to the number of words in the original sentence. For example, +the sentence: + + I love you + +is an anagram of the sentence: + + You olive + +In this exercise, we will consider permutations of words anagrams of +the sentence. In the above example: + + You I love + +is considered a separate anagram. + +When producing anagrams, we will ignore character casing and +punctuation characters. + +Your ultimate goal is to implement a method `sentenceAnagrams`, which, +given a list of words representing a sentence, finds all the anagrams +of that sentence. Note that we used the term _meaningful_ in defining +what anagrams are. You will be given a dictionary, i.e. a list of words +indicating words that have a meaning. + +Here is the general idea. We will transform the characters of the sentence +into a list saying how often each character appears. We will call this +list _the occurrence list_. To find anagrams of a word we will find all +the words from the dictionary which have the same occurrence list. +Finding an anagram of a sentence is slightly more difficult. We will +transform the sentence into its occurrence list, then try to extract any +subset of characters from it to see if we can form any meaningful words. +From the remaining characters we will solve the problem recursively and +then combine all the meaningful words we have found with the recursive +solution. + +Let's apply this idea to our example, the sentence `You olive`. Lets +represent this sentence as an occurrence list of characters `eiloouvy`. We start +by subtracting some subset of the characters, say `i`. We are left with +the characters `eloouvy`. + +Looking into the dictionary we see that `i` corresponds to word `I` in +the English language, so we found one meaningful word. We now solve the +problem recursively for the rest of the characters `eloouvy` and obtain +a list of solutions `List(List(love, you), List(you, love))`. We can combine +`I` with that list to obtain sentences `I love you` and `I you love`, +which are both valid anagrams. + +## Representation + +We represent the words of a sentence with the `String` data type: + + type Word = String + +Words contain lowercase and uppercase characters, and no whitespace, +punctuation or other special characters. + +Since we are ignoring the punctuation characters of the sentence +as well as the whitespace characters, we will represent sentences +as lists of words: + + type Sentence = List[Word] + +We mentioned previously that we will transform words and sentences into +occurrence lists. We represent the occurrence lists as sorted lists of +character and integers pairs: + + type Occurrences = List[(Char, Int)] + +The list should be sorted by the characters in an ascending order. +Since we ignore the character casing, all the characters in the occurrence +list have to be lowercase. +The integer in each pair denotes how often the character appears in a +particular word or a sentence. This integer must be positive. Note that +positive also means non-zero -- characters that do not appear in the +sentence do not appear in the occurrence list either. + +Finally, the dictionary of all the meaningful English words is represented +as a `List` of words: + + val dictionary: List[Word] = loadDictionary + +The dictionary already exists for this exercise and is loaded for you using +the `loadDictionary` utility method. + +## Computing Occurrence Lists + +The `groupBy` method takes a function mapping an element of a collection to a +key of some other type, and produces a `Map` of keys and collections of +elements which mapped to the same key. This method _groups_ the elements, +hence its name. + +Here is one example: + + List("Every", "student", "likes", "Scala").groupBy((element: String) => element.length) + +produces: + + Map( + 5 -> List("Every", "likes", "Scala"), + 7 -> List("student") + ) + +Above, the key is the `length` of the string and the type of the key is `Int`. Every +`String` with the same `length` is grouped under the same key -- its `length`. + +Here is another example: + + List(0, 1, 2, 1, 0).groupBy((element: Int) => element) + +produces: + + Map( + 0 -> List(0, 0), + 1 -> List(1, 1), + 2 -> List(2) + ) + +`Map`s provide efficient lookup of all the values mapped to a certain key. Any collection +of pairs can be transformed into a `Map` using the `toMap` method. Similarly, any `Map` can +be transformed into a `List` of pairs using the `toList` method. + +In our case, the collection will be a `Word` (i.e. a `String`) and its elements are +characters, so the `groupBy` method takes a function mapping characters into a desired +key type. + +In the first part of this exercise, we will implement the method `wordOccurrences` +which, given a word, produces its occurrence list. In one of the previous exercises, +we produced the occurrence list by recursively traversing a list of characters. +This time we will use the `groupBy` method from the Collections API (hint: you +may additionally use other methods, such as `map` and `toList`). + + def wordOccurrences(w: Word): Occurrences + +Next, we implement another version of the method for entire sentences. +We can concatenate the words of the sentence into a single word and then reuse +the method `wordOccurrences` that we already have. + + def sentenceOccurrences(s: Sentence): Occurrences + +## Computing Anagrams of a Word + +To compute the anagrams of a word, we use the simple observation that all the anagrams +of a word have the same occurrence list. To allow efficient lookup of all the words +with the same occurrence list, we will have to _group_ the words of the dictionary +according to their occurrence lists. + + lazy val dictionaryByOccurrences: Map[Occurrences, List[Word]] + +We then implement the method `wordAnagrams` which returns the list of anagrams of +a single word: + + def wordAnagrams(word: Word): List[Word] + +## Computing Subsets of a Set + +To compute all the anagrams of a sentence, we will need a helper method which, +given an occurrence list, produces all the subsets of that occurrence list. + + def combinations(occurrences: Occurrences): List[Occurrences] + +The `combinations` method should return all possible ways in which we can pick +a subset of characters from `occurrences`. For example, given the occurrence list: + + List(('a', 2), ('b', 2)) + +the list of all subsets is: + + List( + List(), + List(('a', 1)), + List(('a', 2)), + List(('b', 1)), + List(('a', 1), ('b', 1)), + List(('a', 2), ('b', 1)), + List(('b', 2)), + List(('a', 1), ('b', 2)), + List(('a', 2), ('b', 2)) + ) + +The order in which you return the subsets does not matter as long as they are +all included. Note that there is only one subset of an empty occurrence list, +and that is the empty occurrence list itself. + +Hint: investigate how you can use for-comprehensions to implement parts of this method. + +## Computing Anagrams of a Sentence + +We now implement another helper method called `subtract` which, given two occurrence +lists `x` and `y`, subtracts the frequencies of the occurrence list `y` from the +frequencies of the occurrence list `x`: + + def subtract(x: Occurrences, y: Occurrences): Occurrences + +For example, given two occurrence lists for words `lard` and `r`: + + val x = List(('a', 1), ('d', 1), ('l', 1), ('r', 1)) + val y = List(('r', 1)) + +the `subtract(x, y)` is `List(('a', 1), ('d', 1), ('l', 1))`. + +The precondition for the `subtract` method is that the occurrence list `y` is +a subset of the occurrence list `x` -- if the list `y` has some character then +the frequency of that character in `x` must be greater or equal than the +frequency of that character in `y`. +When implementing `subtract` you can assume that `y` is a subset of `x`. + +Hint: you can use `foldLeft`, and `-`, `apply` and `updated` operations on `Map`. + +Now we can finally implement our `sentenceAnagrams` method for sequences. + + def sentenceAnagrams(sentence: Sentence): List[Sentence] + +Note that the anagram of the empty sentence is the empty sentence itself. + +Hint: First of all, think about the recursive structure of the problem: what +is the base case, and how should the result of a recursive invocation be integrated +in each iteration? Also, using for-comprehensions helps in finding an elegant +implementation for this method. + +Test the `sentenceAnagrams` method on short sentences, no more than 10 characters. +The combinations space gets huge very quickly as your sentence gets longer, +so the program may run for a very long time. However for sentences such as +`Linux rulez`, `I love you` or `Mickey Mouse` the program should end fairly +quickly -- there are not many other ways to say these things. + + +## Further Improvement (Optional) + +This part is optional and is not part of an assignment, nor will be graded. +You may skip this part freely. + +The solution with enlisting all the combinations was concise, but it was not very efficient. +The problem is that we have recomputed some anagrams more than once when recursively +solving the problem. +Think about a concrete example and a situation where you compute the anagrams of the same +subset of an occurrence list multiple times. + +One way to improve the performance is to save the results obtained the first time +when you compute the anagrams for an occurence list, and use the stored result if +you need the same result a second time. +Try to write a new method `sentenceAnagramsMemo` which does this. -- GitLab