Import instructions for forcomp assignment

860a2e94 · Timothée Floure · cfdbace2 · 860a2e94
Commit 860a2e94 authored 5 years ago by Timothée Floure
--- a/week5/00-homework5.md
+++ b/week5/00-homework5.md
+# Assignment 5: For-comprehensions and Collections
+
+In this assignment, you will solve the combinatorial problem of finding all
+the anagrams of a sentence using the Scala Collections API and for-comprehensions.
+
+You are encouraged to look at the Scala API documentation while solving this
+exercise, which can be found here:
+
+[http://www.scala-lang.org/api/current/index.html](http://www.scala-lang.org/api/current/index.html)
+
+Note that Scala uses the `String` from Java, therefore the documentation
+for strings has to be looked up in the Javadoc API:
+
+[http://docs.oracle.com/javase/6/docs/api/java/lang/String.html](http://docs.oracle.com/javase/6/docs/api/java/lang/String.html)
+
+## Setup
+
+You can use the following commands to make a fresh clone of your repository:
+
+```shell
+git clone -b forcomp git@gitlab.epfl.ch:lamp/student-repositories-f19/cs210-GASPAR.git cs210-forcomp
+cd cs210-patmat
+```
+
+You can always refer to:
+  * [the example assignment](https://gitlab.epfl.ch/lamp/cs-210-functional-programming-2019/blob/master/week1/01-example.md) on the development workflow.
+  * [this guide](https://gitlab.epfl.ch/lamp/cs-210-functional-programming-2019/blob/master/week1/02-grading-and-submission.md) for details on the submission system.
+    **Make sure to submit your assignment before the deadline written in [README.md](/README.md)**
+  * [The documentation of the Scala standard library](https://www.scala-lang.org/files/archive/api/2.13.1)
+  * [The documentation of the Java standard library](https://docs.oracle.com/en/java/javase/11/docs/api/index.html)
+
+## The problem
+
+An anagram of a word is a rearrangement of its letters such that a word with
+a different meaning is formed. For example, if we rearrange the letters of
+the word `Elvis` we can obtain the word `lives`, which is one of its anagrams.
+
+In a similar way, an anagram of a sentence is a rearrangement of all the
+characters in the sentence such that a new sentence is formed. The new
+sentence consists of meaningful words, the number of which may or may not
+correspond to the number of words in the original sentence. For example,
+the sentence:
+
+    I love you
+
+is an anagram of the sentence:
+
+    You olive
+
+In this exercise, we will consider permutations of words anagrams of
+the sentence. In the above example:
+
+    You I love
+
+is considered a separate anagram.
+
+When producing anagrams, we will ignore character casing and
+punctuation characters.
+
+Your ultimate goal is to implement a method `sentenceAnagrams`, which,
+given a list of words representing a sentence, finds all the anagrams
+of that sentence. Note that we used the term _meaningful_ in defining
+what anagrams are. You will be given a dictionary, i.e. a list of words
+indicating words that have a meaning.
+
+Here is the general idea. We will transform the characters of the sentence
+into a list saying how often each character appears. We will call this
+list _the occurrence list_. To find anagrams of a word we will find all
+the words from the dictionary which have the same occurrence list.
+Finding an anagram of a sentence is slightly more difficult. We will
+transform the sentence into its occurrence list, then try to extract any
+subset of characters from it to see if we can form any meaningful words.
+From the remaining characters we will solve the problem recursively and
+then combine all the meaningful words we have found with the recursive
+solution.
+
+Let's apply this idea to our example, the sentence `You olive`. Lets
+represent this sentence as an occurrence list of characters `eiloouvy`. We start
+by subtracting some subset of the characters, say `i`. We are left with
+the characters `eloouvy`.
+
+Looking into the dictionary we see that `i` corresponds to word `I` in
+the English language, so we found one meaningful word. We now solve the
+problem recursively for the rest of the characters `eloouvy` and obtain
+a list of solutions `List(List(love, you), List(you, love))`. We can combine
+`I` with that list to obtain sentences `I love you` and `I you love`,
+which are both valid anagrams.
+
+## Representation
+
+We represent the words of a sentence with the `String` data type:
+
+    type Word = String
+
+Words contain lowercase and uppercase characters, and no whitespace,
+punctuation or other special characters.
+
+Since we are ignoring the punctuation characters of the sentence
+as well as the whitespace characters, we will represent sentences
+as lists of words:
+
+    type Sentence = List[Word]
+
+We mentioned previously that we will transform words and sentences into
+occurrence lists. We represent the occurrence lists as sorted lists of
+character and integers pairs:
+
+    type Occurrences = List[(Char, Int)]
+
+The list should be sorted by the characters in an ascending order.
+Since we ignore the character casing, all the characters in the occurrence
+list have to be lowercase.
+The integer in each pair denotes how often the character appears in a
+particular word or a sentence. This integer must be positive. Note that
+positive also means non-zero -- characters that do not appear in the
+sentence do not appear in the occurrence list either.
+
+Finally, the dictionary of all the meaningful English words is represented
+as a `List` of words:
+
+    val dictionary: List[Word] = loadDictionary
+
+The dictionary already exists for this exercise and is loaded for you using
+the `loadDictionary` utility method.
+
+## Computing Occurrence Lists
+
+The `groupBy` method takes a function mapping an element of a collection to a
+key of some other type, and produces a `Map` of keys and collections of
+elements which mapped to the same key. This method _groups_ the elements,
+hence its name.
+
+Here is one example:
+
+    List("Every", "student", "likes", "Scala").groupBy((element: String) => element.length)
+
+produces:
+
+    Map(
+      5 -> List("Every", "likes", "Scala"),
+      7 -> List("student")
+    )
+
+Above, the key is the `length` of the string and the type of the key is `Int`. Every
+`String` with the same `length` is grouped under the same key -- its `length`.
+
+Here is another example:
+
+    List(0, 1, 2, 1, 0).groupBy((element: Int) => element)
+
+produces:
+
+    Map(
+      0 -> List(0, 0),
+      1 -> List(1, 1),
+      2 -> List(2)
+    )
+
+`Map`s provide efficient lookup of all the values mapped to a certain key. Any collection
+of pairs can be transformed into a `Map` using the `toMap` method. Similarly, any `Map` can
+be transformed into a `List` of pairs using the `toList` method.
+
+In our case, the collection will be a `Word` (i.e. a `String`) and its elements are
+characters, so the `groupBy` method takes a function mapping characters into a desired
+key type.
+
+In the first part of this exercise, we will implement the method `wordOccurrences`
+which, given a word, produces its occurrence list. In one of the previous exercises,
+we produced the occurrence list by recursively traversing a list of characters.
+This time we will use the `groupBy` method from the Collections API (hint: you
+may additionally use other methods, such as `map` and `toList`).
+
+    def wordOccurrences(w: Word): Occurrences
+
+Next, we implement another version of the method for entire sentences.
+We can concatenate the words of the sentence into a single word and then reuse
+the method `wordOccurrences` that we already have.
+
+    def sentenceOccurrences(s: Sentence): Occurrences
+
+## Computing Anagrams of a Word
+
+To compute the anagrams of a word, we use the simple observation that all the anagrams
+of a word have the same occurrence list. To allow efficient lookup of all the words
+with the same occurrence list, we will have to _group_ the words of the dictionary
+according to their occurrence lists.
+
+    lazy val dictionaryByOccurrences: Map[Occurrences, List[Word]]
+
+We then implement the method `wordAnagrams` which returns the list of anagrams of
+a single word:
+
+    def wordAnagrams(word: Word): List[Word]
+
+## Computing Subsets of a Set
+
+To compute all the anagrams of a sentence, we will need a helper method which,
+given an occurrence list, produces all the subsets of that occurrence list.
+
+    def combinations(occurrences: Occurrences): List[Occurrences]
+
+The `combinations` method should return all possible ways in which we can pick
+a subset of characters from `occurrences`. For example, given the occurrence list:
+
+    List(('a', 2), ('b', 2))
+
+the list of all subsets is:
+
+    List(
+      List(),
+      List(('a', 1)),
+      List(('a', 2)),
+      List(('b', 1)),
+      List(('a', 1), ('b', 1)),
+      List(('a', 2), ('b', 1)),
+      List(('b', 2)),
+      List(('a', 1), ('b', 2)),
+      List(('a', 2), ('b', 2))
+    )
+
+The order in which you return the subsets does not matter as long as they are
+all included. Note that there is only one subset of an empty occurrence list,
+and that is the empty occurrence list itself.
+
+Hint: investigate how you can use for-comprehensions to implement parts of this method.
+
+## Computing Anagrams of a Sentence
+
+We now implement another helper method called `subtract` which, given two occurrence
+lists `x` and `y`, subtracts the frequencies of the occurrence list `y` from the
+frequencies of the occurrence list `x`:
+
+    def subtract(x: Occurrences, y: Occurrences): Occurrences
+
+For example, given two occurrence lists for words `lard` and `r`:
+
+    val x = List(('a', 1), ('d', 1), ('l', 1), ('r', 1))
+    val y = List(('r', 1))
+
+the `subtract(x, y)` is `List(('a', 1), ('d', 1), ('l', 1))`.
+
+The precondition for the `subtract` method is that the occurrence list `y` is
+a subset of the occurrence list `x` -- if the list `y` has some character then
+the frequency of that character in `x` must be greater or equal than the
+frequency of that character in `y`.
+When implementing `subtract` you can assume that `y` is a subset of `x`.
+
+Hint: you can use `foldLeft`, and `-`, `apply` and `updated` operations on `Map`.
+
+Now we can finally implement our `sentenceAnagrams` method for sequences.
+
+    def sentenceAnagrams(sentence: Sentence): List[Sentence]
+
+Note that the anagram of the empty sentence is the empty sentence itself.
+
+Hint: First of all, think about the recursive structure of the problem: what
+is the base case, and how should the result of a recursive invocation be integrated
+in each iteration? Also, using for-comprehensions helps in finding an elegant
+implementation for this method.
+
+Test the `sentenceAnagrams` method on short sentences, no more than 10 characters.
+The combinations space gets huge very quickly as your sentence gets longer,
+so the program may run for a very long time. However for sentences such as
+`Linux rulez`, `I love you` or `Mickey Mouse` the program should end fairly
+quickly -- there are not many other ways to say these things.
+
+
+## Further Improvement (Optional)
+
+This part is optional and is not part of an assignment, nor will be graded.
+You may skip this part freely.
+
+The solution with enlisting all the combinations was concise, but it was not very efficient.
+The problem is that we have recomputed some anagrams more than once when recursively
+solving the problem.
+Think about a concrete example and a situation where you compute the anagrams of the same
+subset of an occurrence list multiple times.
+
+One way to improve the performance is to save the results obtained the first time
+when you compute the anagrams for an occurence list, and use the stored result if
+you need the same result a second time.
+Try to write a new method `sentenceAnagramsMemo` which does this.