Import handout for homework 4

90ce62f3 · Timothée Floure · f75eaf04 · 90ce62f3
Commit 90ce62f3 authored 5 years ago by Timothée Floure
--- a/week4/00-homework4.md
+++ b/week4/00-homework4.md
+# Assignment 4: Pattern Matching (Huffman)
+
+## Setup
+
+You can use the following commands to make a fresh clone of your repository:
+
+```shell
+git clone -b patmat git@gitlab.epfl.ch:lamp/student-repositories-f19/cs210-GASPAR.git cs210-patmat
+cd cs210-patmat
+```
+
+You can always refer to:
+  * [the example assignment](https://gitlab.epfl.ch/lamp/cs-210-functional-programming-2019/blob/master/week1/01-example.md) on the development workflow.
+  * [this guide](https://gitlab.epfl.ch/lamp/cs-210-functional-programming-2019/blob/master/week1/02-grading-and-submission.md) for details on the submission system.
+    **Make sure to submit your assignment before the deadline written in [README.md](/README.md)**
+  * [The documentation of the Scala standard library](https://www.scala-lang.org/files/archive/api/2.13.1)
+  * [The documentation of the Java standard library](https://docs.oracle.com/en/java/javase/11/docs/api/index.html)
+
+
+## Overview
+
+Huffman coding is a compression algorithm that can be used to
+compress lists of characters.
+
+In a normal, uncompressed text, each character is represented by the same number of bits (usually eight).
+In Huffman coding, each character can have a bit representation of a different length, depending on how common a character is: the characters that appear often in a text are represented by a shorter bit sequence than those being used more rarely.
+Every huffman code defines the specific bit sequences used to represent each character.
+
+A Huffman code can be represented by a binary tree whose leaves represent the characters that should be encoded.
+The code tree below can represent the characters `A` to `H`.
+
+The leaf nodes have associated with them a weight which denotes the frequency of appearance of that character.
+In the example below, the character `A` has the highest weight 8, while `F` for example has weight 1.
+
+Every branching node of the code tree can be thought of as a set containing the characters present in the leaves below it. The weight of a branching node is the total weight of the leaves below it: this information is necessary for the construction of the tree.
+
+![Example Huffman tree](http://spark-public.s3.amazonaws.com/progfun/images/huffman-table.png)
+
+Note that a given encoding is only optimal if the character frequencies in the encoded text match the weights in the code tree.
+
+Finally, observe the recursive structure of the coding tree: every sub-tree is itself a valid code tree for a smaller alphabet.
+
+### Encoding
+
+For a given Huffman tree, one can obtain the encoded
+representation of a character by traversing from the root of the
+tree to the leaf containing the character. Along the way, when a
+left branch is chosen, a `0` is added to the representation,
+and when a right branch is chosen, `1` is added to the representation.
+Thus, for the Huffman tree above, the character `D` is encoded as
+`1011`.
+
+### Decoding
+
+Decoding also starts at the root of the tree. Given a sequence of
+bits to decode, we successively read the bits, and for each 0, we
+choose the left branch, and for each 1 we choose the right branch.
+When we reach a leaf, we decode the corresponding character and then
+start again at the root of the tree. As an example, given the
+Huffman tree above, the sequence of bits, `10001010` corresponds to
+`BAC`.
+
+## Implementation 
+
+In Scala, a Huffman tree can be represented as follows:
+
+    abstract class CodeTree
+    case class Fork (left: CodeTree, right: CodeTree, chars: List[Char], weight: Int) extends CodeTree
+    case class Leaf(char: Char, weight: Int) extends CodeTree
+
+To begin, implement the following two (hint: very simple) functions using pattern matches on the code tree:
+
+1. `weight` which returns the total weight of a given Huffman tree.  
+`def weight(tree: CodeTree): Int = tree match ...`
+2. `chars` which returns the list of characters defined in a given Huffman tree.  
+`def chars(tree: CodeTree): List[Char] = tree match ...`
+
+Using these functions, it's possible to define `makeCodeTree`, a function
+which facilitates the creation of Huffman trees by automatically calculating
+the list of characters and the weight when creating a node.
+This function is already implemented in the handout template:
+
+    def makeCodeTree(left: CodeTree, right: CodeTree) =
+      Fork(left, right, chars(left) ::: chars(right), weight(left) + weight(right))
+
+
+Using `makeCodeTree`, code trees can be constructed manually in the following way:
+
+    val sampleTree = makeCodeTree(
+      makeCodeTree(Leaf('x', 1), Leaf('e', 1)),
+      Leaf('t', 2)
+    )
+
+## Constructing Huffman Trees
+
+Given a text, it's possible to calculate and build an optimal Huffman
+tree in the sense that the encoding of that text will be of the minimum
+possible length, meanwhile keeping all information (i.e., it is lossless).
+
+To obtain an optimal tree from a list of characters, you have to define a function `createCodeTree` with the following signature:
+
+    def createCodeTree(chars: List[Char]): CodeTree = ...
+
+Proceed with the following steps to break up this assignment into smaller parts (the handout template contains more detailed documentation):
+
+1. Begin by writing a function `times` which calculates the frequency
+of each character in the text:  
+`def times(chars: List[Char]): List[(Char, Int)] = ...`
+2. Then, write a function `makeLeafList` which generates a list containing
+all the leaves of the Huffman tree to be constructed (the case `Leaf` of
+the algebraic datatype `CodeTree`).
+The list should be ordered by ascending weights where the
+weight of a leaf is the number of times (or the frequency) it appears in
+the given text.  
+`def makeOrderedLeafList(freqs: List[(Char, Int)]): List[Leaf] = ...`
+3. Write a simple function `singleton` which checks whether a list of code trees contains only one single tree.  
+`def singleton(trees: List[CodeTree]): Boolean = ...`
+4. Write a function `combine` which (1) removes the two trees with the
+lowest weight from the list constructed in the previous step, and (2)
+merges them by creating a new node of type `Fork`. Add this new tree
+to the list - which is now one element shorter - while preserving the
+order (by weight).  
+`def combine(trees: List[CodeTree]): List[CodeTree] = ...`
+5. Write a function `until` which calls the two functions defined above
+until this list contains only a single tree. This tree is the optimal
+coding tree. The function `until` can be used in the following way:  
+`until(singleton, combine)(trees)`  
+where the argument `trees` is of the type `List[CodeTree]`.
+6. Finally, use the functions defined above to implement the function
+`createCodeTree` which respects the signature shown above.
+
+## Decoding
+
+Define the function `decode` which decodes a list of bits (which were
+already encoded using a Huffman tree), given the corresponding coding
+tree.
+
+    type Bit = Int
+    def decode(tree: CodeTree, bits: List[Bit]): List[Char] = ...
+
+Use this function and the `frenchCode` code tree to decode the bit sequence in `secret`.
+Store the resulting character sequence in `decodedSecret`.
+
+## Encoding
+
+This section deals with the Huffman encoding of a sequence of characters
+into a sequence of bits.
+
+### ...Using a Huffman Tree
+
+Define the function `encode` which encodes a list of characters using
+Huffman coding, given a code tree.
+
+    def encode(tree: CodeTree)(text: List[Char]): List[Bit] = ...
+
+Your implementation must traverse the coding tree for each character,
+a task that should be done using a helper function.
+
+### ...Using a Coding Table
+
+The previous function is simple, but very inefficient. You goal is now
+to define `quickEncode` which encodes an equivalent representation, but
+more efficiently.
+
+    def quickEncode(tree: CodeTree)(text: List[Char]): List[Bit] = ...
+
+Your implementation will build a coding table once which, for each possible
+character, gives the list of bits of its code. The simplest way - but not
+the most efficient - is to encode the table of characters as a list of pairs.
+
+    type CodeTable = List[(Char, List[Bit])]
+
+The encoding must then be done by accessing the table, via a function
+`codeBits`.
+
+    def codeBits(table: CodeTable)(char: Char): List[Bit] = ...
+
+The creation of the table is defined by `convert` which traverses the coding
+tree and constructs the character table.
+
+    def convert(t: CodeTree): CodeTable = ...
+
+Implement the function `convert` by using the function `mergeCodeTables` below:
+
+    def mergeCodeTables(a: CodeTable, b: CodeTable): CodeTable = ...