diff --git a/README.md b/README.md index 4ce61ffd87e8b747e0d497b410b0cc343778da68..0d2260d06cb8ee46b31b8fb30d9a468f722664f7 100644 --- a/README.md +++ b/README.md @@ -68,7 +68,7 @@ My goal is for lectures to be self-contained. The following books contain overla | | Wed | 26.10.2022 | 08:15 | INM202 | Labs | [Parser Lab](labs/lab03/) | | | Thu | 27.10.2022 | 08:15 | INM202 | Exercises | [Exercises on Operational Semantics](exercises/ex5/) | | 7 | Mon | 31.10.2022 | 13:15 | INM200 | Lecture 8 | [Type Rules, Progress, Preservation](https://tube.switch.ch/videos/bdb5c902), [Type Inference](https://tube.switch.ch/videos/14facfc5) | -| | Wed | 02.11.2022 | 08:15 | INM202 | Labs | Type Checking Lab | +| | Wed | 02.11.2022 | 08:15 | INM202 | Labs | [Parser Lab](labs/lab03/), [Type Checking Lab](labs/lab04/) | | | Wed | 03.11.2022 | 08:15 | INM202 | Exercises | Exercises | | 8 | Mon | 07.11.2022 | 13:15 | INM200 | Lecture 9 | Finish [Type Inference](https://tube.switch.ch/videos/14facfc5). [Compilation to Web Assembly](https://tube.switch.ch/videos/fd21d42e) | | 9 | Mon | 14.11.2022 | **13:00** | [INM 200](https://plan.epfl.ch/?room==INM%20200) + [SG0211](https://plan.epfl.ch/?room==SG%200211) | **MIDTERM** | Materials of lectures 1-9 | diff --git a/labs/lab04/lab04-description.md b/labs/lab04/lab04-description.md new file mode 100644 index 0000000000000000000000000000000000000000..e0cfac3b950d56f8dc948a5a5953cc4d5beb38a7 --- /dev/null +++ b/labs/lab04/lab04-description.md @@ -0,0 +1,147 @@ +# Lab 04: Type Checker ([Slides](lab04-slides.pdf)) + +Parsing concludes the syntactical analysis of Amy programs. Having +successfully constructed an abstract syntax tree for an input program, +compilers typically run one or multiple phases containing checks of a +more semantical nature. Virtually all high-level programming languages +enjoy some form of name analysis, whose purpose is to disambiguate +symbol references throughout the program. Some languages go further and +perform a series of additional checks whose goal is to rule out runtime +errors statically (i.e., during compilation, or in other words, without +executing the program). While the exact rules for those checks vary from +language to language, this part of compilation is typically summarized +as \"type checking\". Amy, being a statically-typed language, requires +both name and type analysis. + +## Prelude: From Nominal to Symbolic Trees + +Recall that during parsing we created (abstract syntax) trees of the +*nominal* sort: Names of variables, functions and data types were simply +stored as strings. However, two names used in the program could be the +same, but not refer to one and the same \"thing\" at runtime. During +name analysis we translate from nominal trees to symbolic ones, to make +it clear whether two names refer to one and the same underlying entity. +That is, we explicitly replace strings by fresh identifiers which will +prevent us from mixing up definitions of the same name, or referring to +things that have not been defined. Amy\'s name analyzer is provided to +you as part of this lab\'s skeleton, but you should read the [dedicated +name analyzer page](material/NameAnalysis.md) to understand how it works. + +## Introduction to Type Checking + +The purpose of this lab is to implement a type checker for Amy. Our type +checking rules will prevent certain errors based on the kind or shape of +values that the program is manipulating. For instance, we should prevent +an integer from being added to a boolean value. + +Type checking is the last stage of the compiler frontend. Every program +that reaches the end of this stage without an error is correct (as far +as the compiler is concerned), and every program that does not is wrong. +After type checking we are finally ready to interpret the program or +compile it to binary code! + +Typing rules for Amy are presented in detail in the +[Amy specification](/labs/amy-specification/amy-specification.pdf). Make sure to check correct +typing for all expressions and patterns. + +## Implementation + +The current assignment focuses on the file `TypeChecker.scala`. As +usual, the skeleton and helper methods are given to you, and you will +have to complete the missing parts. In particular, you will write a +compiler phase that checks whether the expressions in a given program +are well-typed and report errors otherwise. + +To this end you will implement a simplified form of the Hindley-Milner +(HM) type-inference algorithm that you\'ll hear about during the +lectures. Note that while not advertised as a feature to users of Amy, +behind the scenes we will perform type inference. It is usually +straightforward to adapt an algorithm for type inference to type +checking, since one can add the user-provided type annotations to the +set of constraints. This is what you will do with HM in this lab. + +Compared to the presentation of HM type inference in class your type +checker can be simplified in another way: Since Amy does not feature +higher-order functions or polymorphic data types, types in Amy are +always *simple* in the sense that they are not composed of arbitrary +other types. That is, a type is either a base type (one of `Int`, `Bool` +and `String`) or it is an ADT, which has a proper name (e.g. `List` or +`Option` from the standard library). In the latter case, all the types +in the constructor of the ADT are immediately known. For instance, the +standard library\'s `List` is really a list of integers, so we know that +the `Cons` constructor takes an `Int` and another `List`. + +As a result, your algorithm will never have to deal with complex +constraints over type constructors (such as the function arrow +`A => B`). Instead, your constraints will always be of the form +`T1 = T2` where `T1` and `T2` are either *simple* types or type +variables. This is most important during unification, which otherwise +would have to deal with complex types separately. + +Your task now is to a) complete the `genConstraints` method which will +traverse a given expression and collect all the necessary typing +constraints, and b) implement the *unification* algorithm as +`solveConstraints`. + +Familiarize yourself with the `Constraint` and `TypeVariable` data +structures in `TypeChecker.scala` and then start by implementing +`genConstraints`. The structure of this method will in many cases be +analogous to the AST traversal you wrote for the name analyzer. Note +that `genConstraints` also takes an *expected type*. For instance, in +case of addition the expected type of both operands should be `Int`. For +other constructs, such as pattern `match`es it is not inherently clear +what should be the type of each `case` body. In this case you can create +and pass a fresh type variable. + +Once you have a working implementation of both `genConstraints` and +`solveConstraints` you can copy over your previous work on the +interpreter and run the programs produced by your frontend! Don\'t +forget that to debug your compiler\'s behavior you can also use the +reference compiler with the `--interpret` flag and then compare the +output. + +## Skeleton + +As usual, you can find the skeleton for this lab in a new branch of your +group\'s repository. After merging it with your existing work, the +structure of your project `src` directory should be as follows: + + src/amyc + ├── Main.scala (updated) + │ + ├── analyzer (new) + │ ├── SymbolTable.scala + │ ├── NameAnalyzer.scala + │ └── TypeChecker.scala + │ + ├── ast + │ ├── Identifier.scala + │ ├── Printer.scala + │ └── TreeModule.scala + │ + ├── interpreter + │ └── Interpreter.scala + │ + ├── lib + │ ├── scallion_3.0.6.jar + │ └── silex_3.0.6.jar + │ + ├── parsing + │ ├── Parser.scala + │ ├── Lexer.scala + │ └── Tokens.scala + │ + └── utils + ├── AmycFatalError.scala + ├── Context.scala + ├── Document.scala + ├── Pipeline.scala + ├── Position.scala + ├── Reporter.scala + └── UniqueCounter.scala + +## Deliverables +Deadline: **Wednesday November 16 at 11pm**. + +Submission: push the solved lab 4 to the branch `clplab4` that was created on your Gitlab repo. Do not push the changes to other branches! It may interfere with your previous submissions. +You may want to copy the files you changed directly to the new branch, since the two branches don't share a history in git. diff --git a/labs/lab04/lab04-slides.pdf b/labs/lab04/lab04-slides.pdf new file mode 100644 index 0000000000000000000000000000000000000000..b5e27f988fc4591a83322d59f0ae4a937771c661 Binary files /dev/null and b/labs/lab04/lab04-slides.pdf differ diff --git a/labs/lab04/material/NameAnalysis.md b/labs/lab04/material/NameAnalysis.md new file mode 100644 index 0000000000000000000000000000000000000000..4c23df11f0d2bbe92ec8c2c6e4afbd476835ae7c --- /dev/null +++ b/labs/lab04/material/NameAnalysis.md @@ -0,0 +1,33 @@ +# Name Analysis + +In the following, we will briefly discuss the purpose and implementation of the name analyzer phase in Amy. Name analysis has three goals: + * To reject programs that do not follow the Amy naming rules. + * For correct programs, to assign a unique identifier to every name. Remember that trees coming out of the parser contain plain strings wherever a name is expected. This might lead to confusion as to what each name refers to. Therefore, during name analysis, we assign a unique identifier to each name at its definition. Later in the program, every reference to that name will use the same unique identifier. + * To populate the symbol table. The symbol table contains a mapping from identifiers to all information that you could need later in the program for that identifier. For example, for each constructor, the symbol table contains an entry with the argument types, parent, and an index for this constructor. + +After name analysis, only name-correct programs should survive, and they should contain unique identifiers that correspond to the correct symbol in the program. + +You can always look at the expected output of name analysis for a given program by invoking the reference compiler with the `--printNames` option. + +## The Symbol Table +The symbol table contains information for all kinds of entities in the program. In the first half of name analysis, we discover all definitions of symbols, assign each of them a fresh identifier, and store these identifier-definition entries in the symbol table. + +The `SymbolTable` API contains three kinds of methods: + * `addX` methods will add a new object to the symbol table. Among other things, these methods turn the strings found in nominal trees into the fresh `Identifier`s we will use to construct symbolic trees. + * `getX` methods which take an `Identifier` as an argument. This is what you will be using to resolve symbols you find in the program, for example, during type checking. + * `getX` methods which take two strings as arguments. These are only useful for name analysis and should not be used later: since during name analysis unique identifiers have not been assigned to everything from the start, sometimes our compiler will need to look up a definition based on its name and the name of its containing module. Of course you should not use these methods once you already have an identifier (in particular, not during type checking). + +## The different tree modules +It is time to talk in detail about the different tree modules in the `TreeModule` file. As explained earlier, our goal is to define two very similar tree modules, with the only difference being how a (qualified) name is represented: In a *nominal* tree, i.e. one coming out of the parser, names are plain strings and qualified names are pairs of strings. On the other hand, in a *symbolic* tree, both kinds of names are unique identifiers. + +To represent either kind of tree, we define a single Scala trait called `TreeModule` which defines two *abstract type fields* `Name` and `QualifiedName`. This trait also defines all types we need to represent Amy ASTs. Many of these types depend on the abstract types. + +These abstract types are filled in when we instantiate the trait. Further down in the same file you can see that we define two objects `NominalTreeModule` and `SymbolicTreeModule`, which instantiate the abstract types. In addition all types within `TreeModule` are conceptually defined separately in each of the two implementations. As a result, there is a type called `NominalTreeModule.Ite` which is *different* from the type called `SymbolicTreeModule.Ite`. + + +## The NameAnalyzer class +The `NameAnalyzer` class implements Amy's naming rules (section 3.4 of the Amy specification). It takes a nominal program as an input and produces a symbol table and a symbolic program. + +Name analysis is split into well-defined steps. The idea is the following: we first discover all definitions in the program in the correct order, i.e., modules, types, constructors, and, finally, functions. We then rewrite function bodies and expressions to refer to the newly-introduced identifiers. + +Notice how name analysis takes as input the `NominalTreeModule.Program` output by the Parser, and returns a `SymbolicTreeModule.Program` along with a populated symbol table. During the last step we therefore transform the program and each of its subtrees from `NominalTreeModule.X` into `SymbolicTreeModule.X`. For instance, a `NominalTreeModule.Program` will be transformed into a `SymbolicTreeModule.Program`, a `NominalTreeModule.Ite` into a `SymbolicTreeModule.Ite` and so forth. To save some typing, we have imported NominalTreeModule as `N` and SymbolicTreeModule as `S`. So to refer e.g. to a `Plus` in the original (nominal) tree module we can simply use `N.Plus` -- to refer to one in the symbolic tree module we can use `S.Plus`. \ No newline at end of file