Skip to content
Snippets Groups Projects
Commit 268e7ada authored by Noe Eric De Santo's avatar Noe Eric De Santo
Browse files

Add labs 4-6 handouts

parent 45caae84
No related branches found
No related tags found
No related merge requests found
## Compiler Extension Presentation Instructions
Background presentations will take place in week 14. We strongly
recommend that you pre-record your presentation. **[You should upload
your talk on SwitchTube](https://tube.switch.ch/channels/c1d660a4)**
(the precise channel will be linked here soon). However, if you prefer,
you can also live stream your presentation, but in that case you are
responsible if the presentation does not reach your audience due to
network quality issues.
**The presentation should be 10 minutes long.**
**Q&A session of 5-10 minutes** will follow right after the
presentation. Please make sure at least one of you is available for the
entire 20 minute slot.
**We would like each member of the group to be part of the
presentation.**
Shortly after, you will receive feedback from us regarding the content
of your presentation, as well as some general feedback on the form.
### Presentation content
Your presentation should summarize your project. In particular, we\'d
expect to see
- a basic overview of the features you added to the compiler/language
- some (short) programs highlighting the use of these features, with a
description of how your extended compiler behaves on them
- possibly some theoretical background you had to learn about to
implement the extension
- an overview of the changes you made to each compiler phase and/or
which phases you added
### Presentation style
Here are some useful resources on how to prepare and give talks:
- [How To Speak by Patrick
Winston](https://www.youtube.com/watch?v=Unzc731iCUY)
- [How to give a great research talk by Simon Peyton
Jones](https://www.microsoft.com/en-us/research/academic-program/give-great-research-talk/)
Please do not use Viktor\'s videos as a model for the presentation, but
instead incorporate as many points of the talk of [Patrick
Winston](https://en.wikipedia.org/wiki/Patrick_Winston) as you believe
apply to your presentation. It is an amazing and entertaining talk,
despite (or because) it is meta-circular: he does as he says. Note:
breaking physical objects or referring to supernatural beings in your
video is not required. Use your own judgement and strike a balance in
being comfortable with what and how you are saying things and trying out
these pieces of advice.
### Instructions for video (recording or streaming)
We suggest that the speaker\'s video shows up when the speaker starts to
speak, so that the audience can relate and identify the speaker.
Afterwards, the video can be turned off and should come back on for
questions and answers. Optionally, a small video can stay on throughout
the presentation. The main content of the presentation should be a
window showing the material being presented, for example as a PDF to
which you can point to and/or annotate it. If the hardware allows you,
you can also use a tablet to simulate a blackboard presentation where
you write down everything you present, or use a combination or simple
slides and a strategy of what you will write on them.
**Video upload:** [please upload your video to this
channel](https://tube.switch.ch/channels/c1d660a4) (login with EPFL
credentials)
### Viktor\'s recording setup
For your information and not as a requirement, Viktor\'s lectures are
prepared using this hardware and software setup on Ubuntu 20 OS:
- slides prepared using the \`beamer\` latex package
- slides annotated using \`xournal\` PDF annotator in full screen mode
on display size 1920x1080
- recording using Zoom, with the following options:
- screen sharing PDF annotator (\`xournal\`), **without** option
to optimize for full-screen viewing
- local recording, with option **Optimize for 3rd party video
editor**
- wacom cintiq pro display as external monitor for annotating PDF\'s
using pen
- video segments are cut and assembled using ffmpeg, which works very
fast:
- cut like this:
```{=html}
<!-- -->
```
fmpeg -i zoom_0.mp4 -ss 00:00:00 -to 00:02:03.00 -c copy mysegment01.mp4
* concatenate like this:
ffmpeg -f concat -i segmentlist.txt -c copy mycombinedvideo.mp4
where segmentlist.txt is a file containing one line per each file to
include:
file 'mysegment01.mp4'
file 'mysegment02.mp4'
file 'mysegment03.mp4'
Alternatively, you can also use \`obs\` open source software. For
recording, under advanced options, you may wish to choose a 1 second key
frame interval to make cutting the video with ffmpeg work well.
File added
File added
# Lab 04: Type Checker
Parsing concludes the syntactical analysis of Amy programs. Having
successfully constructed an abstract syntax tree for an input program,
compilers typically run one or multiple phases containing checks of a
more semantical nature. Virtually all high-level programming languages
enjoy some form of name analysis, whose purpose is to disambiguate
symbol references throughout the program. Some languages go further and
perform a series of additional checks whose goal is to rule out runtime
errors statically (i.e., during compilation, or in other words, without
executing the program). While the exact rules for those checks vary from
language to language, this part of compilation is typically summarized
as \"type checking\". Amy, being a statically-typed language, requires
both name and type analysis.
## Prelude: From Nominal to Symbolic Trees
Recall that during parsing we created (abstract syntax) trees of the
*nominal* sort: Names of variables, functions and data types were simply
stored as strings. However, two names used in the program could be the
same, but not refer to one and the same \"thing\" at runtime. During
name analysis we translate from nominal trees to symbolic ones, to make
it clear whether two names refer to one and the same underlying entity.
That is, we explicitly replace strings by fresh identifiers which will
prevent us from mixing up definitions of the same name, or referring to
things that have not been defined. Amy\'s name analyzer is provided to
you as part of this lab\'s skeleton, but you should read the [dedicated
name analyzer page](name analyzer) to understand how it works.
## Introduction to Type Checking
The purpose of this lab is to implement a type checker for Amy. Our type
checking rules will prevent certain errors based on the kind or shape of
values that the program is manipulating. For instance, we should prevent
an integer from being added to a boolean value.
Type checking is the last stage of the compiler frontend. Every program
that reaches the end of this stage without an error is correct (as far
as the compiler is concerned), and every program that does not is wrong.
After type checking we are finally ready to interpret the program or
compile it to binary code!
Typing rules for Amy are presented in detail in the
[Amy specification](amy_specification.md). Make sure to check correct
typing for all expressions and patterns.
## Implementation
The current assignment focuses on the file `TypeChecker.scala`. As
usual, the skeleton and helper methods are given to you, and you will
have to complete the missing parts. In particular, you will write a
compiler phase that checks whether the expressions in a given program
are well-typed and report errors otherwise.
To this end you will implement a simplified form of the Hindley-Milner
(HM) type-inference algorithm that you\'ll hear about during the
lectures. Note that while not advertised as a feature to users of Amy,
behind the scenes we will perform type inference. It is usually
straightforward to adapt an algorithm for type inference to type
checking, since one can add the user-provided type annotations to the
set of constraints. This is what you will do with HM in this lab.
Compared to the presentation of HM type inference in class your type
checker can be simplified in another way: Since Amy does not feature
higher-order functions or polymorphic data types, types in Amy are
always *simple* in the sense that they are not composed of arbitrary
other types. That is, a type is either a base type (one of `Int`, `Bool`
and `String`) or it is an ADT, which has a proper name (e.g. `List` or
`Option` from the standard library). In the latter case, all the types
in the constructor of the ADT are immediately known. For instance, the
standard library\'s `List` is really a list of integers, so we know that
the `Cons` constructor takes an `Int` and another `List`.
As a result, your algorithm will never have to deal with complex
constraints over type constructors (such as the function arrow
`A => B`). Instead, your constraints will always be of the form
`T1 = T2` where `T1` and `T2` are either *simple* types or type
variables. This is most important during unification, which otherwise
would have to deal with complex types separately.
Your task now is to a) complete the `genConstraints` method which will
traverse a given expression and collect all the necessary typing
constraints, and b) implement the *unification* algorithm as
`solveConstraints`.
Familiarize yourself with the `Constraint` and `TypeVariable` data
structures in `TypeChecker.scala` and then start by implementing
`genConstraints`. The structure of this method will in many cases be
analogous to the AST traversal you wrote for the name analyzer. Note
that `genConstraints` also takes an *expected type*. For instance, in
case of addition the expected type of both operands should be `Int`. For
other constructs, such as pattern `match`es it is not inherently clear
what should be the type of each `case` body. In this case you can create
and pass a fresh type variable.
Once you have a working implementation of both `genConstraints` and
`solveConstraints` you can copy over your previous work on the
interpreter and run the programs produced by your frontend! Don\'t
forget that to debug your compiler\'s behavior you can also use the
reference compiler with the `--interpret` flag and then compare the
output.
## Skeleton
As usual, you can find the skeleton for this lab in a new branch of your
group\'s repository. After merging it with your existing work, the
structure of your project `src` directory should be as follows:
src/amyc
├── Main.scala (updated)
├── analyzer (new)
│ ├── SymbolTable.scala
│ ├── NameAnalyzer.scala
│ └── TypeChecker.scala
├── ast
│ ├── Identifier.scala
│ ├── Printer.scala
│ └── TreeModule.scala
├── interpreter
│ └── Interpreter.scala
├── lib
│ ├── scallion_3.0.6.jar
│ └── silex_3.0.6.jar
├── parsing
│ ├── Parser.scala
│ ├── Lexer.scala
│ └── Tokens.scala
└── utils
├── AmycFatalError.scala
├── Context.scala
├── Document.scala
├── Pipeline.scala
├── Position.scala
├── Reporter.scala
└── UniqueCounter.scala
## Deliverables
You are given **3 weeks** for this assignment.
Deadline: **TBD**.
Submission: one team member submits a zip file submission-groupNumber.zip to the [moodle submission page]().
# Lab 05: Code Generation
## Introduction
Welcome to the last common assignment for the Amy compiler. At this
point, we are finally done with the frontend: we have translated source
programs to ASTs and have checked that all correctness conditions hold
for our program. We are ready to generate code for our program. In our
case the target language will be *WebAssembly*.
WebAssembly is \"a new portable, size- and load-time-efficient format
suitable for compilation to the web\" (<http://webassembly.org>).
WebAssembly is designed to be called from JavaScript in browsers and
lends itself to highly-performant execution.
For simplicity, we will not use a browser, but execute the resulting
WebAssembly bytecode directly using `nodejs` which is essentially a
standalone distribution of the Chrome browser\'s JavaScript engine. When
you run your complete compiler (or the reference compiler) with no
options on program `p`, it will generate four different files under the
`wasmout` directory:
- `p.wat` is the wasm output of the compiler in text format. You can
use this representation to debug your generated code.
- `p.wasm` is the binary output of the compiler. This is what `nodejs`
will use. To translate to the binary format, we use the `wat2wasm`
tool provided by the WebAssembly developers. For your convenience we
have included it in the `bin` directory of the skeleton. Note that
this tool performs a purely mechanical translation and thus its
output (for instance, `p.wasm`) corresponds to a binary
representation of `p.wat`.
- `p.js` is a JavaScript wrapper which we will run with nodejs and
serve as an entrypoint into your generated binary.
To run the program, simply type `nodejs wasmout/p.js`
### Installing nodejs
- You can find directions for your favorite operating system
[here](https://nodejs.org/en/). You should have nodejs 12 or later
(run `nodejs --version` to make sure).
- Once you have installed nodejs, run `npm install deasync` from the
directory you plan to run `amyc` in, i.e. the toplevel directory of
the compiler.
- Make sure the `wat2wasm` executable is visible, i.e. it is in the
system path or you are at the toplevel of the `amyc` directory.
## WebAssembly and Amy
Look at [this
presentation](http://lara.epfl.ch/~gschmid/clp20/codegen.pdf) for the
main concepts of how to translate Amy programs to WebAssembly.
You can find the annotated compiler output to the concat example
[here](http://lara.epfl.ch/~gschmid/clp20/concat.wat).
## The assignment code
### Overview
The code for the assignment is divided into two directories: `wasm` for
the modeling of the WebAssembly framework, and `codegen` for
Amy-specific code generation. There is a lot of code here, but your task
is only to implement code generation for Amy expressions within
`codegen/CodeGen.scala`.
- `wasm/Instructions.scala` provides types that describe a subset of
WebAssembly instructions. It also provides a type `Code` to describe
sequences of instructions. You can chain multiple instructions or
`Code` objects together to generate a longer `Code` with the `<:>`
operator.
- `wasm/Function.scala` describes a wasm function.
- `LocalsHandler` is an object which will create fresh indexes for
local variables as needed.
- A `Function` contains a field called `isMain` which is used to
denote a main function without a return value, which will be
handled differently when printing, and will be exported to
JavaScript.
- The only way to create a `Function` is using `Function.apply`.
Its last argument is a function from a `LocalsHandler` to
`Code`. The reason for this unusual choice is to make sure the
Function object is instantiated with the number of local
variables that will be requested from the LocalsHandler. To see
how it is used, you can look in `codegen/Utils.scala` (but you
won\'t have to use it directly).
- `wasm/Module.scala` and `wasm/ModulePrinter.scala` describe a wasm
module, which you can think of as a set of functions and the
corresponding module headers.
- `codegen/Utils.scala` contains a few utility functions (which you
should use!) and implementations of the built-in functions of Amy.
Use the built-ins as examples.
- `codegen/CodeGen.scala` is the focus of the assignment. It contains
code to translate Amy modules, functions and expressions to wasm
code. It is a pipeline and returns a wasm Module.
- `codegen/CodePrinter.scala` is a Pipeline which will print output
files from the wasm module.
### The cgExpr function
The focus of this assignment is the `cgExpr` function, which takes an
expression and generates a `Code` object. It also takes two additional
arguments: (1) a `LocalsHandler` which you can use to get a new slot for
a local when you encounter a local variable or you need a temporary
variable for your computation. (2) a map `locals` from `Identifiers` to
locals slots, i.e. indices, in the wasm world. For example, if `locals`
contains a pair `i -> 4`, we know that `get_local 4` in wasm will push
the value of i to the stack. Notice how `locals` is instantiated with
the function parameters in `cgFunction`.
## Skeleton
As usual, you can find the skeleton for this lab in a new branch of your
group\'s repository. After merging it with your existing work, the
structure of your project `src` directory should be as follows:
src/amyc
├── Main.scala (updated)
├── analyzer
│ ├── SymbolTable.scala
│ ├── NameAnalyzer.scala
│ └── TypeChecker.scala
├── ast
│ ├── Identifier.scala
│ ├── Printer.scala
│ └── TreeModule.scala
├── bin
│ └── ...
├── codegen (new)
│ ├── CodeGen.scala
│ ├── CodePrinter.scala
│ └── Utils.scala
├── interpreter
│ └── Interpreter.scala
├── lib
│ ├── scallion_3.0.6.jar
│ └── silex_3.0.6.jar
├── parsing
│ ├── Parser.scala
│ ├── Lexer.scala
│ └── Tokens.scala
├── utils
│ ├── AmycFatalError.scala
│ ├── Context.scala
│ ├── Document.scala
│ ├── Pipeline.scala
│ ├── Position.scala
│ ├── Reporter.scala
│ └── UniqueCounter.scala
└── wasm (new)
├── Function.scala
├── Instructions.scala
├── ModulePrinter.scala
└── Module.scala
## Deliverables
You are given **4 weeks** for this assignment.
Deadline: **TBD**.
Submission: one team member submits a zip file submission-groupNumber.zip to the [moodle submission page]().
# Labs 06: Compiler extension project
You have now written a compiler for Amy, a simple functional language.
The final lab project is to design and implement a new functionality of
your own choice on top of the compiler you built so far. In preparation
for this, you should aim to learn about the problem domain by searching
the appropriate literature. The project includes:
- designing and implementing the new functionality
- documenting the results in a written report document
This project has several deadlines, detailed below. Please note that the
first of them (choosing the topic) is already coming up on Sunday!
## Selecting a Project Topic
**Deadline: TBD**
In the following document, we list several project ideas, but you should
also feel free to submit your own by email. All groups will rank the
projects in order of preference, and we will then do our best to assign
the preferred projects to as many groups as possible. Because not all
projects are equally difficult, we annotated each of them with the
expected workload. The suggested projects cover a wide range of
complexity, and we will evaluate your submissions with that complexity
in mind. For instance, for a project marked with `(1)` (relatively low
complexity) we will be expecting a polished, well-tested and
well-documented extension, whereas projects on the other end (`(3)`) may
be more prototypical. For all submissions, however, we require that you
deliver code that compiles and a set of example input files that
demonstrate the new functionality.
[Project ideas](labs06_material/extensions.pdf)
To announce your preferences, [please fill out this form by Sunday at
the latest](). You\'ll have to
provide **the names of the top exactly 5** projects you would like to
work on, in order of descending preference. We will do our best to
assign you the project you are most interested in.
## Project Orientation
**Deadline: TBD**
We will try to inform you about the project assignment as soon as
possible. To give you a chance to validate your understanding of the
project and what\'s expected of you, we will offer dedicated slots
during the project sessions next week. Before you join, you should think
about the following questions
- What are the features you will add to the compiler/language?
- What would be some (short) programs highlighting the use of these
features?
- What changes might be required in each compiler phase and/or what
new phases would you add? (Very roughly)
**TODO: define slots**
## Project Presentation
You will present your idea during the lab sessions on the last regular
week of the semester (Dec 16th/22nd/23rd). We\'ll announce the concrete
schedule of presentations at a later point. [Instructions on what and
how to present your project can be found here.](labs06_material/presentation.md)
## Project Implementation and Report
**Deadline: Jan 7th 2021 23h00**
Your implementation and a report are due on this date, and both will be
delivered using Git. You will develop your project on top of your
implementation of Amy. Please push all development on a new branch
`lab06`, ideally building on top of the codegen lab (branch `lab05`).
**TODO: define submission method**
Your repository should contain:
- Your implementation, which must, to be graded at all, compile and be
able to run non-trivial examples.
- A subdirectory `extension-examples/` which includes some examples
that demonstrate your compiler extension in action.
- A subdirectory `report/` which includes a PDF summarizing your
extension.
**If you did not manage to complete your planned features, or they are
partially implemented, make this clear in your report!**
You are encouraged to use the following (LaTeX) template for your
report:
- [LaTeX sources](labs06_material/report-template.tar.gz)
A PDF version of the template with the required section is available
here:
- [PDF Example](labs06_material/report-template.pdf)
Although you are not required to use the above template, your report
must contain at least the sections described in it with the appropriate
information. Note that writing this report will take some time, and you
should not do it in the last minute. The final report is an important
part of the compiler project. If you have questions about the template
or the contents of the report, make sure you ask them early.
A common question is \"how long should the report be?\". There\'s no
definitive answer to that. Considering that the report will contain code
examples and a technical description of your implementation, it would be
surprising if it were shorter than 3 pages. Please try to stay within 6
pages. A concise, but well-written report is preferable to a long, but
poorly-written one.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment