Skip to content
Snippets Groups Projects
Commit 468dcbf0 authored by Fatih Yazici's avatar Fatih Yazici
Browse files

Remove outdated extensions pdf

parent ebe3cd22
No related branches found
No related tags found
No related merge requests found
\section{Alternative frontends/backends}
This section contains projects that do not modify the language features of \langname,
but change the implementation of a part of the \langname compiler frontend or backend.
\subsection{Code formatter (1)}
Build a code formatter for the Amy language.
A straightforward way to accomplish this would be to add a special mode
(e.g.\ \lstinline{--format}) that the user can start the compiler in.
It would then only run the existing pipeline up to, say, parsing and subsequently go
through a special pretty-printing phase that outputs the program according to code style
rules configurable by the user.
%
A more sophisticated version could instead work on the Token-level, allowing your
formatter to be aware of whitespace, e.g., respecting new-lines that a user inserted.
%
In any case, you will have to maintain comments (which are not part of the AST).
You can look at \lstinline{scalafmt} for some inspiration.
\subsection{Language Server (2)}
Implement a language server for Amy. VSCode and similar IDEs use the \href{https://en.wikipedia.org/wiki/Language_Server_Protocol}{language server protocol} to communicate with compilers and thereby provides deep integration with a variety of programming languages. Among other things, this enables features like showing type-checking errors within the editor, jumping to definitions and looking up all usages of a given definition. Your goal is to provide such functionality for Amy by implementing an additional mode in your compiler, in which it acts as a client for the language server protocol. Your implementation should be demonstrable with VSCode and include the aforementioned features. You can using an existing library, such as \href{https://github.com/eclipse/lsp4j}{LSP4J}, to simplify your task.
\subsection{Formalization of Amy (1)}
Develop an operational semantics for Amy and use your definitions along with Amy's typing rules to prove type safety. Note that this might require you to do some additional reading on type systems.
\subsection{JVM backend (2)}
Implement an alternative backend for \langname which outputs JVM bytecode.
You can use \href{https://github.com/psuter/cafebabe}{this library}.
You first have to think how to represent \langname values in a class-based environment,
and then generate the respective bytecode from \langname ASTs.
\subsection{C backend (3)}
Implement an alternative backend for \langname which outputs C code.
You have to think how to represent \langname values in C,
and then generate respective C code from \langname ASTs.
pdflatex extensions
pdflatex extensions
\newcommand{\CEGIS}{\textsf{CEGIS}}
\newcommand{\TerminalRule}{\textsf{Terminal}}
\newcommand{\Search}{\textsf{Search}}
\newcommand{\Verify}{\textsf{Verify}}
\newcommand{\Enumerate}{\textsf{Enumerate}}
\newcommand{\from}{\mathbin{\leftarrow}}
\newcommand{\union}{\mathbin{\cup}}
\newcommand{\expt}{\mathcal{E}}
\newcommand{\Expansions}{\mathcal{E}}
\newcommand{\prob}[1]{\operatorname{Pr}[#1]}
\newcommand{\R}{\mathbb{R}}
\newcommand{\Land}{\bigwedge}
\newcommand{\cost}{\operatorname{cost}}
\newcommand{\horizon}{h}
\newcommand{\score}{score}
\newtheorem{thm}{Theorem}[section]
\newcommand{\smartparagraph}[1]{\noindent\textbf{#1}}
\newcommand{\sparagraph}[1]{\noindent\textbf{#1}}
\newcommand{\TODO}[1]{\marginpar{\color{red}TODO}{\color{red}#1}\xspace}
% Name calling
\newcommand{\leon}{Leon\xspace}
\newcommand{\leonsyn}{LeonSyn\xspace}
\newcommand{\ourcegis}{STE\xspace} % DONE : find a non-ridiculous name
\newcommand{\ourca}{CA\xspace}
\newcommand{\andor}{\textsc{and/or}\xspace}
\newcommand{\insynth}{InSynth\xspace}
% General math
\newcommand{\ALL}[2]{\ensuremath{\forall #1 :~ #2}}
\newcommand{\EX}[2]{\ensuremath{\exists #1 :~ #2}}
\newcommand{\seq}[1]{\ensuremath{\bar{#1}}}
\newcommand{\seqa}{\seq{a}\xspace}
\newcommand{\seqx}{\seq{x}\xspace}
\newcommand{\seqt}{\seq{T}}
\newcommand{\seqg}{\seq{G}}
\newcommand{\seqr}{\seq{r}}
\newcommand{\varsof}[1]{\ensuremath{\text{vars}(#1)}}
\newcommand{\splus}{\ensuremath{\mathop{,}}} % separator in sequences
% Synthesis framework
\newcommand{\br}[4]{\ensuremath{\left\llbracket #1 \ \left\langle #2 \rhd #3 \right\rangle \ #4\right\rrbracket}}
\newcommand{\pg}[2]{\langle {#1} \mid {#2} \rangle}
\newcommand{\similar}[1]{\ensuremath{G(\textsf{#1})}}
\newcommand{\similarr}[2]{\ensuremath{G_{#2}(\textsf{#1})}}
\newcommand{\prename}{\ensuremath{P}\xspace}
\newcommand{\pcname}{\ensuremath{\Pi}\xspace}
\newcommand{\pgname}{\seqt}
\newcommand{\inputs}{\mathcal{I}}
\newcommand{\pgite}[3]{\ensuremath{\text{\textsf{if(}}#1\text{\textsf{) \{}}#2\text{\textsf{\} else \{}}#3\text{\textsf{\}}}}}
\newcommand{\pglet}[3]{\ensuremath{\text{\textsf{val}} \ #1 \colonequals #2 \text{\textsf{;}} \ #3}}
\newcommand{\match}[2]{\ensuremath{\text{#1\textsf{ match \{ }}#2\text{\textsf{\}}}}}
\newcommand{\mcase}[2]{\ensuremath{\text{\textsf{ case }}#1 \Rightarrow #2}}
\newcommand{\code}[1]{\text{\textsf{#1}}}
\newcommand{\guide}[1]{\ensuremath{\odot\mkern-4mu\left[#1\right]}}
\newcommand{\terminates}[1]{\ensuremath{\Downarrow\mkern-4mu\left[#1\right]}}
% Listing-like things.
\newcommand{\cl}[1]{\lstinline[mathescape]@#1@}
\newcommand{\clnoat}[1]{\lstinline[mathescape]!#1!}
\newcommand{\mcl}[1]{\ensuremath{\mathsf{#1}}}
% \newcommand{\mcl}[1]{\ensuremath{\text{\lstinline{#1}}}}
\newcommand{\choosesym}{\cl{choose}}
% Hoare triples
\newcommand{\HoareTriple}[3]{
\begin{displaymath}
\left\{\begin{array}{l}#1\end{array}\right\}
\begin{array}{l}#2\end{array}
\left\{\begin{array}{l}#3\end{array}\right\}
\end{displaymath}
}
\newcommand{\hoareTriple}[3]{\{$#1$\} $#2$ \{$#3$\}}
\newcommand{\btrue}{\mcl{true}}
\newcommand{\gpo}{::=}
\newcommand{\gnt}[1]{~#1~}
\newcommand{\gt}[1]{\text{\tt \textbf{~#1~}}}
\newcommand{\gtns}[1]{\text{\tt \textbf{#1}}}
\newcommand{\FIXME}[1]{ {\color{red} FIXME: #1}}
\newcommand\langname{Amy\xspace}
\section{Execution}
This section suggests projects that change how \langname code is executed.
\subsection{Memory deallocation (3)}
Allow explicit memory deallocation by the user.
\begin{lstlisting}
val x: List = Cons(1, Nil());
length(x); // OK
free(x);
length(x) // Wrong, might return garbage
\end{lstlisting}
When an object in linear memory is freed, the space it used to occupy
is considered free and can be allocated again.
Any further reference to the freed object is undefined behavior.
You need to change how memory allocation works in code generation
to maintain a list of free blocks,
which will now not be a continuous part at the end of the memory.
The list should not be external,
but rather implemented in the memory itself:
each free block needs to contain a pointer to the next one.
Each block will also need to record its size.
This means that free blocks have to be of size at least 2 words.
When you allocate an object, you need to look through the list
of blocks for one that fits and if none does,
the program should fail.
Make sure you always modify the free list in the simplest way possible,
i.e. the blocks in the list don't have to be in the same order as in memory.
\subsection{Lazy evaluation (1-2)}
Change the evaluation strategy of Amy to lazy evaluation.
Only input and output are evaluated strictly.
\begin{lstlisting}
val x: Int = (Std.printInt(42); 0); // Nothing happens
val y: Int = x + 1 // Still nothing...
Std.printInt(y); // 42 and 1 are printed
val l: List = Cons(1, Cons(2, Cons(error("lazy"), Nil())));
// No error is thrown
l match {
case Nil() => () // At this point, we evaluate l just enough
// to know it is a Cons
case Cons(h, t) => Std.printInt(h) // Prints 1
case Cons(h1, Cons(h2, Cons(h3, _))) =>
// Still no error...
Std.printInt(h3)
// This forces evaluation of the third list element
// and an error is thrown!
}
// We can do neat things like define infinite lists, i.e. streams
def countFrom(start: Int): List = Cons(start, countFrom(start + 1))
Std.printString(L.listToString(
take(countFrom(0), 5)
) // Will terminate and return `List(1, 2, 3, 4, 5)'
\end{lstlisting}
Each value is not evaluated until it is required.
Things that are not evaluated have to live in the runtime state as \emph{thunks},
or suspensions to be evaluated later.
A thunk is essentially a closure (see Section~\ref{closures}) with memoization:
it is either an already calculated value,
or an expression to be evaluated and an evaluation environment.
In turn, an evaluation environment is a mapping from identifiers to other thunks.
You have to make sure that pattern matching only evaluates expressions as much as needed.
Maybe \href{https://en.wikibooks.org/wiki/Haskell/Laziness#Thunks_and_Weak_head_normal_form}{this}
will help you understand the concept.
For simplicity, you can implement lazy evaluation directly in the interpreter, i.e., as an extension of the first lab (1 person).
If you implement lazy evaluation for the WebAssembly-backend, you can work on this project as a team of two. (Note that this variant might be significantly harder.)
\subsection{Final code optimizations (1+)}
Optimize the WebAssembly binary produced by your \langname compiler.
The simplest thing you can do is eliminate some obvious redundancies such as
\begin{minipage}{0.49\textwidth}
\begin{lstlisting}
i32.const 0
if (result i32)
e1
else
e2
end
// equivalent to e2
\end{lstlisting}
\end{minipage}
\begin{minipage}{0.49\textwidth}
\begin{lstlisting}
if (result i32)
i32.const 1
else
i32.const 0
end
// completely redundant
\end{lstlisting}
\end{minipage}
Preferably, you can implement a control flow analysis and some abstract
interpretations to implement more advanced optimizations,
also involving local parameters. This would involve a larger group.
You can have a look at \href{https://cs420.epfl.ch/archive/18/s/acc18_07_optimizations.pdf}{these slides}
for some ideas on optimization.
\subsection{Tail call optimization (1)}
Implement tail call optimization for \langname.
Tail-recursive functions should not create any additional stack frames,
i.e. use the \lstinline{call} instruction.
A way to implement tail recursive functions is to do a source-to-source transformation
which transforms tail recursive functions to loops. You will need to define new ASTs.
When it comes to tail calls that are not tail recursion,
things are tougher. If you feel like also handling those cases,
look \href{https://cs420.epfl.ch/archive/18/s/acc18_10_tail-calls.pdf}{here} for ideas.
\subsection{Foreign-function interface (FFI) to JavaScript (2)}
Design a cross-language interaction layer between \langname and JavaScript.
At a minimum you should support calling JavaScript functions with primitive parameter- and
result-types from Amy. You can also consider supporting calls from JavaScript into Amy.
You will have to decide how WebAssembly representations of Amy objects should map to
JavaScript objects.
To ensure that programs can be meaningfully type-checked, you should add syntax for
\lstinline{external} functions, e.g.
\begin{lstlisting}
object FS {
external def open(path: String): Int
external def read(fd: Int): String
// ...
}
object Example {
val f: Int = FS.open("/home/foo/hello.txt") // open file
Std.printString(FS.read(f)) // print contents of hello.txt
}
\end{lstlisting}
Conversely, Amy functions exposed to JavaScript could be annotated with an \lstinline{export}
keyword.
Ideally you will also demonstrate your FFI's capabilities by wrapping some NodeJS or browser
APIs and exposing them to Amy.
For instance, you might expose the file system API of NodeJS, thus allowing Amy programs to
read from and write to files.
Another idea is to adapt the HTML wrapper file that we provide with the compiler and use the
FFI to write an interactive browser application in Amy.
A more sophisticated version of this project (for three people) would also support foreign
functions involving case classes such as \lstinline{List}.
\subsection{REPL: Read-Eval-Print Loop (3)}
Implement a REPL for \langname.
It should support defining classes, functions and local variables, and evaluating expressions.
You don't have to support redefinitions. You can take a look at the Scala REPL for inspiration.
\subsection{Virtual machine (3)}
Develop your own VM to run WebAssembly code!
To simplify things, you will implement the VM in Scala.
Your VM should take as input
a wasm \lstinline{Module} from amyc's \lstinline{CodeGen} Pipeline
(so you don't need to implement a parser from wasm text or binary)
and execute the code contained within.
Despite using Scala, you still need to follow the VM execution model as much as possible:
translate labels to addresses,
use an array for the memory, a stack for execution etc.
You can choose the VM parameters, such as memory size, any way you choose,
and hard-code built-in functions that are not already implemented in WebAssembly.
File deleted
\documentclass[]{article}
%\settopmatter{printfolios=true}
% For final camera-ready submission
% \documentclass[acmlarge]{acmart}
% \settopmatter{}
\usepackage{amssymb}
\usepackage{amsmath}
\usepackage{defs}
\usepackage{listings}
\usepackage{stmaryrd}
\usepackage{xcolor}
\usepackage{xspace}
\usepackage[colorlinks]{hyperref}
\hypersetup{urlcolor=cyan}
\usepackage{caption} % Link to beginning of figures
\usepackage{mathpartir}
%\usepackage{subcaption}
\input{scalalistings}
\title{Compiler Extensions for \langname}
\date{Computer Language Processing\\~\\LARA\\~\\Autumn 2019}
\begin{document}
\maketitle
\section{Introduction}
In this document you will find some compiler extension ideas
for the last assignment of the semester.
The ideas are grouped in sections based on the broader subject they cover.
Every extension indicates the maximum size of a
group that is allowed to take it up next to its title.
Some assignments suggest additional features which allow the group to include
additional members.
\section{Your own idea!}
We will be very happy to discuss an idea you come up with yourselves.
\input{features.tex}
\input{types}
\input{alternatives}
\input{execution}
\end{document}
\ No newline at end of file
\section{Language features}
Projects in this section extend \langname by adding a new language feature.
To implement one of these projects, you will probably need to modify
every stage of the compiler, from lexer to code generation.
If the project is too hard, you might be allowed to skip the code generation
part and only implement the runtime of your project in the interpreter.
\subsection{Imperative features (2+)}
With the exception of input/output, \langname is a purely functional language:
none of its expressions allow side effects.
Your task for this project is to add imperative language features to Amy.
These should include:
\begin{itemize}
\item Mutable local variables.
\begin{lstlisting}
var i: Int;
var j: Int = 0;
i = j;
j = i + 1;
Std.printInt(i);
Std.printInt(j) // prints 0, 1
\end{lstlisting}
Make sure your name analysis disallows plain \lstinline{val}s to be mutated.
\item While loops.
\begin{lstlisting}
def fact(n: Int): Int = {
var res: Int = 1;
var j: Int = n;
while(1 < j) {
res = res * j;
j = j - 1
};
res
}
\end{lstlisting}
\item \emph{Bonus:} Arrays.
You should support at least array initialization,
indexing and extracting array length.
If you add this feature,
you can add an additional member to the group.
\end{itemize}
\subsection{Implicit parameters (1)}
Much like Scala, this feature allows functions to take implicit parameters.
\begin{lstlisting}
def foo(i: Int)(implicit b: Boolean): Int = {
if (i <= 0 && !b) { i }
else { foo(i - 1) + i } // good, implicit parameter in scope
}
foo(1)(true); // good, argument explicitly provided
foo(1); // bad, no implicit in scope
implicit val b: Boolean = true;
foo(1); // good, implicit in scope
// equivalent to foo(1)(b)
implicit val b2: Boolean = false;
foo(1) // Bad, two boolean implicits in scope.
\end{lstlisting}
When a function that takes an implicit parameter is called
and the implicit parameter is not explicitly defined,
the compiler will look at the scope of the call for an implicit
variable/parameter definition of the same type.
If exactly one such definition is found, the compiler
will complete the call with the defined variable/parameter.
If more than one or no such definitions are found,
the compiler will fail the program with
``implicit parameter conflict'' or ``no implicit found''
errors respectively.
\subsection{Implicit conversions (1)}
Much like Scala, this feature allows specified functions to act as implicit conversions.
\begin{lstlisting}
implicit def i2b(i: Int): Boolean = { !(i == 0) }
2 || false // Good, returns true
def foo(b: Boolean): List = { ... }
foo(42) // Also good
1 + true // Bad, no implicit in scope.
def b2s(b: Boolean): String = { ... }
1 ++ "Hello" // Bad, we cannot apply two conversions
\end{lstlisting}
An implicit conversion is a function with the qualifier \lstinline{implicit}.
It must have a single parameter.
At any point in the program, when an expression \lstinline{e} of type \lstinline{T1} is found
but one of type \lstinline{T2} is expected,
the compiler searches the current module for an implicit conversion
of type \lstinline{(T1) => T2}.
If exactly one such conversion \lstinline{f} is found,
the compiler will substitute the \lstinline{e} by \lstinline{f(e)}
(and the program typechecks).
If multiple such conversions are found,
the compiler fails with an ambiguous implicit error.
If none is found, an ordinary type error is emitted.
Only a single conversion is allowed to apply to an expression.
For example, in the above example, we cannot implicitly apply
\lstinline{i2b} and then \lstinline{b2s} to get a \lstinline{String}.
\subsection{Tuples (1)}
Add support for tuples in \langname. You should support
tuple types, literals, and patterns:
\begin{lstlisting}
def maybeNeg(v: (Int, Boolean)): (Int, Boolean) = { // Type
v match {
case (i, false) => // pattern
(i, false) // literal
case (i, true) =>
(-i, false)
}
}
\end{lstlisting}
There are two ways you could approach this problem:
\begin{itemize}
\item Treat tuples as built-in language features. In this case,
you need to support tuples of arbitrary size.
\item Desugar tuples into case classes. A phase after
parsing and before name analysis will transform all tuples
to specified library classes, e.g. \lstinline{Tuple2, Tuple3} etc.
In this case, you cannot support tuples of arbitrary size,
but you still need to support all sizes up to, say, 10.
With this approach, you don't have to modify any compiler phases
from the name analysis onwards,
except maybe to print error messages that make sense to the user.
\end{itemize}
\subsection{Improved string support (1+)}
Improve string support for \langname.
As a starting point, you can add functionality like substring, length, and replace,
which will require you to write auxiliary WebAssembly or JavaScript code.
To avoid adding additional trees,
you can represent these functions as built-in methods in \lstinline{Std}.
If you want a more elaborate project for a larger group,
you can add \lstinline{Char} as a built-in type,
which opens the door for additional functionality with Strings.
You can also implement
\href{https://docs.scala-lang.org/overviews/core/string-interpolation.html}{string interpolation}.
In general, look at Java/Scala strings for inspiration.
\subsection{Higher-order functions (2+, challenging)}
\label{closures}
Add support for higher-order functions to \langname.
You need to support function types and anonymous functions.
\begin{lstlisting}
def compose(f: Int => Int, g: Int => Int): Int => Int = {
(x: Int) => f(g(x))
}
compose((x: Int) => x + 1, (y: Int) => y * 2)(5) // returns 11
def map(f: Int => Int, l: List): List = {
l match {
case Nil() => Nil()
case Cons(h, t) => Cons(f(h), map(f, t))
}
}
map( (x: Int) => x + 1, Cons(1, Cons(2, Cons(3, Nil()))) )
// Returns List(2, 3, 4)
def foo(): Int => Int = {
val i: Int = 1;
val res: Int => Int = (x: Int) => x + i
// Problem! How do we access i from within res?
res
}
foo()(42) // Returns 43
\end{lstlisting}
You have to think how to represent higher order functions during runtime.
In a bytecode setting,
a first approach is to represent a higher-order function as a pointer to
a named function, which is then called indirectly.
You have to read about tables and indirect calls in WebAssembly.
This works fine for \lstinline{compose} or \lstinline{map} above,
but not for \lstinline{foo}.
The problem is that higher order functions can refer to variables in their scope,
like \lstinline{res} above refers to \lstinline{i}.
The set of those variables are called the \emph{environment} of the function.
If its environment is empty, the function will be called \emph{closed}.
Above, we have no way to refer to \lstinline{i} from within \lstinline{res}
at runtime:
\lstinline{i} is in the frame of \lstinline{foo} which is not accessible in \lstinline{res}.
In fact, by the time we need \lstinline{i},
\lstinline{foo} may have returned and its frame disappeared!
The way to solve this problem is a technique called \emph{closure conversion}.
The idea is the following:
At runtime, a function are represented as a \emph{closure},
i.e. a function pointer along with the environment it captures from its scope.
When we create a closure at runtime, we create a pair of values in memory,
one of which points to the code (which will be a function)
and the other to the environment,
which will be a list of the captured variables.
When we call the function,
we really call the function pointer in the closure.
We need to make sure to extract and somehow pass to the function pointer
its environment from the other pointer.
You can find a detailed explanation of closure conversion
\href{https://cs420.epfl.ch/s/acc17_05_closure-conversion.pdf}{here}.
In the interpreter, things are simpler in both cases:
you can define a new value type \lstinline{FunctionValue}
which contains all necessary information.
In fact, you should probably start here as an exercise.
For your project, we recommend that you assume
all functions in the source code are closed,
but if you are motivated to implement closure conversion,
we will allow an additional group member.
\subsection{Custom operators (2)}
Allow the user to define operators.
\begin{lstlisting}
operator def :::(l1: List, l2: List): List = {
l1 match {
case Nil() => l2
case Cons(h, t) => Cons(h, t ::: l2)
}
}
Cons(1, Cons(2, Nil())) ::: Cons(3, Nil()) // returns List(1, 2, 3)
\end{lstlisting}
You can choose specific priorities for the operators based e.g. on their first character,
or you can allow the user to define it;
e.g. \lstinline{operator 55 def :::(...)} could signify
that \lstinline{:::} has a precedence between \lstinline{+} and \lstinline{*}
(with \lstinline{||} having 10, up to \lstinline{*} having 60).
You can also choose to have built-in binary operators of \langname
subsumed by this project. Of course, their implementation
will be left to be hard-coded by the compiler backend:
\begin{lstlisting}
operator 50 def +(i1: Int, i2: Int): Int = { error("+") }
\end{lstlisting}
In any case, your parser will be in no position to know
what operators are available in your program before actually parsing it.
Therefore, when you have more than one operators in a row,
your parser will just have to parse the tree as a flat sequence
of operand, operator, operand, \ldots,
and then fix the mess afterwards.
Of course other solutions are welcome.
\subsection{Improved Parameters (2)}
Add support for named and default parameters for functions and classes.
If a value for a parameter with a default value is not given,
the compiler completes the default value.
One can choose to explicitly name parameters when calling a function/constructor,
which also allows reordering:
\begin{lstlisting}
def foo(i: Int, j: Int = 42): Int = { i + j }
foo(1) // OK, j has default value
foo(i = 5, j = 7) // OK
foo(j = 5, i = 7) // OK, can reorder named parameters
foo(i = 7) // OK
foo(j = 7) // Wrong, i has no default value
foo() // Wrong, i has no default value
foo(i: Int = 5, j: Int): Int = { i + j }
// Wrong, default parameters have to be at the end
// Similarly for case classes
case class Foo(i: Int, j: Int = 42) extends Bar
\end{lstlisting}
Notice that names for case class parameters are currently not preserved in the AST,
which you will have to change.
% \subsection{Regular expressions (1+)}
% Add support for regular expressions to \langname. You can use syntax similar to Java/Scala.
% The size of the group depends on the number of features you want to implement.
\subsection{List comprehensions (2)}
Extend Amy with list comprehensions, which allow programmers to succinctly express
transformations of \lstinline{List}s.
\begin{lstlisting}
val xs: L.List = L.Cons(1, L.Cons(2, L.Cons(3, L.Nil())));
val ys: L.List = [ 2*x for x in xs if x % 2 != 0 ];
Std.printString(L.toString(ys)) // [2, 6]
\end{lstlisting}
Your list-comprehension syntax should support enumerating elements from one or
multiple lists, filtering them with and mapping them to arbitrary expressions.
It is up to you to decide whether to treat these comprehensions as primitives in your compiler.
If you do so, you will have a dedicated AST node for comprehensions in the entire compiler
pipeline and generate specific code or interpret them accordingly in the end.
Alternatively, you can \emph{desugar} list comprehensions earlier in the pipeline, e.g.\
right after (or during) parsing. You could, for instance, generate auxiliary functions that
compute the result of the list comprehension and are called in place of the comprehensions.
\subsection{Inlining (1+)}
Implement inlining on the AST level, that is, allow users to force the compiler to inline certain
functions and perform optimizations on the resulting AST.
\begin{lstlisting}
inline def abs(n: Int): Int = { if (n < 0) -n else n }
abs(123); // inlined and constant-folded to `123'
abs(-456); // inlined and constant-folded to `456'
// inlined, not cf-ed; careful with side-effects!
abs(Std.readInt())
\end{lstlisting}
Inlining is effective when we can expect optimizations to make code significantly more
efficient given additional information on function arguments. At a minimum, you would add an
\lstinline{inline} qualifier for function definitions and perform \emph{constant folding} on
inlined function bodies.
Inlining is particularly useful when applied to auxiliary functions that only exist for
clarity. While inlining can lead to \emph{code explosion} when applied too liberally, note that
inlining a non-recursive function that is only called in a single location will strictly reduce
code size and potentially lead to more efficient code.
This makes it very attractive to \emph{automatically} apply inlining to such functions:
\begin{lstlisting}
def foo(n: Int): Int = {
def plus1(n: Int): Int = { n + 1 }
inline def times2(n: Int): Int = { 2 * n }
plus1(times2(times2(n))) // inlined and cf-ed to `4 * n + 1'
}
def bar(): Int = {
def fib(n: Int): Int = {
if (n <= 2) { 1 }
else { fib(n-2) + fib(n-1) }
}
fib(10) // should *not* be automatically inlined
}
\end{lstlisting}
To incentivize the user to break functions down into the composition of many auxiliary functions
we can introduce \emph{local function definitions}. That is, the user may define a function within
a function. For this project it is sufficient to enforce local functions that only have access
to their own parameters and locals, but not the surrounding function's parameters or locals.
This project is for two people, if you choose to also implement local function definitions,
and one otherwise.
\ No newline at end of file
%% To import in the preambule
%\usepackage{listings}
\usepackage{letltxmacro}
\newcommand*{\SavedLstInline}{}
\LetLtxMacro\SavedLstInline\lstinline
\DeclareRobustCommand*{\lstinline}{%
\ifmmode
\let\SavedBGroup\bgroup
\def\bgroup{%
\let\bgroup\SavedBGroup
\hbox\bgroup
}%
\fi
\SavedLstInline
}
\lstdefinelanguage{ML}{
alsoletter={*},
morekeywords={datatype, of, if, *},
sensitive=true,
morecomment=[s]{/*}{*/},
morestring=[b]"
}
% "define" Scala
\lstdefinelanguage{scala}{
alsoletter={@,=,>},
morekeywords={abstract, Boolean, case, class, def,
else, error, extends, false, free, if, implicit, Int, match,
object, operator, String, true, Unit, val, var, while,
for, in, inline, array, external, export},
sensitive=true,
morecomment=[l]{//},
morecomment=[s]{/*}{*/},
morestring=[b]"
}
% \newcommand{\codestyle}{\tiny\sffamily}
\newcommand{\codestyle}{\ttfamily}
\newcommand{\SAND}{\mbox{\tt \&\&}\xspace}
\newcommand{\SOR}{\mbox{\tt ||}\xspace}
\newcommand{\MOD}{\mbox{\tt \%}\xspace}
\newcommand{\DIV}{\mbox{\tt /}\xspace}
\newcommand{\PP}{\mbox{\tt ++}\xspace}
\newcommand{\MM}{\mbox{\tt {-}{-}}\xspace}
\newcommand{\RA}{\Rightarrow}
\newcommand{\EQ}{\mbox{\tt ==}}
\newcommand{\NEQ}{\mbox{\tt !=}}
\newcommand{\SLE}{\ensuremath{\leq}}
\newcommand{\SGE}{\ensuremath{\geq}}
\newcommand{\SGT}{\mbox{\tt >}}
\newcommand{\SLT}{\mbox{\tt <}}
\newcommand{\rA}{\rightarrow}
\newcommand{\lA}{\leftarrow}
%============================
% To make it colorful uncomment \color in next 30 lines
%\makeatletter
%\newcommand*\idstyle{%
% \expandafter\id@style\the\lst@token\relax
%}
%\def\id@style#1#2\relax{%
% \ifcat#1\relax\else
% \ifnum`#1=\uccode`#1\color{blue!60!black}
% \fi
% \fi
%}
\makeatother
% Default settings for code listings
\lstset{
language=scala,
showstringspaces=false,
columns=fullflexible,
mathescape=true,
numbers=none,
% numberstyle=\tiny,
basicstyle=\codestyle,
keywordstyle=\bfseries\color{blue!60!black}
,
commentstyle=\itshape\color{red!60!black}
,
%identifierstyle=\idstyle,
tabsize=2%,
%aboveskip=0pt,
%belowskip=0pt
}
\section{Type systems}
\subsection{Polymorphic types (2)}
Allow polymorphic types for functions and classes.
\begin{lstlisting}
abstract class List[A]
case class Nil[A]() extends List[A]
case class Cons[A](h: A, t: List[A]) extends List[A]
def length[A](l: List[A]): Int = {
l match {
case Nil() => 0
case Cons(_, t) => 1 + length(t)
}
}
case class Cons2[A, B](h1: A, h2: B, t: List[A]) extends List[A]
// Wrong, type parameters don't match
\end{lstlisting}
You can assume the sequence of type parameters of an extending class
is identical with the parent in the \lstinline{extends} clause
(see example).
\subsection{Case class subtyping (2)}
Add subtyping support to \langname.
Case classes are now types of their own:
\begin{lstlisting}
val y: Some = Some(0) // Correct, Some is a type
val x: Option = None() // Correct, because None <: Option
val z: Some = None() // Wrong
y match {
case Some(i) => () // Correct
case None() => () // Wrong
}
\end{lstlisting}
Since case classes are types, you can declare a variable, parameter,
ADT field or function return type to be of a case class type,
like any other type.
Case class types are subtypes of their parent (abstract class) type.
This means you can assign case class values to variables
declared with the parent type.
Since we have subtyping, you can now optionally support the \lstinline{Nothing}
type in source code, which is a subtype of every type
and the type of \lstinline{error} expressions.
For this project you will probably rewrite the type checking phase in its entirety.
Rather than dealing with explicit constraints, the resulting phase could perform
more classical type-checking based on the minimal type satisfying all the local
subtyping constraints (the so-called \emph{least-upper bound}).
\subsection{Arrays and range types}
In both of the following two projects you would add fixed-size arrays of integers as
a primitive language feature along with a type system that allows users to specify
the range of integers.
The information about an integer's range can then be used to make array accesses safe
by ensuring that indices are in-bounds.
The difference between the two projects lies in \emph{when} integer bounds are checked,
i.e., at compile-time (\emph{statically}) or at runtime (\emph{dynamically}).
In either case you will add two kinds of types:
First, a family of primitive types \lstinline{array[$n$]} that represent integer arrays of
size $n$.
Furthermore, \emph{range types} that represent subsets of \lstinline{Int} taking the
following form:
\lstinline{[$i$ .. $j$]} where $i$ and $j$ are integer constants.
The intended semantics is for \lstinline{[$i$ .. $j$]} to represent a signed 32-bit integer
$n$ such that $i \le n \le j$.
\subsubsection{Dynamically-checked range types (2)}
Your type system should allow users to specify \emph{concrete} ranges, e.g.,
\lstinline{[0 .. 7]} to denote integers $0 \le n \le 7$. Values of \lstinline{Int} and
any range types will be compatible during type-checking, but your system will have to be
able to detect when an integer might not fall within a given range at runtime.
During code generation your task will then be to emit \emph{runtime checks} to ensure
that, e.g., an \lstinline{Int} in fact falls within the range \lstinline{[0 .. 7]}.
\begin{lstlisting}
// initialize an array of size 8:
val arr: array[8] = [10, 20, 30, 40, 50, 60, 70, 80];
arr[0]; // okay, should not emit any runtime check
arr[arr.length-1]; // okay, same as above
// also okay, but should emit a runtime bounds check:
arr[Std.readInt()];
\end{lstlisting}
In effect, your system will ensure that array accesses are always in-bounds, i.e., do not
over- or under-run an array's first, respectively last, element.
Note that the resulting system should only emit the minimal number of runtime checks to
ensure such safety. For instance, consider the following program:
\begin{lstlisting}
def printBoth(arr1: array[8], arr2: array[8], i: [0 .. 7]): Unit = {
Std.printInt(arr1[i], arr2[i])
}
val someInt: Int = 4;
printBoth([1,2,3,4,5,6,7,8], [8,7,6,5,4,3,2,1], someInt)
\end{lstlisting}
Here it is not necessary to perform any checks in the body of \lstinline{printBoth}, since
whatever values are passed as arguments for parameter \lstinline{i} should have previously
been checked to lie between 0 and 7.
In this concrete case, a runtime check should occur when \lstinline{someInt} is passed to
\lstinline{printBoth}.
\subsubsection{Statically-checked range types (2+, challenging)}
Your type system should be strict and detect potential out-of-bounds array accesses
early. In particular, when your type checker cannot prove that an integer lies in the
required range it should produce a type error (and stop compilation as usual).
\begin{lstlisting}
val arr: array[8] = [10, 20, 30, 40, 50, 60, 70, 80];
arr[0]; // okay
arr[arr.length-1]; // okay
arr[arr.length]; // not okay, type error "Idx 8 is out-of-bounds"
val i: Int = Std.readInt();
arr[i]; // not okay, type error "Int may be out-of-bounds"
if (i >= 0 && i < 8) {
arr[i] // okay, branch is only taken when i is in bounds
}
\end{lstlisting}
To allow as many programs as possible to be accepted your type-checker will have to
employ precise typing rules for arithmetic expressions and if-expressions.
What you will implement are simple forms of \emph{path sensitivity} and
\emph{abstract intepretation}.
\paragraph{Constant-bounds version (2)}
Implement statically-checked range types for arrays of fixed and statically-known sizes.
Range types will only involve constant bounds and in effect your type-checker will only
have to accept programs that operate on arrays whose sizes are known to you as concrete
integer constants.
The typing rules that you come up with should be sufficiently strong to prove safety of
simple array manipulations such as the following:
\begin{lstlisting}
def printArray(arr: array[4], i: [0 .. 4]): Unit = {
if (i < arr.length) {
Std.printInt(arr[i]);
printArray(arr, i+1)
}
}
printArray([1,2,3,4], 0)
\end{lstlisting}
\paragraph{Dependently-typed version (3)}
Rather than relying on the user to provide the exact sizes of arrays, also allow arrays
to be of a fixed, but not statically-known size. To enable your type system to accept
more programs, you should also extend the notion of range types to allow bounds
\emph{relative to} a given array's size.
The resulting types will extend the above ones in at least two ways:
In addition to \lstinline{array[$n$]} there is a special form \lstinline{array[*]}
which represents an array of arbitrary (but fixed) size.
For a range type \lstinline{[$i$ .. $j$]} $i$ and $j$ may not only be integer constants,
but may also be expressions of the form \lstinline{arr.length + $k$} where
\lstinline{arr} is an Array-typed variable in scope and $k$ is an integer constant.
Your system should then be able to abstract over concrete array sizes by referring to
some Array-typed binding's length like in the following example:
\begin{lstlisting}
def printArray(arr: array[*], i: [0 .. arr.length]): Unit = {
if (i < arr.length) {
Std.printInt(arr[i]);
printArray(arr, i+1)
}
}
printArray([1,2,3,4], 0)
printArray([1,2,3,4,5,6,7,8], 0)
\end{lstlisting}
Note that the resulting language will be \emph{dependently-typed}, meaning that
types can depend on terms. In the above example, for instance, the type of parameter
\lstinline{i} of function \lstinline{printArray} depends on parameter \lstinline{arr}.
% TODO: (2) Simple ownership system / affine types + in-place updates for ADTs?
% TODO: (1) Region-based ADT allocation + static tracking of provenance
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment