Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • shchen/cs320
  • raveendr/cs320
  • mwojnaro/cs320
3 results
Show changes
Showing
with 0 additions and 1029 deletions
object ReadName
Std.printString("What is your name?");
val name: String = Std.readString();
Std.printString("Hello " ++ name)
end ReadName
object L
abstract class List
case class Nil() extends List
case class Cons(h: Int(32), t: List) extends List
end L
fn range(from: Int(32), to: Int(32)): List = {
if (to < from) { Nil() }
else {
Cons(from, range(from + 1, to))
}
}
fn length(l: List): Int(32) = { l match {
case Nil() => 0
case Cons(h, t) => 1 + length(t)
}}
fn head(l: List): Int(32) = {
l match {
case Cons(h, _) => h
case Nil() => error("head(Nil)")
}
}
File deleted
\documentclass[]{article}
%\settopmatter{printfolios=true}
% For final camera-ready submission
% \documentclass[acmlarge]{acmart}
% \settopmatter{}
\usepackage{amssymb}
\usepackage{amsmath}
\usepackage{defs}
\usepackage{listings}
\usepackage{stmaryrd}
\usepackage{xcolor}
\usepackage{xspace}
\usepackage[colorlinks]{hyperref}
\hypersetup{urlcolor=cyan}
\usepackage{caption} % Link to beginning of figures
\usepackage{mathpartir}
%\usepackage{subcaption}
\input{scalalistings}
\title{Specification of the \langname language}
\date{Computer Language Processing\\~\\LARA\\~\\Autumn 2021}
\begin{document}
\maketitle
\input{introduction}
\input{syntax}
\input{semantics}
\input{informal}
\input{formal}
\input{types}
\input{moretypes}
\input{library}
\end{document}
pdflatex amy-specification
pdflatex amy-specification
\newcommand{\CEGIS}{\textsf{CEGIS}}
\newcommand{\TerminalRule}{\textsf{Terminal}}
\newcommand{\Search}{\textsf{Search}}
\newcommand{\Verify}{\textsf{Verify}}
\newcommand{\Enumerate}{\textsf{Enumerate}}
\newcommand{\from}{\mathbin{\leftarrow}}
\newcommand{\union}{\mathbin{\cup}}
\newcommand{\expt}{\mathcal{E}}
\newcommand{\Expansions}{\mathcal{E}}
\newcommand{\prob}[1]{\operatorname{Pr}[#1]}
\newcommand{\R}{\mathbb{R}}
\newcommand{\Land}{\bigwedge}
\newcommand{\cost}{\operatorname{cost}}
\newcommand{\horizon}{h}
\newcommand{\score}{score}
\newtheorem{thm}{Theorem}[section]
\newcommand{\smartparagraph}[1]{\noindent\textbf{#1}}
\newcommand{\sparagraph}[1]{\noindent\textbf{#1}}
\newcommand{\TODO}[1]{\marginpar{\color{red}TODO}{\color{red}#1}\xspace}
% Name calling
\newcommand{\leon}{Leon\xspace}
\newcommand{\leonsyn}{LeonSyn\xspace}
\newcommand{\ourcegis}{STE\xspace} % DONE : find a non-ridiculous name
\newcommand{\ourca}{CA\xspace}
\newcommand{\andor}{\textsc{and/or}\xspace}
\newcommand{\insynth}{InSynth\xspace}
% General math
\newcommand{\ALL}[2]{\ensuremath{\forall #1 :~ #2}}
\newcommand{\EX}[2]{\ensuremath{\exists #1 :~ #2}}
\newcommand{\seq}[1]{\ensuremath{\bar{#1}}}
\newcommand{\seqa}{\seq{a}\xspace}
\newcommand{\seqx}{\seq{x}\xspace}
\newcommand{\seqt}{\seq{T}}
\newcommand{\seqg}{\seq{G}}
\newcommand{\seqr}{\seq{r}}
\newcommand{\varsof}[1]{\ensuremath{\text{vars}(#1)}}
\newcommand{\splus}{\ensuremath{\mathop{,}}} % separator in sequences
% Synthesis framework
\newcommand{\br}[4]{\ensuremath{\left\llbracket #1 \ \left\langle #2 \rhd #3 \right\rangle \ #4\right\rrbracket}}
\newcommand{\pg}[2]{\langle {#1} \mid {#2} \rangle}
\newcommand{\similar}[1]{\ensuremath{G(\textsf{#1})}}
\newcommand{\similarr}[2]{\ensuremath{G_{#2}(\textsf{#1})}}
\newcommand{\prename}{\ensuremath{P}\xspace}
\newcommand{\pcname}{\ensuremath{\Pi}\xspace}
\newcommand{\pgname}{\seqt}
\newcommand{\inputs}{\mathcal{I}}
\newcommand{\pgite}[3]{\ensuremath{\text{\textsf{if(}}#1\text{\textsf{) \{}}#2\text{\textsf{\} else \{}}#3\text{\textsf{\}}}}}
\newcommand{\pglet}[3]{\ensuremath{\text{\textsf{val}} \ #1 \colonequals #2 \text{\textsf{;}} \ #3}}
\newcommand{\match}[2]{\ensuremath{\text{#1\textsf{ match \{ }}#2\text{\textsf{\}}}}}
\newcommand{\mcase}[2]{\ensuremath{\text{\textsf{ case }}#1 \Rightarrow #2}}
\newcommand{\code}[1]{\text{\textsf{#1}}}
\newcommand{\guide}[1]{\ensuremath{\odot\mkern-4mu\left[#1\right]}}
\newcommand{\terminates}[1]{\ensuremath{\Downarrow\mkern-4mu\left[#1\right]}}
% Listing-like things.
\newcommand{\cl}[1]{\lstinline[mathescape]@#1@}
\newcommand{\clnoat}[1]{\lstinline[mathescape]!#1!}
\newcommand{\mcl}[1]{\ensuremath{\mathsf{#1}}}
% \newcommand{\mcl}[1]{\ensuremath{\text{\lstinline{#1}}}}
\newcommand{\choosesym}{\cl{choose}}
% Hoare triples
\newcommand{\HoareTriple}[3]{
\begin{displaymath}
\left\{\begin{array}{l}#1\end{array}\right\}
\begin{array}{l}#2\end{array}
\left\{\begin{array}{l}#3\end{array}\right\}
\end{displaymath}
}
\newcommand{\hoareTriple}[3]{\{$#1$\} $#2$ \{$#3$\}}
\newcommand{\btrue}{\mcl{true}}
\newcommand{\gpo}{::=}
\newcommand{\gnt}[1]{~#1~}
\newcommand{\gt}[1]{\text{\tt \textbf{~#1~}}}
\newcommand{\gtns}[1]{\text{\tt \textbf{#1}}}
\newcommand{\FIXME}[1]{ {\color{red} FIXME: #1}}
\newcommand\langname{Amy\xspace}
\subsection{Formal discussion of types}
\newcommand{\hastype}[2]{\Gamma \vdash #1: #2}
In this section, we give a formal (i.e. mathematically robust) description of the Amy typing rules.
A typing rule will be given as
\begin{mathpar}
\inferrule[Rule Name]{P_1 \and \ldots \and P_n}{C}
\end{mathpar}
\noindent where $P_i$ are the rule \emph{premises} and $C$ is the rule \emph{conclusion}.
A typing rule means that the conclusion is true under the premises.
Conclusions and most premises will be \emph{type judgements} in an \emph{environment}.
A type judgement $\hastype{e}{T}$ means that an expression (or pattern) $e$ has type $T$
in environment $\Gamma$.
Environments $\Gamma$ are mappings from variables to types and will be written as
\hbox{$\Gamma = v_1: T_1, \ldots, v_n: T_n$}. We can add a new pair to an environment $\Gamma$
by writing $\Gamma, v_{n+1}: T_{n+1}$.
We will also sometimes write a type judgement of the form $\Gamma \vdash p$.
This means that $p$ typechecks, but we don't assign a type to it.
Type checking will try to typecheck a program under the \emph{initial environment},
and reject the program if it fails to do so.
The \emph{initial environment} $\Gamma_0(p)$ of a program $p$ is one
that contains the types of all functions and constructors in $p$,
where a constructor is treated as a function from its fields to its parent type
(see Section~\ref{sec:classes}).
The initial environment is used to kickstart typechecking at the function definition level.
Figure~\ref{figure:types} lists typing rules for expressions.
Figure~\ref{figure:moretypes} lists typing rules for patterns, functions and programs.
In the typing rule for pattern matching,
$bindings(p)$ refers to the variable bindings implied by a pattern as explained in Section~\ref{sec:expr}.
Rules for literal patterns are omitted because they are the same as literal expressions.
\subsection{Typing Rules and Semantics of Expressions}
\label{sec:expr}
Each expression in \langname is associated with a \emph{typing rule},
which constrains and connects its type and the types of its subexpressions.
An \langname program is said to \emph{typecheck} if
(1) all its expressions obey their respective typing rules, and
(2) the body of each function corresponds to its declared return type.
A program that does not typecheck will be rejected by the compiler.
\newcommand\Int{\gtns{Int(32)}\xspace}
\newcommand\Boolean{\gtns{Boolean}\xspace}
\newcommand\String{\gtns{String}\xspace}
\newcommand\Unit{\gtns{Unit}\xspace}
\newcommand{\typeI}[2]{\ensuremath{( #1 ) \RA #2 }}
\newcommand{\typeII}[3]{\ensuremath{( #1 , #2 ) \RA #3 }}
\newcommand{\typeIII}[4]{\ensuremath{( #1 , #2 , #3 ) \RA #4 }}
In the following, we will informally give the typing rules
and explain the semantics (meaning) of each type of expression in \langname.
We will use function type notation for typing of the various operators.
For example, $(A, B) \Rightarrow C$ denotes that an operator takes arguments of types $A$ and $B$
and returns a value of type $C$.
When talking about the semantics of an expression we will refer to a \emph{context}.
A context is a mapping from variables to the values that have been assigned to them.
\begin{itemize}
\item Literals of \langname are expressions of the base types that are \emph{values},
i.e. they cannot be evaluated further.
The literals \lstinline{true} and \lstinline{false} have type \Boolean.
\lstinline{()}, the unit literal, has type \Unit.
String literals have type \String and integer literals have type \Int.
\item A variable has the type of the corresponding definition
(function parameter or local variable definition).
Its value is the value assigned to it in the current context.
\item \gtns{+}, \gtns{-}, \gtns{*}, \gtns{/} and \gtns{\%} have type \typeII{\Int}{\Int}{\Int},
and are the usual integer operators.
\item Unary \gtns{-} has type \typeI{\Int}{\Int} and is the integer negation.
\item \gtns{<} and \gtns{<=} have type \typeII{\Int}{\Int}{\Boolean} and are the usual arithmetic
comparison operators.
\item \gtns{\&\&} and \gtns{||} have type \typeII{\Boolean}{\Boolean}{\Boolean}
and are the boolean conjunction and disjunction. \emph{Notice that these operators
are short-circuiting}. This means that the second argument does not get evaluated
if the result is known after computing the first one.
For example, \lstinline{true || error("")} will yield \lstinline{true} and not result in an error,
whereas \lstinline{false || error("")} will result in an error in the program.
\item \gtns{!} has type \typeI{\Boolean}{\Boolean} and is the boolean negation.
\item \gtns{++} has type \typeII{\String}{\String}{\String} and is the string concatenation.
\item \gtns{==} is the equality operator. It has type \typeII{A}{A}{\Boolean} for every type A.
Equality for values of the \lstinline{Int(32)}, \lstinline{Boolean} and \lstinline{Unit} types
is defined as \emph{value equality}, i.e. two values are equal if they have the
same representation. E.g. \lstinline{0 == 0}, \lstinline{() == ()} and \lstinline{(1 + 2) == 3}.
Equality for the \emph{reference types} \lstinline{String} and all user-defined types
is defined as \emph{reference equality}, i.e. two values are equal only if they refer to the
same object.
I.e. \hbox{\lstinline{""}\ \lstinline{ == ""}},
\hbox{\lstinline{"a"}\ \lstinline{++ "b"}\ \lstinline{ == "ab"}} and
\hbox{\lstinline{Nil() == Nil()}} all evaluate to \lstinline{false},
whereas \hbox{\lstinline{(val s = "Hello"; s == s)}} evaluates to \lstinline{true}.
\item \gtns{error()} has type \typeI{\String}{A} for any type $A$, i.e. \gtns{error} is always acceptable,
regardless of the expected type. When a program encounters \gtns{error}, it needs to print
something like \lstinline{Error: <msg>}, where \lstinline{<msg>} is its evaluated argument,
and then exit immediately.
\item \lstinline|if(..) {..} else {..}| has type \typeIII{\Boolean}{A}{A}{A} for any type $A$,
and has the following meaning:
First, evaluate the condition of \lstinline{if}. If it evaluates to \lstinline{true},
evaluate and return the then-branch; otherwise, evaluate and return the else-branch.
Notice that the value that is not taken is not evaluated.
\item \lstinline{;} is the \emph{sequence} operator. It has type
\typeII{A}{B}{B} for any types $A$ and $B$.
Notice that the first expression has to be well typed, although its precise type does not matter.
\lstinline{;} evaluates and discards its first argument
(which we will usually invoke just for its side-effects)
and then evaluates and returns its second argument.
\item \lstinline{val n = e; b} defines a local variable with name \lstinline{n} and
adds it to the context, mapped to the value of \lstinline{e}.
It is visible in \lstinline{b} but not in \lstinline{e}.
\lstinline{n} has to obey the name restrictions described in Section~\ref{sec:names}.
\item An expression \lstinline{f(..)} or \lstinline{m.f(..)} denotes either a function call,
or an invocation of a type constructor.
\lstinline{f} has to be the name of a function/constructor defined in the program.
The types of the real arguments of the function/constructor invocation have to match
the corresponding types of the formal arguments in the definition of the function/constructor.
The type of a function/constructor call is the return type of the function,
or the parent type of the constructor respectively.
Evaluating a function call means evaluating its body in a new context,
containing the function's formal arguments mapped to the values of the real
arguments provided at the function call.
Evaluating a call to a constructor means generating and returning a fresh object
containing (a reference to) the constructor and the arguments passed to it.
Notice that an invocation of a type constructor \emph{on values} is itself a value,
i.e. cannot be evaluated further. It corresponds to literals of the other types.
\item \lstinline{match} is the pattern-matching construct of \langname.
It corresponds to Scala's pattern matching. Java programmers can think of
it as a generalized switch-statement.
\lstinline{match} is the only way to access the structure of a value
of a class type. It also happens to be the most complicated structure
of \langname.
\paragraph{Terminology:} To explain how the match-expression works, let us first establish some
terminology. A match case has a \emph{scrutinee} (the first operand,
which gets pattern matched on), and a number of \emph{match cases}
(or simply cases). A case is introduced with the keyword \gt{case},
followed by the \emph{(case) pattern}, then the symbol \gt{=>} and finally
an expression, which we will call the \emph{case expression}.
As seen in Section~\ref{sec:syntax}, a pattern comes in four different forms,
which in the grammar are denoted as
(1) $Id\gtns{(}Patterns\gtns{)}$, (2) $Id$, (3) $Literal$ and \hbox{(4) $\gnt{\_}$}.
We will call those forms \emph{case class pattern}, \emph{identifier pattern},
\emph{literal pattern} and \emph{wildcard pattern} respectively.
The identifier at the beginning of case class pattern is called the \emph{constructor} of the pattern,
and its arguments are called its \emph{subpatterns}.
\paragraph{Typing rules:} For the match-expression to typecheck, two conditions have to hold:
\begin{itemize}
\item All its case expressions have the same type,
which is also the type of the whole match expression.
\item All case patterns have to \emph{follow} the type of the scrutinee.
For a pattern to follow a type means the following, according to its form:
\begin{itemize}
\item Each literal pattern follows exactly the type of its literal.
\item Wildcard and identifier patterns follow any type.
\item A case class pattern follows only the resulting type of its constructor,
if and only if all its subpatterns follow the types of the respective
fields of the constructor.
For example, \lstinline|Nil() match { case Cons(_, t) => () }| typechecks,
whereas \lstinline|Nil() match { case 0 => () }| does not.
\end{itemize}
\end{itemize}
\paragraph{Semantics:} The semantics of pattern matching are as follows:
First, the scrutinee is evaluated, then cases are scanned one by one
until one is found whose pattern \emph{matches} the scrutinee value.
If such case is found, its case expression is evaluated,
after adding to the environment the variables bound in the case pattern (see below).
The value produced in this way is returned as the value of the match-expression.
If none is found, the program terminates with an error.
We say that a pattern \emph{matches} a value when the following holds:
\begin{itemize}
\item A wildcard pattern $\gtns{\_}$ or an identifier pattern
\lstinline{x} match any value. In the second case,
\lstinline{x} is bound to that value when evaluating
the case expression.
\item A literal pattern matches exactly the value of its literal.
Notice that string literals are compared by reference,
so they can never match.
\item A case class pattern \lstinline{case C(..)} matches a value $v$,
if and only if $v$ has been constructed with the same constructor \lstinline{C}
and every subpattern of the pattern matches the corresponding field of $v$.
Notice that we have to recursively bind identifiers in subpatterns.
\end{itemize}
\item Parentheses \lstinline{(e)} can be used freely around an expression \lstinline{e},
mainly to override operator precedence or to make the program more readable.
\lstinline{(e)} is equivalent to \lstinline{e}.
\item When evaluating an expression composed of sub-expressions (e.g. \lstinline{f(a,b)}),
the different sub-expressions are evaluated in left-to-right order (i.e. in the previous example, the sub-expressions would be valuated in the following order: \lstinline{f} then \lstinline{a} then \lstinline{b}).
Function calls are done using the call by value strategy.
\end{itemize}
\section{Introduction}
Welcome to the \langname project! This semester you will learn how to compile a simple functional
Scala-like language
from source files down to executable code. When your compiler is complete,
it will be able to take \langname source (text) files as input and produce
\href{http://webassembly.org}{WebAssembly} bytecode files.
WebAssembly is a new format for portable bytecode which is meant to be run in browsers.
This document is the specification of \langname. Its purpose is to help you clearly
and unambiguously understand what an \langname program means,
and to be the \langname language reference,
along with the reference compiler.
It does not deal with how you will actually implement the compiler;
this will be described to you as assignments are released.
\subsection{Features of \langname}
Let us demonstrate the basic features of \langname through some examples:
\subsubsection{The factorial function}
\begin{figure}[h]
\lstinputlisting{Factorial.scala}
\end{figure}
Every program in \langname is contained in a module, also called \lstinline{object}.
A function is introduced with the keyword \lstinline{fn}, and all its parameters
and result type must be explicitly typed.
\langname supports conditional (or \lstinline{if}-) expressions with obligatory brackets.
Notice that conditionals are not statements, but return a value,
in this case an \lstinline{Int}(32).
In fact, there is no distinction between expressions
and statements in \langname. Even expressions that are called only for their
side-effects return a value of type \lstinline{Unit}.
The condition of an \lstinline{if}-expression must be of type \lstinline{Boolean}
and its branches must have the same type, which is also the type of the whole expression.
\subsubsection{Saying hello}
\begin{figure}[h]
\lstinputlisting{Hello1.scala}
\end{figure}
\langname supports compiling multiple modules together.
To refer to functions (or other definitions) in another module,
one must explicitly use a qualified name.
There is no import statement like in Scala.
In this example, we refer to the \lstinline{printString} function in the
\lstinline{Std} module, which contains some builtin functions to interact with the user.
The string we print is constructed by concatenating two smaller strings
with the \lstinline{++} operator.
\subsubsection{Input, local variables and sequencing expressions}
\begin{figure}[h]
\lstinputlisting{Hello2.scala}
\end{figure}
We can read input from the console with the \lstinline{readX} functions
provided in \lstinline{Std}.
We can define local variables with \lstinline{val},
which must always be typed explicitly.
The value of the variable is given after ``\lstinline{=}'',
followed by a semicolon.
We can sequence expressions with ``\lstinline{;}''.
The value of the first expression is discarded,
and the value of the second one is returned.
Note that ``\lstinline{;}'' is an \emph{operator}
and not a terminator: you are not allowed to put it
at the end of a sequence of expressions.
\subsubsection{Type definitions}
\begin{figure}[h]
\lstinputlisting{List1.scala}
\end{figure}
Except for the basic types, a user can define their own types in \langname.
The user-definable types in \langname come from functional programming and
are called \emph{algebraic data types}.
In this case, we define a type, \lstinline{List},
and two constructors \lstinline{Nil} and \lstinline{Cons},
which we can call to construct values of type \lstinline{List}.
\subsubsection{Constructing ADT values}
\begin{figure}[h!]
\lstinputlisting{List2.scala}
\end{figure}
We can create a \lstinline{List} by calling one of its two constructors like a function,
as demonstrated in the \lstinline{range} function.
\subsubsection{Pattern matching}
\begin{figure}[h!]
\lstinputlisting{List3.scala}
\end{figure}
To use a list value in any meaningful way,
we have to break it down, according to the constructor used to construct it.
This is called \emph{pattern matching} and is a powerful feature of functional programming.
In \lstinline{length} we pattern match against the input value \lstinline{l}.
Pattern matching will check if its argument matches the pattern of the first case,
and if so will evaluate the corresponding expression.
Otherwise it will continue with the second case etc.
If no pattern matches, the program will exit with an error.
If the constructor has arguments, as does \lstinline{Cons} in this case,
we can bind their values to fresh variables in the pattern,
so we can use them in the case expression.
\subsubsection{Wildcard patterns and errors}
\begin{figure}[h]
\lstinputlisting{List4.scala}
\end{figure}
The \lstinline{error} keyword takes a string as argument,
prints \lstinline{Error: } and its argument on the screen,
then exits the program immediately with an error code.
In this function, we are trying to compute the head of a list,
which should fail if the list is empty.
Notice that in the second case,
we don't really care what the tail of the list is.
Therefore, we use a \emph{wildcard pattern} (\lstinline{_}),
which matches any value without binding it to a name.
\subsection{Relation to Scala}
\langname, with mild syntactic variations, is designed to be as close to a simple subset of Scala as possible.
However, it is not a perfect subset. You can easily come up with \langname programs
that are not legal in Scala.
However, many ``reasonable'' programs will be compilable with \lstinline{scalac},
provided you provide an implementation of the \langname standard library along with your code.
This should not be required however, as we are providing a reference implementation of \langname.
\section{The standard library of \langname}
\langname comes with a library of predefined functions,
which are accessible in the \lstinline{Std} object.
Some of these function implement functionalities
that are not expressible in \langname,
e.g. printing to the standard output.
These \emph{built-in functions} are implemented in JavaScript and \hbox{WebAssembly} in case of compilation,
and in Scala in the interpreter.
Built-in functions have stub implementations in the \langname \lstinline{Std} module
for purposes of name analysis and type checking.
The \langname compiler will not automatically include \lstinline{Std} to the input files.
If you want them included, you have to provide them manually.
The signature of the \lstinline{Std} module is shown in Figure~\ref{fig:std}.
\begin{figure}
\begin{lstlisting}
object Std
// Output
fn printString(s: String): Unit = ...
fn printInt(i: Int(32)): Unit = ...
fn printBoolean(b: Boolean): Unit = ...
// Input
fn readString(): String = ...
fn readInt(): Int(32) = ...
// Conversions
fn intToString(i: Int(32)): String = ...
fn digitToString(i: Int(32)): String = ...
fn booleanToString(b: Boolean): String = ...
end Std
\end{lstlisting}
\caption{The \lstinline{Std} module}
\label{fig:std}
\end{figure}
\begin{figure}
\begin{mathpar}
\inferrule[Wildcard Pattern]
{~}
{\hastype{\gtns{\_}}{T}}
\inferrule[Identifier Pattern]
{~}
{\hastype{v}{T}}
\inferrule[Case Class Pattern]
{\hastype{p_1}{T_1} \and \ldots \and \hastype{p_n}{T_n} \and \hastype{C}{(T_1,\ \ldots, T_n) \RA T}}
{\hastype{C\gtns{(}p_1,\ \ldots, p_n \gtns{)}}{T}}
\inferrule[Function Definition]
{\Gamma, v_1: T_1,\ \ldots, v_n: T_n \vdash e: T}
{\Gamma \vdash \gt{fn} f\gtns{(}v_1: T_1,\ \ldots, v_n: T_n\gtns{):}\ T \gt{= \{} e \gt{\}}}
\inferrule[Program]
{\forall f \in p.\ \Gamma_0(p) \vdash f}
{\vdash p}
\end{mathpar}
\caption{Typing rules for patterns, functions and programs}
\label{figure:moretypes}
\end{figure}
%% To import in the preambule
%\usepackage{listings}
\usepackage{letltxmacro}
\newcommand*{\SavedLstInline}{}
\LetLtxMacro\SavedLstInline\lstinline
\DeclareRobustCommand*{\lstinline}{%
\ifmmode
\let\SavedBGroup\bgroup
\def\bgroup{%
\let\bgroup\SavedBGroup
\hbox\bgroup
}%
\fi
\SavedLstInline
}
\lstdefinelanguage{ML}{
alsoletter={*},
morekeywords={datatype, of, if, *},
sensitive=true,
morecomment=[s]{/*}{*/},
morestring=[b]"
}
% "define" Scala
\lstdefinelanguage{scala}{
alsoletter={@,=,>},
morekeywords={abstract, Boolean, case, class, fn,
else, error, extends, false, if, Int, match,
object, String, true, Unit, val, end},
sensitive=true,
morecomment=[l]{//},
morecomment=[s]{/*}{*/},
morestring=[b]"
}
% \newcommand{\codestyle}{\tiny\sffamily}
\newcommand{\codestyle}{\ttfamily}
\newcommand{\SAND}{\mbox{\tt \&\&}\xspace}
\newcommand{\SOR}{\mbox{\tt ||}\xspace}
\newcommand{\MOD}{\mbox{\tt \%}\xspace}
\newcommand{\DIV}{\mbox{\tt /}\xspace}
\newcommand{\PP}{\mbox{\tt ++}\xspace}
\newcommand{\MM}{\mbox{\tt {-}{-}}\xspace}
\newcommand{\RA}{\Rightarrow}
\newcommand{\EQ}{\mbox{\tt ==}}
\newcommand{\NEQ}{\mbox{\tt !=}}
\newcommand{\SLE}{\ensuremath{\leq}}
\newcommand{\SGE}{\ensuremath{\geq}}
\newcommand{\SGT}{\mbox{\tt >}}
\newcommand{\SLT}{\mbox{\tt <}}
\newcommand{\rA}{\rightarrow}
\newcommand{\lA}{\leftarrow}
%============================
% To make it colorful uncomment \color in next 30 lines
%\makeatletter
%\newcommand*\idstyle{%
% \expandafter\id@style\the\lst@token\relax
%}
%\def\id@style#1#2\relax{%
% \ifcat#1\relax\else
% \ifnum`#1=\uccode`#1\color{blue!60!black}
% \fi
% \fi
%}
\makeatother
% Default settings for code listings
\lstset{
language=scala,
showstringspaces=false,
columns=fullflexible,
mathescape=true,
numbers=none,
% numberstyle=\tiny,
basicstyle=\codestyle,
keywordstyle=\bfseries\color{blue!60!black}
,
commentstyle=\itshape\color{red!60!black}
,
%identifierstyle=\idstyle,
tabsize=2,
aboveskip=0pt,
belowskip=0pt
}
\section{Semantics}
In this section we will give the semantics of \langname, i.e. we
will systematically explain what an \langname program represents,
as well as give the restrictions that a legal \langname program must obey.
The discussion will be informal, except for the typing rules
of \langname.
\subsection{Program Structure}
An \langname program consists of one or more source files.
Each file contains a single module (\gtns{object}),
which in turn consists of a series of type and function definitions,
optionally followed by an expression.
We will use the terms object and module interchangeably.
\subsection{Execution}
When an \langname program is executed,
the expression at the end of each module, if present, is evaluated.
The order of execution among modules is the same that the user gave
when compiling or interpreting the program.
Each module's definitions are visible within the module automatically,
and in all other modules provided a qualified name is used.
\subsection{Naming rules}
In this section, we will give the restrictions that a legal \langname program
must obey with regard to naming or referring to entities defined in the program.
Any program not following these restrictions should be rejected by the \langname
name analyzer.
\label{sec:names}
\begin{itemize}
\item \langname is case-sensitive.
\item No two modules in the program can have the same name.
\item No two classes, constructors, and/or functions in the same module can have the same name.
\item No two parameters of the same function can have the same name.
\item No two local variables of the same function can have the same name if they are visible from
one another.
This includes binders in patterns of match-expressions.
Variables that are not mutually visible can have the same name.
E.g. the program \\
\lstinline{val x : Int(32) = 0; val x : Int(32) = 1; 2} is not legal, whereas \\
\lstinline{(val x : Int(32) = 0; 1); (val x : Int(32) = 1; 2)} is.
\item A local variable can have the same name as a parameter. In this case, the local
variable definition shadows the parameter definition.
\item Every variable encountered in an expression
has to refer to a function parameter or a local variable definition.
\item All case classes have to extend a class in the same module.
\item All function or constructor calls or type references have to refer to a function/constructor/type
defined in the same module, or another module provided a qualified name is given.
It is allowed to refer to a constructor/type/function before declaring it.
\item All calls to constructors and functions have to have the same number of arguments
as the respective constructor/function definition.
\end{itemize}
\subsection{Types and Classes}
\label{sec:classes}
Every expression, function parameter, and class parameter in \langname has a \emph{type}.
Types catch some common programming errors by introducing \emph{typing restrictions}.
Programs that do not obey these restrictions are illegal and will be rejected by
the \langname type checker.
The built-in types of \langname are \gtns{Int(32)}, \gtns{String}, \gtns{Boolean} and \gtns{Unit}.
\gtns{Int(32)} represents 32-bit signed integers.
\gtns{String} is a sequence of characters. Strings have poor support in \langname:
the only operations defined on them are are concatenation and conversion to integer.
In fact, not even equality is ``properly'' supported (see Section~\ref{sec:expr}).
\gtns{Boolean} values can take the values \gtns{true} and \gtns{false}.
\gtns{Unit} represents a type with a single value, \gtns{()}.
It is usually used as the result of a computation which is invoked for its side-effects only,
for example, printing some output to the user.
It corresponds to Java's \gtns{void}.
In addition to the built-in types, the programmer can define their own types.
The sort of types that are definable in \langname are called
\href{https://en.wikipedia.org/wiki/Algebraic_data_type}{Algebraic Data Types} (ADTs)
and come from the functional programming world,
but they have also been successfully adopted in Scala.
An ADT is a \emph{type} along with several \emph{constructors} that can create
values of that type. For example, an ADT defining a list of integers
in pseudo syntax may look like this:
\lstinline{type List = Nil() | Cons(Int(32), List)},
which states that a \lstinline{List} is either \lstinline{Nil} (the empty list),
or a \lstinline{Cons} of an integer and another list.
We will say that \lstinline{Cons} has two \emph{fields} of types \lstinline{Int(32)} and \lstinline{List},
whereas \lstinline{Nil} has no fields.
Inside the program, the only way to construct values of the \lstinline{List} type
is to call one of these constructors, e.g. \lstinline{Nil()} or \lstinline{Cons(1, Cons(2, Nil()))}.
You can think of them as functions from their field types to the \lstinline{List} type.
Notice that in the above syntax, \lstinline{Nil} and \lstinline{Cons}
are \textbf{not} types. More specifically, they are not subtypes of \lstinline{List}:
in fact, there is no subtyping in \langname.
Only \lstinline{List} is a type, and values such as \lstinline{Nil()} or \lstinline{Cons(1, Cons(2, Nil()))}
have the type \lstinline{List}.
In \langname, we use Scala syntax to define ADTs.
A type is defined with an abstract class and the constructors with case classes.
The above definition in \langname would be:
\begin{figure}[h]
\begin{lstlisting}
abstract class List
case class Nil() extends List
case class Cons(h: Int(32), t: List) extends List
\end{lstlisting}
\end{figure}
Notice that the names of the fields have no practical meaning,
and we only use then to stay close to Scala.
We will sometimes use the term abstract class for a type
and case class for a type constructor.
The main programming structure to manipulate class types
is \emph{pattern matching}. In Section~\ref{sec:expr} we define how pattern matching works.
% Cont. in informal.tex
\section{Syntax}
\label{sec:syntax}
\def\alt{~~|~~}
\def\Expr{\gnt{Expr}}
\def\Id{\gnt{Id}}
\def\ID{\gnt{ID}}
\def\({\gt{(}}
\def\){\gt{)}}
\begin{figure}
\begin{equation*}
\begin{array}{rl}
\gnt{Program} \gpo & \gnt{Module^*} \\
\gnt{Module} \gpo & \gt{object} \Id \gnt{Definition^*} \gnt{Expr?} \gt{end} \Id \\
\gnt{Definition} \gpo & \gnt{AbstractClassDef} \alt \gnt{CaseClassDef} \alt \gnt{FunDef} \\
\gnt{AbstractClassDef} \gpo & \gt{abstract} \gt{class} \Id \\
\gnt{CaseClassDef} \gpo & \gt{case} \gt{class} \Id \( \gnt{Params} \) \gt{extends} \Id \\
\gnt{FunDef} \gpo & \gt{fn} \gnt{Id} \( \gnt{Params} \) \gt{:} \gnt{Type} \gt{=} \gt{\{} \Expr \gt{\}} \\
\gnt{Params} \gpo & \ \, \epsilon \alt \gnt{ParamDef} [\gt{,} \gnt{ParamDef}]^* \\
\gnt{ParamDef} \gpo & \gnt{Id} \gt{:} \gnt{Type} \\
\gnt{Type} \gpo & \gt{Int} \( \gnt{32} \) \alt \gt{String} \alt \gt{Boolean} \alt \gt{Unit} \alt [\Id \gnt{.}]? \Id \\
\gnt{Expr} \gpo & \Id \\
\alt & \gnt{Literal} \\
\alt & \Expr\ \ \gnt{BinOp}\ \ \Expr \\
\alt & \gnt{UnaryOp} \Expr \\
\alt & \; [\Id \gnt{.}]? \Id \( \gnt{Args} \) \\
\alt & \Expr \gt{;} \Expr \\
\alt & \gt{val} \gnt{ParamDef} \gt{=} \gnt{Expr} \gt{;} \gnt{Expr} \\
\alt & \gt{if} \( \Expr \) \gt{\{} \gnt{Expr} \gt{\}} \gt{else} \gt{\{} \gnt{Expr} \gt{\}} \\
\alt & \Expr \gt{match} \gt{\{} \gnt{MatchCase^+} \gt{\}} \\
\alt & \gt{error} \gt{(} \Expr \gt{)} \\
\alt & \( \Expr \) \\
\gnt{Literal} \gpo & \gt{true} \alt \gt{false} \alt \( \) \\
\alt & \gnt{IntLiteral} \alt \gnt{StringLiteral} \\
\gnt{BinOp} \gpo & \gt{+}\alt \gt{-} \alt \gt{*} \alt \gt{/} \alt \gt{\%} \alt \gt{<} \alt \gt{<=} \\
\alt & \gt{\&\&} \alt \gt{||} \alt \gt{==} \alt \gt{++} \\
\gnt{UnaryOp} \gpo & \gt{-} \alt \gt{!} \\
\gnt{MatchCase} \gpo & \gt{case} \gnt{Pattern} \gt{=>} \Expr \\
\gnt{Pattern} \gpo & \; [\Id \gnt{.}]? \Id \( \gnt{Patterns} \) \alt \Id \alt \gnt{Literal} \alt \gt{\_} \\
\gnt{Patterns} \gpo & \ \; \epsilon \alt \gnt{Pattern} [\gt{,} \gnt{Pattern}]^* \\
\gnt{Args} \gpo & \ \, \epsilon \alt \Expr [\gt{,} \Expr]^* \\
\end{array}
\end{equation*}
\caption{Syntax of \langname}
\label{figure:syntax}
\end{figure}
\begin{figure}
\begin{equation*}
\begin{array}{rl}
\gnt{IntLiteral} \gpo & \gnt{Digit^+}\\
\gnt{Id} \gpo & \gnt{Alpha} \gnt{AlphaNum^*} \text{(and not a reserved word)}\\
\gnt{AlphaNum} \gpo & \gnt{Alpha} \alt \gnt{Digit}\ \alt \gtns{\_} \\
\gnt{Alpha} \gpo & ~[\gtns{a}-\gtns{z}]\ \alt [\gtns{A}-\gtns{Z}] \\
\gnt{Digit} \gpo & ~[\gtns{0}-\gtns{9}] \\
\gnt{StringLiteral} \gpo & ~\gtns{"} \gnt{StringChar^*} \gtns{"}\\
\gnt{StringChar} \gpo & ~\text{Any character except newline and $\gtns{"}$}
\end{array}
\end{equation*}
\caption{Lexical rules for \langname}
\label{figure:lexing}
\end{figure}
The syntax of \langname is given formally by the context-free grammar of Figure~\ref{figure:syntax}.
Everything spelled in $italic$ is a nonterminal symbol of the grammar,
whereas the terminal symbols are spelled in \gtns{monospace} font.
$^*$ is the Kleene star, $s^+$ stands for one or more repetitions of $s$,
and $?$ stands for optional presence of a symbol (zero or one repetitions).
The square brackets $[]$ are not symbols of the grammar,
they merely group symbols together.
Before parsing an \langname program, the Amy \emph{lexer} generates a sequence of terminal symbols
(\emph{tokens}) from the source files.
Some non-terminal symbols mentioned, but not specified, in Figure~\ref{figure:syntax}
are also represented as a single token by the lexer.
They are lexed according to the rules in Figure~\ref{figure:lexing}.
In Figure~\ref{figure:lexing}, we denote the range between characters $\alpha$ and $\beta$ (included)
with $[\alpha - \beta]$.
The syntax in Figure~\ref{figure:syntax} is an \emph{overapproximation} of the real syntax of \langname.
This means that it allows some programs that should not be allowed in Amy.
To get the real syntax of Amy, there are some additional restrictions presented (among other things)
in the following notes:
\begin{itemize}
\item The reserved words of \langname are the following:
\gtns{abstract}, \gtns{Boolean}, \gtns{case}, \gtns{class}, \gtns{fn},
\gtns{else}, \gtns{error}, \gtns{extends}, \gtns{false}, \gtns{if}, \gtns{Int}, \gtns{match},
\gtns{object},
\gtns{end},
\gtns{String}, \gtns{true}, \gtns{Unit}, \gtns{val}, \gtns{\_} (the wildcard pattern).
Identifiers are not allowed to coincide with a reserved word.
\item The operators and language constructs of \langname
have the following precedence, starting from the \emph{lowest}:
(1) \lstinline{val}, \lstinline{;}
(2) \lstinline{if}, \lstinline{match} (3) \lstinline{||}
(4) \lstinline{&&} (5) \lstinline{==}
(6) \lstinline{<}, \lstinline{<=} (7) \lstinline{+}, \lstinline{-}, \lstinline{++}
(8) \lstinline{*}, \lstinline{/}, \lstinline{%}
(9) Unary \lstinline{-}, \lstinline{!}
(10) \lstinline{error}, calls, variables, literals, parenthesized expressions.
For example,\\
\lstinline{1 + 2 * 3} means \lstinline{1 + (2 * 3)} and \\
\lstinline|1 + 2 match {...}| means \lstinline|(1 + 2) match {...}|.
A little more complicated is the interaction between \lstinline{;} and \lstinline{val}:
the definition part of the \lstinline{val} extends only as little as the first semicolon,
but then the variable defined is visible through any number of semicolons.
Thus
\lstinline{(val x: Int(32) = y; z; x)} means \lstinline{(val x: Int(32) = y; (z; x)) }
\sloppy{and not \lstinline{(val x: Int(32) = (y; z); x)} or \lstinline{((val x: Int(32) = y; z); x)}}
(i.e. \lstinline{x} takes the value of y and is visible until the end of the expression).
All operators are left-associative. That means that within
the same precedence category, the leftmost application of an operator takes precedence.
An exception is the sequence operator, which for ease of the implementation
(you will understand during parsing)
can be considered right-associative (it is an associative operator so it does not really
matter).
\item A \lstinline{val} definition is not allowed directly in the value assigned
by an enclosing \lstinline{val} definition.
E.g. \lstinline{(val x: Int(32) = val y: Int(32) = 0; 1; 2)} is not allowed.
On the other hand, \lstinline{(val x: Int(32) = 0; val y: Int(32) = 1; 2)} is allowed.
\item It is not allowed to use a \lstinline{val} as a (second) operand to an operator.
E.g. \lstinline{(1 + val x: Int(32) = 2; x)} is not allowed.
\item A unary operator is not allowed as a direct argument of another unary operator.
E.g. \lstinline{--x} is not allowed.
\item It is not allowed to use \lstinline{match} as a first operand of any binary operator,
except \lstinline{;}. E.g. \lstinline|(x match { ... } + 1)| is not allowed.
On the other hand \lstinline|(x match { ... }; x)| is allowed.
\item The syntax $[\Id \gnt{.}]? \Id$ refers to an optionally qualified name,
for example either \lstinline{MyModule.foo} or \lstinline{foo}.
If the qualifier is included, the qualified name refers to a definition
\lstinline{foo} in another module \lstinline{MyModule};
otherwise, \lstinline{foo} should be defined in the current module.
Since \langname does not have the import statement of Scala or Java,
this is the only way to refer to definitions in other modules.
\item One line comments are introduced with ``\lstinline{//}'': \lstinline{// This is a comment}.
Everything until the end of the line is a comment and should be ignored by the lexer.
\item Multiline comments can be used as follows: \lstinline{/* This is a comment */}.
Everything between the delimiters is a comment, notably including newline characters
and \lstinline{/*}. Nested comments are not allowed.
\item Escaped characters are not recognised inside string literals.
I.e. \lstinline{"\n"} stands for a string literal which contains
a backspace and an ``n''.
\end{itemize}
\newcommand{\typingI}[5]{\inferrule[#1]{\Gamma \vdash #2: #3}{\Gamma \vdash #4: #5}}
\newcommand{\typingII}[7]{\inferrule[#1]{\Gamma \vdash #2: #3 \and \Gamma \vdash #4: #5}{\Gamma \vdash #6: #7}}
\begin{figure}
\begin{mathpar}
\inferrule[Variable]{v: T \in \Gamma}{\hastype{v}{T}}
\inferrule[Int Literal]{i \text{ is an integer literal}}{\hastype{i}{\Int}}
\inferrule[String Literal]{s \text { is a string literal}}{\hastype{s}{\String}}
\inferrule[Unit]{~}{\hastype{\gtns{()}}{\Unit}}
\inferrule[Boolean Literal]{b \in \{\gtns{true}, \gtns{false}\}}{\hastype{b}{\Boolean}}
\inferrule[Arith. Bin. Operators]
{\hastype{e_1}{\Int} \and \hastype{e_2}{\Int} \and op \in \{\gtns{+}, \gtns{-}, \gtns{*}, \gtns{/}, \gtns{\%} \} }
{\hastype{e_1\ op\ e_2}{\Int}}
\inferrule[Arith. Comp. Operators]
{\hastype{e_1}{\Int} \and \hastype{e_2}{\Int} \and op \in \{\gtns{<}, \gtns{<=} \} }
{\hastype{e_1\ op\ e_2}{\Boolean}}
\inferrule[Arith. Negation]{\hastype{e}{\Int}}{\hastype{\gtns{-}e}{\Int}}
\inferrule[Boolean Bin. Operators]
{\hastype{e_1}{\Boolean} \hskip -5pt \and \hastype{e_2}{\Boolean} \hskip -5pt \and op \in \{\gtns{\&\&}, \gtns{||}\} }
{\hastype{e_1\ op\ e_2}{\Boolean}}
\inferrule[Boolean Negation]{\hastype{e}{\Boolean}}{\hastype{\gtns{!}e}{\Boolean}}
\inferrule[String Concatenation]
{\hastype{e_1}{\String} \and \hastype{e_2}{\String} }
{\hastype{e_1 \gt{++} e_2}{\String}}
\inferrule[Equality]
{\hastype{e_1}{T} \and \hastype{e_2}{T}}
{\hastype{e_1 \gt{==} e_2}{\Boolean}}
\inferrule[Sequence]
{\hastype{e_1}{T_1} \and \hastype{e_2}{T_2}}
{\hastype{e_1 \gt{;} e_2}{T_2}}
\inferrule[Local Variable Definition]
{\hastype{e_1}{T_1} \and \Gamma, n: T_1 \vdash e_2: T_2}
{\hastype{\gtns{val}~ n \gt{:} T_1 \gt{=} e_1\gt{;} e_2}{T_2}}
\inferrule[Function/Class Constructor Invocation]
{\hastype{e_1}{T_1} \and \ldots \and \hastype{e_n}{T_n} \and \hastype{f}{(T_1, \ldots, T_n) \RA T}}
{\hastype{f(e_1, \ldots, e_n)}{T}}
\inferrule[If-Then-Else]
{\hastype{e_1}{\Boolean} \and \hastype{e_2}{T} \and \hastype{e_3}{T}}
{\hastype{\gtns{if (}e_1 \gtns{) \{} e_2 \gtns{\} else \{}e_3\gtns{\}}}{T}}
\inferrule[Error]{\hastype{e}{\String}}{\hastype{\gtns{error(}e\gtns{)}}{T}}
\inferrule[Pattern Matching]
{\hastype{e}{T_s} \and \forall i \in [1,n].\ \hastype{p_i}{T_s} \and \forall i \in [1,n].\ \Gamma, bindings(p_i) \vdash e_i : T_c}
{\hastype{e \gt{match \{ case} p_1 \gt{=>} e_1\ \ldots \gt{case} p_n \gt{=>} e_n\ \gtns{\}}}{T_c}}
\end{mathpar}
\caption{Typing rules for expressions}
\label{figure:types}
\end{figure}
\section{Alternative frontends/backends}
This section contains projects that do not modify the language features of \langname,
but change the implementation of a part of the \langname compiler frontend or backend.
\subsection{Code formatter (1)}
Build a code formatter for the Amy language.
A straightforward way to accomplish this would be to add a special mode
(e.g.\ \lstinline{--format}) that the user can start the compiler in.
It would then only run the existing pipeline up to, say, parsing and subsequently go
through a special pretty-printing phase that outputs the program according to code style
rules configurable by the user.
%
A more sophisticated version could instead work on the Token-level, allowing your
formatter to be aware of whitespace, e.g., respecting new-lines that a user inserted.
%
In any case, you will have to maintain comments (which are not part of the AST).
You can look at \lstinline{scalafmt} for some inspiration.
\subsection{Language Server (2)}
Implement a language server for Amy. VSCode and similar IDEs use the \href{https://en.wikipedia.org/wiki/Language_Server_Protocol}{language server protocol} to communicate with compilers and thereby provides deep integration with a variety of programming languages. Among other things, this enables features like showing type-checking errors within the editor, jumping to definitions and looking up all usages of a given definition. Your goal is to provide such functionality for Amy by implementing an additional mode in your compiler, in which it acts as a client for the language server protocol. Your implementation should be demonstrable with VSCode and include the aforementioned features. You can using an existing library, such as \href{https://github.com/eclipse/lsp4j}{LSP4J}, to simplify your task.
\subsection{Formalization of Amy (1)}
Develop an operational semantics for Amy and use your definitions along with Amy's typing rules to prove type safety. Note that this might require you to do some additional reading on type systems.
\subsection{JVM backend (2)}
Implement an alternative backend for \langname which outputs JVM bytecode.
You can use \href{https://github.com/psuter/cafebabe}{this library}.
You first have to think how to represent \langname values in a class-based environment,
and then generate the respective bytecode from \langname ASTs.
\subsection{C backend (3)}
Implement an alternative backend for \langname which outputs C code.
You have to think how to represent \langname values in C,
and then generate respective C code from \langname ASTs.
pdflatex extensions
pdflatex extensions