Skip to content
Snippets Groups Projects
AmySpec.md 29.75 KiB

Specification of the Amy Language

Computer Language Processing
LARA
Spring 2025

1. Introduction

Welcome to the Amy project! This semester you will learn how to compile a simple functional Scala-like language from source files down to executable code. When your compiler is complete, it will be able to take Amy source (text) files as input and produce WebAssembly bytecode files. WebAssembly is a new format for portable bytecode which is meant to be run in browsers.

This document is the specification of Amy. Its purpose is to help you clearly and unambiguously understand what an Amy program means, and to be the Amy language reference, along with the reference compiler. It does not deal with how you will actually implement the compiler; this will be described to you as assignments are released.

Note: The language might change along the way, so check this specification before starting each lab, to make sure you have the latest version in mind.

1.1 Features of Amy

Let us demonstrate the basic features of Amy through some examples:

1.1.1 The factorial function

object Factorial

def fact(i: Int(32)): Int(32) = {
  if (i < 2) { 1 }
  else { i * fact(i-1) }
}

end Factorial

Every program in Amy is contained in a module, also called object. A function is introduced with the keyword def, and all its parameters and result type must be explicitly typed. Amy supports conditional (or if-) expressions with obligatory brackets. Notice that conditionals are not statements, but return a value, in this case an Int(32).

In fact, there is no distinction between expressions and statements in Amy. Even expressions that are called only for their side-effects return a value of type Unit.

The condition of an if-expression must be of type Boolean and its branches must have the same type, which is also the type of the whole expression.

1.1.2 Saying hello

object Hello
  Std.printString("Hello " ++ "world!")
end Hello

Amy supports compiling multiple modules together. To refer to functions (or other definitions) in another module, one must explicitly use a qualified name. There is no import statement like in Scala.

In this example, we refer to the printString function in the Std module, which contains some builtin functions to interact with the user. The string we print is constructed by concatenating two smaller strings with the ++ operator.

1.1.3 Input, local variables and sequencing expressions

object ReadName
  Std.printString("What is your name?");
  val name: String = Std.readString();
  Std.printString("Hello " ++ name)
end ReadName

We can read input from the console with the readX functions provided in Std.

We can define local variables with val, which must always be typed explicitly. The value of the variable is given after =, followed by a semicolon.

We can sequence expressions with ;. The value of the first expression is discarded, and the value of the second one is returned. Note that ; is an operator and not a terminator: you are not allowed to put it at the end of a sequence of expressions.

1.1.4 Type definitions

Except for the basic types, a user can define their own types in Amy. The user-definable types in Amy come from functional programming and are called algebraic data types. In this case, we define a type, List, and two constructors Nil and Cons, which we can call to construct values of type List.

object L
  abstract class List
  case class Nil() extends List
  case class Cons(h: Int(32), t: List) extends List
end L

1.1.5 Constructing ADT values

def range(from: Int(32), to: Int(32)): List = {
  if (to < from) { Nil() }
  else {
    Cons(from, range(from + 1, to))
  }
}

We can create a List by calling one of its two constructors like a function, as demonstrated in the range function.

1.1.6 Pattern matching

def length(l: List): Int(32) = {
  l match {
    case Nil() => 0
    case Cons(h, t) => 1 + length(t)
  }
}

To use a list value in any meaningful way, we have to break it down, according to the constructor used to construct it. This is called pattern matching and is a powerful feature of functional programming.

In length we pattern match against the input value l. Pattern matching will check if its argument matches the pattern of the first case, and if so will evaluate the corresponding expression. Otherwise it will continue with the second case etc. If no pattern matches, the program will exit with an error. If the constructor has arguments, as does Cons in this case, we can bind their values to fresh variables in the pattern, so we can use them in the case expression.

1.1.7 Wildcard patterns and errors

The error keyword takes a string as argument, prints Error: and its argument on the screen, then exits the program immediately with an error code. In this function, we are trying to compute the head of a list, which should fail if the list is empty.

Notice that in the second case, we don’t really care what the tail of the list is. Therefore, we use a wildcard pattern (_), which matches any value without binding it to a name.

def head(l: List): Int(32) = {
  l match {
    case Cons(h, _) => h
    case Nil() => error("head(Nil)")
  }
}

1.2 Relation to Scala

Amy, with mild syntactic variations, is designed to be as close to a simple subset of Scala as possible. However, it is not a perfect subset. You can easily come up with Amy programs that are not legal in Scala. However, many “reasonable” programs will be compilable with scalac, provided you provide an implementation of the Amy standard library along with your code. This should not be required however, as we are providing a reference implementation of Amy.


2. Syntax

The syntax of Amy is given formally by the context-free grammar of Figure 1. Everything spelled in italic is a nonterminal symbol of the grammar, whereas the terminal symbols are spelled in monospace font. * is the Kleene star, s+ stands for one or more repetitions of s, and ? stands for optional presence of a symbol (zero or one repetitions). The square brackets [] are not symbols of the grammar; they merely group symbols together. Please note that the square brackets [] are still tokenized as they are reserved for future use.

Before parsing an Amy program, the Amy lexer generates a sequence of terminal symbols (tokens) from the source files. Some non-terminal symbols mentioned, but not specified, in Figure 1 are also represented as a single token by the lexer. They are lexed according to the rules in Figure 2. In Figure 2, we denote the range between characters α and β (included) with [α-β].

The syntax in Figure 1 is an overapproximation of the real syntax of Amy. This means that it allows some programs that should not be allowed in Amy. To get the real syntax of Amy, there are some additional restrictions presented (among other things) in the following notes:

  • The reserved words of Amy are the following:
    abstract, Boolean, case, class, def, else, error, extends, false, if, Int, match, object, end, String, true, Unit, val, _ (the wildcard pattern).

    Identifiers are not allowed to coincide with a reserved word.

  • The operators and language constructs of Amy have the following precedence, starting from the lowest:

    1. val, ;
    2. if, match
    3. ||
    4. &&
    5. ==
    6. <, <=
    7. +, -, ++
    8. *, /, %
    9. Unary -, !
    10. error, calls, variables, literals, parenthesized expressions.

    For example,
    1 + 2 * 3 means 1 + (2 * 3) and
    1 + 2 match {...} means (1 + 2) match {...}.

    A little more complicated is the interaction between ; and val:
    the definition part of the val extends only as little as the first semicolon, but then the variable defined is visible through any number of semicolons. Thus (val x: Int(32) = y; z; x) means (val x: Int(32) = y; (z; x)) and not (val x: Int(32) = (y; z); x) or ((val x: Int(32) = y; z); x) (i.e. x takes the value of y and is visible until the end of the expression).

    All operators are left-associative. That means that within the same precedence category, the leftmost application of an operator takes precedence. An exception is the sequence operator, which for ease of the implementation (you will understand during parsing) can be considered right-associative (it is an associative operator so it does not really matter).

  • A val definition is not allowed directly in the value assigned by an enclosing val definition. E.g.
    (val x: Int(32) = val y: Int(32) = 0; 1; 2) is not allowed.
    On the other hand,
    (val x: Int(32) = 0; val y: Int(32) = 1; 2) is allowed.

  • It is not allowed to use a val as a (second) operand to an operator. E.g.
    (1 + val x: Int(32) = 2; x) is not allowed.

  • A unary operator is not allowed as a direct argument of another unary operator. E.g. --x is not allowed.

  • It is not allowed to use match as a first operand of any binary operator, except ;. E.g. (x match { ... } + 1) is not allowed. On the other hand (x match { ... }; x) is allowed.

  • The syntax [ Id . ]? Id refers to an optionally qualified name, for example either MyModule.foo or foo. If the qualifier is included, the qualified name refers to a definition foo in another module MyModule; otherwise, foo should be defined in the current module. Since Amy does not have the import statement of Scala or Java, this is the only way to refer to definitions in other modules.

  • One line comments are introduced with //: //This is a comment. Everything until the end of the line is a comment and should be ignored by the lexer.

  • Multiline comments can be used as follows: /*This is a comment */. Everything between the delimiters is a comment, notably including newline characters and /*. Nested comments are not allowed.

  • Escaped characters are not recognised inside string literals. I.e. "\n" stands for a string literal which contains a backslash and an n.


Figure 1: Syntax of Amy

Program ::= Module∗

Module ::= object Id Definition∗ Expr? end Id
Definition ::= AbstractClassDef | CaseClassDef | FunDef

AbstractClassDef ::= abstract class Id
CaseClassDef ::= case class Id ( Params ) extends Id

FunDef ::= def Id ( Params ) : Type = { Expr }
Params ::= ϵ | ParamDef [ , ParamDef ]∗

ParamDef ::= Id : Type
Type ::= Int ( 32 ) | String | Boolean | Unit | [ Id . ]? Id
Expr ::= Id
       | Literal
       | Expr BinOp Expr
       | UnaryOp Expr
       | [ Id . ]? Id ( Args )
       | Expr ; Expr
       | val ParamDef = Expr ; Expr
       | if ( Expr ) { Expr } else { Expr }
       | Expr match { MatchCase+ }
       | error ( Expr )
       | ( Expr )

Literal ::= true | false | ( )
          | IntLiteral | StringLiteral

BinOp ::= + | - | * | / | % | < | <=
        | && | || | == | ++

UnaryOp ::= - | !

MatchCase ::= case Pattern => Expr
Pattern ::= [ Id . ]? Id ( Patterns ) | Id | Literal |
Patterns ::= ϵ | Pattern [ , Pattern ]∗

Args ::= ϵ | Expr [ , Expr ]∗
Figure 2: Lexical rules for Amy

IntLiteral ::= Digit+

Id ::= Alpha AlphaNum∗ (and not a reserved word)
AlphaNum ::= Alpha | Digit |
Alpha ::= [a-z] | [A-Z]
Digit ::= [0-9]

StringLiteral ::= " StringChar∗ "
StringChar ::= Any character except newline and "

3. Semantics

In this section we will give the semantics of Amy, i.e. we will systematically explain what an Amy program represents, as well as give the restrictions that a legal Amy program must obey. The discussion will be informal, except for the typing rules of Amy.