Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • shchen/cs320
  • raveendr/cs320
  • mwojnaro/cs320
3 results
Show changes
Showing
with 1546 additions and 0 deletions
object HelloInt
Std.printString("What is your name?");
val name: String = Std.readString();
Std.printString("Hello " ++ name ++ "! And how old are you?");
val age: Int(32) = Std.readInt();
Std.printString(Std.intToString(age) ++ " years old then.")
end HelloInt
object Printing
Std.printInt(0); Std.printInt(-222); Std.printInt(42);
Std.printBoolean(true); Std.printBoolean(false);
Std.printString(Std.digitToString(0));
Std.printString(Std.digitToString(5));
Std.printString(Std.digitToString(9));
Std.printString(Std.intToString(0));
Std.printString(Std.intToString(-111));
Std.printString(Std.intToString(22));
Std.printString("Hello " ++ "world!");
Std.printString("" ++ "")
end Printing
object TestLists
val l: L.List = L.Cons(5, L.Cons(-5, L.Cons(-1, L.Cons(0, L.Cons(10, L.Nil())))));
Std.printString(L.toString(L.concat(L.Cons(1, L.Cons(2, L.Nil())), L.Cons(3, L.Nil()))));
Std.printInt(L.sum(l));
Std.printString(L.toString(L.mergeSort(l)))
end TestLists
# Lab 03: Parser
## Preamble
Please make sure to refer to the [Amy specification](../amy-specification/AmySpec.md) as the language might be updated.
## Introduction
Starting from this week you will work on the second stage of the Amy
compiler, the parser. The task of the parser is to take a sequence of
tokens produced by the lexer and transform it into an Abstract Syntax
Tree (AST).
For this purpose you will write a grammar for Amy programs in a Domain
Specific Language (DSL) that can be embedded in Scala. Similarly to what
you have seen in the Lexer lab, each grammar rule will also be
associated with a transformation function that maps the parse result to
an AST. The overall grammar will then be used to automatically parse
sequences of tokens into Amy ASTs, while abstracting away extraneous
syntactical details, such as commas and parentheses.
As you have seen (and will see) in the lectures, there are various
algorithms to parse syntax trees corresponding to context-free grammars.
Any context-free grammar (after some normalization) can be parsed using
the CYK algorithm. However, this algorithm is rather slow: its
complexity is in O(n\^3 \* g) where n is the size of the program and g
the size of the grammar. On the other hand, a more restricted LL(1)
grammar can parse inputs in linear time. Thus, the goal of this lab will
be to develop an LL(1) version of the Amy grammar.
### The Parser Combinator DSL
In the previous lab you already started working with **Silex**, which
was the library we used to tokenize program inputs based on a
prioritized list of regular expressions. In this lab we will start using
its companion library, **Scallion**: Once an input string has been
tokenized, Scallion allows us to parse the token stream using the rules
of an LL(1) grammar and translate to a target data structure, such as an
AST.
To familiarize yourself with the parsing functionality of Scallion,
please make sure you read the [Introduction to (Scallion) Parser
Combinators](material/scallion.md). In it, you will learn how to describe grammars
in Scallion\'s parser combinator DSL and how to ensure that your grammar
lies in LL(1) (which Scallion requires to function correctly).
Once you understand parser combinators, you can get to work on your own
implementation of an Amy parser in `Parser.scala`. Note that in this lab
you will essentially operate on two data structures: Your parser will
consume a sequence of `Token`s (defined in `Tokens.scala`) and produce
an AST (as defined by `NominalTreeModule` in `TreeModule.scala`). To
accomplish this, you will have to define appropriate parsing rules and
translation functions for Scallion.
In `Parser.scala` you will already find a number of parsing rules given
to you, including the starting non-terminal `program`. Others, such as
`expr` are stubs (marked by `???`) that you will have to complete
yourself. Make sure to take advantage of Scallion\'s various helpers
such as the `operators` method that simplifies defining operators of
different precedence and associativity.
### An LL(1) grammar for Amy
As usual, the [Amy specification](../amy-specification/AmySpec.md) will guide you when it comes to deciding what exactly should be accepted by your parser.
Carefully read the *Syntax* section.
Note that the EBNF grammar in the specification merely represents an
over-approximation of Amy's true grammar -- it is too imprecise to be
useful for parsing. Firstly, this grammar is ambiguous. That
is, it allows multiple ways to parse an expression. For example, `x + y * z`
could be parsed as either `(x + y) * z` or as `x + (y * z)`. In other
words, the grammar doesn't enforce either operator precedence or
associativity correctly. Additionally, the restrictions mentioned
throughout the *Syntax* section of the specification are not followed.
Your task is thus to come up with appropriate rules that encode Amy's
true grammar. Furthermore, this grammar should be LL(1) for reasons of
efficiency. Scallion will read your grammar, examine if it is indeed an LL(1) grammar, and, if so, parse input programs. If Scallion determines that the
grammar is not an LL(1) grammar, it will report an error. You can also instruct
Scallion to generate some counter-examples for you (see the `checkLL1`
function).
### Translating to ASTs
Scallion will parse a sequence of tokens according to the grammar you
provide, however, without additional help, it does not know how to build
Amy ASTs. For instance, a (nonsensical) grammar that only accepts
sequences of identifier tokens, e.g.
many(elem(IdentifierKind)): Syntax[Seq[Token]]
will be useful in deciding whether the input matches the expected form,
but will simply return the tokens unchanged when parsing succeeds.
Scallion does allow you to map parse results from one type to another,
however. For instance, in the above example we might want to provide a
function `f(idTokens: Seq[Token]): Seq[Variable]` that transforms the
identifier tokens into (Amy-AST) variables of those names.
For more information on how to use Scallion's `Syntax#map` method
please refer to the [Scallion introduction](material/scallion.md).
## Notes
### Understanding the AST: Nominal vs. Symbolic Trees
If you check the TreeModule file containing the ASTs, you will notice it
is structured in an unusual way: there is a `TreeModule` trait extended
by `NominalTreeModule` and `SymbolicTreeModule`. The reason for this
design is that we need two very similar ASTs, but with different types
representing names in each case. Just after parsing (this assignment),
all names are just *Strings* and qualified names are essentially pairs of
*Strings*. We call ASTs that only use such String-based names `Nominal`
-- the variant we will be using in this lab. Later, during name
analysis, these names will be resolved to unique identifiers, e.g. two
variables that refer to different definitions will be distinct, even if
they have the same name. For now you can just look at the TreeModule and
substitute the types that are not defined there (`Name` and
`QualifiedName`) with their definitions inside `NominalTreeModule`.
### Positions
As you will notice in the code we provide, all generated ASTs have their
position set. The position of each node of the AST is defined as its
starting position, i.e., the position of its first character in the text file. It is important that you set the positions in all the
trees that you create for better error reporting later. Although our
testing infrastructure cannot directly check for presence of positions,
we will check it manually.
### Pretty Printing
Along with the stubs, we provide a printer for Amy ASTs. It will print
parentheses around all expressions so you can clearly see how your
parser interprets precedence and associativity. You can use it to test
your parser, and it will also be used during our testing to compare the
output of your parser with the reference parser.
### Running your code
To debug your parser, you will like to run your code to see the produced trees, or see the counter examples produced by scallion if your grammar is not LL(1). To do so, you can run the following command:
```bash
sbt "run --printTrees <path-to-file>"
```
This will run the compiler pipeline up to the parser and print the nominal trees produced by your parser. This output is the same as the one in the test resources for the test cases.
## Skeleton
As usual, you can find the skeleton in the git repository. This lab
builds on your previous work, so -- given your implementation of the
lexer -- you will only unpack two files from the skeleton.
The structure of your project `src` directory should be as follows:
lib
└── scallion-assembly-0.6.1.jar
library
├── ...
└── ...
examples
├── ...
└── ...
src
├── amyc
│ ├── Main.scala (updated)
│ │
│ ├── ast (new)
│ │ ├── Identifier.scala
│ │ ├── Printer.scala
│ │ └── TreeModule.scala
│ │
│ ├── parsing
│ │ ├── Parser.scala (new)
│ │ ├── Lexer.scala
│ │ └── Tokens.scala
│ │
│ └── utils
│ ├── AmycFatalError.scala
│ ├── Context.scala
│ ├── Document.scala
│ ├── Pipeline.scala
│ ├── Position.scala
│ ├── Reporter.scala
│ └── UniqueCounter.scala
└──test
├── scala
│ └── amyc
│ └── test
│ ├── CompilerTest.scala
│ ├── LexerTests.scala
│ ├── ParserTests.scala (new)
│ ├── TestSuite.scala
│ └── TestUtils.scala
└── resources
├── lexer
│ └── ...
└── parser (new)
└── ...
## Deliverables
Deadline: **28.03.2025 23:59:59**
You should submit your files to Moodle in the corresponding assignment.
You should submit the following files:
- `Parser.scala`: Your implementation of the Amy parser.
File added
object L
abstract class List
case class Nil() extends List
case class Cons(h: Int(32), t: List) extends List
def isEmpty(l : List): Boolean = { l match {
case Nil() => true
case _ => false
}}
def length(l: List): Int(32) = { l match {
case Nil() => 0
case Cons(_, t) => 1 + length(t)
}}
def head(l: List): Int(32) = {
l match {
case Cons(h, _) => h
case Nil() => error("head(Nil)")
}
}
def headOption(l: List): O.Option = {
l match {
case Cons(h, _) => O.Some(h)
case Nil() => O.None()
}
}
def reverse(l: List): List = {
reverseAcc(l, Nil())
}
def reverseAcc(l: List, acc: List): List = {
l match {
case Nil() => acc
case Cons(h, t) => reverseAcc(t, Cons(h, acc))
}
}
def indexOf(l: List, i: Int(32)): Int(32) = {
l match {
case Nil() => -1
case Cons(h, t) =>
if (h == i) { 0 }
else {
val rec: Int(32) = indexOf(t, i);
if (0 <= rec) { rec + 1 }
else { -1 }
}
}
}
def range(from: Int(32), to: Int(32)): List = {
if (to < from) { Nil() }
else {
Cons(from, range(from + 1, to))
}
}
def sum(l: List): Int(32) = { l match {
case Nil() => 0
case Cons(h, t) => h + sum(t)
}}
def concat(l1: List, l2: List): List = {
l1 match {
case Nil() => l2
case Cons(h, t) => Cons(h, concat(t, l2))
}
}
def contains(l: List, elem: Int(32)): Boolean = { l match {
case Nil() =>
false
case Cons(h, t) =>
h == elem || contains(t, elem)
}}
abstract class LPair
case class LP(l1: List, l2: List) extends LPair
def merge(l1: List, l2: List): List = {
l1 match {
case Nil() => l2
case Cons(h1, t1) =>
l2 match {
case Nil() => l1
case Cons(h2, t2) =>
if (h1 <= h2) {
Cons(h1, merge(t1, l2))
} else {
Cons(h2, merge(l1, t2))
}
}
}
}
def split(l: List): LPair = {
l match {
case Cons(h1, Cons(h2, t)) =>
val rec: LPair = split(t);
rec match {
case LP(rec1, rec2) =>
LP(Cons(h1, rec1), Cons(h2, rec2))
}
case _ =>
LP(l, Nil())
}
}
def mergeSort(l: List): List = {
l match {
case Nil() => l
case Cons(h, Nil()) => l
case xs =>
split(xs) match {
case LP(l1, l2) =>
merge(mergeSort(l1), mergeSort(l2))
}
}
}
def toString(l: List): String = { l match {
case Nil() => "List()"
case more => "List(" ++ toString1(more) ++ ")"
}}
def toString1(l : List): String = { l match {
case Cons(h, Nil()) => Std.intToString(h)
case Cons(h, t) => Std.intToString(h) ++ ", " ++ toString1(t)
}}
def take(l: List, n: Int(32)): List = {
if (n <= 0) { Nil() }
else {
l match {
case Nil() => Nil()
case Cons(h, t) =>
Cons(h, take(t, n-1))
}
}
}
end L
object O
abstract class Option
case class None() extends Option
case class Some(v: Int(32)) extends Option
def isdefined(o: Option): Boolean = {
o match {
case None() => false
case _ => true
}
}
def get(o: Option): Int(32) = {
o match {
case Some(i) => i
case None() => error("get(None)")
}
}
def getOrElse(o: Option, i: Int(32)): Int(32) = {
o match {
case None() => i
case Some(oo) => oo
}
}
def orElse(o1: Option, o2: Option): Option = {
o1 match {
case Some(_) => o1
case None() => o2
}
}
def toList(o: Option): L.List = {
o match {
case Some(i) => L.Cons(i, L.Nil())
case None() => L.Nil()
}
}
end O
/** This module contains basic functionality for Amy,
* including stub implementations for some built-in functions
* (implemented in WASM or JavaScript)
*/
object Std
def printInt(i: Int(32)): Unit = {
error("") // Stub implementation
}
def printString(s: String): Unit = {
error("") // Stub implementation
}
def printBoolean(b: Boolean): Unit = {
printString(booleanToString(b))
}
def readString(): String = {
error("") // Stub implementation
}
def readInt(): Int(32) = {
error("") // Stub implementation
}
def intToString(i: Int(32)): String = {
if (i < 0) {
"-" ++ intToString(-i)
} else {
val rem: Int(32) = i % 10;
val div: Int(32) = i / 10;
if (div == 0) { digitToString(rem) }
else { intToString(div) ++ digitToString(rem) }
}
}
def digitToString(i: Int(32)): String = {
error("") // Stub implementation
}
def booleanToString(b: Boolean): String = {
if (b) { "true" } else { "false" }
}
end Std
scalaVersion := "3.5.2"
version := "1.0.0"
organization := "ch.epfl.lara"
organizationName := "LARA"
name := "calculator"
libraryDependencies ++= Seq("org.scalatest" %% "scalatest" % "3.2.10" % "test")
\ No newline at end of file
File added
sbt.version=1.10.7
/* Copyright 2020 EPFL, Lausanne
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package calculator
import scallion.*
import silex.*
sealed trait Token
case class NumberToken(value: Int) extends Token
case class OperatorToken(operator: Char) extends Token
case class ParenthesisToken(isOpen: Boolean) extends Token
case object SpaceToken extends Token
case class UnknownToken(content: String) extends Token
object CalcLexer extends Lexers with CharLexers {
type Position = Unit
type Token = calculator.Token
val lexer = Lexer(
// Operators
oneOf("-+/*!")
|> { cs => OperatorToken(cs.head) },
// Parentheses
elem('(') |> ParenthesisToken(true),
elem(')') |> ParenthesisToken(false),
// Spaces
many1(whiteSpace) |> SpaceToken,
// Numbers
{
elem('0') |
nonZero ~ many(digit)
}
|> { cs => NumberToken(cs.mkString.toInt) }
) onError {
(cs, _) => UnknownToken(cs.mkString)
}
def apply(it: String): Iterator[Token] = {
val source = Source.fromString(it, NoPositioner)
val tokens = lexer(source)
tokens.filter((token: Token) => token != SpaceToken)
}
}
sealed abstract class TokenKind(text: String) {
override def toString = text
}
case object NumberClass extends TokenKind("<number>")
case class OperatorClass(op: Char) extends TokenKind(op.toString)
case class ParenthesisClass(isOpen: Boolean) extends TokenKind(if (isOpen) "(" else ")")
case object OtherClass extends TokenKind("?")
sealed abstract class Expr
case class LitExpr(value: Int) extends Expr
case class BinaryExpr(op: Char, left: Expr, right: Expr) extends Expr
case class UnaryExpr(op: Char, inner: Expr) extends Expr
object CalcParser extends Parsers {
type Token = calculator.Token
type Kind = calculator.TokenKind
import Implicits._
override def getKind(token: Token): TokenKind = token match {
case NumberToken(_) => NumberClass
case OperatorToken(c) => OperatorClass(c)
case ParenthesisToken(o) => ParenthesisClass(o)
case _ => OtherClass
}
val number: Syntax[Expr] = accept(NumberClass) {
case NumberToken(n) => LitExpr(n)
}
def binOp(char: Char): Syntax[Char] = accept(OperatorClass(char)) {
case _ => char
}
val plus = binOp('+')
val minus = binOp('-')
val times = binOp('*')
val div = binOp('/')
val fac: Syntax[Char] = accept(OperatorClass('!')) {
case _ => '!'
}
def parens(isOpen: Boolean) = elem(ParenthesisClass(isOpen))
val open = parens(true)
val close = parens(false)
lazy val expr: Syntax[Expr] = recursive {
(term ~ moreTerms).map {
case first ~ opNexts => opNexts.foldLeft(first) {
case (acc, op ~ next) => BinaryExpr(op, acc, next)
}
}
}
lazy val term: Syntax[Expr] = (factor ~ moreFactors).map {
case first ~ opNexts => opNexts.foldLeft(first) {
case (acc, op ~ next) => BinaryExpr(op, acc, next)
}
}
lazy val moreTerms: Syntax[Seq[Char ~ Expr]] = recursive {
epsilon(Seq.empty[Char ~ Expr]) |
((plus | minus) ~ term ~ moreTerms).map {
case op ~ t ~ ots => (op ~ t) +: ots
}
}
lazy val factor: Syntax[Expr] = (basic ~ fac.opt).map {
case e ~ None => e
case e ~ Some(op) => UnaryExpr(op, e)
}
lazy val moreFactors: Syntax[Seq[Char ~ Expr]] = recursive {
epsilon(Seq.empty[Char ~ Expr]) |
((times | div) ~ factor ~ moreFactors).map {
case op ~ t ~ ots => (op ~ t) +: ots
}
}
lazy val basic: Syntax[Expr] = number | open.skip ~ expr ~ close.skip
// Or, using operators...
//
// lazy val expr: Syntax[Expr] = recursive {
// operators(factor)(
// (times | div).is(LeftAssociative),
// (plus | minus).is(LeftAssociative)
// ) {
// case (l, op, r) => BinaryExpr(op, l, r)
// }
// }
//
// Then, you can get rid of term, moreTerms, and moreFactors.
def apply(tokens: Iterator[Token]): Option[Expr] = Parser(expr)(tokens).getValue
}
object Main {
def main(args: Array[String]): Unit = {
if (!CalcParser.expr.isLL1) {
CalcParser.debug(CalcParser.expr, false)
return
}
println("Welcome to the awesome calculator expression parser.")
while (true) {
print("Enter an expression: ")
val line = scala.io.StdIn.readLine()
if (line.isEmpty) {
return
}
CalcParser(CalcLexer(line)) match {
case None => println("Could not parse your line...")
case Some(parsed) => println("Syntax tree: " + parsed)
}
}
}
}
/* Copyright 2019 EPFL, Lausanne
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package calculator
import org.scalatest._
import flatspec._
class Tests extends AnyFlatSpec with Inside {
"Parser" should "be LL(1)" in {
assert(CalcParser.expr.isLL1)
}
it should "be able to parse some strings" in {
val result = CalcParser(CalcLexer("1 + 3 * (5! / 7) + 42"))
assert(result.nonEmpty)
val parsed = result.get
inside(parsed) {
case BinaryExpr('+', BinaryExpr('+', one, mult), fortytwo) => {
assert(one == LitExpr(1))
assert(fortytwo == LitExpr(42))
inside(mult) {
case BinaryExpr('*', three, BinaryExpr('/', UnaryExpr('!', five), seven)) => {
assert(three == LitExpr(3))
assert(five == LitExpr(5))
assert(seven == LitExpr(7))
}
}
}
}
}
}
\ No newline at end of file
**For a brief overview of Scallion and its purpose, you can watch [this
video](https://mediaspace.epfl.ch/media/0_lypn7l0x).** What follows below is
a slightly more detailed description, and an example project you can use
to familiarize yourself with Scallion.
## Introduction to Parser Combinators
The next part of the compiler you will be working on is the parser. The
goal of the parser is to convert the sequence of tokens generated by the
lexer into an Amy *abstract syntax tree* (AST).
There are many approaches to writing parsers, such as:
- Writing the parser by hand directly in the compiler's language using
mutually recursive functions, or
- Writing the parser in a *domain specific language* (DSL) and using a
parser generator (such as Bison) to produce the parser.
Another approach, which we will be using, is *parser combinators*. The
idea behind the approach is very simple:
- Have a set of simple primitive parsers, and
- Have ways to combine them together into more and more complex
parsers. Hence the name *parser combinators*.
Usually, those primitive parsers and combinators are provided as a
library directly in the language used by the compiler. In our case, we
will be working with **Scallion**, a Scala parser combinators library
developed by *LARA*.
Parser combinators have many advantages -- the main one being easy to
write, read and maintain.
## Scallion Parser Combinators
### Documentation
In this document, we will introduce parser combinators in Scallion and
showcase how to use them. This document is not intended to be a complete
reference to Scallion. Fortunately, the library comes with a
[comprehensive
API](https://epfl-lara.github.io/scallion) which
fulfills that role. Feel free to refer to it while working on your
project!
### Playground Project
We have set up [an example project](scallion-playground) that
implements a lexer and parser for a simple expression language using
Scallion. Feel free to experiment and play with it. The project
showcases the API of Scallion and some of the more advanced combinators.
### Setup
In Scallion, parsers are defined within a trait called `Syntaxes`. This
trait takes as parameters two types:
- The type of tokens,
- The type of *token kinds*. Token kinds represent groups of tokens.
They abstract away all the details found in the actual tokens, such
as for instance positions or identifiers name. Each token has a
unique kind.
In our case, the tokens will be of type `Token` that we introduced and
used in the previous project. The token kinds will be `TokenKind`, which
we have already defined for you.
object Parser extends Pipeline[Iterator[Token], Program]
with Parsers {
type Token = myproject.Token
type Kind = myproject.TokenKind
// Indicates the kind of the various tokens.
override def getKind(token: Token): TokenKind = TokenKind.of(token)
// You parser implementation goes here.
}
The `Parsers` trait (mixed into the `Parser` object above) comes from
Scallion and provides all functions and types you will use to define
your grammar and AST translation.
### Writing Parsers
When writing a parser using parser combinators, one defines many smaller
parsers and combines them together into more and more complex parsers.
The top-level, most complex, of those parser then defines the entire
syntax for the language. In our case, that top-level parser will be
called `program`.
All those parsers are objects of the type `Syntax[A]`. The type
parameter `A` indicates the type of values produced by the parser. For
instance, a parser of type `Syntax[Int]` produces `Int`s and a parser of
type `Syntax[Expr]` produces `Expr`s. Our top-level parser has the
following signature:
lazy val program: Parser[Program] = ...
Contrary to the types of tokens and token kinds, which are fixed, the
type of values produced is a type parameter of the various `Syntax`s.
This allows your different parsers to produce different types of values.
The various parsers are stored as `val` members of the `Parser` object.
In the case of mutually dependent parsers, we use `lazy val` instead.
lazy val definition: Syntax[ClassOrFunDef] =
functionDefinition | abstractClassDefinition | caseClassDefinition
lazy val functionDefinition: Syntax[ClassOrFunDef] = ...
lazy val abstractClassDefinition: Syntax[ClassOrFunDef] = ...
lazy val caseClassDefinition: Syntax[ClassOrFunDef] = ...
### Running Parsers
Parsers of type `Syntax[A]` can be converted to objects of type
`Parser[A]`, which have an `apply` method which takes as parameter an
iterator of tokens and returns a value of type `ParseResult[A]`, which
can be one of three things:
- A `Parsed(value, rest)`, which indicates that the parser was
successful and produced the value `value`. The entirety of the input
iterator was consumed by the parser.
- An `UnexpectedToken(token, rest)`, which indicates that the parser
encountered an unexpected token `token`. The input iterator was
consumed up to the erroneous token.
- An `UnexpectedEnd(rest)`, which indicates that the end of the
iterator was reached and the parser could not finish at this point.
The input iterator was completely consumed.
In each case, the additional value `rest` is itself some sort of a
`Parser[A]`. That parser represents the parser after the successful
parse or at the point of error. This parser could be used to provide
useful error messages or even to resume parsing.
override def run(ctx: Context)(tokens: Iterator[Token]): Program = {
import ctx.reporter._
val parser = Parser(program)
parser(tokens) match {
case Parsed(result, rest) => result
case UnexpectedEnd(rest) => fatal("Unexpected end of input.")
case UnexpectedToken(token, rest) => fatal("Unexpected token: " + token)
}
}
### Parsers and Grammars
As you will see, parsers built using parser combinators will look a lot
like grammars. However, unlike grammars, parsers not only describe the
syntax of your language, but also directly specify how to turn this
syntax into a value. Also, as we will see, parser combinators have a
richer vocabulary than your usual *BNF* grammars.
Interestingly, a lot of concepts that you have seen on grammars, such as
`FIRST` sets and nullability can be straightforwardly transposed to
parsers.
#### FIRST set
In Scallion, parsers offer a `first` method which returns the set of
token kinds that are accepted as a first token.
definition.first === Set(def, abstract, case)
#### Nullability
Parsers have a `nullable` method which checks for nullability of a
parser. The method returns `Some(value)` if the parser would produce
`value` given an empty input token sequence, and `None` if the parser
would not accept the empty sequence.
### Basic Parsers
We can now finally have a look at the toolbox we have at our disposition
to build parsers, starting from the basic parsers. Each parser that you
will write, however complex, is a combination of these basic parsers.
The basic parsers play the same role as terminal symbols do in grammars.
#### Elem
The first of the basic parsers is `elem(kind)`. The function `elem`
takes argument the kind of tokens to be accepted by the parser. The
value produced by the parser is the token that was matched. For
instance, here is how to match against the *end-of-file* token.
val eof: Parser[Token] = elem(EOFKind)
#### Accept
The function `accept` is a variant of `elem` which directly applies a
transformation to the matched token when it is produced.
val identifier: Syntax[String] = accept(IdentifierKind) {
case IdentifierToken(name) => name
}
#### Epsilon
The parser `epsilon(value)` is a parser that produces the `value`
without consuming any input. It corresponds to the *𝛆* found in
grammars.
### Parser Combinators
In this section, we will see how to combine parsers together to create
more complex parsers.
#### Disjunction
The first combinator we have is disjunction, that we write, for parsers
`p1` and `p2`, simply `p1 | p2`. When both `p1` and `p2` are of type
`Syntax[A]`, the disjunction `p1 | p2` is also of type `Syntax[A]`. The
disjunction operator is associative and commutative.
Disjunction works just as you think it does. If either of the parsers
`p1` or `p2` would accept the sequence of tokens, then the disjunction
also accepts the tokens. The value produced is the one produced by
either `p1` or `p2`.
Note that `p1` and `p2` must have disjoint `first` sets. This
restriction ensures that no ambiguities can arise and that parsing can
be done efficiently.[^1] We will see later how to automatically detect
when this is not the case and how fix the issue.
#### Sequencing
The second combinator we have is sequencing. We write, for parsers `p1`
and `p2`, the sequence of `p1` and `p2` as `p1 ~ p2`. When `p1` is of
type `A` and `p2` of type `B`, their sequence is of type `A ~ B`, which
is simply a pair of an `A` and a `B`.
If the parser `p1` accepts the prefix of a sequence of tokens and `p2`
accepts the postfix, the parser `p1 ~ p2` accepts the entire sequence
and produces the pair of values produced by `p1` and `p2`.
Note that the `first` set of `p2` should be disjoint from the `first`
set of all sub-parsers in `p1` that are *nullable* and in trailing
position (available via the `followLast` method). This restriction
ensures that the combinator does not introduce ambiguities.
#### Transforming Values
The method `map` makes it possible to apply a transformation to the
values produced by a parser. Using `map` does not influence the sequence
of tokens accepted or rejected by the parser, it merely modifies the
value produced. Generally, you will use `map` on a sequence of parsers,
as in:
lazy val abstractClassDefinition: Syntax[ClassOrFunDef] =
(kw("abstract") ~ kw("class") ~ identifier).map {
case kw ~ _ ~ id => AbstractClassDef(id).setPos(kw)
}
The above parser accepts abstract class definitions in Amy syntax. It
does so by accepting the sequence of keywords `abstract` and `class`,
followed by any identifier. The method `map` is used to convert the
produced values into an `AbstractClassDef`. The position of the keyword
`abstract` is used as the position of the definition.
#### Recursive Parsers
It is highly likely that some of your parsers will require to
recursively invoke themselves. In this case, you should indicate that
the parser is recursive using the `recursive` combinator:
lazy val expr: Syntax[Expr] = recursive {
...
}
If you were to omit it, a `StackOverflow` exception would be triggered
during the initialisation of your `Parser` object.
The `recursive` combinator in itself does not change the behaviour of
the underlying parser. It is there to *tie the knot*[^2].
In practice, it is only required in very few places. In order to avoid
`StackOverflow` exceptions during initialisation, you should make sure
that all recursive parsers (stored in `lazy val`s) must not be able to
reenter themselves without going through a `recursive` combinator
somewhere along the way.
#### Other Combinators
So far, many of the combinators that we have seen, such as disjunction
and sequencing, directly correspond to constructs found in `BNF`
grammars. Some of the combinators that we will see now are more
expressive and implement useful patterns.
##### Optional parsers using opt
The combinator `opt` makes a parser optional. The value produced by the
parser is wrapped in `Some` if the parser accepts the input sequence and
in `None` otherwise.
opt(p) === p.map(Some(_)) | epsilon(None)
##### Repetitions using many and many1
The combinator `many` returns a parser that accepts any number of
repetitions of its argument parser, including 0. The variant `many1`
forces the parser to match at least once.
##### Repetitions with separators repsep and rep1sep
The combinator `repsep` returns a parser that accepts any number of
repetitions of its argument parser, separated by an other parser,
including 0. The variant `rep1sep` forces the parser to match at least
once.
The separator parser is restricted to the type `Syntax[Unit]` to ensure
that important values do not get ignored. You may use `unit()` to on a
parser to turn its value to `Unit` if you explicitly want to ignore the
values a parser produces.
##### Binary operators with operators
Scallion also contains combinators to easily build parsers for infix
binary operators, with different associativities and priority levels.
This combinator is defined in an additional trait called `Operators`,
which you should mix into `Parsers` if you want to use the combinator.
By default, it should already be mixed-in.
val times: Syntax[String] =
accept(OperatorKind("*")) {
case _ => "*"
}
...
lazy val operation: Syntax[Expr] =
operators(number)(
// Defines the different operators, by decreasing priority.
times | div is LeftAssociative,
plus | minus is LeftAssociative,
...
) {
// Defines how to apply the various operators.
case (lhs, "*", rhs) => Times(lhs, rhs).setPos(lhs)
...
}
Documentation for `operators` is [available on this
page](https://epfl-lara.github.io/scallion/scallion/Operators.html).
##### Upcasting
In Scallion, the type `Syntax[A]` is invariant with `A`, meaning that,
even when `A` is a (strict) subtype of some type `B`, we *won\'t* have
that `Syntax[A]` is a subtype of `Syntax[B]`. To upcast a `Syntax[A]` to
a syntax `Syntax[B]` (when `A` is a subtype of `B`), you should use the
`.up[B]` method.
For instance, you may need to upcast a syntax of type
`Syntax[Literal[_]]` to a `Syntax[Expr]` in your assignment. To do so,
simply use `.up[Expr]`.
### LL(1) Checking
In Scallion, non-LL(1) parsers can be written, but the result of
applying such a parser is not specified. In practice, we therefore
restrict ourselves only to LL(1) parsers. The reason behind this is that
LL(1) parsers are unambiguous and can be run in time linear in the input
size.
Writing LL(1) parsers is non-trivial. However, some of the higher-level
combinators of Scallion already alleviate part of this pain. In
addition, LL(1) violations can be detected before the parser is run.
Syntaxes have an `isLL1` method which returns `true` if the parser is
LL(1) and `false` otherwise, and so without needing to see any tokens of
input.
#### Conflict Witnesses
In case your parser is not LL(1), the method `conflicts` of the parser
will return the set of all `LL1Conflict`s. The various conflicts are:
- `NullableConflict`, which indicates that two branches of a
disjunction are nullable.
- `FirstConflict`, which indicates that the `first` set of two
branches of a disjunction are not disjoint.
- `FollowConflict`, which indicates that the `first` set of a nullable
parser is not disjoint from the `first` set of a parser that
directly follows it.
The `LL1Conflict`s objects contain fields which can help you pinpoint
the exact location of conflicts in your parser and hopefully help you
fix those.
The helper method `debug` prints a summary of the LL(1) conflicts of a
parser. We added code in the handout skeleton so that, by default, a
report is outputted in case of conflicts when you initialise your
parser.
[^1]: Scallion is not the only parser combinator library to exist, far
from it! Many of those libraries do not have this restriction. Those
libraries generally need to backtrack to try the different
alternatives when a branch fails.
[^2]: See [a good explanation of what tying the knot means in the
context of lazy
languages.](https://stackoverflow.com/questions/357956/explanation-of-tying-the-knot)
sbt.version=1.10.7
addSbtPlugin("com.lightbend.sbt" % "sbt-proguard" % "0.3.0")
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "1.2.0")
\ No newline at end of file
package amyc
import ast._
import utils._
import parsing._
import java.io.File
object Main extends MainHelpers {
private def parseArgs(args: Array[String]): Context = {
var ctx = Context(new Reporter, Nil)
args foreach {
case "--printTokens" => ctx = ctx.copy(printTokens = true)
case "--printTrees" => ctx = ctx.copy(printTrees = true)
case "--interpret" => ctx = ctx.copy(interpret = true)
case "--help" => ctx = ctx.copy(help = true)
case file => ctx = ctx.copy(files = ctx.files :+ file)
}
ctx
}
def main(args: Array[String]): Unit = {
val ctx = parseArgs(args)
if (ctx.help) {
val helpMsg = {
"""Welcome to the Amy reference compiler, v.1.5
|
|Options:
| --printTokens Print lexer tokens (with positions) after lexing and exit
| --printTrees Print trees after parsing and exit
| --interpret Interpret the program instead of compiling
| --help Print this message
""".stripMargin
}
println(helpMsg)
sys.exit(0)
}
val pipeline =
AmyLexer.andThen(
if (ctx.printTokens) DisplayTokens
else Parser.andThen(
treePrinterN("Trees after parsing")))
val files = ctx.files.map(new File(_))
try {
if (files.isEmpty) {
ctx.reporter.fatal("No input files")
}
if (ctx.interpret) {
ctx.reporter.fatal("Unsupported actions for now")
}
files.find(!_.exists()).foreach { f =>
ctx.reporter.fatal(s"File not found: ${f.getName}")
}
pipeline.run(ctx)(files)
ctx.reporter.terminateIfErrors()
} catch {
case AmycFatalError(_) =>
sys.exit(1)
}
}
}
trait MainHelpers {
import SymbolicTreeModule.{Program => SP}
import NominalTreeModule.{Program => NP}
def treePrinterN(title: String): Pipeline[NP, Unit] = {
new Pipeline[NP, Unit] {
def run(ctx: Context)(v: NP) = {
println(title)
println(NominalPrinter(v))
}
}
}
}
\ No newline at end of file
package amyc.ast
object Identifier {
private val counter = new amyc.utils.UniqueCounter[String]
def fresh(name: String): Identifier = new Identifier(name)
}
// Denotes a unique identifier in an Amy program
// Notice that we rely on reference equality to compare Identifiers.
// The numeric id will be generated lazily,
// so the Identifiers are numbered in order when we print the program.
final class Identifier private(val name: String) {
private lazy val id = Identifier.counter.next(name)
def fullName = s"${name}_$id"
override def toString: String = name
}
package amyc.ast
import scala.language.implicitConversions
import amyc.utils._
// A printer for Amy trees
trait Printer {
val treeModule: TreeModule
import treeModule._
implicit def printName(name: Name)(implicit printUniqueIds: Boolean): Document
implicit def printQName(name: QualifiedName)(implicit printUniqueIds: Boolean): Document
protected implicit def stringToDoc(s: String): Raw = Raw(s)
def apply(t: Tree)(implicit printUniqueIDs: Boolean = false): String = {
def binOp(e1: Expr, op: String, e2: Expr) = "(" <:> rec(e1) <:> " " + op + " " <:> rec(e2) <:> ")"
def rec(t: Tree, parens: Boolean = true): Document = t match {
/* Definitions */
case Program(modules) =>
Stacked(modules map (rec(_)), emptyLines = true)
case ModuleDef(name, defs, optExpr) =>
Stacked(
"object " <:> name,
"",
Indented(Stacked(defs ++ optExpr.toList map (rec(_, false)), emptyLines = true)),
"end " <:> name,
""
)
case AbstractClassDef(name) =>
"abstract class " <:> printName(name)
case CaseClassDef(name, fields, parent) =>
def printField(f: TypeTree) = "v: " <:> rec(f)
"case class " <:> name <:> "(" <:> Lined(fields map printField, ", ") <:> ") extends " <:> parent
case FunDef(name, params, retType, body) =>
Stacked(
"def " <:> name <:> "(" <:> Lined(params map (rec(_)), ", ") <:> "): " <:> rec(retType) <:> " = {",
Indented(rec(body, false)),
"}"
)
case ParamDef(name, tpe) =>
name <:> ": " <:> rec(tpe)
/* Expressions */
case Variable(name) =>
name
case IntLiteral(value) =>
value.toString
case BooleanLiteral(value) =>
value.toString
case StringLiteral(value) =>
"\"" + value + '"'
case UnitLiteral() =>
"()"
case Plus(lhs, rhs) =>
binOp(lhs, "+", rhs)
case Minus(lhs, rhs) =>
binOp(lhs, "-", rhs)
case Times(lhs, rhs) =>
binOp(lhs, "*", rhs)
case Div(lhs, rhs) =>
binOp(lhs, "/", rhs)
case Mod(lhs, rhs) =>
binOp(lhs, "%", rhs)
case LessThan(lhs, rhs) =>
binOp(lhs, "<", rhs)
case LessEquals(lhs, rhs) =>
binOp(lhs, "<=", rhs)
case And(lhs, rhs) =>
binOp(lhs, "&&", rhs)
case Or(lhs, rhs) =>
binOp(lhs, "||", rhs)
case Equals(lhs, rhs) =>
binOp(lhs, "==", rhs)
case Concat(lhs, rhs) =>
binOp(lhs, "++", rhs)
case Not(e) =>
"!(" <:> rec(e) <:> ")"
case Neg(e) =>
"-(" <:> rec(e) <:> ")"
case Call(name, args) =>
name <:> "(" <:> Lined(args map (rec(_)), ", ") <:> ")"
case Sequence(lhs, rhs) =>
val main = Stacked(
rec(lhs, false) <:> ";",
rec(rhs, false),
)
if (parens) {
Stacked(
"(",
Indented(main),
")"
)
} else {
main
}
case Let(df, value, body) =>
val main = Stacked(
"val " <:> rec(df) <:> " =",
Indented(rec(value)) <:> ";",
rec(body, false) // For demonstration purposes, the scope or df is indented
)
if (parens) {
Stacked(
"(",
Indented(main),
")"
)
} else {
main
}
case Ite(cond, thenn, elze) =>
Stacked(
"(if(" <:> rec(cond) <:> ") {",
Indented(rec(thenn)),
"} else {",
Indented(rec(elze)),
"})"
)
case Match(scrut, cases) =>
Stacked(
rec(scrut) <:> " match {",
Indented(Stacked(cases map (rec(_)))),
"}"
)
case Error(msg) =>
"error(" <:> rec(msg) <:> ")"
/* cases and patterns */
case MatchCase(pat, expr) =>
Stacked(
"case " <:> rec(pat) <:> " =>",
Indented(rec(expr))
)
case WildcardPattern() =>
"_"
case IdPattern(name) =>
name
case LiteralPattern(lit) =>
rec(lit)
case CaseClassPattern(name, args) =>
name <:> "(" <:> Lined(args map (rec(_)), ", ") <:> ")"
/* Types */
case TypeTree(tp) =>
tp match {
case IntType => "Int(32)"
case BooleanType => "Boolean"
case StringType => "String"
case UnitType => "Unit"
case ClassType(name) => name
}
}
rec(t).print
}
}
object NominalPrinter extends Printer {
val treeModule: NominalTreeModule.type = NominalTreeModule
import NominalTreeModule._
implicit def printName(name: Name)(implicit printUniqueIds: Boolean): Document = Raw(name)
implicit def printQName(name: QualifiedName)(implicit printUniqueIds: Boolean): Document = {
Raw(name match {
case QualifiedName(Some(module), name) =>
s"$module.$name"
case QualifiedName(None, name) =>
name
})
}
}
object SymbolicPrinter extends SymbolicPrinter
trait SymbolicPrinter extends Printer {
val treeModule: SymbolicTreeModule.type = SymbolicTreeModule
import SymbolicTreeModule._
implicit def printName(name: Name)(implicit printUniqueIds: Boolean): Document = {
if (printUniqueIds) {
name.fullName
} else {
name.name
}
}
@inline implicit def printQName(name: QualifiedName)(implicit printUniqueIds: Boolean): Document = {
printName(name)
}
}
package amyc.ast
import amyc.utils.Positioned
/* A polymorphic module containing definitions of Amy trees.
*
* This trait represents either nominal trees (where names have not been resolved)
* or symbolic trees (where names/qualified names) have been resolved to unique identifiers.
* This is done by having two type fields within the module,
* which will be instantiated differently by the two different modules.
*
*/
trait TreeModule { self =>
/* Represents the type for the name for this tree module.
* (It will be either a plain string, or a unique symbol)
*/
type Name
// Represents a name within an module
type QualifiedName
// A printer that knows how to print trees in this module.
// The modules will instantiate it as appropriate
val printer: Printer { val treeModule: self.type }
// Common ancestor for all trees
trait Tree extends Positioned {
override def toString: String = printer(this)
}
// Expressions
trait Expr extends Tree
// Variables
case class Variable(name: Name) extends Expr
// Literals
trait Literal[+T] extends Expr { val value: T }
case class IntLiteral(value: Int) extends Literal[Int]
case class BooleanLiteral(value: Boolean) extends Literal[Boolean]
case class StringLiteral(value: String) extends Literal[String]
case class UnitLiteral() extends Literal[Unit] { val value: Unit = () }
// Binary operators
case class Plus(lhs: Expr, rhs: Expr) extends Expr
case class Minus(lhs: Expr, rhs: Expr) extends Expr
case class Times(lhs: Expr, rhs: Expr) extends Expr
case class Div(lhs: Expr, rhs: Expr) extends Expr
case class Mod(lhs: Expr, rhs: Expr) extends Expr
case class LessThan(lhs: Expr, rhs: Expr) extends Expr
case class LessEquals(lhs: Expr, rhs: Expr) extends Expr
case class And(lhs: Expr, rhs: Expr) extends Expr
case class Or(lhs: Expr, rhs: Expr) extends Expr
case class Equals(lhs: Expr, rhs: Expr) extends Expr
case class Concat(lhs: Expr, rhs: Expr) extends Expr
// Unary operators
case class Not(e: Expr) extends Expr
case class Neg(e: Expr) extends Expr
// Function/constructor call
case class Call(qname: QualifiedName, args: List[Expr]) extends Expr
// The ; operator
case class Sequence(e1: Expr, e2: Expr) extends Expr
// Local variable definition
case class Let(df: ParamDef, value: Expr, body: Expr) extends Expr
// If-then-else
case class Ite(cond: Expr, thenn: Expr, elze: Expr) extends Expr
// Pattern matching
case class Match(scrut: Expr, cases: List[MatchCase]) extends Expr {
require(cases.nonEmpty)
}
// Represents a computational error; prints its message, then exits
case class Error(msg: Expr) extends Expr
// Cases and patterns for Match expressions
case class MatchCase(pat: Pattern, expr: Expr) extends Tree
abstract class Pattern extends Tree
case class WildcardPattern() extends Pattern // _
case class IdPattern(name: Name) extends Pattern // x
case class LiteralPattern[+T](lit: Literal[T]) extends Pattern // 42, true
case class CaseClassPattern(constr: QualifiedName, args: List[Pattern]) extends Pattern // C(arg1, arg2)
// Definitions
trait Definition extends Tree { val name: Name }
case class ModuleDef(name: Name, defs: List[ClassOrFunDef], optExpr: Option[Expr]) extends Definition
trait ClassOrFunDef extends Definition
case class FunDef(name: Name, params: List[ParamDef], retType: TypeTree, body: Expr) extends ClassOrFunDef {
def paramNames = params.map(_.name)
}
case class AbstractClassDef(name: Name) extends ClassOrFunDef
case class CaseClassDef(name: Name, fields: List[TypeTree], parent: Name) extends ClassOrFunDef
case class ParamDef(name: Name, tt: TypeTree) extends Definition
// Types
trait Type
case object IntType extends Type {
override def toString: String = "Int"
}
case object BooleanType extends Type {
override def toString: String = "Boolean"
}
case object StringType extends Type {
override def toString: String = "String"
}
case object UnitType extends Type {
override def toString: String = "Unit"
}
case class ClassType(qname: QualifiedName) extends Type {
override def toString: String = printer.printQName(qname)(false).print
}
// A wrapper for types that is also a Tree (i.e. has a position)
case class TypeTree(tpe: Type) extends Tree
// All is wrapped in a program
case class Program(modules: List[ModuleDef]) extends Tree
}
/* A module containing trees where the names have not been resolved.
* Instantiates Name to String and QualifiedName to a pair of Strings
* representing (module, name) (where module is optional)
*/
object NominalTreeModule extends TreeModule {
type Name = String
case class QualifiedName(module: Option[String], name: String) {
override def toString: String = printer.printQName(this)(false).print
}
val printer = NominalPrinter
}
/* A module containing trees where the names have been resolved to unique identifiers.
* Both Name and ModuleName are instantiated to Identifier.
*/
object SymbolicTreeModule extends TreeModule {
type Name = Identifier
type QualifiedName = Identifier
val printer = SymbolicPrinter
}