Skip to content
Snippets Groups Projects
Commit 25ef763f authored by Sankalp Gambhir's avatar Sankalp Gambhir
Browse files

Merge branch 'ex-03' into 'main'

Add exercise set 3

See merge request lara/cs320!9
parents 1f6d21e3 3d3fa1b2
No related branches found
No related tags found
No related merge requests found
...@@ -33,7 +33,7 @@ The grade is based on a midterm (30%) as well as team project work (70%). Please ...@@ -33,7 +33,7 @@ The grade is based on a midterm (30%) as well as team project work (70%). Please
| :-- | :-- | :-- | :-- | :-- | :-- | :-- | :-- | :-- | | :-- | :-- | :-- | :-- | :-- | :-- | :-- | :-- | :-- |
| | .23... | Fri | 14.03.2025 | 13:15 | ELA 2 | Lecture 6 | [Name Analysis](https://mediaspace.epfl.ch/media/06-01%2C+Name+Analysis/0_1b9t1hz8) [(PDF)](info/lectures/lec06-name-analysis.pdf), [Type Systems as Inductive Relations](https://mediaspace.epfl.ch/media/07-01%2C+Introduction+to+Types+and+Inductive+Relations/0_3hxblocu) [(PDF)](info/lectures/lec06-inductive.pdf) . [Operational Semantics](https://mediaspace.epfl.ch/media/07-02%2C+Operational+Semantics/0_3ru05nbo) [(PDF)](info/lectures/lec06-operational.pdf) | | | .23... | Fri | 14.03.2025 | 13:15 | ELA 2 | Lecture 6 | [Name Analysis](https://mediaspace.epfl.ch/media/06-01%2C+Name+Analysis/0_1b9t1hz8) [(PDF)](info/lectures/lec06-name-analysis.pdf), [Type Systems as Inductive Relations](https://mediaspace.epfl.ch/media/07-01%2C+Introduction+to+Types+and+Inductive+Relations/0_3hxblocu) [(PDF)](info/lectures/lec06-inductive.pdf) . [Operational Semantics](https://mediaspace.epfl.ch/media/07-02%2C+Operational+Semantics/0_3ru05nbo) [(PDF)](info/lectures/lec06-operational.pdf) |
| | .23... | Fri | 14.03.2025 | 15:15 | ELA 2 | Lab 3 | Parser lab | | | .23... | Fri | 14.03.2025 | 15:15 | ELA 2 | Lab 3 | Parser lab |
| 5 | ..3... | Wed | 19.03.2025 | 13:15 | BC 01 | Exercises 3 | LL(1) Grammars | | 5 | ..3... | Wed | 19.03.2025 | 13:15 | BC 01 | Exercises 3 | [LL(1) Grammars](info/exercises/ex-03.pdf) [(solutions)](info/exercises/ex-03-sol.pdf) |
| | ..3... | Fri | 21.03.2025 | 13:15 | ELA 2 | Lecture 7 | Type Checking | | | ..3... | Fri | 21.03.2025 | 13:15 | ELA 2 | Lecture 7 | Type Checking |
| | ..34.. | Fri | 21.03.2025 | 15:15 | ELA 2 | Lab 4 | Typer lab release | | | ..34.. | Fri | 21.03.2025 | 15:15 | ELA 2 | Lab 4 | Typer lab release |
| 6 | ..34.. | Wed | 26.03.2025 | 13:15 | BC 01 | Exercises 4 | Parsing. Type checking | | 6 | ..34.. | Wed | 26.03.2025 | 13:15 | BC 01 | Exercises 4 | Parsing. Type checking |
......
No preview for this file type
No preview for this file type
File added
File added
% Compiler Design 3.9
\begin{exercise}{}
Compute \(\nullable\), \(\first\), and \(\follow\) for the non-terminals \(A\)
and \(B\) in the following grammar:
%
\begin{align*}
A &::= BAa \\
A &::= \\
B &::= bBc \\
B &::= AA
\end{align*}
Remember to extend the language with an extra start production for the
computation of \(\follow\).
\begin{solution}
\begin{enumerate}
\item \(\nullable\): we get the constraints
\begin{gather*}
\nullable(A) = \nullable(BAa) \lor \nullable(\epsilon) \\
\nullable(B) = \nullable(bBc) \lor \nullable(AA)
\end{gather*}
We can solve these to get \(\nullable(A) = \nullable(B) = true\).
\item \(\first\): we get the constraints (given that both \(A\) and \(B\)
are nullable):
\begin{align*}
\first(A) &= \first(BAa) \cup \first(\epsilon) \\
&= \first(B) \cup \first(A) \cup \emptyset \\
&= \first(B) \cup \first(A) \\
\first(B) &= \first(bBc) \cup \first(AA) \\
&= \{b\} \cup \first(A) \cup \first(A) \cup \emptyset \\
&= \{b\} \cup \first(A)
\end{align*}
Starting from \(\first(A) = \first(B) = \emptyset\), we iteratively
compute the fixpoint to get \(\first(A) = \first(B) = \{a, b\}\).
\item \(\follow\): we add a production \(A' ::= A~\mathbf{EOF}\), and get
the constraints (in order of productions):
\begin{gather*}
\{\mathbf{EOF}\} \subseteq \follow(A) \\
\\
\first(A) \subseteq \follow(B) \\
\{a\} \subseteq \follow(A) \\
\\
\{c\} \subseteq \follow(B) \\
\\
\first(A) \subseteq \follow(A) \\
\follow(B) \subseteq \follow(A)
\end{gather*}
Substituting the computed \(\first\) sets, and computing a fixpoint, we
get \(\follow(A) = \{a, b, c,\mathbf{EOF}\}\) and \(\follow(B) = \{a, b,
c\}\).
\end{enumerate}
\end{solution}
\end{exercise}
% Compiler design 3.11
\begin{exercise}{}
Given the following grammar for arithmetic expressions:
\begin{align*}
S &::= Exp~\mathbf{EOF} \\
Exp &::= Exp_2~ Exp_* \\
Exp_* &::= +~ Exp_2~ Exp_* \\
Exp_* &::= -~ Exp_2~ Exp_* \\
Exp_* &::= \\
Exp_2 &::= Exp_3~ Exp_{2*} \\
Exp_{2*} &::= *~ Exp_3~ Exp_{2*} \\
Exp_{2*} &::= /~ Exp_3~ Exp_{2*} \\
Exp_{2*} &::= \\
Exp_3 &::= \mathbf{num} \\
Exp_3 &::= (Exp) \\
\end{align*}
\begin{enumerate}
\item Compute \(\nullable\), \(\first\), \(\follow\) for each of the
non-terminals in the grammar.
\item Check if the grammar is LL(1). If not, modify the grammar to make it
so.
\item Build the LL(1) parsing table for the grammar.
\item Using your parsing table, parse or attempt to parse (till error) the
following strings, assuming that \(\mathbf{num}\) matches any natural
number:
\begin{enumerate}
\item \((3 + 4) * 5 ~\mathbf{EOF}\)
\item \(2 + + ~\mathbf{EOF}\)
\item \(2 ~\mathbf{EOF}\)
\item \(2 * 3 + 4 ~\mathbf{EOF}\)
\item \(2 + 3 * 4 ~\mathbf{EOF}\)
\end{enumerate}
\end{enumerate}
\begin{solution}
\begin{enumerate}
\item We can compute the \(\nullable\), \(\first\), and \(\follow\) sets as:
\begin{enumerate}
\item \(\nullable\):
%
\begin{align*}
\nullable(Exp) &= false \\
\nullable(Exp_*) &= true \\
\nullable(Exp_2) &= false \\
\nullable(Exp_{2*}) &= true \\
\nullable(Exp_3) &= false
\end{align*}
\item \(\first\): we have constraints:
%
\begin{align*}
\first(Exp) &= \first(Exp_2) \\
\first(Exp_*) &= \{+\} \cup \{-\} \cup \emptyset \\
\first(Exp_2) &= \first(Exp_3) \\
\first(Exp_{2*}) &= \{*\} \cup \{/\} \cup \emptyset \\
\first(Exp_3) &= \{\mathbf{num}\} \cup \{(\}
\end{align*}
%
which can be solved to get:
%
\begin{align*}
\first(Exp) &= \{\mathbf{num}, (\} \\
\first(Exp_*) &= \{+, -\} \\
\first(Exp_2) &= \{\mathbf{num}, (\} \\
\first(Exp_{2*}) &= \{*, /\} \\
\first(Exp_3) &= \{\mathbf{num}, (\}
\end{align*}
\item \(\follow\): we have constraints (for each rule, except
empty/terminal rules):
\begin{multicols}{2}
\allowdisplaybreaks
\begin{align*}
\{\mathbf{EOF}\} &\subseteq \follow(Exp) \\
&\\
\first(Exp_*) &\subseteq \follow(Exp_2) \\
\follow(Exp) &\subseteq \follow(Exp_2) \\
\follow(Exp) &\subseteq \follow(Exp_*) \\
&\\
\first(Exp_*) &\subseteq \follow(Exp_2) \\
\follow(Exp_*) &\subseteq \follow(Exp_2) \\
&\\
\first(Exp_*) &\subseteq \follow(Exp_2) \\
\follow(Exp_*) &\subseteq \follow(Exp_2) \\
&\\
\first(Exp_{2*}) &\subseteq \follow(Exp_3) \\
\follow(Exp_2) &\subseteq \follow(Exp_3) \\
\follow(Exp_2) &\subseteq \follow(Exp_{2*}) \\
&\\
\first(Exp_{2*}) &\subseteq \follow(Exp_3) \\
\follow(Exp_{2*}) &\subseteq \follow(Exp_3) \\
&\\
\first(Exp_{2*}) &\subseteq \follow(Exp_3) \\
\follow(Exp_{2*}) &\subseteq \follow(Exp_3) \\
&\\
\{)\} &\subseteq \follow(Exp) \\
\end{align*}
\end{multicols}
The fixpoint can again be computed to get:
\begin{align*}
\follow(S) &= \{\} \\
\follow(Exp) &= \{), \mathbf{EOF}\} \\
\follow(Exp_*) &= \{), \mathbf{EOF}\} \\
\follow(Exp_2) &= \{+, -, ), \mathbf{EOF}\} \\
\follow(Exp_{2*}) &= \{+, -, ), \mathbf{EOF}\} \\
\follow(Exp_3) &= \{+, -, *, /, ), \mathbf{EOF}\}
\end{align*}
\end{enumerate}
\item The grammar is LL(1), there are no conflicts. Demonstrated by the
parsing table below.
\item LL(1) parsing table:
\begin{center}
\begin{tabular}{c|c|c|c|c|c|c|c|c}
& \(\mathbf{num}\) & \(+\) & \(-\) & \(*\) & \(/\) & \((\) & \()\) & \(\mathbf{EOF}\) \\
\hline
\(S\) & 1 & & & & & 1 & &\\
\(Exp\) & 1 & & & & & 1 & &\\
\(Exp_*\) & & 1 & 2 & & & & 3 & 3 \\
\(Exp_2\) & 1 & & & & & 1 & & \\
\(Exp_{2*}\) & & 3 & 3 & 1 & 2 & & 3 & 3 \\
\(Exp_3\) & 1 & & & & & 2 & & \\
\end{tabular}
\end{center}
\item Parsing the strings:
\begin{enumerate}
\item \((3 + 4) * 5 ~\mathbf{EOF}\) \checkmark
\item \(2 + + ~\mathbf{EOF}\) --- fails on the second \(+\). The
corresponding error cell in the parsing table is \((Exp_2, +)\).
\item \(2 ~\mathbf{EOF}\) \checkmark
\item \(2 * 3 + 4 ~\mathbf{EOF}\) \checkmark
\item \(2 + 3 * 4 ~\mathbf{EOF}\) fails on the \(*\). Error at \((Exp_*, *)\).
\end{enumerate}
Example step-by-step LL(1) parsing state for \(2 * 3 + 4\):
\begin{center}
\begin{tabular}{c c}
Lookahead token & Stack \\
\hline
\(2\) & \(S\) \\
\(2\) & \(Exp ~ \mathbf{EOF}\) \\
\(2\) & \(Exp_2 ~ Exp_* ~ \mathbf{EOF}\) \\
\(2\) & \(Exp_3 ~ Exp_{2*} ~ Exp_* ~ \mathbf{EOF}\) \\
\(2\) & \(\mathbf{num} ~ Exp_{2*} ~ Exp_* ~ \mathbf{EOF}\) \\
\(*\) & \(Exp_{2*} ~ Exp_* ~ \mathbf{EOF}\) \\
\(*\) & \(* ~Exp_3 ~ Exp_{2*} ~ Exp_* ~ \mathbf{EOF}\) \\
\(3\) & \(Exp_3 ~ Exp_{2*} ~ Exp_* ~ \mathbf{EOF}\) \\
\(3\) & \(\mathbf{num} ~ Exp_{2*} ~ Exp_* ~ \mathbf{EOF}\) \\
\(+\) & \(Exp_{2*} ~ Exp_* ~ \mathbf{EOF}\) \\
\(+\) & \(Exp_* ~ \mathbf{EOF}\) \\
\(+\) & \(+ ~Exp_2 ~Exp_* ~ \mathbf{EOF}\) \\
\(4\) & \(Exp_2 ~Exp_* ~ \mathbf{EOF}\) \\
\(4\) & \(Exp_3 ~Exp_{2*} ~Exp_* ~ \mathbf{EOF}\) \\
\(4\) & \(\mathbf{num} ~Exp_{2*} ~Exp_* ~ \mathbf{EOF}\) \\
\(\mathbf{EOF}\) & \(Exp_{2*} ~Exp_* ~ \mathbf{EOF}\) \\
\(\mathbf{EOF}\) & \(Exp_* ~ \mathbf{EOF}\) \\
\(\mathbf{EOF}\) & \(\mathbf{EOF}\) \\
\end{tabular}
\end{center}
\end{enumerate}
\end{solution}
\end{exercise}
\begin{exercise}{}
If \(L\) is a regular language, then the set of prefixes of words in \(L\) is
also a regular language. Given this fact, from a regular expression for \(L\),
we should be able to obtain a regular expression for the set of all prefixes
of words in \(L\) as well.
We want to do this with a function \(\prefixes\) that is recursive over the
structure of the regular expression for \(L\), i.e. of the form:
%
\begin{align*}
\prefixes(\epsilon) &= \epsilon \\
\prefixes(a) &= a \mid \epsilon \\
\prefixes(r \mid s) &= \prefixes(r) \mid \prefixes(s) \\
\prefixes(r \cdot s) &= \ldots \\
\prefixes(r^*) &= \ldots \\
\prefixes(r^+) &= \ldots
\end{align*}
\begin{enumerate}
\item Complete the definition of \(\prefixes\) above by filling in the
missing cases.
\item Use this definition to find:
\begin{enumerate}
\item \(\prefixes(ab^*c)\)
\item \(\prefixes((a \mid bc)^*)\)
\end{enumerate}
\end{enumerate}
\begin{solution}
The computation for \(\prefixes(\cdot)\) is similar to the computation of
\(\first(\cdot)\) for grammars.
\begin{enumerate}
\item The missing cases:
\begin{enumerate}
\item \(\prefixes(r \cdot s) = \prefixes(r) \mid r \cdot \prefixes(s)\).
Either we have read \(r\) partially, or we have read all of \(r\), and a
part of \(s\).
\item \(\prefixes(r^*) = r*\cdot\prefixes(r)\). We can
consider \(r^* = \epsilon \mid r \mid rr \mid \ldots\), and apply the
rules for union and concatenation. Intuitively, if the word has \(n \ge
0\) instances of \(r\), we can read \(m < n\) instances of \(r\), and
then a prefix of the next instance of \(r\).
\item \(\prefixes(r^+) = r^* \cdot \prefixes(r)\). Same as
previous. Why does the empty case still appear?
\end{enumerate}
\item The prefix computations are:
\begin{enumerate}
\item \(\prefixes(ab^*c) = \epsilon \mid a \mid ab^*(b \mid c \mid \epsilon)\). Computation:
\begin{align*}
\prefixes(ab^*c) &= \prefixes(a) \mid a\cdot\prefixes(b^*c) & [\text{concatenation}]\\
&= (a \mid \epsilon) \mid a\cdot\prefixes(b^*c) &[a]\\
&= (a \mid \epsilon) \mid a\cdot(\prefixes(b^*) \mid b^*\prefixes(c)) &[\text{concatenation}]\\
&= (a \mid \epsilon) \mid a\cdot(\prefixes(b^*) \mid b^*(c \mid \epsilon)) &[c]\\
&= (a \mid \epsilon) \mid a\cdot(b^*\prefixes(b) \mid b^*(c \mid \epsilon)) &[\text{star}]\\
&= (a \mid \epsilon) \mid a\cdot(b^*(b \mid \epsilon) \mid b^*(c \mid \epsilon)) &[b]\\
&= (a \mid \epsilon) \mid a\cdot(b^*(b \mid c \mid \epsilon)) &[\text{rewrite}]\\
&= \epsilon \mid a \mid a\cdot(b^*(b \mid c \mid \epsilon)) & [\text{rewrite}]\\
\end{align*}
\item \(\prefixes((a \mid bc)^*) = (a \mid bc)^*(\epsilon \mid a \mid b \mid bc)\).
\end{enumerate}
\end{enumerate}
\end{solution}
\end{exercise}
% this language is not LL 1 actually, I think
% \begin{exercise}{}
% Consider the following grammar of \(\mathbf{if}-\mathbf{then}-\mathbf{else}\) expressions with assignments:
% %
% \begin{align*}
% stmt &::= \mathbf{if} ~id = id~ \mathbf{then} ~stmt ~optStmt \\
% &::= \{ stmt^* \} \\
% &::= id = id; \\
% optStmt &::= \epsilon \mid \mathbf{else} ~stmt \\
% \end{align*}
% \begin{enumerate}
% \item Show that the grammar is ambiguous.
% \item Is the grammar LL(1)?
% \end{enumerate}
% \end{exercise}
\begin{exercise}{}
Argue that the following grammar is \emph{not} LL(1). Produce an equivalent
LL(1) grammar.
\begin{equation*}
E ::= \mathbf{num} + E \mid \mathbf{num} - E
\end{equation*}
\begin{solution}
The language is clearly not LL(1), as on seeing a token \(\mathbf{num}\), we
cannot decide whether to continue parsing it as \(\mathbf{num} + E\) or
\(\mathbf{num} - E\).
The notable problem is the common prefix between the two rules. We can
separate this out by introducing a new non-terminal \(T\). This is a
transformation known as \emph{left factorization}.
\begin{align*}
E &::= \mathbf{num} ~T \\
T &::= + E \mid - E
\end{align*}
% without changing the terms or the overall "structure" of the grammar, we
% have logically partitioned it to fit within our parsing schema.
\end{solution}
\end{exercise}
\begin{exercise}{}
Consider the following grammar:
\begin{equation*}
S ::= S(S) \mid S[S] \mid () \mid [\;]
\end{equation*}
Check whether the same transformation as the previous case can be applied to
produce an LL(1) grammar. If not, argue why, and suggest a different
transformation.
\begin{solution}
Applying left factorization to the grammar, we get:
\begin{align*}
S &::= S ~T \mid S ~T \mid () \mid [\;] \\
T &::= (S) \mid [S]
\end{align*}
This is not LL(1), as on reading a token ``\((\)'', we cannot decide whether
this is the final parentheses (base case) in the expression, or whether
there is a \(T\) following it.
The problem is that this version of the grammar is left-recursive. A
recursive-descent parser for this grammar would loop forever on the first
rule. This is caused by the fact that our parsers are top-down, left to
right. We can fix this by \emph{moving} the recursion to the right. This is
generally called \emph{left recursion elimination}.
Transformed grammar steps (explanation below):
Left recursion elimination (not LL(1) yet! \(\first(S') = \{(, [\;\}\)):
\begin{align*}
S &::= S' \mid ()S' \mid [\;]S' \\
S' &::= (S)S' \mid [S]S'
\end{align*}
Inline \(S'\) once in \(S ::= S'\):
\begin{align*}
S &::= (S)S' \mid [S]S' \mid ()S' \mid [\;]S' \\
S' &::= (S)S' \mid [S]S' \mid \epsilon
\end{align*}
Finally, left factorize \(S\) to get an LL(1) grammar:
\begin{align*}
S &::= (T_1 \mid [T_2 \\
T_1 &::= S)S' \mid ~)S' \\
T_2 &::= S]S' \mid ~]S' \\
S' &::= (S)S' \mid [S]S' \mid \epsilon
\end{align*}
To eliminate left-recursion in general, consider a non-terminal \(A ::=
A\alpha \mid \beta\), where \(\beta\) does not start with \(A\) (not
left-recursive). We can remove the left recursion by introducing a new
non-terminal, \(A'\), such that:
\begin{align*}
A &::= A' \mid \beta A' \\
A' &::= \alpha A' \mid \epsilon
\end{align*}
i.e., for the left-recursive rule \(A\alpha\), we instead attempt to parse
an \(\alpha\) followed by the rest. In exchange, the base case \(\beta\) now
expects an \(A'\) to follow it.
%
Note that \(\beta\) can be empty as well.
Intuitively, we are shifting the direction in which we look for instances of
\(A\). Consider a partial derivation starting from \(\beta \alpha \alpha
\alpha\). The original version of the grammar would complete the parsing as:
\begin{center}
\begin{forest}
[\(A\)
[\(A\)
[\(A\)
[\(A\)
[\(\beta\)]
]
[\(\alpha\)]
]
[\(\alpha\)]
]
[\(\alpha\)]
]
\end{forest}
\end{center}
but with the new grammar, we parse it as:
\begin{center}
\begin{forest}
[\(A\)
[\(\beta\)]
[\(A'\)
[\(\alpha\)]
[\(A'\)
[\(\alpha\)]
[\(A'\)
[\(\alpha\)]
[\(A'\)
[\(\epsilon\)]
]
]
]
]
]
\end{forest}
\end{center}
There are two main pitfalls to remember with left-recursion elimination:
\begin{enumerate}
\item it may need to be applied several times till the grammar is
unchanged, as the first transformation may introduce new (indirect)
recursive rules (check \(A ::= AA\alpha \mid \epsilon\)).
\item it may require \emph{inlining} some non-terminals, when the left
recursion is \emph{indirect}. For example, consider \(A ::= B\alpha, B ::=
A\beta\), where there is no immediate reduction to do, but inlining \(B\),
we get \(A ::= A\beta\alpha\), where the elimination can be applied.
\end{enumerate}
\end{solution}
\end{exercise}
\documentclass[a4paper]{article}
\input{../macro}
\ifdefined\ANSWERS
\if\ANSWERS1
\printanswers
\fi
\fi
\DeclareMathOperator{\prefixes}{prefixes}
\DeclareMathOperator{\first}{first}
\DeclareMathOperator{\nullable}{nullable}
\DeclareMathOperator{\follow}{follow}
\title{CS 320 \\ Computer Language Processing\\Exercises: Week 4}
\author{}
\date{March 19, 2025}
\begin{document}
\maketitle
% prefixes of regular expressions
\input{ex/prefix}
% compute nullable follow first for CFGs
\input{ex/compute}
% build ll1 parsing table, parse or attempt to parse some strings
\input{ex/table}
\end{document}
...@@ -6,6 +6,7 @@ ...@@ -6,6 +6,7 @@
\usepackage{xspace} \usepackage{xspace}
\usepackage[colorlinks]{hyperref} \usepackage[colorlinks]{hyperref}
\usepackage{tabularx} \usepackage{tabularx}
\usepackage{multicol}
% for drawing % for drawing
\usepackage{tikz} \usepackage{tikz}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment