CSE411

CSE411: Introduction to Compilers

Parsing Algorithms

Jooyong Yi (UNIST)

2024 Spring
CSE411

Parsing

        <id, 1> <=> <id, 2> <+> <id, 3> <*> <int, 60>
                           ↓
        |------------------------------------|
        |      Syntax Analyzer (Parser)      |
        |------------------------------------|
                           ↓
                       =
                      / \
                <id,1>   +
                        / \
                   <id,2>  *
                          / \
                    <id,3>   <int,60>                   
2024 Spring
CSE411

Synthesizing a Parser

        Specification for the parser written in CFG
                           ↓
                    |-------------|    
                    |    Parser   |        
                    | Synthesizer |
                    |-------------|
                           |
                      synthesizes   
                           ↓
                       |--------|            
a sequence of tokens → | Parser | → parse tree
                       |--------|                          
2024 Spring
CSE411

Potential Approach

CFG NPDA DPDA

2024 Spring
CSE411

Potential Approach

CFG NPDA DPDA

  • NPDA DPDA
2024 Spring
CSE411

Potential Approach

CFG NPDA DPDA

  • NPDA DPDA
  • Why not use NPDA?
2024 Spring
CSE411

Our Focus

        Specification for the parser written in CFG
                           ↓
                    |-------------|    
            Focus:  |    Parser   |        
                    | Synthesizer |
                    |-------------|
                           |
                      synthesizes   
                           ↓
                       |--------|            
a sequence of tokens → | Parser | → parse tree
                       |--------|                          

Q: How can we turn a CFG into a parse tree?

2024 Spring
CSE411

Context-Free Grammar (CFG)

  • CFG
    • : a finite set of non-terminals (e.g., exp).
    • : a finite set of terminals (e.g., '(', ')', PLUS)
    • : a start non-terminal (e.g., start)
    • : a finite set of productions.
      • Each production has the form
        • where and
start : exp EOF
exp: INT | ID | exp PLUS exp | '(' exp ')'
2024 Spring
CSE411

Parse Tree

  • Root: a start non-terminal
  • Leaf: a terminal or
  • Internal node: a non-terminal
  • Edge: defined by a production rule
A -> X Y Z
 
        A
       /|\                       
      X Y Z
2024 Spring
CSE411

Example

// Grammar G
E : E '+' E | int
  • What is the parse tree for int '+' int?
2024 Spring
CSE411

Deterministic Parsing Algorithm?

// Grammar G
E : E '+' E | int
input: int '+' int

E → .... →      E
               /|\                       
              E + E
              |   |
            int   int
2024 Spring
CSE411

Derivation

  • Assume CFG where
  • Suppose you have a string where
  • Then, derives and we write
2024 Spring
CSE411

Leftmost Derivation

  • Assume CFG where
  • Suppose you have a string where
  • Then, derives and we write
2024 Spring
CSE411

Rightmost Derivation

The rightmost nonterminal is replaced.

  • Assume CFG where
  • Suppose you have a string where
  • Then, derives and we write
2024 Spring
CSE411

Applying Leftmost Derivation

// Grammar G
E : E '+' E | int
input: int '+' int
2024 Spring
CSE411

Applying Leftmost Derivation

// Grammar G
E : E '+' E | int
input: int '+' int

E  -lm->  E '+' E  -lm->  int '+' E  -lm->  int '+' int

E →     E   →      E    →      E
       /|\        /|\         /|\
      E + E      E + E       E + E           
                 |           |   |
                int         int  int
2024 Spring
CSE411

Applying Rightmost Derivation

// Grammar G
E : E '+' E | int
input: int '+' int

E  -rm->  E '+' E  -rm->  E '+' int  -rm->  int '+' int

E →     E   →      E    →      E
       /|\        /|\         /|\
      E + E      E + E       E + E
                     |       |   |
                    int    int   int
2024 Spring
CSE411

Frontend Architecture Options

          |--------|                          |--------|
input  →  | Lexer  |  → sequence of tokens →  | Parser | → parse tree
          |--------|                          |--------|
2024 Spring
CSE411

Frontend Architecture Options

          |--------|                     |--------|
input  →  | Lexer  |  → current token →  | Parser |
          |--------|                     |--------|
            ↑                                 ↓
            |                           intermediate
            ← request the next token  ←  incomplete
                                         parse tree
2024 Spring
CSE411

Which One to Use?

          |--------|                          |--------|
input  →  | Lexer  |  → sequence of tokens →  | Parser | → parse tree
          |--------|                          |--------|
          |--------|                     |--------|
input  →  | Lexer  |  → current token →  | Parser |
          |--------|                     |--------|
            ↑                                 ↓
            |                           intermediate
            ← request the next token  ←  incomplete
                                         parse tree
2024 Spring
CSE411

Applying Leftmost Derivation (Revisited)

// Grammar G
E : E '+' E | int
input: int '+' int
        ↑

E →     E  
       /|\ 
      E + E
2024 Spring
CSE411

Applying Leftmost Derivation (Revisited)

// Grammar G
E : E '+' E | int
input: int '+' int
                ↑

E →     E   →      E   
       /|\        /|\  
      E + E      E + E 
                 |
                int
2024 Spring
CSE411

Applying Leftmost Derivation (Revisited)

// Grammar G
E : E '+' E | int
input: int '+' int
                   ↑

E →     E   →      E    →      E
       /|\        /|\         /|\
      E + E      E + E       E + E
                 |           |   |
                int        int   int
2024 Spring
CSE411

Applying Rightmost Derivation (Revisited)

// Grammar G
E : E '+' E | int
input: int '+' int
        ↑

E →     E   →      E    
       /|\        /|\   
      E + E      E + E
                     |
                     int
  • We have not read the second int yet!
2024 Spring
CSE411

Applying Rightmost Derivation (Revisited)

// Grammar G
E : E '+' E | int
input: int '+' int
                                       E
                                    /  |  \
  E               E      E         E   |   E
  |               |      |         |   |   |
int '+' int  →  int '+' int   →   int '+'  int
  ↑                       ↑                  ↑
2024 Spring
CSE411

Applying Rightmost Derivation (Revisited)

// Grammar G
E : E '+' E | int
input: int '+' int
                                        E
                                    //  ||  \\
  E               E      E         E    ||   E
 ||               |      ||        |    ||   |
int '+' int  →  int '+' int   →   int  '+'  int
  ↑                        ↑                  ↑
E  -rm->  E '+' E  -rm->  E '+' int  -rm->  int '+' int
2024 Spring
CSE411

Top-Down Parsing vs. Bottom-Up Parsing

// Grammar G
E : E '+' E | int
Top-Down parsing uses the leftmost derivation.

E →     E   →      E    →      E
       /|\        /|\         /|\
      E + E      E + E       E + E
                 |           |   |
                int         int  int
Bottom-Up parsing uses the rightmost derivation.

                                       E
                                    /  |  \
 E               E       E         E   |   E
 |               |       |         |   |   |
int '+' int  →  int '+' int   →   int '+' int
2024 Spring