CSE411

CSE411: Introduction to Compilers

Introduction to Parsing

Jooyong Yi (UNIST)

2024 Spring
CSE411

Lexing and Parsing

        position  =  initial  +  rate   *   60
                           ↓
        |------------------------------------|
        |      Lexical Analyzer (Lexer)      |
        |------------------------------------|
                           ↓
        <id, 1> <=> <id, 2> <+> <id, 3> <*> <int, 60>
                           ↓
        |------------------------------------|
        |      Syntax Analyzer (Parser)      |
        |------------------------------------|
                           ↓
                       =
                      / \
                <id,1>   +
                        / \
                   <id,2>  *
                          / \
                    <id,3>   <int,60>                   
2024 Spring
CSE411

Why a Tree?

  • Consider: 1 + 2 * 3
2024 Spring
CSE411

Why a Tree?

  • Consider: 2 * 3 + 4
         *                    +
        / \                  / \
       2   +                *   4 
          / \              / \
         3   4            2   3
2024 Spring
CSE411

How to write a parser?

2024 Spring
CSE411

Synthesizing a Parser

              Specification for the parser
                           ↓
                    |-------------|    
                    |    Parser   |        
                    | Synthesizer |
                    |-------------|
                           |
                      synthesizes   
                           ↓
                       |--------|            
a sequence of tokens → | Parser | → parse tree
                       |--------|                          
2024 Spring
CSE411

Parser Synthesizer Examples

  • Yacc (Yet Another Compiler Compiler)
> man yacc

YACC(1)                                                                User Commands                                                                YACC(1)

NAME
       yacc - GNU Project parser generator

SYNOPSIS
       yacc [OPTION]... FILE

DESCRIPTION
       Yacc (Yet Another Compiler Compiler) is a parser generator.  This version is a simple wrapper around bison(1).  It passes option -y, --yacc to activate 
       the upward compatibility mode. See bison(1) for more information.
2024 Spring
CSE411

What to Give as the Specification for Parser?

2024 Spring
CSE411

What to Give as the Specification for Parser?

start : exp EOF
exp: INT | ID | exp PLUS exp | '(' exp ')'

INT : '0' | ('1'..'9') ('0'..'9')* 
ID : ( 'a'..'z' | 'A'..'Z' | '_' | '$' ) ( 'a'..'z' | 'A'..'Z' | '_' | '0'..'9' | '$' )* 
PLUS : '+' 
2024 Spring
CSE411

What to Give as the Specification for Parser?

start : exp EOF
exp: INT | ID | exp PLUS exp | '(' exp ')'

INT : '0' | ('1'..'9') ('0'..'9')* 
ID : ( 'a'..'z' | 'A'..'Z' | '_' | '$' ) ( 'a'..'z' | 'A'..'Z' | '_' | '0'..'9' | '$' )* 
PLUS : '+' 

Is it a regular expression?

2024 Spring
CSE411

What to Give as the Specification for Parser?

start : exp EOF
exp: INT | ID | exp PLUS exp | '(' exp ')'

INT : '0' | ('1'..'9') ('0'..'9')* 
ID : ( 'a'..'z' | 'A'..'Z' | '_' | '$' ) ( 'a'..'z' | 'A'..'Z' | '_' | '0'..'9' | '$' )* 
PLUS : '+' 

Is it a regular expression?

  - a: An ordinary character stands for itself.
  - "ab": Quotation, a string in quotes stands for itself literally. E.g., "if"
  - ε: The empty string
  - M | N: Alternation, choosing from M or N.
  - [0-9]: 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
  - M N: Concatenation, an M followed by an N.
  - M*: Repetition (zero or more times)
  - M+: Repetition (one or more times)
  - M?: Optional, zero or one occurrences of M.
2024 Spring
CSE411

What to Give as the Specification for Parser?

start : exp EOF
exp: INT | ID | exp PLUS exp | '(' exp ')'

INT : '0' | ('1'..'9') ('0'..'9')* 
ID : ( 'a'..'z' | 'A'..'Z' | '_' | '$' ) ( 'a'..'z' | 'A'..'Z' | '_' | '0'..'9' | '$' )* 
PLUS : '+' 

Is it a regular expression?

  • The language is a prototypical example of a language that cannot be described by a regular expression.
2024 Spring
CSE411

Which spec language can we use?

start : exp EOF
exp: INT | ID | exp PLUS exp | '(' exp ')'

INT : '0' | ('1'..'9') ('0'..'9')* 
ID : ( 'a'..'z' | 'A'..'Z' | '_' | '$' ) ( 'a'..'z' | 'A'..'Z' | '_' | '0'..'9' | '$' )* 
PLUS : '+' 
2024 Spring
CSE411

Context-Free Grammar (CFG)

start : exp EOF
exp: INT | ID | exp PLUS exp | '(' exp ')'
  • CFG
    • : a finite set of non-terminals (e.g., exp).
    • : a finite set of terminal symbols (e.g., '(', ')', PLUS)
    • : a start non-terminal (e.g., start)
    • : a finite set of productions.
      • Each production has the form
        • where and
2024 Spring
CSE411

Context-Free Grammar (CFG)

  • A grammar to describe a formal (i.e., non-ambiguous) language such as a programming language.
  • Invented by Noam Chomsky.
2024 Spring
CSE411

Context-Free Grammar vs. Context-Sensitive Grammar

2024 Spring
CSE411

Context-Free Grammar vs. Context-Sensitive Grammar

  • CFG , CSG
    • : a finite set of non-terminals (e.g., exp).
    • : a finite set of terminals (e.g., '(', ')', PLUS)
    • : a start non-terminal (e.g., start)
    • : a finite set of productions.
      • CFG
        • Each production has the form
          • where and
      • CSG
        • Each production has the form
          • where , and
2024 Spring
CSE411

CFG as a Parser Specification Language

                  CFG
                   ↓
            |-------------|    
            |    Parser   |        
            | Synthesizer |
            |-------------|
                   |
              synthesizes   
                   ↓
              |--------|            
    program → | Parser | → parse tree
              |--------|                          
2024 Spring
CSE411

CFG and Pushdown Automata (PDA)

Theory: For any context-free grammar , there exists an non-deterministic pushdown automata (NPDA) accepting .

2024 Spring
CSE411

Non-deterministic PDA (NPDA)

Input
â–¡â–¡â–¡â–¡â–¡â–¡â–¡â–¡
  ↓
|---------|    Stack
| Control | ⇔  |__|
|  Unit   |    |__|
|---------|    |__|
2024 Spring
CSE411

Non-deterministic PDA (NPDA)

  • NPDA
    • : a finite set of internal states of the control unit
    • : the input alphabet
    • : the stack alphabet
    • : the transition function
    • : the initial state of the control unit
    • : the stack start symbol
    • : the set of final states
2024 Spring
CSE411

Acceptance Condition of NPDA

  • Suppose
  • if and
    1. When all symbols of the input are consumed, the final state is reached.
    2. The state of the stack is irrelevant to the acceptance.
2024 Spring
CSE411

NPDA Example

  • NPDA

  • Which input does accept?

2024 Spring
CSE411

NPDA Example

  • NPDA

  • Which input does accept?

2024 Spring
CSE411

Deterministic PDA (DPDA)

  • DPDA
    • : a finite set of internal states of the control unit
    • : the input alphabet
    • : the stack alphabet
    • : the transition function
      • If is defined, then should not be defined for every .
    • : the initial state of the control unit
    • : the stack start symbol
    • : the set of final states
2024 Spring
CSE411

CFG NPDA DPDA

  • This does not work. Why?
2024 Spring
CSE411

CFG NPDA DPDA

  • This does not work. Why?

  • NPDA -X-> DPDA

    • The expressive power of NPDA and DPDA are not equivalent.
    • NPDA is more expressive.
2024 Spring
CSE411

NPDA -X-> DPDA

  • Assume

    • DPDA_1 accepts
    • DPDA_2 accepts
  • We can have an NPDA that accepts .

  • How about a DPDA that accepts ? How do we know which one to use between DPDA_1 and DPDA_2?

2024 Spring
CSE411

Next Week

  • We will see various ways to perform parsing.
  • Ultimately, we will land on a parsing algorithm that is quite similar to DPDA.
2024 Spring
CSE411

Science and Engineering

  • Tony Hoare (Turing Award Laureate, 1980) once mentioned something like this:
    • Engineers seek the 'it works' moment, and scientists seek the 'Aha' moment.
2024 Spring