CSE411: Introduction to Compilers

Abstract Syntax Tree

Jooyong Yi (UNIST)

Where We Are

         |-------|            |--------|        
Source → | Lexer | → tokens → | Parser | → parse
Program  |       |            |        |   tree 
         |-------|            |--------|        
                                                        

So far, we know that ...

         |-------|            |--------|        
Source → | Lexer | → tokens → | Parser | → parse
Program  |       |            |        |   tree 
         |-------|            |--------|        
                                                        

So far, we know that ...

         |-------|            |--------|        
Source → | Lexer | → tokens → | Parser | → parse
Program  |       |            |        |   tree 
         |-------|            |--------|        
                                                        
  • The source program is grammatically correct.

We still do not know ...

         |-------|            |--------|        
Source → | Lexer | → tokens → | Parser | → parse
Program  |       |            |        |   tree 
         |-------|            |--------|        
                                                        
  • How to generate the target program.
  • How to optimize the program.
  • Whether the program satisfies desired semantic properties such as the lack of type errors.

How to attach semantic information to the parse tree?

Review: Recursive Descent Parsing

                                                         E : T E'
                                                         E': + T E' | ε
E: E + E | E * E | (E) | int          ==>                T : F T'
                                                         T': * F T' | ε
                                                         F: int | (E)                                                                    
val tok = ref (getToken())
fun advance() = tok := getToken()
fun eat(t) = if t = !tok then tok := advance() else error()

fun E() = (T(); E'())
and E'() = case !tok (* lookahead token *)
           of PLUS => (eat(PLUS); T(); E'())
            | ? => () (* do nothing *)
and T() = (F(); T'())
and T'() = case !tok (* lookahead token *)
           of TIMES => (eat(TIMES); F(); T'())
            | ? => () (* do nothing *)
and F() = case !tok (* lookahead token *)
          of INT => eat(INT)
           | LPAREN => (eat(LPAREN); E(); eat(RPAREN))                                                             

Attaching Semantic Information

                                                         E : T E'
                                                         E': + T E' | ε
E: E + E | E * E | (E) | int          ==>                T : F T'
                                                         T': * F T' | ε
                                                         F: int | (E)                                                                    
datatype token = INT of int | ...

val tok = ref (getToken())                                         
fun advance() = tok := getToken()
fun eat(t) = if t = !tok then tok := advance() else error()

fun E() = (E'(T()))
and E'(a) = case !tok (* lookahead token *)
            of PLUS => (eat(PLUS); E'(a + T()))
             | ? => a 
and T() = (T'(F())) 
and T'(a) = case !tok (* lookahead token *)
            of TIMES => (eat(TIMES); T'(a * F())) 
             | ? => a (* MODIFIED *)
and F() = case !tok (* lookahead token *)
          of INT i => eat(INT); i
           | LPAREN => (eat(LPAREN); 
                        let i = E()
                        in eat(RPAREN);
                           i
                        end)                                                                                                  

Bottom-Up Parsing

(* Description written in ML-Yacc *)
exp : INT                 (INT)
    | exp PLUS exp        (exp1 + exp2)
    | exp TIMES exp       (exp1 * exp2)                                                        

Bottom-Up Parsing

(* Description written in ML-Yacc *)
exp : INT                 (INT)
    | exp PLUS exp        (exp1 + exp2)
    | exp TIMES exp       (exp1 * exp2)

I want a compiler, not an interpreter!

(* Description written in ML-Yacc *)
exp : INT                 (INT)
    | exp PLUS exp        (exp1 + exp2)
    | exp TIMES exp       (exp1 * exp2)

I want a compiler, not an interpreter!

exp : INT                 (exp.code := INT.num)
    | exp PLUS exp        (exp.code := exp1.code ++ exp2.code ++ "ADD")
    | exp TIMES exp       (exp.code := exp1.code ++ exp2.code ++ "MUL")                          

Food for Thought

  • The approaches we have seen so far perform what we want (e.g., code generation) as the input program is being parsed.
  • What would be the advantages and disadvantages of these on-the-fly approaches?

More Modular Approach

                                            E                |---------------------------|
                                        /   |   \            |  Perform code generation  |
3 - 2 + 1  == syntax check ==>         E    +   T     ====>  |  via tree traversal       |
                                     / | \      |            |---------------------------|  
                                    E  -  T    int
                                    |     |
                                    T    int
                                    |
                                   int
  • A complete parse tree is built first.
  • Then, the tree is traversed to generate the target code.

Abstract Syntax Tree

                                            E                                       BinExp(+) 
                                        /   |   \                                  /         \
3 - 2 + 1  == syntax check ==>         E    +   T     == abstraction ==>      BinExp(-)      Int(1)
                                     / | \      |                               /   \
                                    E  -  T    int                          Int(3)  Int(2)
                                    |     |
                                    T    int
                                    |
                                   int

How to Build an AST?

E : E + T    { BinExp('+', $1, $3) }
  | E - T    { BinExp('-', $1, $3) }
  | T        { $1 }
T : int      { Int($1) }                                                                                     
Input: 3 - 2 + 1
       ↑ 

  T: N1                                    N1: Int(3)
  |
 int                                                                                                    

How to Build an AST?

E : E + T    { BinExp('+', $1, $3) }
  | E - T    { BinExp('-', $1, $3) }
  | T        { $1 }
T : int      { Int($1) }                                                                                     
Input: 3 - 2 + 1
       ↑ 

  E: N1
  |
  T: N1                                    N1: Int(3)
  |
 int                                                                                                    

How to Build an AST?

E : E + T    { BinExp('+', $1, $3) }
  | E - T    { BinExp('-', $1, $3) }
  | T        { $1 }
T : int      { Int($1) }                                                                                     
Input: 3 - 2 + 1
           ↑ 

       E: N3                                         N3: BinExp(-)
   /   |    \                                          /         \
  E    |    T: N2                               N1: Int(3)     N2: Int(2)
  |    |    |
  T    |    |                                  
  |    |    |
 int   -   int                                                                                                    

How to Build an AST?

E : E + T    { BinExp('+', $1, $3) }
  | E - T    { BinExp('-', $1, $3) }
  | T        { $1 }
T : int      { Int($1) }                                                                                     
Input: 3 - 2 + 1
               ↑ 

                 E: N5                                     N5: BinExp(+)
            /    |   \                                       /           \
       E: N3     |   T: N4                           N3: BinExp(-)     N4: Int(1)
   /   |    \    |   |                                   /         \
  E    |    T    |   |                           N1: Int(3)       N2: Int(2)
  |    |    |    |   |
  T    |    |    |   |                          
  |    |    |    |   |
 int   -   int   +  int                                                                                                       

How to Build an AST with the Top-Down Parsing?

E  : T E'     { $0.node := $2.syn; $2.inh := $1.node } // $0: E, $1: T, $2: E'
E' : + T E'   { $3.inh := new BinExp('+', $0.inh, $2.node); $0.syn := $3.syn }
   | - T E'   { $3.inh := new BinExp('-', $0.inh, $2.node); $0.syn := $3.syn }
   | ε        { $0.syn := $0.inh }
T : int       { $0.node := new Int($1) }                                                                                     
Input: 3 - 2 + 1
       ↑

E → T E' → int E'                        


             E       
         /       \
        T: N1    E'                 N1: Int(3)
        |
       int                                                                                                                          

How to Build an AST with the Top-Down Parsing?

E  : T E'     { $0.node := $2.syn; $2.inh := $1.node } // $0: E, $1: T, $2: E'
E' : + T E'   { $3.inh := new BinExp('+', $0.inh, $2.node); $0.syn := $3.syn }
   | - T E'   { $3.inh := new BinExp('-', $0.inh, $2.node); $0.syn := $3.syn }
   | ε        { $0.syn := $0.inh }
T : int       { $0.node := new Int($1) }                                                                                     
Input: 3 - 2 + 1
           ↑

E → T E' → int E' → int - T E' → int - int E'


          ___ E ___       
         /         \
        T: N1       E'______                 
        |         / |       \
       int       -  T: N2    E'        N1: Int(3)   N2: Int(2)                                                            
                    |
                   int
                                                               

How to Build an AST with the Top-Down Parsing?

E  : T E'     { $0.node := $2.syn; $2.inh := $1.node } // $0: E, $1: T, $2: E'
E' : + T E'   { $3.inh := new BinExp('+', $0.inh, $2.node); $0.syn := $3.syn }
   | - T E'   { $3.inh := new BinExp('-', $0.inh, $2.node); $0.syn := $3.syn }
   | ε        { $0.syn := $0.inh }
T : int       { $0.node := new Int($1) }                                                                                     
Input: 3 - 2 + 1
               ↑

E → T E' → int E' → int - T E' → int - int E' →  int - int + int


          ___ E ___       
         /         \
        T: N1       E'______                 
        |         / |       \
       int       -  T: N2    E'_______                                                          
                    |       / |       \
                   int     +  T: N3    E'
                              |        |                   N1: Int(3)   N2: Int(2)  N3: Int(1)                               
                             int       ε                          

How to Build an AST with the Top-Down Parsing?

E  : T E'     { $0.node := $2.syn; $2.inh := $1.node } // $0: E, $1: T, $2: E'
E' : + T E'   { $3.inh := new BinExp('+', $0.inh, $2.node); $0.syn := $3.syn }
   | - T E'   { $3.inh := new BinExp('-', $0.inh, $2.node); $0.syn := $3.syn }
   | ε        { $0.syn := $0.inh }
T : int       { $0.node := new Int($1) }                                                                                     
Input: 3 - 2 + 1
               ↑

E → T E' → int E' → int - T E' → int - int E' →  int - int + int

               _____________________________________________________________________
              |                                                                    |
          ___ E ___                                                                |
         /         \                                                               |
        T: N1       E1'_____                                                       |
        |         / |       \                                              N4: BinExp('+', E2'.inh, N3)    
       int       -  T: N2    E2'______                                          /                 \       
                    |       / |       \                       N5: BinExp('-', E1'.inh, N2)     N3: Int(1)                   
                   int     +  T: N3    E3'                          /       \                
                              |        |                   N1: Int(3)    N2: Int(2)                                         
                             int       ε                          

E3' : ε              E3'.syn = E3'.inh
E2' : + T E3'        E3'.inh = N4, E2'.syn = E3'.syn
E1' : - T E2'        E2'.inh = N5, E1'.syn = E2'.syn
  E : T E1'          E.node = E1'.syn, E1'inh = N1


E.node = E1'.syn = E2'.syn = E3'.syn = E3'.inh = N4