CSE411

CSE411: Introduction to Compilers

Overview

Jooyong Yi (UNIST)

2024 Spring
CSE411

Interpreter vs Compiler

How is a compiler different from an interpreter in terms of the input-output interface?

2024 Spring
CSE411

Usage of an Interpreter

Consider the following Python program.

# hello-name.py
import sys 

print(f"Hello {sys.argv[1]}") # sys.argv[1]: the first command-line argument
2024 Spring
CSE411

Usage of an Interpreter

Consider the following Python program.

# hello-name.py
import sys 

print(f"Hello {sys.argv[1]}") # sys.argv[1]: the first command-line argument

We typically run a Python program as follows:

> python hello-name.py Uni
Hello Uni
2024 Spring
CSE411

Usage of a Compiler

Consider the following C program.

// hello-name.c
#include <stdio.h>

int main(int argc, char** argv) {
    printf("Hello %s\n", argv[1]);
    return 0;
}
2024 Spring
CSE411

Usage of a Compiler

Consider the following C program.

// hello-name.c
#include <stdio.h>

int main(int argc, char** argv) {
    printf("Hello %s\n", argv[1]);
    return 0;
}

We typically run a C program as follows:

> clang hello-name.c -o hello-name
> ./hello-name Uni
Hello Uni
2024 Spring
CSE411

Interpreter vs Compiler

  • Python Interpreter
                    Uni
                     ⬇
hello-name.py ➡ [ python ]
                     ⬇
                  Hello Uni   
  • Clang Compiler
                             Uni
                              ⬇
hello-name.c ➡ [ clang ] ➡ hello-name
                              ⬇
                          Hello Uni   
2024 Spring
CSE411

Advantages and Disadvantages of Compilers

Which one is faster between a compiler an an interpreter and why?

2024 Spring
CSE411

What a typical interpreter looks like

Consider a very simple programming language the performs basic arithmetic computation.

Examples:

(2 4 add) returns 6
(5 2 sub) returns 3
2024 Spring
CSE411

What a typical interpreter looks like

A textbook interpreter would look like the following:

def execute(cmd_seq, stack):
    cmd = cmd_seq[0]
    if isinstance(cmd, int):
        stack.insert(0, cmd)
        cmd_seq.pop(0)
    elif cmd in 'add sub'.split():
        if not len(stack) >= 2:
            raise Irreducible
        n1 = stack.pop(0)
          if not isinstance(n1, int):
            raise Irreducible
        n2 = stack.pop(0)
        ans = int(operator.__dict__[cmd](n2, n1))
2024 Spring
CSE411

An Interpreter is Slow Because ...

2024 Spring
CSE411

An Interpreter is Slow Because ...

  1. Computation is performed indirectly using software instructions (e.g., the Python arithmetic functions)
|------------------------|
|       Program          |
|------------------------|
|      Interpreter       |
|------------------------|
|       Hardware         |
|------------------------|
2024 Spring
CSE411

An Interpreter is Slow Because ...

  1. An interpreter typically handles each line of input one by one, it loses an opportunity for code optimization.
def foo():
    return 1234 * 1234

print(foo() + foo())
2024 Spring
CSE411

Compiled Code is Typically Fast Because ...

  1. Compiled Code (target program) is run directly over hardware using hardware instructions.
  2. A compiler typically performs code optimization before generating code written in the target language.
2024 Spring
CSE411

What is a Disadvantage of a Compiler?

2024 Spring
CSE411

What is a Disadvantage of a Compiler?

  • Compiled code typically runs only for the target architecture.
2024 Spring
CSE411

An Interpreter is Typically Portable

  • Your source code can be distributed for multiple platforms.
  • Portability is provided by the virtual machine.
|-----------------------------------------|
|       Program                           |
|-----------------------------------------|
|      Interpreter  (Virtual Machine)     |
|-----------------------------------------|
|       Hardware                          |
|-----------------------------------------|
2024 Spring
CSE411

Taking the Best of the Both Worlds

  • Source program is compiled (translated) into an intermediate representation (e.g., Java/Python Bytecode)
  • Part of an intermediate program (e.g., Java methods that are called frequently) is compiled at runtime. This techinque is called Just-In-Time Compilation (JIT).
                  |-----------------------------------------|
Source Program →  |       Intermediate Representation (IR)  |
                  |-------------------------------⬇--------|
                  |      Virtual Machine  | Compiled Code   |
                  |-----------------------------------------|
                  |       Hardware                          |
                  |-----------------------------------------|
2024 Spring
CSE411

High-Level Steps of Compilation

                 |-----------|        |----------|
Source Program → | Front End | → IR → | Back End | → Target Program
                 |-----------|        |----------|
2024 Spring
CSE411

Front End

         |-------|            |--------|           |----------|        |-----------|
Source → | Lexer | → tokens → | Parser | → parse → | Semantic |→ AST → |   IR      | → IR
Program  |       |            |        |   tree    | Analyzer |        | Generator |
         |-------|            |--------|           |----------|        |-----------|
  • Parser performs syntactic analysis, i.e., checks if the program respects the language grammar.
  • Semantic analyzer detects semantic errors (e.g., type errors).
  • AST stands for Abstract Syntax Tree.
2024 Spring
CSE411

Lexer (Lexical Analyzer)

  • Given a source program, the lexer extracts the sequence of tokens.
        position  =  initial  +  rate   *   60
                           ↓
        |------------------------------------|
        |      Lexical Analyzer (Lexer)      |
        |------------------------------------|
                           ↓
        <id, 1> <=> <id, 2> <+> <id, 3> <*> <int, 60>                     
2024 Spring
CSE411

Parser (Syntax Analyzer)

  • Given a sequence of tokens, the parser extracts the parse tree.
        <id, 1> <=> <id, 2> <+> <id, 3> <*> <int, 60>                     
                           ↓
        |------------------------------------|
        |      Syntax Analyzer (Parser)      |
        |------------------------------------|
                           ↓
                       =
                      / \
                <id,1>   +
                        / \
                   <id,2>  *
                          / \
                    <id,3>   <int,60>           
2024 Spring
CSE411

Semantic Analyzer

  • Given a parse tree, the semantic analyzer detects semantic errors such as type errors.
                       =
                      / \
                <id,1>   +
                        / \
                   <id,2>  *
                          / \
                    <id,3>   <int,60>           
                           ↓                    
        |------------------------------------|
        |  Semantic Analyzer (Type Checking) |
        |------------------------------------|
                           ↓                            
                       =
                      / \
                <id,1>   +
                  int   int
                        / \
                   <id,2>  *
                     int   int
                          / \
                    <id,3>   <int,60>
                     int       int

2024 Spring
CSE411

IR (Intermediate Representation) Generator

  • At this phase, we do not restrict ourselves to a specific hardware architecture.
                       =
                      / \
                <id,1>   +
                  int   int
                        / \
                   <id,2>  *
                     int   int
                          / \
                    <id,3>   <int,60>
                     int       int
                           ↓                            
        |------------------------------------|
        |            IR Generator            |
        |------------------------------------|
                           ↓                            
                       t1 = 60
                       t2 = id3 * t1
                       t3 = id2 + t2
                       id1 = t3
2024 Spring
CSE411

Back End

  • Typically, multiple code optimization takes place.
     |-------------|         |-------------|          |-------------|
IR → |    Code     | → IR' → |    Code     | → IR'' → |    Code     | → Target
     | Optimizer 1 |         | Optimizer 2 |          |  Generator  |   Program
     |-------------|         |-------------|          |-------------| 
2024 Spring
CSE411

Code Optimization

                       t1 = 60
                       t2 = id3 * t1
                       t3 = id2 + t2
                       id1 = t3
                           ↓                            
        |------------------------------------|
        |        Code Optimization           |
        |------------------------------------|
                           ↓   
                      t2 = id3 * 60
                      id1 = id2 + t2                         
2024 Spring
CSE411

Code Generation

                      t2 = id3 * 60
                      id1 = id2 + t2                         
                           ↓                            
        |----------------------------------|
        |        Code Generation           |
        |----------------------------------|
                           ↓   
                     LD  R2, id3
                     MUL R2, R2, 60
                     LD  R1, id2
                     ADD R1, R1, R2
                     ST  id1, R1
2024 Spring

![width:600px](img/overview/lexer.png)

![width:600px](img/overview/parser.png)

![width:400px](img/overview/semantic-analyzer.png)

![width:400px](img/overview/IR-gen.png)

![width:400px](img/overview/code-opt.png)

![width:400px](img/overview/code-gen.png)