CSE411

CSE411: Introduction to Compilers

Overview

Jooyong Yi (UNIST)

2023 Spring
CSE411

Interpreter vs Compiler

How is a compiler different from an interpreter in terms of the input-output interface?

2023 Spring
CSE411

Usage of an Interpreter

Consider the following Python program.

# hello-name.py
import sys 

print(f"Hello {sys.argv[1]}") # sys.argv[1]: the first command-line argument
2023 Spring
CSE411

Usage of an Interpreter

Consider the following Python program.

# hello-name.py
import sys 

print(f"Hello {sys.argv[1]}") # sys.argv[1]: the first command-line argument

We typically run a Python program as follows:

> python hello-name.py Uni
Hello Uni
2023 Spring
CSE411

Usage of a Compiler

Consider the following C program.

// hello-name.c
#include <stdio.h>

int main(int argc, char** argv) {
    printf("Hello %s\n", argv[1]);
    return 0;
}
2023 Spring
CSE411

Usage of a Compiler

Consider the following C program.

// hello-name.c
#include <stdio.h>

int main(int argc, char** argv) {
    printf("Hello %s\n", argv[1]);
    return 0;
}

We typically run a C program as follows:

> clang hello-name.c -o hello-name
> ./hello-name Uni
Hello Uni
2023 Spring
CSE411

Interpreter vs Compiler

  • Python Interpreter
                    Uni
                     ⬇
hello-name.py ➡ [ python ]
                     ⬇
                  Hello Uni   
  • Clang Compiler
                             Uni
                              ⬇
hello-name.c ➡ [ clang ] ➡ hello-name
                              ⬇
                          Hello Uni   
2023 Spring
CSE411

Advantages and Disadvantages of Compilers

Which one is faster between a compiler an an interpreter and why?

2023 Spring
CSE411

What a typical interpreter looks like

Consider a very simple programming language the performs basic arithmetic computation.

Examples:

(2 4 add) returns 6
(5 2 sub) returns 3
2023 Spring
CSE411

What a typical interpreter looks like

A textbook interpreter would look like the following:

def execute(cmd_seq, stack):
    cmd = cmd_seq[0]
    if isinstance(cmd, int):
        stack.insert(0, cmd)
        cmd_seq.pop(0)
    elif cmd in 'add sub'.split():
        if not len(stack) >= 2:
            raise Irreducible
        n1 = stack.pop(0)
          if not isinstance(n1, int):
            raise Irreducible
        n2 = stack.pop(0)
        ans = int(operator.__dict__[cmd](n2, n1))
2023 Spring
CSE411

An Interpreter is Slow Because ...

2023 Spring
CSE411

An Interpreter is Slow Because ...

  1. Computation is performed indirectly using software instructions (e.g., the Python arithmetic functions)
|------------------------|
|       Program          |
|------------------------|
|      Interpreter       |
|------------------------|
|       Hardware         |
|------------------------|
2023 Spring
CSE411

An Interpreter is Slow Because ...

  1. An interpreter typically handles each line of input one by one, it loses an opportunity for code optimization.
def foo():
    return 1234 * 1234

print(foo() + foo())
2023 Spring
CSE411

Compiled Code is Typically Fast Because ...

  1. Compiled Code (target program) is run directly over hardware using hardware instructions.
  2. A compiler typically performs code optimization before generating code written in the target language.
2023 Spring
CSE411

What is a Disadvantage of a Compiler?

2023 Spring
CSE411

What is a Disadvantage of a Compiler?

  • Compiled code typically runs only for the target architecture.
2023 Spring
CSE411

An Interpreter is Typically Portable

  • Your source code can be distributed for multiple platforms.
  • Portability is provided by the virtual machine.
|-----------------------------------------|
|       Program                           |
|-----------------------------------------|
|      Interpreter  (Virtual Machine)     |
|-----------------------------------------|
|       Hardware                          |
|-----------------------------------------|
2023 Spring
CSE411

Taking the Best of the Both Worlds

  • Source program is compiled (translated) into an intermediate representation (e.g., Java/Python Bytecode)
  • Part of an intermediate program (e.g., Java methods that are called frequently) is compiled at runtime. This techinque is called Just-In-Time Compilation (JIT).
                  |-----------------------------------------|
Source Program →  |       Intermediate Representation (IR)  |
                  |-------------------------------⬇--------|
                  |      Virtual Machine  | Compiled Code   |
                  |-----------------------------------------|
                  |       Hardware                          |
                  |-----------------------------------------|
2023 Spring
CSE411

High-Level Steps of Compilation

                 |-----------|        |----------|
Source Program → | Front End | → IR → | Back End | → Target Program
                 |-----------|        |----------|
2023 Spring
CSE411

Front End

         |-------|            |--------|           |----------|        |-----------|
Source → | Lexer | → tokens → | Parser | → parse → | Semantic |→ AST → |   IR      | → IR
Program  |       |            |        |   tree    | Analyzer |        | Generator |
         |-------|            |--------|           |----------|        |-----------|
  • Parser performs syntactic analysis, i.e., checks if the program respects the language grammar.
  • Semantic analyzer detects semantic errors (e.g., type errors).
  • AST stands for Abstract Syntax Tree.
2023 Spring
CSE411

Lexer (Lexical Analyzer)

  • Given a source program, the lexer extracts the sequence of tokens.

2023 Spring
CSE411

Parser (Syntax Analyzer)

  • Given a sequence of tokens, the parser extracts the parse tree.

2023 Spring
CSE411

Semantic Analyzer

  • Given a parse tree, the semantic analyzer detects semantic errors such as type errors.
  • The semantic analyzer can also modify the parse tree. In the example, the inttofloat operator is added to convert an integer 60 to the floating-point number.

2023 Spring
CSE411

IR Generator

2023 Spring
CSE411

Back End

  • Typically, multiple code optimization takes place.
     |-------------|         |-------------|          |-------------|
IR → |    Code     | → IR' → |    Code     | → IR'' → |    Code     | → Target
     | Optimizer 1 |         | Optimizer 2 |          |  Generator  |   Program
     |-------------|         |-------------|          |-------------| 
2023 Spring
CSE411

Code Optimization

2023 Spring
CSE411

Code Generation

2023 Spring