Jooyong Yi

LOFT (Lab of Software), UNIST

A Bug’s Life: From Its Detection through Patching to Verification

Jooyong Yi

LOFT (Lab of Software), UNIST

Contents

  • Part 1: Bug Hunting
  • Part 2: Patch Hunting
  • Part 3: Patch Verification

Part 1: Bug Hunting

White, Grey, and Black-box Fuzzing

White, Grey, and Black-box Fuzzing

Grey-box Fuzzing is Not Enough

Grey-box Fuzzing is Not Enough

Our New Approach

Key Intuition

  • Fuzzing generates numerous inputs.
  • We can infer approximate path conditions from these inputs.

Is an Approximate PC Useful?

void conv2d(Tensor input, Tensor weight, int* padding, int* dilation) {
  TORCH_CHECK(input.dim() == 4);
  TORCH_CHECK(input.dim() == weight.dim());
  bool kh_correct = input.size(2) + 2*padding[0] >= dilation[0] * (weight.size(2) - 1) + 1;
  bool kh_correct = input.size(3) + 2*padding[1] >= dilation[1] * (weight.size(3) - 1) + 1;
  if (kh_correct && kw_correct) {
→   compute_conv2d(input, weight, padding, dilation);
    ...
Exact PC Approximate PC
i_rank == 4 ∧ i_rank == w_rank i_rank == 4 ∧ w_rank == 4
∧ i2 + 2*p0 >= d0 * (w2 - 1) + 1 ∧ True
∧ i3 + 2*p1 >= d1 * (w3 - 1) + 1 ∧ i3 >= w3

Path Exploration Using Approximate PCs

Path Exploration Using Approximate PCs

Preview of the Results

White Grey Black Black

How to Infer Approximate PCs?

void conv2d(Tensor input, Tensor weight, int* padding, int* dilation) {
  TORCH_CHECK(input.dim() == 4); // b1
    ...

What If the Inferred PC Is Incorrect?

void conv2d(Tensor input, Tensor weight, int* padding, int* dilation) {
  TORCH_CHECK(input.dim() == 4); // b1
    ...

Counter-Example-Guided Condition Refinement

void conv2d(Tensor input, Tensor weight, int* padding, int* dilation) {
  TORCH_CHECK(input.dim() == 4); // b1
    ...

Counter-Example-Guided Condition Refinement

  • As the exploration proceeds, the path conditions become more precise.

Tensorflow Results

Bug Finding Results

DL Library Total Confirmed Fixed
PyTorch 43 41 23
TensorFlow 18 18 9
Total 61 59 32

Why DL Libraries?

  • Our approach is general.
  • Input space of DL libraries is manageable.

Part 2: Patch Hunting

Automated Program Repair from Fuzzing Perspective

  • ISSTA 2023
    • YoungJae Kim, Seungheon Han, Askar Yeltayuly Khamit, Jooyong Yi

Fuzzing

process of searching for interesting inputs

  • The location of a bug-revealing input is unknown a priori.
  • As more interesting inputs are found, some of them may reveal a new bug.

Fuzzing

process of following footprints of interesting inputs in pursuit of a bug-revealing input

APR

process of searching for plausible patches

  • The location of a correct patch is unknown a priori.
  • As more plausible patches are found, some of them may be correct.

APR

process of following footprints of plausible patches in pursuit of a correct patch

Standard Patch Scheduling Algorithm

Evaluation Results of Our Approach

Patch Space

Multi-Armed Bandit Problem

  • At each layer, we need to choose one "arm" to pull.
  • A reward is given if an "interesting" patch is found.
    • A patch is considered interesting when the program patched with passes one of the tests that previously failed.
  • Our goal is to maximize the total reward over time.

Bernoulli Bandit Problem

Thompson Sampling Algorithm

  1. Sampling: for each arm , sample from
  2. Selection: select the arm with the highest sampled
  3. Update: update

Updating

Does Our Approach Fix More Bugs Correctly?

  1. Catch plausible patches
  2. Rank them using a patch ranking technique

Results on Recalling Correct Patches

Top 1 Top 5

Reflection

  • Update the distribution when an interesting patch is found: a black-box approach

Reflection

  • Update the distribution when an interesting patch is found: a black-box approach
  • Can we invent a grey-box approach that performs better than the black-box approach?

Enhancing the Efficiency of Automated Program Repair via Greybox Analysis

  • ASE 2024
    • YoungJae Kim, Yechan Park, Seungheon Han, Jooyong Yi

Two Key Questions

  1. What to observe?
  2. How to guide the search based on the observation?

What to Observe?

  • Critical branch: a branch whose hit count changes before and after an interesting patch is applied

What to Observe?

  • Critical branch: a branch whose hit count changes before and after an interesting patch is applied
    • Positive critical branch: a critical branch whose hit count increases
    • Negative critical branch: a critical branch whose hit count decreases

How to Guide the Search?

  • We again traverse the patch-space tree using the multi-armed bandit model.
  • We choose an edge that is more likely to lead to a patch candidate that behaves similarly to the interesting patches found earlier during the repair process.

Count-based Similarity of Patch Behavior

  • Suppose that an interesting patch is found earlier during the repair process.
  • Assume involves a positive critical branch .
  • Patch is considered similar to an interesting patch if
    • the hit count of increases after applying

Our Greybox Guidance Policy

  • We choose an edge that is more likely to lead to a patch candidate that shows count-based similarity to the interesting patches found earlier during the repair process.

Blackbox vs Greybox

Evaluation (D4J v1.2; 10 times repetitions)

Results on Recalling Correct Patches

Top 1 Top 5

Part 3: Patch Verification

Cast a Wide Net

APR

process of searching for plausible patches
+
select a correct patch

SymRadar: PoC-Centered Bounded Verification for Vulnerability Repair

  • ICSE 2026
    • Seungheon Han, YoungJae Kim, Yeseung Lee, Jooyong Yi

Automated Vulnerability Repair

Our Approach

  1. Bounded verification via symbolic execution
  2. Function-level verification to avoid the reachability problem
  3. PoC-centered verification

Under-Constrained Symbolic Execution

struct node {
  int data;
  struct node* next;
};

int listSum(struct node* n) {
  int sum = 0;
  while (n) {
    sum += n->data;
    n = n->next;
  }
  return sum;
}

Under-Constrained Symbolic Execution

Limitation of UC-SE

int listSum(struct node* n) {
  int sum = 0;
  int i = 1; // added
  while (n) {
    sum += n->data;
    n = n->next;
    i *= 2; // added
  }
  g = arr[i]; // added
  return sum;
}

PoC-Centered Bounded Patch Verification

UC-SE vs. SymRadar

Patch Verification

Patch Classification Rubric

Key Requirements for Patch Verification

  1. Detecting as many incorrect plausible patches as possible (high specificity)
  2. Preserving as many correct patches as possible (high recall)

Evaluation (3,3037 patches generated from CPR)

Tool Recall Specificity Balanced Accuracy
SymRadar (Ours) 100% 78% 89%
CPR 96% 8% 52%
UC-KLEE 88% 57% 73%
Spider 77% 59% 67%
VulnFix 62% 66% 64%

Evaluation (90 patches generated from San2Patch)

Tool Recall Specificity Balanced Accuracy
SymRadar (Ours) 100% 74% 87%
UC-KLEE 100% 52% 76%
Spider 48% 83% 65%
VulnFix 63% 28% 45%

![width:800px](./img/debugging-ladybug.jpg) --- ![height:700px](./img/bugs_life_lady_bug.webp) ---

Using black-box fuzzing is like firing a machine gun while blindfolded.

White-box fuzzing is like using a sniper rifle. Each shot is slow but, but it hits a new path every time.

Let me first show you a snippet of the results. We applied our lightgrey-box fuzzing named PathFinder to a well-known deep learning library, PyTorch. These plots show how branch coverage increases over time. Clearly, our approach overwhelmingly outperforms the existing SOTA tools using various approaches.

Moreover, as the exploration proceeds, the path conditions can become more precise since more data points for synthesis are available.

# Bug Hunting and Patch Hunting ![width:1000px](./img/hunting.png) ---

Our situation can be specifically modeled as a Bernoulli bandit problem. We need to speculate the probability of success of each arm. And each arm can have a different probability of success.

The Bernoulli bandit problem can be solved by the Thompson sampling algorithm that works in the following three steps. First, for each arm $k$, we sample $\theta_k$ from its distribution. Let's say we are about to choose between method 1 and method 2. Let's assume that the left arm is associated with this Beta distribution and the right arm is associated with that Beta distribution. It is likely that a higher value is sample from the right arm, in which case we choose the right arm. However, note that Thompson sampling still allows to choose the left arm with a smaller probability.

- Distribution of $\theta_k$: Beta distribution $(\alpha_k, \beta_k)$ | $Beta(\alpha=2,\beta=2)$ | $Beta(\alpha=3,\beta=2)$ | $Beta(\alpha=5,\beta=2)$ | | --- | --- | --- | | ![width:290px](./img/beta-2-2.png) | ![width:290px](./img/beta-3-2.png) | ![width:290px](./img/beta-5-2.png) |

What if we find an interesting patch? Then, we update the distributions of the corresponding edges. For example, if this was the distribution of this edge before the update, its right-hand side one shows the distribution after the update. Notice that the distribution after the update is more left skewed, indicating that selecting this edge looks more promising than before.

Then the natural question that arises is: Can we invent a grey-box approach that performs better than the black-box approach?

--- # Blackbox Guidance Policy - While traversing the patch-space tree, this policy gives higher priority to edges that are more likely to lead to the discovery of interesting patches. - Note that the only runtime information used in this policy is whether a test passes or fails after applying a patch.

Each edge is associated with critical branches. These critical branches are obtained after executing interesting patches observed in the corresponding subtree. Unlike in the black-box approach, we assign a beta distribution to each critical branch.

# PoC-Centered Bounded Patch Verification 1. Concrete Snapshot Extraction 2. Abstract Snapshot Construction 3. Patch Verification ---

# Abstract Snapshot Construction ![width:1000px](./img/abstraction-example.png) ---