“We would accomplish many more things if we did not think of them as impossible.”

— Vince Lombardi

I think I should practice to be more “formal” for posts under Academia.

Prologue

Although having quite a lot limitations which makes it practically not feasible, symbolic execution still stands as an important approach in program analysis. Instead of running the actual program, it simulates the execution with program semantics and symbolic states, and enumerates program paths with constraints that enable them. Therefore, it is very useful to formally verify if certain properties hold during the program execution.

Symbolic execution proceeds by identifying possible program paths, and the constraints required to be satisfied to exercise the path. The constraint is expressed in first order logic formulas over the input, which is then solved by SMT (Satisfiability Modulo Theory) solvers to decide their satisfiability.

The drawback of symbolic execution comes from the undecidable nature of the program. As we know, there are three typical structures of a program: sequence, branch and loop, if goto is excluded. Every time symbolic execution encounters a branch, it split the path to cover both cases. And for loops, symbolic execution unrolls it with the iteration specified by the loop condition, or with a fixed or heuristic iteration if the loop is unbounded. As a result, when the program becomes complex, there will be exponentially increasing paths, making it impossible to maintain and solve the constraints.

Nevertheless, symbolic execution still get some attention with its ability to provide and solve path constraints. Started by Driller (Stephens et al., 2016), many fuzzing techniques integrate symbolic execution (or its concolic sibling) to enhance the fuzzer by solving constraints that is hard to met by random fuzzing. And with the emergence of Large Language Model (LLM), it is becoming possible for LLM-based agents to simulate or replace symbolic execution to some extent (Meng et al., 2025).

Although we have the reason to discard this approach, let’s give a last glance at it. There are many symbolic execution engines (KLEE, SymCC) and backend constraint solvers (STP, Z3) available. In this post, I will use the combination of KLEE + Z3.


Getting Started

Installation

Alright, everything starts with installation.

KLEE is a symbolic execution engine built against the LLVM tool chain, and reasons the program state over the LLVM IR.

The simplest way to get started with KLEE is via the docker image it provided. You may check the available versions and docker images at its release page. If you just want to get muddy, see this to build it from source. In this case, remember to also build uClibc to get supports for library functions.

Basic usage

Here we can use the example from the documentation of KLEE. To perform symbolic execution, it is often required to write a small wrapper of the target program or function. Here, we use main to wrap the function get_sign which is to be tested.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#include <klee/klee.h>

int get_sign(int x) {
if (x == 0)
return 0;

if (x < 0)
return -1;
else
return 1;
}

int main() {
int a;
klee_make_symbolic(&a, sizeof(a), "a");
return get_sign(a);
}

Note that, we need to call klee_make_symbol to symbolize the variable we want. Here it is the most generic form, which turns a memory chunk symbolic. You may also use klee_int if you look into klee.h. However, KLEE still treats everything as symbolized bytes.

Since KLEE uses LLVM, you should compile it with clang to produce byte code, i.e. get_sign.bc.

1
clang -emit-llvm -c -g -O0 -Xclang -disable-O0-optnone get_sign.c

If you use KLEE’s docker image, you may not need to specify the include directory. You should probably add -I path/to/include in other cases.

To run symbolic execution on the byte code, simply call klee. You may want to specify a backend if you have multiple engine installed.

1
klee --solver-backend=z3 get_sign.bc

You may then proceed following the documentation to use ktest-tool to view the generated test case or to replay it. And it is always a good choice to read the help information via klee --help.

More options

Then there is something not covered by the documentation. By default KLEE only output path constraint if there is an exception. If you want to view the path constraint discovered all the time, you can add --write-kqueries for Z3 solver.

You can also simulate assertion using klee_assert(0) to crash the program at any location.

If you used library functions, specify --posix-runtime to link with the runtime.

Use --max-time to specify maximum time to explore the input space.


Epilogue

This is merely a simple summary, not something with quality. However, it can be the beginning of “formal” posts, where I should write critical literature review on some topics. ᓚᘏᗢ