MASTERÂ Assignment
Hybrid Binary Function Similarity via IR, Symbolic Execution and Locality-sensitive hashing
Type : Master M-CS
Period: October 2025 - March, 2026
Student: Molenaar, K. (Koen, Student M-CS)
Date Final project: March 19, 2026
Supervisors:
Abstract:
This research explores a hybrid approach to function-level binary code similarity analysis. Existing techniques often rely exclusively on either static or dynamic analysis, each with their own strengths and weaknesses. This research proposes a hybrid workflow, combining static and dynamic analysis. Static analysis collects ngrams of normalized LLVM IR, derived by lifting binary functions from assembly to LLVM IR. Dynamic analysis collects semantic runtime information via function-level symbolic execution. These collected features are used to produce static and dynamic signatures using locality-sensitive hashing (LSH). This hybrid approach is evaluated on function-level similarity calculation across different architectures, compilers, and optimization settings. Additionally, the fusion of static- and dynamic-derived similarity results is evaluated. Lastly, the design and implementation decisions needed to integrate LSH in a hybrid BCSA approach are described.

