View on GitHub

# Recent Papers Related To Fuzzing

Remark: This website is only used for collecting and grouping the related paper. If there are any paper need to be updated, you can contribute PR.

# All Papers (Classification according to Publication)

• Survey/Review
• ASE 2022
• [Effectively Generating Vulnerable Transaction Sequences in Smart Contracts with Reinforcement Learning-guided Fuzzing]
• [Efficient Greybox Fuzzing to Detect Memory Errors]
• [FuzzerAid: Grouping Fuzzed Crashes Based On Fault Signatures]
• [Fuzzle: Making a Puzzle for Fuzzers]
• [Griffin: Grammar-Free DBMS Fuzzing]
• [HTFuzz: Heap Operation Sequence Sensitive Fuzzing]
• [LawBreaker: An Approach for Specifying Traffic Laws and Fuzzing Autonomous Vehicles]
• [QATest: A Uniform Fuzzing Framework for Question Answering Systems]
• [So Many Fuzzers, So Little Time - Experience from Evaluating Fuzzers on the Contiki-NG Network (Hay)StackVirt]
• ISSTA 2022
• [Almost Correct Invariants: Synthesizing Inductive Invariants by Fuzzing Proofs]
• ocTer: Documentation-Guided Fuzzing for Testing Deep Learning API Functions
• [Efficient Greybox Fuzzing of Applications in Linux-based IoT Devices via Enhanced User-mode Emulation]
• [MDPFuzz: Testing Models Solving Markov Decision Processes]
• [PrIntFuzz: Fuzzing Linux Drivers via Automated Virtual Device Simulation]
• [SLIME: Program-sensitive Energy Allocation for Fuzzing]
• [SnapFuzz: High-Throughput Fuzzing of Network Applications]
• [TensileFuzz: Facilitating Seed Input Generation in Fuzzing via String Constraint Solving]
• ESEC/FSE 2022
• S&P 2022
• NDSS 2022
• USENIX SEC 2022
• ICSE 2022
• SANER 2022
• Others 2022
• ASE 2021
• ACSAC 2021
• SOSP 2021
• S&P 2021
• ISSTA 2021
• ESEC/FSE 2021
• CCS 2021
• OOPSLA 2021
• PLDI 2021
• ICSE 2021
• USENIX ATC 2021
• USENIX Security 2021
• NDSS 2021
• ICST 2021
• SANER 2021
• JFLA 2022
• Others 2021
• CCS 2020
• ASE 2020
• ISSTA 2020
• ESEC/FSE 2020
• ACSAC 2020
• PLDI 2020
• USENIX Security 2020
• ICSE 2020
• NDSS 2020
• S&P 2020
• SANER 2020
• ICST 2020
• ASIACCS 2020
• Others 2020
• ACSAC 2019
• OOPSLA 2019
• TSE 2019
• Access 2019
• CCS 2019
• S&P 2019
• USENIX Security 2019
• ASE 2019
• NDSS 2019
• ICSE 2019
• ESEC/FSE 2019
• ISSTA 2019
• PLDI 2019
• ASIACCS 2019
• ICST 2019
• ISSRE 2019
• Other 2019
• S&P 2018
• USENIX Security 2018
• CCS 2018
• NDSS 2018
• ICSE 2018
• ESEC/FSE 2018
• ASE 2018
• ISSTA 2018
• CAV 2018
• ACSAC 2018
• Other 2018
• S&P 2017
• USENIX Security 2017
• CCS 2017
• NDSS 2017
• ESEC/FSE 2017
• ASE 2017
• ISSRE 2017
• Other 2017
• CCS 2016
• NDSS 2016
• PLDI 2016
• S&P 2015
• USENIX Security 2015
• Others

# Survey/Review

### Fuzzing: Challenges and Reflections

Abstract: Fuzzing is a method to discover software bugs and vulnerabilities by automatic test input generation which has found tremendous recent interest in both academia and industry. Fuzzing comes in the form of several techniques. On one hand, we have symbolic execution, which enables a particularly effective approach to fuzzing by systematically enumerating the paths of a program. On the other hand, we have random input generation, which generates large amounts of inputs per second with none or minimal program analysis overhead. In this article, we summarize the open challenges and opportunities for fuzzing and symbolic execution as they emerged in discussions among researchers and practitioners in a Shonan Meeting, and were validated in a subsequent survey. We take a forward-looking view of the software vulnerability discovery technologies and provide concrete directions for future research.

### SoK: The Progress, Challenges, and Perspectives of Directed Greybox Fuzzing

Abstract: Greybox fuzzing has been the most scalable and practical approach to software testing. Most greybox fuzzing tools are coverage guided as code coverage is strongly correlated with bug coverage. However, since most covered codes may not containbugs, blindly extending code coverage is less efficient, especially for corner cases. Unlike coverage-based fuzzers who extend the code coverage in an undirected manner, a directed fuzzer spends most of its time budget on reaching specific target locations (e.g.,the bug-prone zone) without wasting resources stressing unrelated parts. Thus, directed greybox fuzzing is particularly suitable for scenarios such as patch testing, bug reproduction, and special bug hunting. In this paper, we conduct the first in-depth study of directed greybox fuzzing. We investigate 28 state-of-the-artfuzzers (82% are published after 2019) closely related to DGF, which have various directed types and optimization techniques. Based on the feature of DGF, we extract 15 metrics to conducta thorough assessment of the collected tools and systemize the knowledge of this field. Finally, we summarize the challenges and provide perspectives of this field, aiming to facilitate and boost future research on this topic.

### Fuzzing: Hack, Art, and Science (CACM 2020)

Abstract: Fuzzing, or fuzz testing, is the process of finding security vulnerabilities in input-parsing code by repeatedly testing the parser with modified, or fuzzed, inputs.35 Since the early 2000s, fuzzing has become a mainstream practice in assessing software security. Thousands of security vulnerabilities have been found while fuzzing all kinds of software applications for processing documents, images, sounds, videos, network packets, Web pages, among others. These applications must deal with untrusted inputs encoded in complex data formats. For example, the Microsoft Windows operating system supports over 360 file formats and includes millions of lines of code just to handle all of these.

### Survey of Directed Fuzzy Technology

Abstract: The fuzzy testing technology can effectively detect vulnerabilities. Based on Directed Symbolic Execution (DSE) fuzzing and Directed Grey Box Fuzzing (DGF), which can reach the specified target locations and scan the vulnerability quickly and efficiently. This paper introduces the theoretical knowledge of directed fuzzy testing technology, and several state-of-the-art fuzzy testing tools to elaborate the principle, advantages, disadvantages and the prospect of directed fuzzy technology.

### A Review of Machine Learning Applications in Fuzzing

Abstract: Fuzzing has played an important role in improving software development and testing over the course of several decades. Recent research in fuzzing has focused on applications of machine learning (ML), offering useful tools to overcome challenges in the fuzzing process. This review surveys the current research in applying ML to fuzzing. Specifically, this review discusses successful applications of ML to fuzzing, briefly explores challenges encountered, and motivates future research to address fuzzing bottlenecks.

### A systematic review of fuzzing based on machine learning techniques

Abstract: Security vulnerabilities play a vital role in network security system. Fuzzing technology is widely used as a vulnerability discovery technology to reduce damage in advance. However, traditional fuzzing techniques have many challenges, such as how to mutate input seed files, how to increase code coverage, and how to effectively bypass verification. Machine learning technology has been introduced as a new method into fuzzing test to alleviate these challenges. This paper reviews the research progress of using machine learning technology for fuzzing test in recent years, analyzes how machine learning improve the fuzz process and results, and sheds light on future work in fuzzing. Firstly, this paper discusses the reasons why machine learning techniques can be used for fuzzing scenarios and identifies six different stages in which machine learning have been used. Then this paper systematically study the machine learning based fuzzing models from selection of machine learning algorithm, pre-processing methods, datasets, evaluation metrics, and hyperparameters setting. Next, this paper assesses the performance of the machine learning models based on the frequently used evaluation metrics. The results of the evaluation prove that machine learning technology has an acceptable capability of categorize predictive for fuzzing. Finally, the comparison on capability of discovering vulnerabilities between traditional fuzzing tools and machine learning based fuzzing tools is analyzed. The results depict that the introduction of machine learning technology can improve the performance of fuzzing. However, there are still some limitations, such as unbalanced training samples and difficult to extract the characteristics related to vulnerabilities.

### The Art, Science, and Engineering of Fuzzing: A Survey

Abstract: Among the many software testing techniques available today, fuzzing has remained highly popular due to its conceptual simplicity, its low barrier to deployment, and its vast amount of empirical evidence in discovering real-world software vulnerabilities. At a high level, fuzzing refers to a process of repeatedly running a program with generated inputs that may be syntactically or semantically malformed. While researchers and practitioners alike have invested a large and diverse effort towards improving fuzzing in recent years, this surge of work has also made it difficult to gain a comprehensive and coherent view of fuzzing. To help preserve and bring coherence to the vast literature of fuzzing, this paper presents a unified, general-purpose model of fuzzing together with a taxonomy of the current fuzzing literature. We methodically explore the design decisions at every stage of our model fuzzer by surveying the related literature and innovations in the art, science, and engineering that make modern-day fuzzers effective.

### Fuzzing: Art, Science, and Engineering

Abstract: Among the many software vulnerability discovery techniques available today, fuzzing has remained highly popular due to its conceptual simplicity, its low barrier to deployment, and its vast amount of empirical evidence in discovering real-world software vulnerabilities. At a high level, fuzzing refers to a process of repeatedly running a program with generated inputs that may be syntactically or semantically malformed. While researchers and practitioners alike have invested a large and diverse effort towards improving fuzzing in recent years, this surge of work has also made it difficult to gain a comprehensive and coherent view of fuzzing. To help preserve and bring coherence to the vast literature of fuzzing, this paper presents a unified, general-purpose model of fuzzing together with a taxonomy of the current fuzzing literature. We methodically explore the design decisions at every stage of our model fuzzer by surveying the related literature and innovations in the art, science, and engineering that make modern-day fuzzers effective.

### Fuzzing: a survey

Abstract: Security vulnerability is one of the root causes of cyber-security threats. To discover vulnerabilities and fix them in advance, researchers have proposed several techniques, among which fuzzing is the most widely used one. In recent years, fuzzing solutions, like AFL, have made great improvements in vulnerability discovery. This paper presents a summary of the recent advances, analyzes how they improve the fuzzing process, and sheds light on future work in fuzzing. Firstly, we discuss the reason why fuzzing is popular, by comparing different commonly used vulnerability discovery techniques. Then we present an overview of fuzzing solutions, and discuss in detail one of the most popular type of fuzzing, i.e., coverage-based fuzzing. Then we present other techniques that could make fuzzing process smarter and more efficient. Finally, we show some applications of fuzzing, and discuss new trends of fuzzing and potential future directions.

### Fuzzing: State of the art

Abstract: As one of the most popular software testing techniques, fuzzing can find a variety of weaknesses in a program, such as software bugs and vulnerabilities, by generating numerous test inputs. Due to its effectiveness, fuzzing is regarded as a valuable bug hunting method. In this paper, we present an overview of fuzzing that concentrates on its general process, as well as classifications, followed by detailed discussion of the key obstacles and some state-of-the-art technologies which aim to overcome or mitigate these obstacles. We further investigate and classify several widely used fuzzing tools. Our primary goal is to equip the stakeholder with a better understanding of fuzzing and the potential solutions for improving fuzzing methods in the spectrum of software testing and security. To inspire future research, we also predict some future directions with regard to fuzzing.

### Fuzzing: A Survey for Roadmap

Abstract: Fuzz testing (fuzzing) has witnessed its prosperity in detecting security flaws recently. It generates a large number of test cases and monitors the executions for defects. Fuzzing has detected thousands of bugs and vulnerabilities in various applications. Although effective, there lacks systematic analysis of gaps faced by fuzzing. As a technique of defect detection, fuzzing is required to narrow down the gaps between the entire input space and the defect space. Without limitation on the generated inputs, the input space is infinite. However, defects are sparse in an application, which indicates that the defect space is much smaller than the entire input space. Besides, because fuzzing generates numerous test cases to repeatedly examine targets, it requires fuzzing to perform in an automatic manner. Due to the complexity of applications and defects, it is challenging to automatize the execution of diverse applications. In this paper, we systematically review and analyze the gaps as well as their solutions, considering both breadth and depth. This survey can be a roadmap for both beginners and advanced developers to better understand fuzzing.

### Survey of Software Fuzzing Techniques

Abstract: As cybersecurity becomes more than an afterthought and receives the attention it deserves, it is critical to use tools that examine and understand the flaws within programs. One such tool is fuzzing, a testing process which subjects a system or program to a stream of input data, where the goal of the test is to check for exploitable edge cases. This testing is vital, since a hacker may bombard a system with a variety of inputs and scan the system for weaknesses after causing the system to fail. As the security landscape is constantly shifting, there is a need for an up to date review of the state of literature so readers can make informed decisions about the current state of their systems and software. This survey does that by summarizing current state-of-the art fuzzing approaches, classifying these approaches, and highlighting key insights into the current state of research. The paper also identifies current challenges and suggests future research directions in this area.

### A Review of Fuzzing Tools and Methods

Abstract: This paper reviewed some of the most noteworthy academic literature and practical work that has been produced in the field of fuzzing. We first examined how vulnerabilities come to exist in software and how security researchers find them. After a brief overview of common vulnerability types and methods of static analysis, we looked in depth at the field of fuzzing. Competing approaches to fuzzing were examined, from simple random inputs all the way to using genetic algorithms and taint analysis. The importance of measuring code coverage to evaluate the completeness of a fuzzing campaign was examined. Finally, the focus was placed on the fuzz testing of web browsers and the specific tools and techniques related to that.

# Differential Fuzzing

### T-Reqs: HTTP Request Smuggling with Differential Fuzzing (CCS 2021)

Abstract: HTTP Request Smuggling (HRS) is an attack that exploits the HTTP processing discrepancies between two servers deployed in a proxy-origin configuration, allowing attackers to smuggle hidden requests through the proxy. While this idea is not new, HRS is soaring in popularity due to recently revealed novel exploitation techniques and real-life abuse scenarios. In this work, we step back from the highly-specific exploits hogging the spotlight, and present the first work that systematically explores HRS within a scientific framework. We design an experiment infrastructure powered by a novel grammar-based differential fuzzer, test 10 popular server/proxy/CDN technologies in combinations, identify pairs that result in processing discrepancies, and discover exploits that lead to HRS. Our experiment reveals previously unknown ways to manipulate HTTP requests for exploitation, and for the first time documents the server pairs prone to HRS.

### CatchBackdoor: Backdoor Testing by Critical Trojan Neural Path Identification via Differential Fuzzing (2021)

Abstract: The success of deep neural networks (DNNs) in real-world applications has benefited from abundant pre-trained models. However, the backdoored pre-trained models can pose a significant trojan threat to the deployment of downstream DNNs. Existing DNN testing methods are mainly designed to find incorrect corner case behaviors in adversarial settings but fail to discover the backdoors crafted by strong trojan attacks. Observing the trojan network behaviors shows that they are not just reflected by a single compromised neuron as proposed by previous work but attributed to the critical neural paths in the activation intensity and frequency of multiple neurons. This work formulates the DNN backdoor testing and proposes the CatchBackdoor framework. Via differential fuzzing of critical neurons from a small number of benign examples, we identify the trojan paths and particularly the critical ones, and generate backdoor testing examples by simulating the critical neurons in the identified paths. Extensive experiments demonstrate the superiority of CatchBackdoor, with higher detection performance than existing methods. CatchBackdoor works better on detecting backdoors by stealthy blending and adaptive attacks, which existing methods fail to detect. Moreover, our experiments show that CatchBackdoor may reveal the potential backdoors of models in Model Zoo.

### Duo: Differential Fuzzing for Deep Learning Operators (IEEE Transactions on Reliability 2021)

Abstract: Deep learning (DL) libraries reduce the barriers to the DL model construction. In DL libraries, various building blocks are DL operators with different functionality, responsible for processing high-dimensional tensors during training and inference. Thus, the quality of operators could directly impact the quality of models. However, existing DL testing techniques mainly focus on robustness testing of trained neural network models and cannot locate DL operators’ defects. The insufficient test input and undetermined test output in operator testing have become challenging for DL library developers. In this article, we propose an approach, namely Duo, which combines fuzzing techniques and differential testing techniques to generate input and evaluate corresponding output. It implements mutation-based fuzzing to produce tensor inputs by employing nine mutation operators derived from genetic algorithms and differential testing to evaluate outputs’ correctness from multiple operator instances. Duo is implemented in a tool and used to evaluate seven operators from TensorFlow, PyTorch, MNN, and MXNet in an experiment. The result shows that Duo can expose defects of DL operators and realize multidimension evaluation for DL operators from different DL libraries.

### DiFuzzRTL: Differential Fuzz Testing to Find CPU Bug (S&P 2021)

Abstract: DifuzzRTL is a differential fuzz testing approach for CPU verification. We introduce new coverage metric, register-coverage, which comprehensively captures the states of an RTL design and correctly guides the input generation. DifuzzRTL automatically instruments register-coverage, randomly generates and mutates instructions defined in ISA, then cross-check against an ISA simulator to detect bugs.

### DPIFuzz: A Differential Fuzzing Framework to Detect DPI Elusion Strategies for QUIC (ACSAC 2020)

Abstract: QUIC is an emerging transport protocol that has the potential to replace TCP in the near future. As such, QUIC will become an important target for Deep Packet Inspection (DPI). Reliable DPI is essential, e.g., for corporate environments, to monitor traffic entering and leaving their networks. However, elusion strategies threaten the validity of DPI systems, as they allow attackers to carefully design traffic to fool and thus evade on-path DPI systems. While such elusion strategies for TCP are well documented, it is unclear if attackers will be able to elude QUIC-based DPI systems. In this paper, we systematically explore elusion methodologies for QUIC. To this end, we present DPIFuzz: a differential fuzzing framework which can automatically detect strategies to elude stateful DPI systems for QUIC. We use DPIFuzz to generate and mutate QUIC streams in order to compare (and find differences in) the server-side interpretations of five popular open-source QUIC implementations. We show that DPIFuzz successfully reveals DPI elusion strategies, such as using packets with duplicate packet numbers or exploiting the diverging handling of overlapping stream offsets by QUIC implementations. DPIFuzz additionally finds four security-critical vulnerabilities in these QUIC implementations.

### DifFuzz: Differential Fuzzing for Side-Channel Analysis (ICSE 2019)

Abstract: Side-channel attacks allow an adversary to uncover secret program data by observing the behavior of a program with respect to a resource, such as execution time, consumed memory or response size. Side-channel vulnerabilities are difficult to reason about as they involve analyzing the correlations between resource usage over multiple program paths. We present DifFuzz, a fuzzing-based approach for detecting side-channel vulnerabilities related to time and space. DifFuzz automatically detects these vulnerabilities by analyzing two versions of the program and using resource-guided heuristics to find inputs that maximize the difference in resource consumption between secret-dependent paths. The methodology of DifFuzz is general and can be applied to programs written in any language. For this paper, we present an implementation that targets analysis of Java programs, and uses and extends the Kelinci and AFL fuzzers. We evaluate DifFuzz on a large number of Java programs and demonstrate that it can reveal unknown side-channel vulnerabilities in popular applications. We also show that DifFuzz compares favorably against Blazer and Themis, two state-of-the-art analysis tools for finding side-channels in Java programs.

### Deep Differential Testing of JVM Implementations (ICSE 2019)

Abstract: The Java Virtual Machine (JVM) is the cornerstone of the widely-used Java platform. Thus, it is critical to ensure the reliability and robustness of popular JVM implementations. However, little research exists on validating production JVMs. One notable effort is classfuzz, which mutates Java bytecode syntactically to stress-test different JVMs. It is shown that classfuzz mainly produces illegal bytecode files and uncovers defects in JVMs’ startup processes. It remains a challenge to effectively test JVMs’ bytecode verifiers and execution engines to expose deeper bugs.

This paper tackles this challenge by introducing classming, a novel, effective approach to performing deep, differential JVM testing. The key of classming is a technique, live bytecode mutation, to generate, from a seed bytecode file f, likely valid, executable (live) bytecode files: (1) capture the seed f’s live bytecode, the sequence of its executed bytecode instructions; (2) repeatedly manipulate the control- and data-flow in f’s live bytecode to generate semantically different mutants; and (3) selectively accept the generated mutants to steer the mutation process toward live, diverse mutants. The generated mutants are then employed to differentially test JVMs.

We have evaluated classming on mainstream JVM implementations, including OpenJDK’s HotSpot and IBM’s J9, by mutating the DaCapo benchmarks. Our results show that classming is very effective in uncovering deep JVM differences. More than 1,800 of the generated classes exposed JVM differences, and more than 30 triggered JVM crashes. We analyzed and reported the JVM runtime differences and crashes, of which 14 have already been confirmed/fixed, including a highly critical security vulnerability in J9 that allowed untrusted code to disable the security manager and elevate its privileges (CVE-2017-1376).

### Hunting for bugs in code coverage tools via randomized differential testing (ICSE 2019)

Abstract: Reliable code coverage tools are critically important as it is heavily used to facilitate many quality assurance activities, such as software testing, fuzzing, and debugging. However, little attention has been devoted to assessing the reliability of code coverage tools. In this study, we propose a randomized differential testing approach to hunting for bugs in the most widely used C code coverage tools. Specifically, by generating random input programs, our approach seeks for inconsistencies in code coverage reports produced by different code coverage tools, and then identifies inconsistencies as potential code coverage bugs. To effectively report code coverage bugs, we addressed three specific challenges: (1) How to filter out duplicate test programs as many of them triggering the same bugs in code coverage tools; (2) how to automatically reduce large test programs to much smaller ones that have the same properties; and (3) how to determine which code coverage tools have bugs? The extensive evaluations validate the effectiveness of our approach, resulting in 42 and 28 confirmed/fixed bugs for gcov and llvm-cov, respectively. This case study indicates that code coverage tools are not as reliable as it might have been envisaged. It not only demonstrates the effectiveness of our approach, but also highlights the need to continue improving the reliability of code coverage tools. This work opens up a new direction in code coverage validation which calls for more attention in this area.

### Different is Good: Detecting the Use of Uninitialized Variables through Differential Replay (CCS 2019)

Abstract: The use of uninitialized variables is a common issue. It could cause kernel information leak, which defeats the widely deployed security defense, i.e., kernel address space layout randomization (KASLR). Though a recent system called Bochspwn Reloaded reported multiple memory leaks in Windows kernels, how to effectively detect this issue is still largely behind.

In this paper, we propose a new technique, i.e., differential replay, that could effectively detect the use of uninitialized variables. Specifically, it records and replays a program’s execution in multiple instances. One instance is with the vanilla memory, the other one changes (or poisons) values of variables allocated from the stack and the heap. Then it compares program states to find references to uninitialized variables. The idea is that if a variable is properly initialized, it will overwrite the poisoned value and program states in two running instances should be the same. After detecting the differences, our system leverages the symbolic taint analysis to further identify the location where the variable was allocated. This helps us to identify the root cause and facilitate the development of real exploits. We have implemented a prototype called TimePlayer. After applying it to both Windows 7 and Windows 10 kernels (x86/x64), it successfully identified 34 new issues and another 85 ones that had been patched (some of them were publicly unknown.) Among 34 new issues, 17 of them have been confirmed as zero-day vulnerabilities by Microsoft.

### Differential Program Analysis with Fuzzing and Symbolic Execution (ASE 2018)

Abstract: Differential program analysis means to identify the behavioral divergences in one or multiple programs, and it can be classified into two categories: identify the behavioral divergences (1) between two program versions for the same input (aka regression analysis), and (2) for the same program with two different inputs (e.g, side-channel analysis). Most of the existent approaches for both subproblems try to solve it with single techniques, which suffer from its weaknesses like scalability issues or imprecision. This research proposes to combine two very strong techniques, namely fuzzing and symbolic execution to tackle these problems and provide scalable solutions for real-world applications. The proposed approaches will be implemented on top of state-of-the-art tools like AFL and Symbolic PathFinder to evaluate them against existent work.

### NEZHA: Efficient Domain-Independent Differential Testing (S&P 2017)

Abstract: Differential testing uses similar programs as cross-referencing oracles to find semantic bugs that do not exhibit explicit erroneous behaviors like crashes or assertion failures. Unfortunately, existing differential testing tools are domain-specific and inefficient, requiring large numbers of test inputs to find a single bug. In this paper, we address these issues by designing and implementing NEZHA, an efficient input-format-agnostic differential testing framework. The key insight behind NEZHA’s design is that current tools generate inputs by simply borrowing techniques designed for finding crash or memory corruption bugs in individual programs (e.g., maximizing code coverage). By contrast, NEZHA exploits the behavioral asymmetries between multiple test programs to focus on inputs that are more likely to trigger semantic bugs. We introduce the notion of δ-diversity, which summarizes the observed asymmetries between the behaviors of multiple test applications. Based on δ-diversity, we design two efficient domain-independent input generation mechanisms for differential testing, one gray-box and one black-box. We demonstrate that both of these input generation schemes are significantly more efficient than existing tools at finding semantic bugs in real-world, complex software.

### Coverage-Directed Differential Testing of JVM Implementations (PLDI 2016)

Java virtual machine (JVM) is a core technology, whose reliability is critical. Testing JVM implementations requires painstaking effort in designing test classfiles (*.class) along with their test oracles. An alternative is to employ binary fuzzing to differentially test JVMs by blindly mutating seeding classfiles and executing the resulting mutants on different JVMs for revealing inconsistent behaviors. However, this blind approach is not cost effective in practice because (1) most of the mutants are invalid and redundant, and (2) the discovered JVM discrepancies, if any, mostly only concern compatibility issues, rather than actual defects. This paper tackles this challenge by introducing classfuzz, a coverage-directed fuzzing approach that focuses on representative classfiles for differential JVM testing. Our core insight is to (1) mutate seeding classfiles using a set of predefined mutation operators and employ Markov Chain Monte Carlo (MCMC) sampling to guide mutator selection, and (2) execute the mutants on a reference JVM implementation and use coverage uniqueness as a discipline for accepting representative ones. The accepted classfiles are used as inputs to differentially test JVMs and find defects. We have implemented classfuzz and conducted an extensive evaluation of it against existing fuzz testing algorithms. Our evaluation results show that classfuzz can enhance the ratio of discrepancy-triggering classfiles from 1.7% to 11.9%. We have also reported 62 defect-indicative discrepancies, along with the test classfiles, to JVM developers. A number of our reported issues have already been confirmed as JVM defects, and some even match recent clarifications and changes to the JVM specification, Java SE 8 Edition.

# Evaluate Fuzzing

### FIXREVERTER: A Realistic Bug Injection Methodology for Benchmarking Fuzz Testing (USENIX Security2022)

Abstract: Fuzz testing is an active area of research with proposed improvements published at a rapid pace. Such proposals are assessed empirically: Can they be shown to perform better than the status quo? Such an assessment requires a benchmark of target programs with well-identified, realistic bugs. To ease the construction of such a benchmark, this paper presents FIXREVERTER, a tool that automatically injects realistic bugs in a program. FIXREVERTER takes as input a bugfix pattern which contains both code syntax and semantic conditions. Any code site that matches the specified syntax is undone if the semantic conditions are satisfied, as checked by static analysis, thus (re)introducing a likely bug. This paper focuses on three bugfix patterns, which we call conditional-abort, conditional-execute, and conditional-assign, based on a study of fixes in a corpus of Common Vulnerabilities and Exposures (CVEs). Using FIXREVERTER we have built REVBUGBENCH, which consists of 10 programs into which we have injected nearly 8,000 bugs; the programs are taken from FuzzBench and Binutils, and represent common targets of fuzzing evaluations. We have integrated REVBUGBENCH into the FuzzBench service, and used it to evaluate five fuzzers. Fuzzing performance varies by fuzzer and program, as desired/expected. Overall, 219 unique bugs were reported, 19% of which were detected by just one fuzzer.

### On the Reliability of Coverage-Based Fuzzer Benchmarking (ICSE 2022)

Abstract: In one of the largest studies of measures of fuzzer effectiveness, involving over 13 million lines of code, 10 fuzzers, and 24 CPU years worth of fuzzing campaigns, we identify a \emph{very strong correlation} between the coverage achieved and the number of bugs found by a fuzzer: A fuzzer that covers more code also finds more bugs. Because bug-based benchmarking is expensive and subject to several threats to validity, it might seem reasonable to compare fuzzers in terms of the coverage achieved, and from that derive empirical claims about a fuzzer’s superiority at finding bugs.

Curiously enough, however, we find \emph{no strong agreement} on which fuzzer is superior if we compared multiple fuzzers in terms of coverage achieved instead of the number of bugs found. The fuzzer best at achieving coverage, may \emph{not} be best at finding bugs.

### Mutation Analysis: Answering the Fuzzing Challenge (2022)

Abstract: Fuzzing is one of the fastest growing fields in software testing. The idea behind fuzzing is to check the behavior of software against a large number of randomly generated inputs, trying to cover all interesting parts of the input space, while observing the tested software for anomalous behaviour. One of the biggest challenges facing fuzzer users is how to validate software behavior, and how to improve the quality of oracles used. While mutation analysis is the premier technique for evaluating the quality of software test oracles, mutation score is rarely used as a metric for evaluating fuzzer quality. Unless mutation analysis researchers can solve multiple problems that make applying mutation analysis to fuzzing challenging, mutation analysis may be permanently sidelined in one of the most important areas of testing and security research. This paper attempts to understand the main challenges in applying mutation analysis for evaluating fuzzers, so that researchers can focus on solving these challenges.

### Evaluating Code Coverage for Kernel Fuzzers via Function Call Graph (Access 2021)

Abstract: The OS kernel, which has full system privileges, is an attractive attack surface. A kernel fuzzer that targets system calls in fuzzing is a popular tool for discovering kernel bugs that can induce kernel privilege escalation attacks. To the best of our knowledge, the relevance of code coverage, which is obtained by fuzzing, to the system call has not been studied yet. For instance, modern coverage-guided kernel fuzzers, such as Syzkaller, estimate code coverage by comparing the entire set of executed basic blocks (or edges) regardless of the system call relevancy. Our insight is that the system call relevancy could be an essential performance indicator for realizing kernel fuzzing. In this regard, this study aims to assess the system call-related code coverage of kernel fuzzers. For this purpose, we have developed a practical assessment system that leverages the Intel PT and KCOV and assessed the Linux kernel fuzzers, such as Syzkaller, Trinity, and ext4 fuzzer. The experiments on different kernel versions demonstrated that approximately 32,000–47,000 functions are implemented in the Linux kernel, and approximately 9.7–15.2% are related to the system call. Our finding is that fuzzers that achieve higher code coverage in conventional metrics do not execute more basic blocks related to system calls. Thus, we recommend that kernel fuzzers use both system call-related functions and regular basic blocks in coverage metrics to assess fuzzing performance or to improve coverage feedback.

### FuzzBench: An Open Fuzzer Benchmarking Platform and Service (FSE 2021)

Abstract: Fuzzing is a key tool used to reduce bugs in production software. At Google, fuzzing has uncovered tens of thousands of bugs. Fuzzing is also a popular subject of academic research. In 2020 alone, over 120 papers were published on the topic of improving, developing, and evaluating fuzzers and fuzzing techniques. Yet, proper evaluation of fuzzing techniques remains elusive. The community has struggled to converge on methodology and standard tools for fuzzer evaluation.

To address this problem, we introduce FuzzBench as an opensource turnkey platform and free service for evaluating fuzzers. It aims to be easy to use, fast, reliable, and provides reproducible experiments. Since its release in March 2020, FuzzBench has been widely used both in industry and academia, carrying out more than 150 experiments for external users. It has been used by several published and in-the-work papers from academic groups, and has had real impact on the most widely used fuzzing tools in industry. The presented case studies suggest that FuzzBench is on its way to becoming a standard fuzzer benchmarking platform.

### An Empirical Study of OSS-Fuzz Bugs (MSR 2021)

Abstract: Continuous fuzzing is an increasingly popular technique for automated quality and security assurance. Google maintains OSS-Fuzz: a continuous fuzzing service for open source software. We conduct the first empirical study of OSS-Fuzz, analyzing 23,907 bugs found in 316 projects. We examine the characteristics of fuzzer-found faults, the lifecycles of such faults, and the evolution of fuzzing campaigns over time. We find that OSS-Fuzz is often effective at quickly finding bugs, and developers are often quick to patch them. However, flaky bugs, timeouts, and out of memory errors are problematic, people rarely file CVEs for security vulnerabilities, and fuzzing campaigns often exhibit punctuated equilibria, where developers might be surprised by large spikes in bugs found. Our findings have implications on future fuzzing research and practice.

### Industrial Oriented Evaluation of Fuzzing Techniques (ICST 2021)

Abstract: Fuzzing is a promising method for discovering vulnerabilities. Recently, various techniques are developed to improve the efficiency of fuzzing, and impressive gains are observed in evaluation results. However, evaluation is complex, as many factors affect the results, for example, test suites, baseline and metrics. Even more, most experiment setups are lab-oriented, lacking industrial settings such as large code-base and parallel runs. The correlation between the academic evaluation results and the bug-finding ability in real industrial settings has not been sufficiently studied.

In this paper, we test representative fuzzing techniques to reveal their efficiency in industrial settings. First, we apply typical fuzzers on academic widely used small projects from LAVAM suite. We also apply the same fuzzers on large practical projects from Google’s fuzzer-test-suite, which is rarely used in academic settings. Both experiments are performed in both single and parallel run. By analyzing the results, we found that most optimizations working well on LAVA-M suite fail to achieve satisfying results on Google’s fuzzer-test-suite (e.g. compared to AFL, QSYM detects 82x more synthesized bugs in LAVA-M, but only detects 26% real bugs in Google’s fuzzer-test-suite), and the original AFL even outperforms most academic optimization variants in industry widely used parallel runs (e.g. AFL covers 13% more paths than AFLFast). Then, we summarize common pitfalls of those optimizations, analyze the corresponding root causes, and propose potential directions such as orchestrations and synchronization to overcome the problems. For example, when running in parallel on those large practical projects, the proposed horizontal orchestration could cover 36%-82% more paths, and discover 46%-150% more unique crashes or bugs, compared to fuzzers such as AFL, FairFuzz and QSYM.

### UNIFUZZ: A Holistic and Pragmatic Metrics-Driven Platform for Evaluating Fuzzers (USENIX Security2021)

Abstract: A flurry of fuzzing tools (fuzzers) have been proposed in the literature, aiming at detecting software vulnerabilities effectively and efficiently. To date, it is however still challenging to compare fuzzers due to the inconsistency of the benchmarks, performance metrics, and/or environments for evaluation, which buries the useful insights and thus impedes the discovery of promising fuzzing primitives. In this paper, we design and develop UNIFUZZ, an open-source and metrics-driven platform for assessing fuzzers in a comprehensive and quantitative manner. Specifically, UNIFUZZ to date has incorporated 35 usable fuzzers, a benchmark of 20 real-world programs, and six categories of performance metrics. We first systematically study the usability of existing fuzzers, find and fix a number of flaws, and integrate them into UNIFUZZ. Based on the study, we propose a collection of pragmatic performance metrics to evaluate fuzzers from six complementary perspectives. Using UNIFUZZ, we conduct in-depth evaluations of several prominent fuzzers including AFL [1], AFLFast [2], Angora [3], Honggfuzz [4], MOPT [5], QSYM [6], T-Fuzz [7] and VUzzer64 [8]. We find that none of them outperforms the others across all the target programs, and that using a single metric to assess the performance of a fuzzer may lead to unilateral conclusions, which demonstrates the significance of comprehensive metrics. Moreover, we identify and investigate previously overlooked factors that may significantly affect a fuzzer’s performance, including instrumentation methods and crash analysis tools. Our empirical results show that they are critical to the evaluation of a fuzzer. We hope that our findings can shed light on reliable fuzzing evaluation, so that we can discover promising fuzzing primitives to effectively facilitate fuzzer designs in the future.

### My Fuzzer Beats Them All! Developing a Framework for Fair Evaluation and Comparison of Fuzzers (2021)

Abstract: Fuzzing has become one of the most popular techniques to identify bugs in software. To improve the fuzzing process, a plethora of techniques have recently appeared in academic literature. However, evaluating and comparing these techniques is challenging as fuzzers depend on randomness when generating test inputs. Commonly, existing evaluations only partially follow best practices for fuzzing evaluations. We argue that the reason for this are twofold. First, it is unclear if the proposed guidelines are necessary due to the lack of comprehensive empirical data in the case of fuzz testing. Second, there does not yet exist a framework that integrates statistical evaluation techniques to enable fair comparison of fuzzers. To address these limitations, we introduce a novel fuzzing evaluation framework called SENF (Statistical EvaluatioN of Fuzzers). We demonstrate the practical applicability of our framework by utilizing the most wide-spread fuzzer AFL as our baseline fuzzer and exploring the impact of different evaluation parameters (e.g., the number of repetitions or run-time), compilers, seeds, and fuzzing strategies. Using our evaluation framework, we show that supposedly small changes of the parameters can have a major influence on the measured performance of a fuzzer.

### A Quantitative Comparison of Covera (AST 2020)

Abstract: In recent years, many tools have been developed for fuzz testing that generates and executes test cases repeatedly. However, many studies use different fuzzing targets and evaluation criteria and then it is difficult to compare the performance of the existing tools for fuzz testing. Therefore, we prepared a unified collection of fuzzing targets and then compared 8 fuzzers with the benchmark. In comparison, we compared the fuzzers based on the number of execution paths and branch coverage. The result shows that the number of execution paths is significantly different between the fuzzers. On the other hand, the statistical difference is not confirmed between the branch converges of the fuzzers.

### Fuzzing: On the Exponential Cost of Vulnerability Discovery (FSE 2020)

Abstract: We present counterintuitive results for the scalability of fuzzing. Given the same non-deterministic fuzzer, finding the same bugs linearly faster requires linearly more machines. Yet, finding linearly more bugs in the same time requires exponentially more machines. Similarly, with exponentially more machines, we can cover the same code exponentially faster, but uncovered code only linearly faster. In other words, re-discovering the same vulnerabilities (or achieving the same coverage) is cheap but finding new vulnerabilities (or achieving more coverage) is expensive. This holds even under the simplifying assumption of no parallelization overhead.

We derive these observations from over four CPU years worth of fuzzing campaigns involving almost three hundred open source programs, two state-of-the-art greybox fuzzers, four measures of code coverage, and two measures of vulnerability discovery. We provide a probabilistic analysis and conduct simulation experiments to explain this phenomenon.

### A Feature-Oriented Corpus for understanding, Evaluating and Improving Fuzz Testing (ASIACCS 2019)

Abstract: Fuzzing is a promising technique for detecting security vulnerabilities. Newly developed fuzzers are typically evaluated in terms of the number of bugs found on vulnerable programs/binaries. However,existing corpora usually do not capture the features that prevent fuzzers from finding bugs, leading to ambiguous conclusions on the pros and cons of the fuzzers evaluated. A typical example is that Driller detects more bugs than AFL, but its evaluation cannot establish if the advancement of Driller stems from the concolic execution or not, since, for example, its ability in resolving a datasets magic values is unclear. In this paper, we propose to address the above problem by generating corpora based on search-hampering features. As a proof-of-concept, we have designed FEData, a prototype corpus that currently focuses on four search-hampering features to generate vulnerable programs for fuzz testing. Unlike existing corpora that can only answer “how”, FEData can also further answer “why” by exposing (or understanding) the reasons for the identified weaknesses in a fuzzer. The “why” information serves as the key to the improvement of fuzzers.

### Be Sensitive and Collaborative: Analyzing Impact of Coverage Metrics in Greybox Fuzzing (RAID 2019)

Abstract: Coverage-guided greybox fuzzing has become one of the most common techniques for finding software bugs. Coverage metric, which decides how a fuzzer selects new seeds, is an essential parameter of fuzzing and can significantly affect the results. While there are many existing works on the effectiveness of different coverage metrics on software testing, little is known about how different coverage metrics could actually affect the fuzzing results in practice. More importantly, it is unclear whether there exists one coverage metric that is superior to all the other metrics. In this paper, we report the first systematic study on the impact of different coverage metrics in fuzzing. To this end, we formally define and discuss the concept of sensitivity, which can be used to theoretically compare different coverage metrics. We then present several coverage metrics with their variants. We conduct a study on these metrics with the DARPA CGC dataset, the LAVA-M dataset, and a set of real-world applications (a total of 221 binaries). We find that because each fuzzing instance has limited resources (time and computation power), (1) each metric has its unique merit in terms of flipping certain types of branches (thus vulnerability finding) and (2) there is no grand slam coverage metric that defeats all the others. We also explore combining different coverage metrics through cross-seeding, and the result is very encouraging: this pure fuzzing based approach can crash at least the same numbers of binaries in the CGC dataset as a previous approach (Driller) that combines fuzzing and concolic execution. At the same time, our approach uses fewer computing resources.

### Study and Comparison of General Purpose Fuzzers

Abstract: Fuzz testing is a widely used technique for the detection of vulnerabilities whose popularity has led to the development of various tools that do fuzz testing. General-purpose fuzzers work in all domains while some other fuzzers are targeted towards some specific domain. Evaluation of these tools is not an easy task since different fuzzing tools excel in di erent domains. In this paper, we evaluate 3 such general-purpose fuzzing tools namely libFuzzer, American Fuzzy Lop(AFL) and honggfuzz on 2 metrics, i.e. their bug finding capability and their code coverage. We use the google fuzzer-test-suite which has 24 applications spanning several domains. libFuzzer performs best out of the three in finding memory leaks and out-of-memory related bugs but for other kinds of bugs, all three perform at par. honggfuzz seems to be the best in terms of coverage, though libFuzzer is not far behind, which we believe is because of our runs being of short duration.

### Evaluating Fuzz Testing (CCS 2018)

Abstract: Fuzz testing has enjoyed great success at discovering security critical bugs in real software. Recently, researchers have devoted significant effort to devising new fuzzing techniques, strategies, and algorithms. Such new ideas are primarily evaluated experimentally so an important question is: What experimental setup is needed to produce trustworthy results? We surveyed the recent research literature and assessed the experimental evaluations carried out by 32 fuzzing papers. We found problems in every evaluation we considered. We then performed our own extensive experimental evaluation using an existing fuzzer. Our results showed that the general problems we found in existing experimental evaluations can indeed translate to actual wrong or misleading assessments. We conclude with some guidelines that we hope will help improve experimental evaluations of fuzz testing algorithms, making reported results more robust.

# Instrumentation

### InstruGuard: Find and Fix Instrumentation Errors for Coverage-based Greybox Fuzzing (ASE 2021)

Abstract: As one of the most successful methods at vulnerability discovery, coverage-based greybox fuzzing relies on the lightweight compiler-level instrumentation to achieve the finegrained coverage feedback of the target program. Researchers improve it by optimizing the coverage metrics without questioning the correctness of the instrumentation. However, instrumentation errors, including missed instrumentation locations and redundant instrumentation locations, harm the ability of fuzzers. According to our experiments, it is a common and severe problem in various coverage-based greybox fuzzers and at different compiler optimization levels.

In this paper, we design and implement InstruGuard, an open-source and pragmatic platform to find and fix instrumentation errors. It detects instrumentation errors by static analysis on target binaries, and fixes them with a general solution based on binary rewriting. To study the impact of instrumentation errors and test our solutions, we built a dataset of 15 real-world rograms and selected 6 representative fuzzers as targets. We used InstruGuard to check and repair the instrumented binaries with different fuzzers and different compiler optimization options. To evaluate the effectiveness of the repair, we ran the fuzzers with original instrumented programs and the repaired ones, and compared the fuzzing results from aspects of execution paths, line coverage, and real bug findings. The results showed that InstruGuard had corrected the instrumentation errors of different fuzzers and helped to find more bugs in the dataset. Moreover, we discovered one new zero-day vulnerability missed by other fuzzers with fixed instrumentation but without any changes to the fuzzers.

### RIFF: Reduced Instruction Footprint for Coverage-Guided Fuzzing (USENIX ATC 2021)

Abstract: Coverage-guided fuzzers use program coverage measurements to explore different program paths efficiently. The coverage pipeline consists of runtime collection and post-execution processing procedures. First, the target program executes instrumentation code to collect coverage information. Then the fuzzer performs an expensive analysis on the collected data, yet most program executions lead to no increases in coverage. Inefficient implementations of these steps significantly reduce the fuzzer’s overall throughput.

In this paper, we propose RIFF, a highly efficient program coverage measurement mechanism to reduce fuzzing overhead. For the target program, RIFF moves computations originally done at runtime to instrumentation-time through static program analysis, thus reducing instrumentation code to a bare minimum. For the fuzzer, RIFF processes coverage with different levels of granularity and utilizes vector instructions to improve throughput.

We implement RIFF in state-of-the-art fuzzers such as AFL and MOpt and evaluate its performance on real-world programs in Google’s FuzzBench and fuzzer-test-suite. The results show that RIFF improves coverage measurement efficiency of fuzzers by 23× and 6× during runtime collection and post-execution processing, respectively. As a result, the fuzzers complete 147% more executions, and use only 6.53 hours to reach the 24-hour coverage of baseline fuzzers on average.

### Hashing Fuzzing: Introducing Input Diversity to Improve Crash Detection (TSE 2021)

Abstract: The utility of a test set of program inputs is strongly influenced by its diversity and its size. Syntax coverage has become a standard proxy for diversity. Although more sophisticated measures exist, such as proximity of a sample to a uniform distribution, methods to use them tend to be type dependent. We use r-wise hash functions to create a novel, semantics preserving, testability transformation for C programs that we call HashFuzz. Use of HashFuzz improves the diversity of test sets produced by instrumentation-based fuzzers. We evaluate the effect of the HashFuzz transformation on eight programs from the Google Fuzzer Test Suite using four state-of-the-art fuzzers that have been widely used in previous research. We demonstrate pronounced improvements in the performance of the test sets for the transformed programs across all the fuzzers that we used. These include strong improvements in diversity in every case, maintenance or small improvement in branch coverage – up to 4.8% improvement in the best case, and significant improvement in unique crash detection numbers – between 28% to 97% increases compared to test sets for untransformed programs.

### RetroWrite: Statically Instrumenting COTS Binaries for Fuzzing and Sanitization (S&P 2020)

Abstract: Analyzing the security of closed source binaries is currently impractical for end-users, or even developers who rely on third-party libraries. Such analysis relies on automatic vulnerability discovery techniques, most notably fuzzing with sanitizers enabled. The current state of the art for applying fuzzing or sanitization to binaries is a dynamic binary translation, which has a prohibitive performance overhead. The alternate technique, static binary rewriting, cannot fully recover symbolization information and hence has difficulty modifying binaries to track code coverage for fuzzing or to add security checks for sanitizers.

The ideal solution for binary security analysis would be a static rewriter that can intelligently add the required instrumentation as if it were inserted at compile time. Such instrumentation requires an analysis to statically disambiguate between references and scalars, a problem known to be undecidable in the general case. We show that recovering this information is possible in practice for the most common class of software and libraries: 64-bit, position-independent code. Based on this observation, we develop RetroWrite, binary-rewriting instrumentation to support American Fuzzy Lop (AFL) and Address Sanitizer (ASan), and show that it can achieve compiler-level performance while retaining precision. Binaries rewritten for coverage guided fuzzing using RetroWrite are identical in performance to compiler-instrumented binaries and outperform the default QEMU-based instrumentation by 4.5x while triggering more bugs. Our implementation of binary-only Address Sanitizer is 3x faster than Valgrind’s memcheck, the state-of-the-art binary-only memory checker, and detects 80% more bugs in our evaluation.

### INSTRCR: Lightweight instrumentation optimization based on coverage-guided fuzz testing (CCET 2019)

Abstract: In Fuzzing facing binary coverage, the main role of instrumentation is feedback code coverage (in the case of Fuzz for binary, instrumentation can provide coverage information, which plays an important role in guiding the operation of seeds in Fuzz) . The current instrumentation optimization technique mainly relies on the control flow graph (CFG) to select key basic blocks at the basic block level, but the accuracy of this method is not high enough. Considering that the actual path in the actual operation of the binary may be different from the CFG generated in advance, this paper is based on the indirect jump that cannot be accurately analyzed in the CFG, and some of the basic blocks that can be optimized for high-frequency interpolation. According to the algorithm proposed in this paper, The combination of static analysis and dynamic analysis is used to continuously adjust and select key basic block nodes for instrumentation. It is verified by experiments that this kind of instrumentation method can effectively improve the coverage rate and reduce the overhead, and provide effective guidance for Fuzzing, which can effectively reduce the Fuzzer’s false negatives.

### Full-speed Fuzzing: Reducing Fuzzing Overhead through Coverage-guided Tracing (S&P 2019)

Abstract: Coverage-guided fuzzing is one of the most successful approaches for discovering software bugs and security vulnerabilities. Of its three main components: (1) test case generation, (2) code coverage tracing, and (3) crash triage, code coverage tracing is a dominant source of overhead. Coverage-guided fuzzers trace every test case’s code coverage through either static or dynamic binary instrumentation, or more recently, using hardware support. Unfortunately, tracing all test cases incurs significant performance penalties–even when the overwhelming majority of test cases and their coverage information are discarded because they do not increase code coverage. To eliminate needless tracing by coverage-guided fuzzers, we introduce the notion of coverage-guided tracing. Coverage-guided tracing leverages two observations: (1) only a fraction of generated test cases increase coverage, and thus require tracing; and (2) coverage-increasing test cases become less frequent over time. Coverage-guided tracing encodes the current frontier of coverage in the target binary so that it self-reports when a test case produces new coverage–without tracing. This acts as a filter for tracing; restricting the expense of tracing to only coverage-increasing test cases. Thus, coverage-guided tracing trades increased time handling coverage-increasing test cases for decreased time handling non-coverage-increasing test cases. To show the potential of coverage-guided tracing, we create an implementation based on the static binary instrumentor Dyninst called UnTracer. We evaluate UnTracer using eight real-world binaries commonly used by the fuzzing community. Experiments show that after only an hour of fuzzing, UnTracer’s average overhead is below 1%, and after 24-hours of fuzzing, UnTracer approaches 0% overhead, while tracing every test case with popular white- and black-box-binary tracers AFL-Clang, AFL-QEMU, and AFL-Dyninst incurs overheads of 36%, 612%, and 518%, respectively. We further integrate UnTracer with the state-of-the-art hybrid fuzzer QSYM and show that in 24-hours of fuzzing, QSYM-UnTracer executes 79% and 616% more test cases than QSYM-Clang and QSYM-QEMU, respectively.

### INSTRIM Lightweight Instrumentation for Coverage-guided Fuzzing (NDSS 2018 workshop)

Abstract: Empowered by instrumentation, coverage-guided fuzzing monitors the program execution path taken by an input, and prioritizes inputs based on their contribution to code coverage. Although instrumenting every basic block ensures full visibility, it slows down the fuzzer and thus the speed of vulnerability discovery. This paper shows that thanks to common program structures (e.g., directed acyclic subgraphs and simple loops) and compiler optimization (e.g., knowledge of incoming edges), it is possible to accurately reconstruct coverage information by instrumenting only a small fraction of basic blocks. Specifically, we formulate the problem as a path differentiation problem on the control flow graph, and propose an efficient algorithm to select basic blocks that need to be instrumented so that different execution paths remain differentiable. We extend AFL to support such CFG-aware instrumentation. Our experiment results confirm that, compared with full instrumentation, our CFG-aware instrumentation only needs to instrument about 20% of basic blocks while offering 1.04–1.78x speedup during fuzzing. Finally, we highlight several technical challenges and promising research directions to further improve instrumentation for fuzzing.

### SyzGen: Automated Generation of Syscall Specification of Closed-Source macOS Drivers (CCS 2021)

Abstract: Kernel drivers are a critical part of the attack surface since they constitute a large fraction of kernel codebase and oftentimes lack proper vetting, especially for those closed-source ones. Unfortunately, the complex input structure and unknown relationships/dependencies among interfaces make them very challenging to understand. Thus, security analysts primarily rely on manual audit for interface recovery to generate meaningful fuzzing test cases. In this paper, we present SyzGen, a first attempt to automate the generation ofbsyscall specifications for closed-source macOS drivers and facilitate interface-aware fuzzing. We leverage two insights to overcome the challenges of binary analysis: (1) iterative refinement of syscall knowledge and (2) extraction and extrapolation of ependencies from a small number of execution traces. We evaluated our approach on 25 targets. The results show that SyzGen can effectively produce high-quality specifications, leading to 34 bugs, including one that attackers can exploit to escalate privilege, and 2 CVEs to date.

# IoT or protocols fuzzing

### FuzzUSB: Hybrid Stateful Fuzzing of USB Gadget Stacks (FSE 2022)

Abstract: Universal Serial Bus (USB) is the de facto protocol supported by peripherals and mobile devices, such as USB thumb drives and smart phones. For many devices, USB Type-C ports are the primary interface for charging, file transfer, audio, video, etc. Accordingly, attackers have exploited different vulnerabilities within USB stacks, compromising host machines via BadUSB attacks or jailbreaking iPhones from USB connections. While there exist fuzzing frameworks dedicated to USB vulnerability discovery, all of them focus on USB host stacks and ignore USB gadget stacks, which enable all the features within modern peripherals and smart devices. In this paper, we propose FuzzUSB, the first fuzzing framework for the USB gadget stack within commodity OS kernels, leveraging static analysis, symbolic execution, and stateful fuzzing. FuzzUSB combines static analysis and symbolic execution to extract internal state machines from USB gadget drivers, and uses them to achieve state-guided fuzzing through multi-channel inputs. We have implemented FuzzUSB upon the syzkaller kernel fuzzer and applied it to the most recent mainline Linux, Android, and FreeBSD kernels. As a result, we have found 34 previously unknown bugs within the Linux and Android kernels, and opened 8 CVEs. Furthermore, compared to the baseline, FuzzUSB has also demonstrated different improvements, including 3× higher code coverage, 50× improved bug-finding efficiency for Linux USB gadget stacks, 2× higher code coverage for FreeBSD USB gadget stacks, and reproducing known bugs that could not be detected by the baseline fuzzers. We believe FuzzUSB provides developers a powerful tool to thwart USB-related vulnerabilities within modern devices and complete the current USB fuzzing scope.

### SNPSFuzzer: A Fast Greybox Fuzzer for Stateful Network Protocols using Snapshots (2022)

Abstract: Greybox fuzzing has been widely used in stateless programs and has achieved great success. However, most state-of-the-art greybox fuzzers generally have the problems of slow speed and shallow state depth coverage in the process of fuzzing stateful network protocol programs which are able to remember and store details of the interactions. The existing greybox fuzzers for network protocol programs send a series of well-defined prefix sequences of input messages first and then send mutated messages to test the target state of a stateful network protocol. The process mentioned above causes a high time cost. In this paper, we propose SNPSFuzzer, a fast greybox fuzzer for stateful network protocol using snapshots. SNPSFuzzer dumps the context information when the network protocol program is under a specific state and restores it when the state needs to be fuzzed. Furthermore, we design a message chain analysis algorithm to explore more and deeper network protocol states. Our evaluation shows that, compared with the state-of-the-art network protocol greybox fuzzer AFLNET, SNPSFuzzer increases the speed of network protocol fuzzing by 112.0%-168.9% and improves path coverage by 21.4%-27.5% within 24 hours. Moreover, SNPSFuzzer exposes a previously unreported vulnerability in program Tinydtls.

### SnapFuzz: An Efficient Fuzzing Framework for Network Applications (2022)

Abstract: In recent years, fuzz testing has benefited from increased computational power and important algorithmic advances, leading to systems that have discovered many critical bugs and vulnerabilities in production software. Despite these successes, not all applications can be fuzzed efficiently. In particular, stateful applications such as network protocol implementations are constrained by their low fuzzing throughput and the need to develop fuzzing harnesses that reset their state and isolate their side effects. In this paper, we present SnapFuzz, a novel fuzzing framework for network applications. SnapFuzz offers a robust architecture that transforms slow asynchronous network communication into fast synchronous communication based on UNIX domain sockets, speeds up all file operations by redirecting them to an in-memory filesystem, and removes the need for many fragile modifications, such as configuring time delays or writing cleanup scripts, together with several other improvements. Using SnapFuzz, we fuzzed five popular networking applications: LightFTP, Dnsmasq, LIVE555, TinyDTLS and Dcmqrscp. We report impressive performance speedups of 72.4x, 49.7x, 24.8x, 23.9x, and 8.5x, respectively, with significantly simpler fuzzing harnesses in all cases. Through its performance advantage, SnapFuzz has also found 12 previously-unknown crashes in these applications.

### State Selection Algorithms and Their Impact on The Performance of Stateful Network Protocol Fuzzing (2021)

Abstract: The statefulness property of network protocol implementations poses a unique challenge for testing and verification techniques, including Fuzzing. Stateful fuzzers tackle this challenge by leveraging state models to partition the state space and assist the test generation process. Since not all states are equally important and fuzzing campaigns have time limits, fuzzers need effective state selection algorithms to prioritize progressive states over others. Several state selection algorithms have been proposed but they were implemented and evaluated separately on different platforms, making it hard to achieve conclusive findings. In this work, we evaluate an extensive set of state selection algorithms on the same fuzzing platform that is AFLNet, a state-of-the-art fuzzer for network servers. The algorithm set includes existing ones supported by AFLNet and our novel and principled algorithm called AFLNetLegion. The experimental results on the ProFuzzBench benchmark show that (i) the existing state selection algorithms of AFLNet achieve very similar code coverage, (ii) AFLNetLegion clearly outperforms these algorithms in selected case studies, but (iii) the overall improvement appears insignificant. These are unexpected yet interesting findings. We identify problems and share insights that could open opportunities for future research on this topic.

### ICS3Fuzzer: A Framework for Discovering Protocol Implementation Bugs in ICS Supervisory Software by Fuzzing (ACSAC 2021)

Abstract: The supervisory software is widely used in industrial control systems (ICSs) to manage field devices such as PLC controllers. Once compromised, it could be misused to control or manipulate these physical devices maliciously, endangering manufacturing process or even human lives. Therefore, extensive security testing of supervisory software is crucial for the safe operation of ICS. However, fuzzing ICS supervisory software is challenging due to the prevalent use of proprietary protocols. Without the knowledge of the program states and packet formats, it is difficult to enter the deep states for effective fuzzing.

In this work, we present a fuzzing framework to automatically discover implementation bugs residing in the communication protocols between the supervisory software and the field devices. To avoid heavy human efforts in reverse-engineering the proprietary protocols, the proposed approach constructs a state-book based on the readily-available execution trace of the supervisory software and the corresponding inputs. Then, we propose a state selection algorithm to find the protocol states that are more likely to have bugs. Our fuzzer distributes more budget on those interesting states. To quickly reach the interesting states, traditional snapshot-based method does not work since the communication protocols are time sensitive. We address this issue by synchronously managing external events (GUI operations and network traffic) during the fuzzing loop. We have implemented a prototype and used it to fuzz the supervisory software of four popular ICS platforms. We have found 13 bugs and received 3 CVEs, 2 are classified as critical (CVSS3.x score CRITICAL 9.8) and affected 40 different products.

### Westworld: Fuzzing-Assisted Remote Dynamic Symbolic Execution of Smart Apps on IoT Cloud Platforms (ACSAC 2021)

Abstract: Existing symbolic execution typically assumes the analyzer can control the I/O environment and/or access the library code, which, however, is not the case when programs run on a remote proprietary execution environment managed by another party. For example, SmartThings, one of the most popular IoT platforms, is such a cloud-based execution environment. For programmers who write automation applications to be deployed on IoT cloud platforms, it raises significant challenges when they want to systematically test their code and find bugs. We propose fuzzing-assisted remote dynamic symbolic execution, which uses dynamic symbolic execution as backbone and utilizes fuzzing when necessary to automatically test programs running in a remote proprietary execution environment over which the analyzer has little control. As a case study, we enable it for analyzing smart apps running on SmartThings. We have developed a prototype and the evaluation shows that it is effective in testing smart apps and finding bugs.

### ProFuzzBench - A Benchmark for Stateful Protocol Fuzzing (ISSTA 2021)

Abstract: We present a new benchmark (ProFuzzBench) for stateful fuzzing of network protocols. The benchmark includes a suite of representative open-source network servers for popular protocols, and tools to automate experimentation. We discuss challenges and potential directions for future research based on this benchmark.

### TCP-Fuzz: Detecting Memory and Semantic Bugs in TCP Stacks with Fuzzing (USENIX ATC 2021)

Abstract: TCP stacks provide reliable data transmission in network, and thus they should be correctly implemented and well tested to ensure reliability and security. However, testing TCP stacks is difficult. First, a TCP stack accepts packets and system calls that have dependencies between each other, and thus generating effective test cases is challenging. Second, a TCP stack has various complex state transitions, but existing testing approaches target covering states instead of covering state transitions, and thus their testing coverage is limited. Finally, our study of TCP stack commits shows that 87% of bug-fixing commits are related to semantic bugs (such as RFC violations), but existing bug sanitizers can detect only memory bugs not semantic bugs.

In this paper, we design a novel fuzzing framework named TCP-Fuzz, to effectively test TCP stacks and detect bugs. TCP-Fuzz consists of three key techniques: (1) a dependency-based strategy that considers dependencies between packets and system calls, to generate effective test cases; (2) a transition-guided fuzzing approach that uses a new coverage metric named branch transition as program feedback, to improve the coverage of state transitions; (3) a differential checker that compares the outputs of multiple TCP stacks for the same inputs, to detect semantic bugs. We have evaluated TCP-Fuzz on five widely-used TCP stacks (TLDK, F-Stack, mTCP, FreeBSD TCP and Linux TCP), and find 56 real bugs (including 8 memory bugs and 48 semantic bugs). 40 of these bugs have been confirmed by related developers.

### ICPFuzzer: proprietary communication protocol fuzzing by using machine learning and feedback strategies (Cybersecurity 2021)

Abstract: The fuzzing test is able to discover various vulnerabilities and has more chances to hit the zero-day targets. And ICS(Industrial control system) is currently facing huge security threats and requires security standards, like ISO 62443, to ensure the quality of the device. However, some industrial proprietary communication protocols can be customized and have complicated structures, the fuzzing system cannot quickly generate test data that adapt to various protocols. It also struggles to define the mutation field without having prior knowledge of the protocols. Therefore, we propose a fuzzing system named ICPFuzzer that uses LSTM(Long short-term memory) to learn the features of a protocol and generates mutated test data automatically. We also use the responses of testing and adjust the weight strategies to further test the device under testing (DUT) to find more data that cause unusual connection status. We verified the effectiveness of the approach by comparing with the open-source and commercial fuzzers. Furthermore, in a real case, we experimented with the DLMS/COSEM for a smart meter and found that the test data can cause a unusual response. In summary, ICPFuzzer is a black-box fuzzing system that can automatically execute the testing process and reveal vulnerabilities that interrupt and crash industrial control communication. Not only improves the quality of ICS but also improves safety.

### Fuzzing With Optimized Grammar-Aware Mutation Strategies (Access 2021)

Abstract: Fuzzing is a widely used technique to discover vulnerabilities in software. However, for programs requiring highly structured inputs, the byte-based mutation strategies in existing fuzzers have difficulties in generating valid inputs. To resolve this challenge, Grammar-Based Fuzzing (GBF) utilizes existing grammar specifications to generate new inputs. Some GBFs perform mutation based on Abstract Syntax Trees (ASTs), which can generate inputs conforming to grammars. However, the existing GBFs neglect using feedback to optimize mutation strategies, and blindly generate inputs without considering the effectiveness of those inputs. In this paper, we use the power schedule and the subtree pool to optimize mutation strategies. Specifically, we first translate input files into ASTs, and extract subtrees from ASTs into a subtree pool. Then, we optimize the power schedule on AST nodes based on a probabilistic model. That is, we adaptively determine the time budget for mutating an AST node. Finally, we replace AST nodes along with their subtrees using the ones we select from the subtree pool. We implement a fuzzing tool to demonstrate our strategies. The experiment results show that our method outperforms the state-of-the-art methods in fuzzing efficiency.

### FIRM-COV: High-Coverage Greybox Fuzzing for IoT Firmware via Optimized Process Emulation (Access 2021)

Abstract: With the growing prevalence of the Internet of Things (IoT), related security threats have kept pace. The need to dynamically detect vulnerabilities in IoT devices cannot be overstated. In this work, we present FIRM-COV, the first high coverage-oriented greybox fuzzer for IoT firmware. FIRM-COV leverages newly optimized process emulation by targeting IoT programs and mining real-world vulnerabilities. FIRM-COV focuses on solving problems of IoT fuzzing based on empirical analyses, using the required structured input, the inaccuracy and instability of emulation, and the required high code coverage. By optimizing the existing emulation technique, FIRM-COV always maintains a stable state and achieves high accuracy when detecting vulnerabilities. We also implement a dictionary generation algorithm to provide structured input values and synergy scheduling to achieve high coverage and throughput. We compare FIRM-COV with other IoT fuzzing frameworks for eight real-world IoT devices. As a result, FIRM-COV achieves the highest coverage and throughput, finding the fastest and most 1-day vulnerabilities with almost no false-positives. It also found two 0-day vulnerabilities in real-world IoT devices within 24 h.

### DIANE: Identifying Fuzzing Triggers in Apps to Generate Under-constrained Inputs for IoT Devices (S&P 2020)

Abstract: Internet of Things (IoT) devices have rooted themselves in the everyday life of billions of people. Thus, researchers have applied automated bug finding techniques to improve their overall security. However, due to the difficulties in extracting and emulating custom firmware, black-box fuzzing is often the only viable analysis option. Unfortunately, this solution mostly produces invalid inputs, which are quickly discarded by the targeted IoT device and do not penetrate its code. Another proposed approach is to leverage the companion app (i.e., the mobile app typically used to control an IoT device) to generate well-structured fuzzing inputs. Unfortunately, the existing solutions produce fuzzing inputs that are constrained by app-side validation code, thus significantly limiting the range of discovered vulnerabilities. In this paper, we propose a novel approach that overcomes these limitations. Our key observation is that there exist functions inside the companion app that can be used to generate optimal (i.e., valid yet under-constrained) fuzzing inputs. Such functions, which we call fuzzing triggers, are executed before any data-transforming functions (e.g., network serialization), but after the input validation code. Consequently, they generate inputs that are not constrained by app-side sanitization code, and, at the same time, are not discarded by the analyzed IoT device due to their invalid format. We design and develop Diane, a tool that combines static and dynamic analysis to find fuzzing triggers in Android companion apps, and then uses them to fuzz IoT devices automatically. We use Diane to analyze 11 popular IoT devices, and identify 11 bugs, 9 of which are zero days. Our results also show that without using fuzzing triggers, it is not possible to generate bug-triggering inputs for many devices.

### Snipuzz: Black-box Fuzzing of IoT Firmware via Message Snippet Inference (CCS 2021)

Abstract: The proliferation of Internet of Things (IoT) devices has made people’s lives more convenient, but it has also raised many security concerns. Due to the difficulty of obtaining and emulating IoT firmware, the black-box fuzzing of IoT devices has become a viable option. However, existing black-box fuzzers cannot form effective mutation optimization mechanisms to guide their testing processes, mainly due to the lack of feedback. It is difficult or even impossible to apply existing grammar-based fuzzing strategies. Therefore, an efficient fuzzing approach with syntax inference is required in the IoT fuzzing domain. To address these critical problems, we propose a novel automatic black-box fuzzing for IoT firmware, termed Snipuzz. Snipuzz runs as a client communicating with the devices and infers message snippets for mutation based on the responses. Each snippet refers to a block of consecutive bytes that reflect the approximate code coverage in fuzzing. This mutation strategy based on message snippets considerably narrows down the search space to change the probing messages. We compared Snipuzz with four state-of-the-art IoT fuzzing approaches, i.e., IoTFuzzer, BooFuzz, Doona, and Nemesys. Snipuzz not only inherits the advantages of app-based fuzzing (e.g., IoTFuzzer, but also utilizes communication responses to perform efficient mutation. Furthermore, Snipuzz is lightweight as its execution does not rely on any prerequisite operations, such as reverse engineering of apps. We also evaluated Snipuzz on 20 popular real-world IoT devices. Our results show that Snipuzz could identify 5 zero-day vulnerabilities, and 3 of them could be exposed only by Snipuzz. All the newly discovered vulnerabilities have been confirmed by their vendors.

### Learning-Based Fuzzing of IoT Message Brokers (ICST 2021)

Abstract: The number of devices in the Internet of Things (IoT) immensely grew in recent years. A frequent challenge in the assurance of the dependability of IoT systems is that components of the system appear as a black box. This paper presents a semi-automatic testing methodology for black-box systems that combines automata learning and fuzz testing. Our testing technique uses stateful fuzzing based on a model that is automatically inferred by automata learning. Applying this technique, we can simultaneously test multiple implementations for unexpected behavior and possible security vulnerabilities.We show the effectiveness of our learning-based fuzzing technique in a case study on the MQTT protocol. MQTT is a widely used publish/subscribe protocol in the IoT. Our case study reveals several inconsistencies between five different MQTT brokers. The found inconsistencies expose possible security vulnerabilities and violations of the MQTT specification.

### RiverFuzzRL - an open-source tool to experiment with reinforcement learning for fuzzing (ICST 2021)

Abstract: Combining fuzzing techniques and reinforcement learning could be an important direction in software testing. However, there is a gap in support for experimentation in this field, as there are no open-source tools to let academia and industry to perform experiments easily. The purpose of this paper is to fill this gap by introducing a new framework, named RiverFuzzRL, on top of our already mature framework for AI-guided fuzzing, River. We provide out-of-the-box implementations for users to choose from or customize for their test target. The work presented here is performed on testing binaries and does not require access to the source code, but it can be easily adapted to other types of software testing as well. We also discuss the challenges faced, opportunities, and factors that are important for performance, as seen in the evaluation.

### Vulnerability Detection in SIoT Applications: A Fuzzing Method on their Binaries (IEEE Transactions on Network Science and Engineering 2020)

Abstract: SIoT enables devices to communicate with each other automatically, which is not reliable when applications in SIoT are vulnerable. To improve the security of SIoT, different techniques have been employed so far, mainly to detect vulnerabilities in SIoT applications. Among the detection techniques, fuzzing is one of the most effective ones that can significantly improve the security of SIoT applications. However, the existing fuzzing methods have three problems. First of all, the schemes to instrument target binaries cause high memory overhead because they instrument at all edges to obtain the coverage information. Moreover, they introduce a severe problem called edge collision, i.e., two different edges are deemed the same during fuzzing. Thirdly, none of the existing fuzzers conduct fuzzing using path coverage because path coverage has high memory overhead. In this paper, we propose BECFuzz to resolve the above three problems. BECFuzz instruments at specific edges, and conducts fuzzing based on both edge coverage and path coverage, which greatly improves its effectiveness. We implement our BECFuzz based on two typical fuzzers which are widely recognised as baselines, AFL and AFLFast, and run experiments on 18 real-world programs. The results demonstrate that our method suppresses the state-of-art fuzzers in performance.

### Analysis of DTLS Implementations Using Protocol State Fuzzing (USENIX Security2020)

Abstract: Recent years have witnessed an increasing number of protocols relying on UDP. Compared to TCP, UDP offers performance advantages such as simplicity and lower latency. This has motivated its adoption in Voice over IP, tunneling technologies, IoT, and novel Web protocols. To protect sensitive data exchange in these scenarios, the DTLS protocol has been developed as a cryptographic variation of TLS. DTLS’s main challenge is to support the stateless and unreliable transport of UDP. This has forced protocol designers to make choices that affect the complexity of DTLS, and to incorporate features that need not be addressed in the numerous TLS analyses.

We present the first comprehensive analysis of DTLS implementations using protocol state fuzzing. To that end, we extend TLS-Attacker, an open source framework for analyzing TLS implementations, with support for DTLS tailored to the stateless and unreliable nature of the underlying UDP layer. We build a framework for applying protocol state fuzzing on DTLS servers, and use it to learn state machine models for thirteen DTLS implementations. Analysis of the learned state models reveals four serious security vulnerabilities, including a full client authentication bypass in the latest JSSE version, as well as several functional bugs and non-conformance issues. It also uncovers considerable differences between the models, confirming the complexity of DTLS state machines.

### Frankenstein: Advanced Wireless Fuzzing to Exploit New Bluetooth Escalation Targets (USENIX Security2020)

Abstract: Wireless communication standards and implementations have a troubled history regarding security. Since most implementations and firmwares are closed-source, fuzzing remains one of the main methods to uncover Remote Code Execution (RCE) vulnerabilities in deployed systems. Generic over-the-air fuzzing suffers from several shortcomings, such as constrained speed, limited repeatability, and restricted ability to debug. In this paper, we present Frankenstein, a fuzzing framework based on advanced firmware emulation, which addresses these shortcomings. Frankenstein brings firmware dumps “back to life”, and provides fuzzed input to the chip’s virtual modem. The speed-up of our new fuzzing method is sufficient to maintain interoperability with the attached operating system, hence triggering realistic full-stack behavior. We demonstrate the potential of Frankenstein by finding three zero-click vulnerabilities in the Broadcom and Cypress Bluetooth stack, which is used in most Apple devices, many Samsung smartphones, the Raspberry Pis, and many others. Given RCE on a Bluetooth chip, attackers may escalate their privileges beyond the chip’s boundary. We uncover a Wi-Fi/Bluetooth coexistence issue that crashes multiple operating system kernels and a design flaw in the Bluetooth 5.2 specification that allows link key extraction from the host. Turning off Bluetooth will not fully disable the chip, making it hard to defend against RCE attacks. Moreover, when testing our chip-based vulnerabilities on those devices, we find BlueFrag, a chip-independent Android RCE.

### A deep convolution generative adversarial networks based fuzzing framework for industry control protocols

Abstract: A growing awareness is brought that the safety and security of industrial control systems cannot be dealt with in isolation, and the safety and security of industrial control protocols (ICPs) should be considered jointly. Fuzz testing (fuzzing) for the ICP is a common way to discover whether the ICP itself is designed and implemented with flaws and network security vulnerability. Traditional fuzzing methods promote the safety and security testing of ICPs, and many of them have practical applications. However, most traditional fuzzing methods rely heavily on the specification of ICPs, which makes the test process a costly, time-consuming, troublesome and boring task. And the task is hard to repeat if the specification does not exist. In this study, we propose a smart and automated protocol fuzzing methodology based on improved deep convolution generative adversarial network and give a series of performance metrics. An automated and intelligent fuzzing framework BLSTM-DCNNFuzz for application is designed. Several typical ICPs, including Modbus and EtherCAT, are applied to test the effectiveness and efficiency of our framework. Experiment results show that our methodology outperforms the existing ones like General Purpose Fuzzer and other deep learning based fuzzing methods in convenience, effectiveness, and efficiency.

### ICS Protocol Fuzzing: Coverage Guided Packet Crack and Generation (DAC 2020)

Abstract: Industrial Control System (ICS) protocols play an essential role in building communications among system components. Recently, many severe vulnerabilities, such as Stuxnet and DragonFly, exposed in ICS protocols have affected a wide distribution of devices. Therefore, it is of vital importance to ensure their correctness. However, the vulnerability detection efficiency of traditional techniques such as fuzzing is challenged by the complexity and diversity of the protocols.

In this paper, we propose to equip the traditional protocol fuzzing with coverage-guided packet crack and generation. We collect the coverage information during testing procedure, save those valuable packets that trigger new path coverage and crack them into pieces, based on which, we can construct higher quality new packets for further testing. For evaluation, we build Peach* on top of Peach, which is one of the most widely used protocol fuzzers, and conduct experiments on several ICS protocols such as Modbus and DNP3. Results show that, compared with the original Peach, Peach* achieves the same code coverage and bug detection numbers at the speed of 1.2X-25X. It also gains final increase with 8.35%-36.84% more paths within 24 hours, and has exposed 9 previously unknown vulnerabilities.

### AFLNET: A Greybox Fuzzer for Network Protocols (ICST 2020)

Abstract: Server fuzzing is difficult. Unlike simple command-line tools, servers feature a massive state space that can be traversed effectively only with well-defined sequences of input messages. Valid sequences are specified in a protocol. In this paper, we present AFLNET, the first grey-box fuzzer for protocol implementations. Unlike existing protocol fuzzers, AFLNET takes a mutational approach and uses state-feedback to guide the fuzzing process. AFLNET is seeded with a corpus of recorded message exchanges between the server and an actual client. No protocol specification or message grammars are required. AFLNET acts as a client and replays variations of the original sequence of messages sent to the server and retains those variations that were effective at increasing the coverage of the code or state space. To identify the server states that are exercised by a message sequence, AFLNET uses the server’s response codes. From this feedback, AFLNET identifies progressive regions in the state space, and systematically steers towards such regions. The case studies with AFLNET on two popular protocol implementations demonstrate a substantial performance boost over the state-of-the-art. AFLNET discovered two new CVEs that are classified as critical (CVSS score CRITICAL 9.8).

### Finding Security Vulnerabilities in Network Protocol Implementations (Arxiv 2020)

Abstract: Implementations of network protocols are often prone to vulnerabilities caused by developers’ mistakes when accessing memory regions and dealing with arithmetic operations. Finding practical approaches for checking the security of network protocol implementations has proven to be a challenging problem. The main reason is that the protocol software state-space is too large to be explored. Here we propose a novel verification approach that combines fuzzing with symbolic execution to verify intricate properties in network protocol implementations. We use fuzzing for an initial exploration of the network protocol, while symbolic execution explores both the program paths and protocol states, which were uncovered by fuzzing. From this combination, we automatically generate high-coverage test input packets for a network protocol implementation.We surveyed various approaches based on fuzzing and symbolic execution to understand how these techniques can be effectively combined and then choose a suitable tool to develop further our model on top of it. In our preliminary evaluation, we used ESBMC, Map2Check, and KLEE as software verifiers and SPIKE as fuzzer to check their suitability to verify our network protocol implementations. Our experimental results show that ESBMC can be further developed within our verification framework called FuSeBMC, to efficiently and effectively detect intricate security vulnerabilities in network protocol implementations.

### Smart seed selection-based effective black box fuzzing for IIoT protocol (2020)

Abstract: Connections of cyber-physical system (CPS) components are gradually increasing owing to the introduction of the Industrial Internet of Things (IIoT). IIoT vulnerability analysis has become a major issue because complex skillful cyber-attacks on CPS systems exploit their zero-day vulnerabilities. However, current white box techniques for vulnerability analysis are difficult to use in real heterogeneous environments, where devices supplied by various manufacturers and diverse firmware versions are used. Therefore, we herein propose a novel protocol fuzzing test technique that can be applied in a heterogeneous environment. As seed configuration can significantly influence the test result in a black box test, we update the seed pool using test cases that travel different program paths compared to the seed. The input, output, and Delta times are used to determine if a new program area has been searched in the black box environment. We experimentally verified the effectiveness of the proposed.

### Fw‐fuzz: A code coverage‐guided fuzzing framework for network protocols on firmware (2020)

Abstract: Fuzzing is an effective approach to detect software vulnerabilities utilizing changeable generated inputs. However, fuzzing the network protocol on the firmware of IoT devices is limited by inefficiency of test case generation, cross‐architecture instrumentation, and fault detection. In this article, we propose the Fw‐fuzz, a coverage‐guided and crossplatform framework for fuzzing network services running in the context of firmware on embedded architectures, which can generate more valuable test cases by introspecting program runtime information and using a genetic algorithm model. Specifically, we propose novel dynamic instrumentation in Fw‐fuzz to collect the running state of the firmware program. Then Fw‐fuzz adopts a genetic algorithm model to guide the generation of inputs with high code coverage. We fully implement the prototype system of Fw‐fuzz and conduct evaluations on network service programs of various architectures in MIPS, ARM, and PPC. By comparing with the protocol fuzzers Boofuzz and Peach in metrics of edge coverage, our prototype system achieves an average growth of 33.7% and 38.4%, respectively. We further verify six known vulnerabilities and discover 5 0‐day vulnerabilities with the Fw‐fuzz, which prove the validity and utility of our framework. The overhead of our system expressed as an additional 5% of memory growth.

### BaseSAFE: Baseband SAnitized Fuzzing through Emulation (WiSec 2020)

Abstract: Rogue base stations are an effective attack vector. Cellular basebands represent a critical part of the smartphone’s security: they parse large amounts of data even before authentication. They can, therefore, grant an attacker a very stealthy way to gather information about calls placed and even to escalate to the main operating system, over-the-air. In this paper, we discuss a novel cellular fuzzing framework that aims to help security researchers find critical bugs in cellular basebands and similar embedded systems. BaseSAFE allows partial rehosting of cellular basebands for fast instrumented fuzzing off-device, even for closed-source firmware blobs. BaseSAFE’s sanitizing drop-in allocator, enables spotting heap-based buffer-overflows quickly. Using our proof-of-concept harness, we fuzzed various parsers of the Nucleus RTOS-based MediaTek cellular baseband that are accessible from rogue base stations. The emulator instrumentation is highly optimized, reaching hundreds of executions per second on each core for our complex test case, around 15k test-cases per second in total. Furthermore, we discuss attack vectors for baseband modems. To the best of our knowledge, this is the first use of emulation-based fuzzing for security testing of commercial cellular basebands. Most of the tooling and approaches of BaseSAFE are also applicable for other low-level kernels and firmware. Using BaseSAFE, we were able to find memory corruptions including heap out-of-bounds writes using our proof-of-concept fuzzing harness in the MediaTek cellular baseband. BaseSAFE, the harness, and a large collection of LTE signaling message test cases will be released open-source upon publication of this paper.

### Poster: Fuzzing IoT Firmware via Multi-stage Message Generation (CCS 2019)

Abstract: In this work, we present IoTHunter, the first grey-box fuzzer for fuzzing stateful protocols in IoT firmware. IoTHunter addresses the state scheduling problem based on a multi-stage message generation mechanism on runtime monitoring of IoT firmware. We evaluate IoTHunter with a set of real-world programs, and the result shows that IoTHunter outperforms black-box fuzzer boofuzz, which has a 2.2x, 2.0x, and 2.5x increase for function coverage, block coverage, and edge coverage, respectively. IoTHunter also found five new vulnerabilities in the firmware of home router Mikrotik, which have been reported to the vendor.

### SeqFuzzer: An Industrial Protocol Fuzzing Framework in Deep Learning Perspective (ICST 2019)

Abstract: Industrial networks are the cornerstone of modern industrial control systems. Performing security checks of industrial communication processes help detect unknown risks and vulnerabilities. Fuzz testing is a widely used method for performing security checks that takes advantage of automation. However, there is a big challenge to carry out security checks on the industrial networks due to the increasing variety and complexity of industrial communication protocols. In this case, existing approaches usually take a long time to model the protocol for generating test cases, which is labor-intensive and timeconsuming. This becomes even worse when the target protocol is stateful. To help in addressing this problem, we employed a deep learning model to learn the structures of protocol frames and deal with the temporal features of stateful protocols. We propose a fuzzing framework named SeqFuzzer which automatically learns the protocol frame structures from communication traffic and generates fake but plausible messages as test cases. For proving the usability of our approach, we applied SeqFuzzer to widelyused Ethernet for Control Automation Technology (EtherCAT) devices and successfully detected several security vulnerabilities

### SPFuzz: A Hierarchical Scheduling Framework for Stateful Network Protocol Fuzzing (IEEE Access 2019)

Abstract: In recent years, the fuzzing technology is widely used to detect the software vulnerabilities owing to the coverage improvement in the target program and the easiness of use. However, it is less efficient to fuzz the stateful protocols due to the difficulties like maintaining states and dependencies of messages. To address these challenges, we present SPFuzz, a framework for building flexible, coverage guided stateful protocol fuzzing. We define a language in SPFuzz to describe the protocol specifications, protocol states transitions and dependencies for generating valuable test cases, maintaining correct messages in session states and handling protocol dependencies by updating message data in time. The SPFuzz adopts a three-level mutation strategy, namely head, content, and sequence mutation strategy to drive the fuzzing process to cover more paths, in conjunction with the method to randomly assign weights to messages and strategies. We use the following metrics to evaluate the performance of SPFuzz and other frameworks upon three protocol implementations, i.e., Proftpd, Oftpd, and OpenSSL, which are three-granularity coverages specifically function, basic block, and edge. In experiments, the SPFuzz framework outperforms the existing stateful protocol fuzzing tool Boofuzz by an average of 69.12% in three granularities coverage tests. This demonstrates that the SPFuzz has the ability to explore more and deeper paths of the target program. We further triggered CVE-2015-0291 in OpenSSL 1.0.2 with the SPFuzz, which proves the validity and utility of our framework.

### HFuzz: Towards automatic fuzzing testing of NB-IoT core network protocols implementations (FGCS 2019)

Abstract: Narrowband Internet of Things (NB-loT) is widely deployed in the cellular network of operators, yet implementations of its core network protocols are suffering from bugs. Due to the complexity of the frame structure of NB-IoT core network protocols, testing the protocols in this field is notoriously difficult. In this paper, we propose a novel fuzzing framework, named HFuzz, to generate a great many high-quality test inputs automatically. HFuzz is an automatic hierarchy-aware fuzzing framework and can allocate computing resources efficiently. We put forward the concept of Message Structure Tree to transform the seed file and generate mutated data of the tested protocols and optimize the resource allocation for each hierarchy of the transformed structure by a novel scheduling algorithm. Therefore HFuzz can get a balance between breadth and depth in finding new paths. Compared to traditional fuzzing tools, HFuzz can easily pass the early verification and induce a better coverage of the target implementations by taking full advantage of format information of NB-IoT core network protocols. Our framework applies to various protocols, and we evaluate the performance of HFuzz on GPRS Tunnelling Protocol version 2(GTPv2) in this paper and conduct experiments with two protocol implementations, Open Air Interface (OAI) and B*(a development system). The experimental results show HFuzz yields higher coverage than American Fuzzy Lop (AFL) and Peach, and we further find a real implementation bug in OAI.

### FIRM-AFL: High-Throughput Greybox Fuzzing of IoT Firmware via Augmented Process Emulation (USENIX Security2019)

Abstract: Cyber attacks against IoT devices are a severe threat. These attacks exploit software vulnerabilities in IoT firmware. Fuzzing is an effective software testing technique for vulnerability discovery. In this work, we present FIRM-AFL, the first high-throughput grey box fuzzer for IoT firmware. FIRMAFL addresses two fundamental problems in IoT fuzzing. First, it addresses compatibility issues by enabling fuzzing for POSIX-compatible firmware that can be emulated in a system emulator. Second, it addresses the performance bottleneck caused by system-mode emulation with a novel technique called augmented process emulation. By combining system mode emulation and user-mode emulation in a novel way, augmented process emulation provides high compatibility as system-mode emulation and high throughput as user-mode emulation. Our evaluation results show that (1) FIRM-AFL is fully functional and capable of finding real-world vulnerabilities in IoT programs; (2) the throughput of FIRM-AFL is on average 8.2 times higher than system-mode emulation based fuzzing; and (3) FIRM-AFL is able to find 1-day vulnerabilities much faster than system-mode emulation based fuzzing, and is able to find 0-day vulnerabilities.

### Exploring Effective Fuzzing Strategies to Analyze Communication Protocols (FEAST 2019)

Abstract: In recent years, coverage-based greybox fuzzing has become popular forvulnerability detection due to its simplicity and efficiency. However, it is less powerful when applied directly to protocol fuzzing due to the unique challenges involved in fuzzing communication protocols. In particular, the communication among multiple ends contains more than one packet, which are not necessarily dependent upon each other, i.e., fuzzing single (usually the first) packet can only achieve extremely limited code coverage. In this paper, we study such challenges and demonstrate the limitation of current non-stateful greybox fuzzer. In order to achieve higher code coverage, we design stateful protocol fuzzing strategies for communication protocols to explore the code related to different protocol states. Our approach contains a state switching engine, together with a multi-state forkserver to consistently and flexibly fuzz different states of an compiler-instrumented protocol program. Our experimental results on OpenSSL show that our approach achieves an improvement of 73% more code coverage and 2× unique crashes when comparing against fuzzing the first packet during a protocol handshake.

### Leveraging Textual Specifications for Grammar-Based Fuzzing of Network Protocols (AAAI 2019)

Abstract: Grammar-based fuzzing is a technique used to find software vulnerabilities by injecting well-formed inputs generated following rules that encode application semantics. Most grammar-based fuzzers for network protocols rely on human experts to manually specify these rules. In this work we study automated learning of protocol rules from textual specifications (i.e. RFCs). We evaluate the automatically extracted protocol rules by applying them to a state-of-the-art fuzzer for transport protocols and show that it leads to a smaller number of test cases while finding the same attacks as the system that uses manually specified rules.

### MTF-Storm: a high performance fuzzer for Modbus-TCP (ETFA 2018)

Abstract: MTF-Storm is a highly effective fuzzer for industrial systems employing Modbus/TCP connectivity. It achieves high fault coverage, while offering high performance and quick testing of the System-Under-Test (SUT). Analogously to its predecessor MTF, MTF-Storm operates in 3 phases: a) reconnaissance b) fuzz testing and failure detection. Reconnaissance identifies the memory organization of the SUT and the supported functionality, enabling selection and synthesis of fuzz testing sequences that are effective for the specific SUT. MTF-Storm develops its test sequences systematically, starting with single field tests and proceeding with combined field tests, adopting techniques for automated combinatorial software testing and reducing the test space through partitioning field value ranges. MTF-Storm has been used to evaluate 9 different Modbus/TCP implementations and has identified issues with all of them, ranging from out-of-spec responses to successful denial-of-service attacks and crashes.

### Advancing Protocol Fuzzing for Industrial Automation and Control Systems (ICISSP 2018)

Abstract: Testing for security vulnerabilities is playing an important role in the changing domain of industrial automation and control systems. These systems are increasingly connected to each other via networking technology and are faced with new cyber threats. To improve the security properties of such systems, their robustness must be ensured. Security testing frameworks aim at enabling the assurance of robustness even at the time of development and can play a key role in bringing security into the industrial domain.

Fuzzing describes a technique to discover vulnerabilities in technical systems and is best known from its usage in IT security testing. It uses randomly altered data to provoke unexpected behaviour and can be used in combination with regular unit testing. Combined with the power of fuzzing, the effectiveness of security testing frameworks can be increased.

In this work, different fuzzing tools were evaluated for their properties and then compared with the requirements for an application in the industrial domain. As no fuzzer was fully satisfying these requirements, a new fuzzer, combining the strength of different others, was designed and implemented, and then evaluated. The evaluation includes a real-world application where multiple vulnerabilities in industrial automation components could be identified.

### IoTFuzzer: Discovering Memory Corruptions in IoT Through App-based Fuzzing (NDSS 2018)

Abstract: With more IoT devices entering the consumer market, it becomes imperative to detect their security vulnerabilities before an attacker does. Existing binary analysis based approaches only work on firmware, which is less accessible except for those equipped with special tools for extracting the code from the device. To address this challenge in IoT security analysis, we present in this paper a novel automatic fuzzing framework, called IOTFUZZER, which aims at finding memory corruption vulnerabilities in IoT devices without access to their firmware images. The key idea is based upon the observation that most IoT devices are controlled through their official mobile apps, and such an app often contains rich information about the protocol it uses to communicate with its device. Therefore, by identifying and reusing program-specific logic (e.g., encryption) to mutate the test case (particularly message fields), we are able to effectively probe IoT targets without relying on any knowledge about its protocol specifications. In our research, we implemented IOTFUZZER and evaluated 17 real-world IoT devices running on different protocols, and our approach successfully identified 15 memory corruption vulnerabilities (including 8 previously unknown ones).

### Bbuzz: A Bit-aware Fuzzing Framework for Network Protocol Systematic Reverse Engineering and Analysis (MCC 2017)

Abstract: Fuzzing is a critical part of secure software development life-cycle, for finding vulnerabilities, developing exploits, and reverse engineering. This relies on appropriate approaches, tools and frameworks. File and protocol fuzzing is well covered, multiple approaches and implementations exist. Unfortunately, assessed tools do not posses the required capabilities for working with protocols, where constructing bit groups are not byte aligned. In this paper, a systematic approach is proposed and tool prototype developed for the cyber red teaming purposes. In a case study, the developed Bbuzz tool is used to reverse engineer a proprietary NATO Link-1 network protocol allowing to inject rogue airplane tracks into air operations command and control system.

### Test Data Generation for Stateful Network Protocol Fuzzing Using a Rule-Based State Machine (2016)

Abstract: To improve the efficiency and coverage of stateful network protocol fuzzing, this paper proposes a new method, using a rule-based state machine and a stateful rule tree to guide the generation of fuzz testing data. The method first builds a rule-based state machine model as a formal description of the states of a network protocol. This removes safety paths, to cut down the scale of the state space. Then it uses a stateful rule tree to describe the relationship between states and messages, and then remove useless items from it. According to the message sequence obtained by the analysis of paths using the stateful rule tree and the protocol specification, an abstract data model of test case generation is defined. The fuzz testing data is produced by various generation algorithms through filling data in the fields of the data model. Using the rule-based state machine and the stateful rule tree, the quantity of test data can be reduced. Experimental results indicate that our method can discover the same vulnerabilities as traditional approaches, using less test data, while optimizing test data generation and improving test efficiency.

### Protocol State Fuzzing of TLS Implementations (USENIX Security2015)

Abstract: We describe a largely automated and systematic analysis of TLS implementations by what we call ‘protocol state fuzzing’: we use state machine learning to infer state machines from protocol implementations, using only blackbox testing, and then inspect the inferred state machines to look for spurious behaviour which might be an indication of flaws in the program logic. For detecting the presence of spurious behaviour the approach is almost fully automatic: we automatically obtain state machines and any spurious behaviour is then trivial to see. Detecting whether the spurious behaviour introduces exploitable security weaknesses does require manual investigation. Still, we take the point of view that any spurious functionality in a security protocol implementation is dangerous and should be removed.

We analysed both server- and client-side implementations with a test harness that supports several key exchange algorithms and the option of client certificate authentication. We show that this approach can catch an interesting class of implementation flaws that is apparently common in security protocol implementations: in three of the TLS implementations analysed new security flaws were found (in GnuTLS, the Java Secure Socket Extension, and OpenSSL). This shows that protocol state fuzzing is a useful technique to systematically analyse security protocol implementations. As our analysis of different TLS implementations resulted in different and unique state machines for each one, the technique can also be used for fingerprinting TLS implementations.

### A Modbus-TCP Fuzzer for testing internetworked industrial systems (ETFA 2015)

Abstract: Modbus/TCP is a network protocol for industrial communications encapsulated in TCP/IP network packets. There is an increasing need to test existing Modbus protocol implementations for security vulnerabilities, as devices become accessible even from the Internet. Fuzz testing can be used to discover implementation bugs in a fast and economical way. We present the design and implementation of MTF, a Modbus/TCP Fuzzer. The MTF incorporates a reconnaissance phase in the testing procedure so as to assist mapping the capabilities of the tested device and to adjust the attack vectors towards a more guided and informed testing rather than plain random testing. The MTF was used to test eight implementations of the Modbus protocol and revealed bugs and vulnerabilities that crash the execution, effectively resulting in denial of service attacks using only a few network packets.

### PULSAR: Stateful Black-Box Fuzzing of Proprietary Network Protocols (Springer, Cham, 2015)

Abstract: The security of network services and their protocols critically depends on minimizing their attack surface. A single flaw in an implementation can suffice to compromise a service and expose sensitive data to an attacker. The discovery of vulnerabilities in protocol implementations, however, is a challenging task: While for standard protocols this process can be conducted with regular techniques for auditing, the situation becomes difficult for proprietary protocols if neither the program code nor the specification of the protocol are easily accessible. As a result, vulnerabilities in closed-source implementations can often remain undiscovered for a longer period of time. In this paper, we present PULSAR, a method for stateful black-box fuzzing of proprietary network protocols. Our method combines concepts from fuzz testing with techniques for automatic protocol reverse engineering and simulation. It proceeds by observing the traffic of a proprietary protocol and inferring a generative model for message formats and protocol states that can not only analyze but also simulate communication. During fuzzing this simulation can effectively explore the protocol state space and thereby enables uncovering vulnerabilities deep inside the protocol implementation. We demonstrate the efficacy of PULSAR in two case studies, where it identifies known as well as unknown vulnerabilities.

### SECFUZZ: Fuzz-testing Security Protocols (AST 2012)

Abstract: We propose a light-weight, yet effective, technique for fuzz-testing security protocols. Our technique is modular, it exercises (stateful) protocol implementations in depth, and handles encrypted traffic. We use a concrete implementation of the protocol to generate valid inputs, and mutate the inputs using a set of fuzz operators. A dynamic memory analysis tool monitors the execution as an oracle to detect he vulnerabilities exposed by fuzz-testing. We provide the fuzzer with the necessary keys and cryptographic algorithms in order to properly mutate encrypted messages. We present a case study on two widely used, mature implementations of the Internet Key Exchange (IKE) protocol and report on two new vulnerabilities discovered by our fuzz-testing tool. We also compare the effectiveness of our technique to two existing model-based fuzz-testing tools for IKE.

### Extension of SPIKE for Encrypted Protocol Fuzzing (2011)

Abstract: A fuzzer is a program that attempts to find security vulnerabilities in an application by sending random or semi-random input. Fuzzers have been widely used to find vulnerabilities in protocol implementations. The implementations may conform to the design of the protocol, but most of the times some glitches might remain. As a result vulnerabilities might remain unnoticed. Consequently, different implementations of the same protocol may be vulnerable to different kind of attacks. Fuzzers help us discover such implementation flaws. Among the currently available and popular ones, SPIKE is one recognized open-source fuzzing framework. However, SPIKE has a limitation of fuzzing only non-encrypted protocols. This paper presents the extension of SPIKE, called ESPIKE, for fuzzing of encrypted protocols. ESPIKE will facilitate testing implementations of SSL encrypted protocols. As a proof of concept for efficiency of ESPIKE we demonstrate its usage on sftp and https protocol.

### AutoFuzz: Automated Network Protocol Fuzzing Framework (IJCSNS 2010)

Abstract: Assessing software security involves steps such as code review, risk analysis, penetration testing and fuzzing. During the fuzzing phase, the tester‟s goal is to find flaws in software by sending unexpected input to the target application and monitoring its behavior. In this paper we introduce the AutoFuzz [1] - extendable, open source framework used for testing network protocol implementations. AutoFuzz is a „smart‟, man-in-the-middle, semi deterministic network protocol fuzzing framework. AutoFuzz learns a protocol implementation by constructing a Finite State Automaton (FSA) which captures the observed communications between a client and a server [5]. In addition, AutoFuzz learns individual message syntax, including fields and probable types, by applying the bioinformatics techniques of [2]. Finally, AutoFuzz can fuzz client or server protocol implementations by intelligently modifying the communication sessions between them using the FSA as a guide. AutoFuzz was applied to a variety of File Transfer Protocol (FTP) server implementations, confirming old and discovering new vulnerabilities.

# SMT Fuzzing

### Alt-Ergo-Fuzz: A fuzzer for the Alt-Ergo SMT solver (JFLA 2022)

Alt-Ergo is an open source Satisfiability Modulo Theories (SMT) solver programmed in OCaml. It was designed for program verification and it’s used as a back end by other software verification tools such as Frama-C, SPARK, Why3, Atelier-B and Caveat, the reliability of which depends on the soundness of Alt-Ergo’s answers and the absence of bugs in it. Fuzzing is an efficient technique to test programs and find bugs. It works by quickly and automatically generating input data with which to test the software. American Fuzzy Lop (AFL) is one of the most well-known and most used fuzzers in both academia and the industry. It has managed to find many bugs in various programs thanks to its grey box fuzzing technique that uses genetic algorithms and program instrumentation to generate test data that maximizes code and execution path coverage in the targeted software. In this paper we present Alt-Ergo-Fuzz, a fuzzer for Alt-Ergo that we developed with the aim of finding faults and unsoundness bugs to solve and improve its reliability. By using AFL as a back end, the Crowbar OCaml library for test case generation and the CVC5 SMT solver as a reference solver of which the answers will be used to determine whether or not Alt-Ergo’s answers are correct, we managed to develop Alt-Ergo-Fuzz, which even as a work in progress and in only twenty days of testing managed to find four never found before bugs in Alt-Ergo.

### BanditFuzz: Fuzzing SMT Solvers with Multi-agent Reinforcement Learning (FM 2021)

Abstract: We present BanditFuzz, a multi-agent reinforcement learning (RL) guided performance fuzzer for state-of-the-art Satisfiability Modulo Theories (SMT) solvers. BanditFuzz constructs inputs that expose performance issues in a set of target solvers relative to a set of reference solvers, and is the first performance fuzzer that supports the entirety of the theories in the SMT-LIB initiative. Another useful feature of BanditFuzz is that users can specify the size of inputs they want, thus enabling developers to construct very small inputs that zero-in on a performance problem in their SMT solver relative to other competitive solvers. We evaluate BanditFuzz across 52 logics from SMT-COMP ’20 targeting competition-winning solvers against runner-ups. We baseline BanditFuzz against random fuzzing and a single agent algorithm and observe a significant improvement, with up to a 82.6% improvement in the margin of PAR-2 scores across baselines on their respective benchmarks. Furthermore, we reached out to developers and contributors of the CVC4, Z3, and Bitwuzla solvers and provide case studies of how BanditFuzz was able to expose surprising performance deficiencies in each of these tools.

### Skeletal Approximation Enumeration for SMT Solver Testing (FSE 2021)

Abstract: Ensuring the equality of SMT solvers is critical due to its broad spectrum of applications in academia and industry, such as symbolic execution and program verification. Existing approaches to testing SMT solvers are either too costly or find difficulties generalizing to different solvers and theories, due to the test oracle problem. To complement existing approaches and overcome their weaknesses, this paper introduces skeletal approximation enumeration (SAE), a novel lightweight and general testing technique for all first-order theories. To demonstrate its practical utility, we have applied the SAE technique to test Z3 and CVC4, two comprehensively tested, state-of-the-art SMT solvers. By the time of writing, our approach had found 71 confirmed bugs in Z3 and CVC4,55 of which had already been fixed.

### Fuzzing SMT Solvers via Two-Dimensional Input Space Exploration (ISSTA 2021)

Abstract: Satisfiability Modulo Theories (SMT) solvers serve as the core engine of many techniques, such as symbolic execution. Therefore, ensuring the robustness and correctness of SMT solvers is critical. While fuzzing is an efficient and effective method for validating the quality of SMT solvers, we observe that prior fuzzing work only focused on generating various first-order formulas as the inputs but neglected the algorithmic configuration space of an SMT solver, which leads to under-reporting many deeply-hidden bugs. In this paper, we present Falcon, a fuzzing technique that explores both the formula space and the configuration space. Combining the two spaces significantly enlarges the search space and makes it challenging to detect bugs efficiently. We solve this problem by utilizing the correlations between the two spaces to reduce the search space, and introducing an adaptive mutation strategy to boost the search efficiency. During six months of extensive testing, Falcon finds 518 confirmed bugs in CVC4 and Z3, two state-of-the-art SMT solvers, 469 of which have already been fixed. Compared to two state-of-the-art fuzzers, Falcon detects 38 and 44 more bugs and improves the coverage by a large margin in 24 hours of testing.

### Detecting Critical Bugs in SMT Solvers Using Blackbox Mutational Fuzzing (FSE 2020)

Abstract: Formal methods use SMT solvers extensively for deciding formula satisfiability, for instance, in software verification, systematic test generation, and program synthesis. However, due to their complex implementations, solvers may contain critical bugs that lead to unsound results. Given the wide applicability of solvers in software reliability, relying on such unsound results may have detrimental consequences. In this paper, we present STORM, a novel blackbox mutational fuzzing technique for detecting critical bugs in SMT solvers. We run our fuzzer on seven mature solvers and find 29 previously unknown critical bugs. STORM is already being used in testing new features of popular solvers before deployment.

### On the Unusual Effectiveness of Type-aware Mutations for Testing SMT Solvers

Abstract: We propose type-aware operator mutation, a simple, but unusually effective approach for testing SMT solvers. The key idea is to mutate operators of conforming types within the seed formulas to generate well-typed mutant formulas. These mutant formulas are then used as the test cases for SMT solvers. We realized typeaware operator mutation within the OpFuzz tool and used it to stress-test Z3 and CVC4, two state-of-the-art SMT solvers. Type-aware operator mutations are unusually effective: During nine months of extensive testing with OpFuzz, we reported 909 bugs in Z3 and CVC4,1 out of which 632 bugs were confirmed and 531 of the confirmed bugs were fixed by the developers. The detected bugs are highly diverse — we found bugs of many different types (soundness bugs, invalid model bugs, crashes, etc.), logics and solver configurations. We have further conducted an in-depth study on the bugs found by OpFuzz. The study results show that the bugs found by OpFuzz are of high quality. Many of them affect core components of the SMT solvers’ codebases, and some required major changes for the developers to fix. Among the 909 bugs found by OpFuzz, 130 were soundness bugs, the most critical bugs in SMT solvers, and 501 were in the default modes of the solvers. Notably, OpFuzz found 16 critical soundness bugs in CVC4, which has proved to be a very stable SMT solver

### BanditFuzz: Fuzzing SMT Solvers with Reinforcement Learning (2020)

Satisfiability Modulo Theories (SMT) solvers are fundamental tools in the broad context of software engineering and security research. If SMT solvers are to continue to have an impact, it is imperative we develop efficient and systematic testing methods for them. To this end, we present a reinforcement learning driven fuzzing system BanditFuzz that zeroes in on the grammatical constructs of well-formed solver inputs that are the root cause of performance or correctness issues in solvers-under-test. To the best of our knowledge, BanditFuzz is the first machine-learning based fuzzer for SMT solvers. BanditFuzz takes as input a grammar G describing the well-formed inputs to a set of distinct solvers (say, P_1 and P_2) that implement the same specification and a fuzzing objective (e.g., maximize the relative performance difference between P_1 and P_2), and outputs a ranked list of grammatical constructs that are likely to maximize performance differences between P_1 and P_2 or are root causes of errors in these solvers. Typically, mutation fuzzing is implemented as a set of random mutations applied to a given input. By contrast, the key innovation behind BanditFuzz is the modeling of a grammar-preserving fuzzing mutator as a reinforcement learning (RL) agent that, via blackbox interactions with programs-under-test, learns which grammatical constructs are most likely the cause of an error or performance issue. Using BanditFuzz, we discovered 1700 syntactically unique inputs resulting in inconsistent answers across state-of-the-art SMT solvers Z3, CVC4, Colibri, MathSAT, and Z3str3 over the floating-point and string SMT theories. Further, using BanditFuzz, we constructed two benchmark suites (with 400 floating-point and 110 string instances) that expose performance issues in all considered solvers. We also performed a comparison of BanditFuzz against random, mutation, and evolutionary fuzzing methods. We observed up to a 31% improvement in performance fuzzing and up to 81% improvement in the number of bugs found by BanditFuzz relative to these other methods for the same amount of time provided to all methods.

### Validating SMT Solvers via Semantic Fusion (PLDI 2020)

Abstract: We introduce Semantic Fusion, a general, effective methodology for validating Satisfiability Modulo Theory (SMT) solvers. Our key idea is to fuse two existing equisatisfiable (i.e., both satisfiable or unsatisfiable) formulas into a new formula that combines the structures of its ancestors in a novel manner and preserves the satisfiability by construction. This fused formula is then used for validating SMT solvers. We realized Semantic Fusion as YinYang, a practical SMT solver testing tool. During four months of extensive testing, YinYang has found 45 confirmed, unique bugs in the default arithmetic and string solvers of Z3 and CVC4, the two state-of-the-art SMT solvers. Among these, 41 have already been fixed by the developers. The majority (29/45) of these bugs expose critical soundness issues. Our bug reports and testing effort have been well-appreciated by SMT solver developers.

### Automatically Testing String Solvers (ICSE 2020)

Abstract: SMT solvers are at the basis of many applications, such as program verification, program synthesis, and test case generation. For all these applications to provide reliable results, SMT solvers must answer queries correctly. However, since they are complex, highly-optimized software systems, ensuring their correctness is challenging. In particular, state-of-the-art testing techniques do not reliably detect when an SMT solver is unsound.

In this paper, we present an automatic approach for generating test cases that reveal soundness errors in the implementations of string solvers, as well as potential completeness and performance issues. We synthesize input formulas that are satisfiable or unsatisfiable by construction and use this ground truth as test oracle. We automatically apply satisfiability-preserving transformations to generate increasingly-complex formulas, which allows us to detect many errors with simple inputs and, thus, facilitates debugging.

The experimental evaluation shows that our technique effectively reveals bugs in the implementation of widely-used SMT solvers and applies also to other types of solvers, such as automata-based solvers. We focus on strings here, but our approach carries over to other theories and their combinations.

### StringFuzz: A fuzzer for string solvers (CAV 2018)

Abstract: In this paper, we introduce StringFuzz: a modular SMT-LIB problem instance transformer and generator for string solvers. We supply a repository of instances generated by StringFuzz in SMT-LIB 2.0/2.5 format. We systematically compare Z3str3, CVC4, Z3str2, and Norn on groups of such instances, and identify those that are particularly challenging for some solvers. We briefly explain our observations and show how StringFuzz helped discover causes of performance degradations in Z3str3.

# Anti Fuzzing

### Antifuzz: impeding fuzzing audits of binary executables (USENIX Security2019)

Abstract: A general defense strategy in computer security is to increase the cost of successful attacks in both computational resources as well as human time. In the area of binary security, this is commonly done by using obfuscation methods to hinder reverse engineering and the search for software vulnerabilities. However, recent trends in automated bug finding changed the modus operandi. Nowadays it is very common for bugs to be found by various fuzzing tools. Due to ever-increasing amounts of automation and research on better fuzzing strategies, large-scale, dragnet-style fuzzing of many hundreds of targets becomes viable. As we show, current obfuscation techniques are aimed at increasing the cost of human understanding and do little to slow down fuzzing. In this paper, we introduce several techniques to protect a binary executable against an analysis with automated bug finding approaches that are based on fuzzing, symbolic/concolic execution, and taint-assisted fuzzing (commonly known as hybrid fuzzing). More specifically, we perform a systematic analysis of the fundamental assumptions of bug finding tools and develop general countermeasures for each assumption. Note that these techniques are not designed to target specific implementations of fuzzing tools, but address general assumptions that bug finding tools necessarily depend on. Our evaluation demonstrates that these techniques effectively impede fuzzing audits, while introducing a negligible performance overhead. Just as obfuscation techniques increase the amount of human labor needed to find a vulnerability, our techniques render automated fuzzing-based approaches futile.

### FUZZIFICATION: Anti-Fuzzing Technique (USENIX Security2019)

Abstract: Fuzzing is a software testing technique that quickly and automatically explores the input space of a program without knowing its internals. Therefore, developers commonly use fuzzing as part of test integration throughout the software development process. Unfortunately, it also means that such a blackbox and the automatic natures of fuzzing are appealing to adversaries who are looking for zero-day vulnerabilities. To solve this problem, we propose a new mitigation approach, called Fuzzification , that helps developers protect the released, binary-only software from attackers who are capable of applying state-of-the-art fuzzing techniques. Given a performance budget, this approach aims to hinder the fuzzing process from adversaries as much as possible. We propose three Fuzzification techniques: 1) SpeedBump, which amplifies the slowdown in normal executions by hundreds of times to the fuzzed execution, 2) BranchTrap, interfering with feedback logic by hiding paths and polluting coverage maps, and 3) AntiHybrid, hindering taint-analysis and symbolic execution. Each technique is designed with best-effort, defensive measures that attempt to hinder adversaries from bypassing Fuzzification. Our evaluation on popular fuzzers and real-world applications shows that Fuzzification effectively reduces the number of discovered paths by 70.3% and decreases the number of identified crashes by 93.0% from real-world binaries, and decreases the number of detected bugs by 67.5% from LAVA-M dataset while under user-specified overheads for common workloads. We discuss the robustness of Fuzzification techniques against adversarial analysis techniques. We open-source our Fuzzification system to foster future research.

# Kernel Fuzzing

### Demystifying the Dependency Challenge in Kernel Fuzzing (ICSE 2022)

Abstract: Fuzz testing operating system kernels remains a daunting task to date. One known challenge is that much of the kernel code is locked under specific kernel states and current kernel fuzzers are not effective in exploring such an enormous state space. We refer to this problem as the dependency challenge. Though there are some efforts trying to address the dependency challenge, the prevalence and categorization of dependencies have never been studied. Most prior work simply attempted to recover dependencies opportunistically whenever they are relatively easy to recognize. In this paper, we undertake a substantial measurement study to systematically understand the real challenge behind dependencies. To our surprise, we show that even for well-fuzzed kernel modules, unresolved dependencies still account for 59% - 88% of the uncovered branches. Furthermore, we show that the dependency challenge is only a symptom rather than the root cause of failing to achieve more coverage. By distilling and summarizing our findings, we believe the research provides valuable guidance to future research in kernel fuzzing. Finally, we propose a number of novel research directions directly based on the insights gained from the measurement study.

### Semantic-Informed Driver Fuzzing Without Both the Hardware Devices and the Emulators (ICSE 2022)

Abstract: Device drivers are security-critical. In monolithic kernels like Linux, there are hundreds of thousands of drivers which run in the same privilege as the core kernel. Consequently, a bug in a driver can compromise the whole system. More critically, drivers are particularly buggy. First, drivers receive complex and untrusted inputs from not only the user space but also the hardware. Second, the driver code can be developed by less-experienced third parties, and is less tested because running a driver requires the corresponding hardware device or the emulator. Therefore, existing studies show that drivers tend to have a higher bug density and have become a major security threat. Existing testing techniques have to focus the fuzzing on a limited number of drivers that have the corresponding devices or the emulators, thus cannot scale.

In this paper, we propose a device-free driver fuzzing system, DR. FUZZ, that does not require hardware devices to fuzztest drivers. The core of DR. FUZZ is a semantic-informed mechanism that efficiently generates inputs to properly construct relevant data structures to pass the “validation chain” in driving initialization, which enables subsequent device-free driver fuzzing. The elimination of the needs for the hardware devices and the emulators removes the bottleneck in driver testing. The semanticinformed mechanism incorporates multiple new techniques to make device-free driver fuzzing practical: inferring valid input values for passing the validation chain in initialization, inferring the temporal usage order of input bytes to minimize mutation space, and employing error states as a feedback to direct the fuzzing going through the validation chain. Moreover, the semantic-informed mechanism is generic; we can also instruct it to generate semi-malformed inputs for a higher code coverage. We evaluate DR. FUZZ on 214 Linux drivers. With an only 24-hour time budget, DR. FUZZ can successfully initialize and enable most of the drivers without the corresponding devices, whereas existing fuzzers like syzkaller cannot succeed in any case. DR. FUZZ also significantly outperforms existing driver fuzzers that are even equipped with the device or emulator in other aspects: it increases the code coverage by 70% and the throughput by 18%. With DR. FUZZ, we also find 46 new bugs in these Linux drivers.

### SyzScope: Revealing High-Risk Security Impacts of Fuzzer-Exposed Bugs in Linux kernel (USENIX SEC’22)

Abstract: Fuzzing has become one of the most effective bug finding approach for software. In recent years, 24*7 continuous fuzzing platforms have emerged to test critical pieces of software, e.g., Linux kernel. Though capable of discovering many bugs and providing reproducers (e.g., proof-of-concepts), a major problem is that they neglect a critical function that should have been built-in, i.e., evaluation of a bug’s security impact. It is well-known that the lack of understanding of security impact can lead to delayed bug fixes as well as patch propagation. In this paper, we develop SyzScope, a system that can automatically uncover new “high-risk” impacts given a bug with seemingly “low-risk” impacts. From analyzing over a thousand low-risk bugs on syzbot, SyzScope successfully determined that 183 low-risk bugs (more than 15%) in fact contain high-risk impacts, e.g., control flow hijack and arbitrary memory write, some of which still do not have patches available yet.

### HEALER: Relation Learning Guided Kernel Fuzzing (SOSP 2021)

Abstract: Modern operating system kernels are too complex to be free of bugs. Fuzzing is a promising approach for vulnerability detection and has been applied to kernel testing. However, existing work does not consider the influence relations between system calls when generating and mutating inputs, resulting in difficulties when trying to reach into the kernel’s deeper logic effectively.

In this paper, we propose HEALER, a kernel fuzzer that improves fuzzing’s effectiveness by utilizing system call relation learning. HEALER learns the influence relations between system calls by dynamically analyzing minimized test cases. Then, HEALER utilizes the learned relations to guide input generation and mutation, which improves the quality of test cases and the effectiveness of fuzzing. We implemented HEALER and evaluated its performance on recent versions of the Linux kernel. Compared to state-of-the-art kernel fuzzers such as Syzkaller and Moonshine, HEALER improves branch coverage by 28% and 21%, while achieving a speedup of 2.2× and 1.8×, respectively. In addition, HEALER detected 218 vulnerabilities, 33 of which are previously unknown and have been confirmed by the corresponding kernel maintainers.

### Combinatorial Testing of RESTful APIs (ICSE 2022)

Abstract: This paper presents RestCT, a systematic and fully automatic approach that adopts Combinatorial Testing (CT) to test RESTful APIs. RestCT is systematic in that it covers and tests not only the interactions of a certain number of operations in RESTful APIs, but also the interactions of particular input-parameters in every single operation. This is realised by a novel two-phase test case generation approach, which first generates a constrained sequence covering array to determine the execution orders of available operations, and then applies an adaptive strategy to generate and refine several constrained covering arrays to concretise input-parameters of each operation. RestCT is also automatic in that its application relies on only a given Swagger specification of RESTful APIs. The creation of CT test models (especially, the inferring of dependency relationships in operation and input-parameter levels), and the generation and execution of test cases are performed without any human intervention. Experimental results on 11 real-world RESTful APIs demonstrate the effectiveness and efficiency of RestCT. In particular, RestCT can find eight new bugs, where only one of them can be triggered by the state-of-the-art testing tool of RESTful APIs.

### GraphFuzz: Library API Fuzzing with Lifetime-aware Dataflow Graphs (ICSE 2022)

Abstract: We present the design and implementation of GraphFuzz, a new structure-, coverage- and object lifetime-aware fuzzer capable of automatically testing low-level Library APIs. Unlike other fuzzers, GraphFuzz models sequences of executed functions as a dataflow graph, thus enabling it to perform graph-based mutations both at the data and at the execution trace level. GraphFuzz comes with an automated specification generator to minimize the developer integration effort.

We use GraphFuzz to analyze Skia—the rigorously tested Google Chrome graphics library—and benchmark GraphFuzz-generated fuzzing harnesses against hand-optimized, painstakingly written libFuzzer harnesses. We find that GraphFuzz generates test cases that achieve 2-3x more code coverage on average with minimal development effort, and also uncovered previous unknown defects in the process. We demonstrate GraphFuzz’s applicability on low-level APIs by analyzing four additional open-source libraries and finding dozens of previously unknown defects. All security relevant findings have already been reported and fixed by the developers.

Last, we open-source GraphFuzz under a permissible license and provide code to reproduce all results in this paper.

### MOREST: Model-based RESTful API Testing with Execution Feedback (ICSE 2022)

Abstract: RESTful APIs are arguably the most popular endpoints for accessing Web services. Blackbox testing is one of the emerging techniques for ensuring the reliability of RESTful APIs. The major challenge in testing RESTful APIs is the need for correct sequences of API operation calls for in-depth testing. To build meaningful operation call sequences, researchers have proposed techniques to learn and utilize the API dependencies based on OpenAPI specifications. However, these techniques either lack the overall awareness of how all the APIs are connected or the flexibility of adaptively fixing the learned knowledge.

In this paper, we propose MOREST, a model-based RESTful API testing technique that builds and maintains a dynamically updating RESTful-service Property Graph (RPG) to model the behaviors of RESTful-services and guide the call sequence generation. We empirically evaluated MOREST and the results demonstrate that \textsc{Morest} can successfully request an average of 152.66%-232.45% more API operations, cover 26.16%-103.24% more lines of code, and detect 40.64%-215.94% more bugs than state-of-the-art techniques.

In total, we applied MOREST to 6 real-world projects and found 44 bugs (13 of them cannot be detected by existing approaches). Specifically, 2 of the confirmed bugs are from Bitbucket, a famous code management service with more than 6 million users.

### Automated Testing of Software that Uses Machine Learning APIs (ICSE 2022)

Abstract: An increasing number of software applications incorporate machine learning (ML) solutions for cognitive tasks that statistically mimic human behaviors. To test such software, tremendous human effort is needed to design image/text/audio inputs that are relevant to the software, and to judge whether the software is processing these inputs as most human beings do. Even when misbehavior is exposed, it is often unclear whether the culprit is inside the cognitive ML API or the code using the API.

This paper presents Keeper, a new testing tool for software that uses cognitive ML APIs. Keeper designs a pseudo-inverse function for each ML API that reverses the corresponding cognitive task in an empirical way (e.g., an image search engine pseudo-reverses the image-classification API), and incorporates these pseudo-inverse functions into a symbolic execution engine to automatically generate relevant image/text/audio inputs and judge output correctness. Once misbehavior is exposed, Keeper attempts to change how ML APIs are used in software to alleviate the misbehavior. Our evaluation on a variety of open-source applications shows that Keeper greatly improves the branch coverage, while identifying many previously unknown bugs. Our user study with Python developers and application users further demonstrates the efficacy of Keeper.

### Nessie: Automatically Testing JavaScript APIs with Asynchronous Callbacks (ICSE 2022)

Abstract: Previous algorithms for feedback-directed unit test generation iteratively create sequences of API calls by executing partial tests and by adding new API calls at the end of the test. These algorithms are challenged by a popular class of APIs: higher-order functions that receive callback arguments, which often are invoked asynchronously. Existing test generators cannot effectively test such APIs because they only sequence API calls, but do not nest one call into the callback function of another. This paper presents Nessie, the first feedback-directed unit test generator that supports nesting of API calls and that tests asynchronous callbacks. Nesting API calls enables a test to use values produced by an API that are available only once a callback has been invoked, and is often necessary to ensure that methods are invoked in a specific order. The core contributions of our approach are a tree-based representation of unit tests with callbacks and a novel algorithm to iteratively generate such tests in a feedback-directed manner. We evaluate our approach on ten popular JavaScript libraries with both asynchronous and synchronous callbacks. The results show that, in a comparison with LambdaTester, a state of the art test generation technique that only considers sequencing of method calls, Nessie finds more behavioral differences and achieves slightly higher coverage. Notably, Nessie needs to generate significantly fewer tests to achieve and exceed the coverage achieved by the state of the art.

### REST API Fuzzing by Coverage Level Guided Blackbox Testing (2021)

Abstract: With the growth of web applications, REST APIs have become the primary communication method between services. In order to ensure system reliability and security, software quality can be assured by effective testing methods. Black box fuzz testing is one of the effective methods to perform tests on a large scale. However, conventional black box fuzz testing generates random data without judging the quality of the input. We implement a black box fuzz testing method for REST APIs. It resolves the issues of blind mutations without knowing the effectiveness by Test Coverage Level feedback. We also enhance the mutation strategies by reducing the testing complexity for REST APIs, generating more appropriate test cases to cover possible paths. We evaluate our method by testing two large open-source projects and 89 bugs are reported and confirmed. In addition, we find 351 bugs from 64 remote API services in APIs.guru.

### FuzzGen: Automatic Fuzzer Generation (USENIX Security2020)

Abstract: Fuzzing is a testing technique to discover unknown vulnerabilities in software. When applying fuzzing to libraries, the core idea of supplying random input remains unchanged, yet it is non-trivial to achieve good code coverage. Libraries cannot run as standalone programs, but instead are invoked through another application. Triggering code deep in a library remains challenging as specific sequences of API calls are required to build up the necessary state. Libraries are diverse and have unique interfaces that require unique fuzzers, so far written by a human analyst.

To address this issue, we present FuzzGen, a tool for automatically synthesizing fuzzers for complex libraries in a given environment. FuzzGen leverages a whole system analysis to infer the library’s interface and synthesizes fuzzers specifically for that library. FuzzGen requires no human interaction and can be applied to a wide range of libraries. Furthermore, the generated fuzzers leverage LibFuzzer to achieve better code coverage and expose bugs that reside deep in the library.

FuzzGen was evaluated on Debian and the Android Open Source Project (AOSP) selecting 7 libraries to generate fuzzers. So far, we have found 17 previously unpatched vulnerabilities with 6 assigned CVEs. The generated fuzzers achieve an average of 54.94% code coverage; an improvement of 6.94% when compared to manually written fuzzers, demonstrating the effectiveness and generality of FuzzGen.

### Intelligent REST API Data Fuzzing (FSE 2020)

Abstract: The cloud runs on REST APIs. In this paper, we study how to intelligently generate data payloads embedded in REST API requests in order to find data-processing bugs in cloud services. We discuss how to leverage REST API specifications, which, by definition, contain data schemas for API request bodies. We then propose and evaluate a range of data fuzzing techniques, including structural schema fuzzing rules, various rule combinations, search heuristics, extracting data values from examples included in REST API specifications, and learning data values on-the-fly from previous service responses. After evaluating these techniques, we identify the top-performing combination and use this algorithm to fuzz several Microsoft Azure cloud services. During our experiments, we found 100s of “Internal Server Error” service crashes, which we triaged into 17 unique bugs and reported to Azure developers. All these bugs are reproducible, confirmed, and fixed or in the process of being fixed.

### FUDGE: Fuzz Driver Generation at Scale (FSE 2019)

Abstract: At Google we have found tens of thousands of security and robustness bugs by fuzzing C and C++ libraries. To fuzz a library, a fuzzer requires a fuzz driver—which exercises some library code—to which it can pass inputs. Unfortunately, writing fuzz drivers remains a primarily manual exercise, a major hindrance to the widespread adoption of fuzzing. In this paper, we address this major hindrance by introducing the Fudge system for automated fuzz driver generation. Fudge automatically generates fuzz driver candidates for libraries based on existing client code. We have used Fudge to generate thousands of new drivers for a wide variety of libraries. Each generated driver includes a synthesized C/C++ program and a corresponding build script, and is automatically analyzed for quality. Developers have integrated over 200 of these generated drivers into continuous fuzzing services and have committed to address reported security bugs. Further, several of these fuzz drivers have been upstreamed to open source projects and integrated into the OSS-Fuzz fuzzing infrastructure. Running these fuzz drivers has resulted in over 150 bug fixes, including the elimination of numerous exploitable security vulnerabilities.

### REST-ler: Stateful REST API Fuzzing (ICSE 2019)

Abstract: This paper introduces REST-ler, the first stateful REST API fuzzer. REST-ler analyzes the API specification of a cloud service and generates sequences of requests that automatically test the service through its API. REST-ler generates test sequences by (1) inferring producer-consumer dependencies among request types declared in the specification (eg inferring that “a request B should be executed after request A” because B takes as an input a resource-id x produced by A) and by (2) analyzing dynamic feedback from responses observed during prior test executions in order to generate new tests (eg learning that “a request C after a request sequence A;B is refused by the service” and therefore avoiding this combination in the future). We present experimental results showing that these two techniques are necessary to thoroughly exercise a service under test while pruning the large search space of possible request sequences. We used REST-ler to test GitLab, a large open-source self-hosted Git service, as well as several Microsoft Azure and Office365 cloud services. REST-ler found 28 bugs in Gitlab and several bugs in each of the Azure and Office365 cloud services tested so far. These bugs have been confirmed by the service owners, and are either in the process of being fixed or have already been fixed.

### Fuzzing JavaScript Environment APIs with Interdependent Function Calls (IFM 2019)

Abstract: The prevalence of the JavaScript programming language makes the correctness and security of its execution environments highly important. The most exposed and vulnerable parts of these environments are the APIs published to the executed untrusted JavaScript programs. This paper revisits the fuzzing technique that generates JavaScript environment API calls using random walks on so-called prototype graphs to uncover potentially security-related failures. We show the limits of generating independent call expressions, the approach of prior work, and give an extension to enable the generation of interdependent API calls that re-use each other’s results. We demonstrate with an experiment that this enhancement allows our approach to exercise JavaScript environment APIs in ways that were not possible with the previous approach, and that it can also trigger more issues in a real target.

### Systematic Fuzzing and Testing of TLS Libraries (CCS 2016)

Abstract: We present TLS-Attacker, an open source framework for evaluating the security of TLS libraries. TLS-Attacker allows security engineers to create custom TLS message flows and arbitrarily modify message contents using a simple interface in order to test the behavior of their libraries. Based on TLS-Attacker, we present a two-stage fuzzing approach to evaluate TLS server behavior. Our approach automatically searches for cryptographic failures and boundary violation vulnerabilities. It allowed us to find unusual padding oracle vulnerabilities and overflows/overreads in widely used TLS libraries, including OpenSSL, Botan, and MatrixSSL.

Our findings motivate developers to create comprehensive test suites, including positive as well as negative tests, for the evaluation of TLS libraries. We use TLS-Attacker to create such a test suite framework which finds further problems in Botan.

# Hardware Fuzzing

### CPU Fuzzing for Discovering Hardware-caused Information Leakage (2022)

Abstract: In the last years, many microarchitectural attacks, such as cache attacks, Spectre, or Meltdown, have been discovered in CPUs. However, discovering microarchitectural attacks is still a tedious manual process in many cases. Hence, CPUs are still full of vulnerabilities and side channels.

This talk presents our research on the automated discovery of microarchitectural attack vectors. We discuss how we developed Osiris, a fuzzing-based framework that discovers timing side channels in the CPU microarchitecture. In addition to side channels, we present Transynther, an automated approach for finding Meltdown-type transient-execution attacks and generating effective exploitation code for these attacks. Starting from a model of the Meltdown mechanism, Transynther uses a fuzzing-based approach for finding new variants based on known Meltdown-type attacks. With this approach, we did not only re-discover new ways to exploit known attacks but also discovered a new ZombieLoad variant named “Medusa”. Our tools are available as open-source software and can be used to check any x86 CPU for potentially new vulnerabilities. We present different case studies based on our findings to demonstrate that the discovered vulnerabilities are indeed relevant.

### Fuzzing Hardware Like Software (USENIX SEC 2022)

Abstract: Hardware flaws are permanent and potent: hardware cannot be patched once fabricated, and any flaws may undermine even formally verified software executing on top. Consequently, verification time dominates implementation time. The gold standard in hardware Design Verification (DV) is concentrated at two extremes: random dynamic verification and formal verification. Both techniques struggle to root out the subtle flaws in complex hardware that often manifest as security vulnerabilities. The root problem with random verification is its undirected nature, making it inefficient, while formal verification is constrained by the state-space explosion problem, making it infeasible to apply to complex designs. What is needed is a solution that is directed, yet under-constrained.

Instead of making incremental improvements to existing hardware verification approaches, we leverage the observation that existing software fuzzers already provide such a solution; we adapt it for hardware verification, thus leveraging existing—more advanced—software verification tools. Specifically, we translate RTL hardware to a software model and fuzz that model. The central challenge we address is how best to mitigate the differences between the hardware execution model and software execution model. This includes: 1) how to represent test cases, 2) what is the hardware equivalent of a crash, 3) what is an appropriate coverage metric, and 4) how to create a general-purpose fuzzing harness for hardware.

To evaluate our approach, we design, implement, and opensource a Hardware Fuzzing Pipeline that enables fuzzing hardware at scale, using only open-source tools. Using our pipeline, we fuzz four IP blocks from Google’s OpenTitan Root-of-Trust chip. Our experiments reveal a two orders-of-magnitude reduction in run time to achieve Finite State Machine (FSM) coverage over traditional dynamic verification schemes. Moreover, with our design-agnostic harness, we achieve over 88% HDL line coverage in three out of four of our designs—even without any initial seeds.

### PeriScope: An Effective Probing and Fuzzing Framework for the Hardware-OS Boundary (NDSS2019)

Abstract: The OS kernel is an attractive target for remote attackers. If compromised, the kernel gives adversaries full system access, including the ability to install rootkits, extract sensitive information, and perform other malicious actions, all while evading detection. Most of the kernel’s attack surface is situated along the system call boundary. Ongoing kernel protection efforts have focused primarily on securing this boundary; several capable analysis and fuzzing frameworks have been developed for this purpose. However, there are additional paths to kernel compromise that do not involve system calls, as demonstrated by several recent exploits. For example, by compromising the firmware of a peripheral device such as a Wi-Fi chipset and subsequently sending malicious inputs from the Wi-Fi chipset to the Wi-Fi driver, adversaries have been able to gain control over the kernel without invoking a single system call. Unfortunately, there are currently no practical probing and fuzzing frameworks that can help developers find and fix such vulnerabilities occurring along the hardware-OS boundary. We present PeriScope, a Linux kernel based probing framework that enables fine-grained analysis of device-driver interactions. PeriScope hooks into the kernel’s page fault handling mechanism to either passively monitor and log traffic between device drivers and their corresponding hardware, or mutate the data stream on-the-fly using a fuzzing component, PeriFuzz, thus mimicking an active adversarial attack. PeriFuzz accurately models the capabilities of an attacker on peripheral devices, to expose different classes of bugs including, but not limited to, memory corruption bugs and double-fetch bugs. To demonstrate the risk that peripheral devices pose, as well as the value of our framework, we have evaluated PeriFuzz on the Wi-Fi drivers of two popular chipset vendors, where we discovered 15 unique vulnerabilities, 9 of which were previously unknown.

# Other Interesting Fuzzing

### μAFL: Non-intrusive Feedback-driven Fuzzing for Microcontroller Firmware (ICSE 2022)

Abstract: Fuzzing is one of the most effective approaches to find software flaws. However, applying it to microcontroller firmware incurs many challenges. For example, rehosting-based solutions cannot accurately model peripheral behaviors and thus cannot be used to fuzz the corresponding driver code. In this work, we present μAFL, a hardware-assisted approach to fuzzing firmware on real devices. It leverages debugging tools in existing embedded system development to construct an AFL-compatible fuzzing framework. Specifically, we use the debug dongle to bridge the fuzzing environment on the PC and the target firmware on the microcontroller device. To collect code coverage information without costly code instrumentation, μAFL relies on the ARM ETM hardware debugging feature, which transparently collects the instruction trace and streams the results to the PC. However, the raw ETM data is obscure and needs enormous computing resources to recover the actual instruction flow. We therefore propose an alternative representation of code coverage, which retains the same path sensitivity as the original AFL algorithm, but can directly work on the raw ETM data without matching them with disassembled instructions. To further reduce the workload, we use the DWT hardware feature to selectively collect runtime information of interest. We evaluated μAFL on two real evaluation boards from two major vendors: NXP and STMicroelectronics. With our prototype, we discovered ten 0-day bugs in the driver code shipped with the SDK of STMicroelectronics and three 0-day bugs in the SDK of NXP. Considering the wide adoption of chip-vendor-provided SDK code in real products, our results are alarming

### CorbFuzz: Checking Browser Security Policies with Fuzzing (ASE 2021)

Abstract: Browsers use security policies to block malicious behaviors. Cross-Origin Read Blocking (CORB) is a browser security policy for preventing side-channel attacks such as Spectre. We propose a web browser security policy fuzzer called CorbFuzz for checking CORB and similar policies. In implementing a security policy, the browser only has access to HTTP requests and responses, and takes policy actions based solely on those interactions. In checking the browser security policies, CorbFuzz uses a policy oracle that tracks the web application behavior and infers the desired policy action based on the web application state. By comparing the policy oracle with the browser behavior, CorbFuzz detects weaknesses in browser security policies. CorbFuzz checks the web browser policy by fuzzing a set of web applications where the state-related queries are symbolically evaluated for increased coverage and automation. CorbFuzz collects type information from database queries and branch conditions in order to prevent the generation of inconsistent data values during fuzzing. We evaluated CorbFuzz on CORB implementations of Chromium and Webkit, and Opaque Response Blocking (ORB) policy implementation of Firefox using web applications collected from GitHub. We found three classes of weaknesses in Chromium’s implementation of CORB.

### Fully Automated Functional Fuzzing of Android Apps for Detecting Non-crashing Logic Bugs (OOPSLA 2021)

Abstract: Android apps are GUI-based event-driven software and have become ubiquitous in recent years. Obviously, functional correctness is critical for an app’s success. However, in addition to crash bugs, non-crashing functional bugs (in short as “non-crashing bugs” in this work) like inadvertent function failures, silent user data lost and incorrect display information are prevalent, even in popular, well-tested apps. These non-crashing functional bugs are usually caused by program logic errors and manifest themselves on the graphic user interfaces (GUIs). In practice, such bugs pose significant challenges in effectively detecting them because (1) current practices heavily rely on expensive, small-scale manual validation (the lack of automation); and (2) modern fully automated testing has been limited to crash bugs (the lack of test oracles).

This paper fills this gap by introducing independent view fuzzing, a novel, fully automated approach for detecting non-crashing functional bugs in Android apps. Inspired by metamorphic testing, our key insight is to leverage the commonly-held independent view property of Android apps to manufacture property-preserving mutant tests from a set of seed tests that validate certain app properties. The mutated tests help exercise the tested apps under additional, adverse conditions. Any property violations indicate likely functional bugs for further manual confirmation. We have realized our approach as an automated, end-to-end functional fuzzing tool, Genie. Given an app, (1) Genie automatically detects non-crashing bugs without requiring human-provided tests and oracles (thus fully automated); and (2) the detected non-crashing bugs are diverse (thus general and not limited to specific functional properties), which set Genie apart from prior work.

We have evaluated Genie on 12 real-world Android apps and successfully uncovered 34 previously unknown non-crashing bugs in their latest releases — all have been confirmed, and 22 have already been fixed. Most of the detected bugs are nontrivial and have escaped developer (and user) testing for at least one year and affected many app releases, thus clearly demonstrating Genie’s effectiveness. According to our analysis, Genie achieves a reasonable true positive rate of 40.9%, while these 34 non-crashing bugs could not be detected by prior fully automated GUI testing tools (as our evaluation confirms). Thus, our work complements and enhances existing manual testing and fully automated testing for crash bugs.

### webFuzz: Grey-Box Fuzzing for Web Applications (ESORICS 2021)

Abstract: Fuzzing is significantly evolved in analysing native code, but web applications, invariably, have received limited attention until now. This paper designs, implements and evaluates webFuzz, a gray-box fuzzing prototype for discovering vulnerabilities in web applications.

webFuzz is successful in leveraging instrumentation for detecting cross-site scripting (XSS) vulnerabilities, as well as covering more code faster than black-box fuzzers. In particular, webFuzz has discovered one zero-day vulnerability in WordPress, a leading CMS platform, and five in an online commerce application named CE-Phoenix.

Moreover, in order to systematically evaluate webFuzz, and similar tools, we provide the first attempt for automatically synthesizing reflective cross-site scripting (RXSS) vulnerabilities in vanilla web applications.

### Vulnerability-Oriented Fuzz Testing for Connected Autonomous Vehicle Systems ( IEEE Transactions on Reliability 2021)

Abstract: In an era of connectivity and automation, the vehicle industry is adopting numerous technologies to transform driver-centric vehicles into intelligent mechanical devices driven by software components. Software integration and network connectivity inherit numerous security issues that open the door for malicious attacks. Software security testing is a scalable and practical approach to identify systems’ weaknesses and vulnerabilities at an early stage and throughout their life-cycle. Security specialists recommend fuzz testing to identify vulnerabilities within vehicle software systems. Nevertheless, the randomness and blindness of fuzzing hinder it from becoming a reliable security tool. This article presents a vulnerability-oriented fuzz (VulFuzz) testing framework that utilizes security vulnerability metrics designed particularly for connected and autonomous vehicles to direct and prioritize the fuzz testing toward the most vulnerable components. While most gray-box fuzzing techniques aim solely to expand code coverage, the proposed approach assigns weights to ensure a thorough examination of the most vulnerable components. Moreover, we employ an input structure-aware mutation technique that can bypass vehicle software systems’ input formats to boost test performance and avoid dropped test cases. Such a testing technique will contribute to the quality assurance of vehicle software engineering. We implemented the proposed approach on OpenPilot, a driver assistance system, and compared our results to American fuzzy lop (AFL) and an unguided mutation-based fuzzer. Within 16.8 h, VulFuzz exposed 335 crashes, 41 times more than AFL and two times more than an unguided mutation-based fuzzer. VulFuzz is explicitly efficient for automotive systems, reaching the same code coverage as AFL but with more exposed crashes and fewer dropped messages.

### Hardware Support to Improve Fuzzing Performance and Precision (CCS 2021)

Abstract: Coverage-guided fuzzing is considered one of the most efficient bug-finding techniques, given its number of bugs reported. However, coverage tracing provided by existing software-based approaches, such as source instrumentation and dynamic binary translation, can incur large overhead. Hindered by the significantly lowered execution speed, it also becomes less beneficial to improve coverage feedback by incorporating additional execution states. In this paper, we propose SNAP, a customized hardware platform that implements hardware primitives to enhance the performance and precision of coverage-guided fuzzing. By sitting at the bottom of the computer stack, SNAP leverages the existing CPU pipeline and micro-architectural features to provide coverage tracing and rich execution semantics with near-zero cost regardless of source code availability. Prototyped as a synthesized RISC-V BOOM processor on FPGA, SNAP incurs a barely 3.1% tracing overhead on the SPEC benchmarks while achieving a 228× higher fuzzing throughput than the existing software-based solution. Posing only a 4.8% area and 6.5% power overhead, SNAP is highly practical and can be adopted by existing CPU architectures with minimal changes.

### Fuzzing: Cyberphysical System Testing for Security and Dependability (Computer 2021)

Abstract: Fuzz testing enables automated remote testing of cyberphysical and Internet of Things systems. The selection of effective test vector sequences is critical and a natural domain for the employment of artificial intelligence and machine learning techniques that incorporate successful selection criteria.

### ESRFuzzer: an enhanced fuzzing framework for physical SOHO router devices to discover multi-Type vulnerabilities (Cybersecurity 2021)

Abstract: SOHO (small office/home office) routers provide services for end devices to connect to the Internet, playing an important role in cyberspace. Unfortunately, security vulnerabilities pervasively exist in these routers, especially in the web server modules, greatly endangering end users. To discover these vulnerabilities, fuzzing web server modules of SOHO routers is the most popular solution. However, its effectiveness is limited due to the lack of input specification, lack of routers’ internal running states, and lack of testing environment recovery mechanisms. Moreover, existing works for device fuzzing are more likely to detect memory corruption vulnerabilities.In this paper, we propose a solution ESRFuzzer to address these issues. It is a fully automated fuzzing framework for testing physical SOHO devices. It continuously and effectively generates test cases by leveraging two input semantic models, i.e., KEY-VALUE data model and CONF-READ communication model, and automatically recovers the testing environment with power management. It also coordinates diversified mutation rules with multiple monitoring mechanisms to trigger multi-type vulnerabilities. With the guidance of the two semantic models, ESRFuzzer can work in two ways: general mode fuzzing and D-CONF mode fuzzing. General mode fuzzing can discover both issues which occur in the CONF and READ operation, while D-CONF mode fuzzing focus on the READ-op issues especially missed by general mode fuzzing.We ran ESRFuzzer on 10 popular routers across five vendors. In total, it discovered 136 unique issues, 120 of which have been confirmed as 0-day vulnerabilities we found. As an improvement of SRFuzzer, ESRFuzzer have discovered 35 previous undiscovered READ-op issues that belong to three vulnerability types, and 23 of them have been confirmed as 0-day vulnerabilities by vendors. The experimental results show that ESRFuzzer outperforms state-of-the-art solutions in terms of types and number of vulnerabilities found.

### Estimating Residual Risk in Greybox Fuzzing (FSE 2021)

Abstract: For any errorless fuzzing campaign, no matter how long, there is always some residual risk that a software error would be discovered if only the campaign was run for just a bit longer. Recently, greybox fuzzing tools have found widespread adoption. Yet, practitioners can only guess when the residual risk of a greybox fuzzing campaign falls below a specific, maximum allowable threshold.

In this paper, we explain why residual risk cannot be directly estimated for greybox campaigns, argue that the discovery probability (i.e., the probability that the next generated input increases code coverage) provides an excellent upper bound, and explore sound statistical methods to estimate the discovery probability in an ongoing greybox campaign. We find that estimators for blackbox fuzzing systematically and substantially \emph{under}-estimate the true risk. An engineer—who stops the campaign when the estimators purport a risk below the maximum allowable risk—is vastly misled. She might need execute a campaign that is orders of magnitude longer to achieve the allowable risk. Hence, the \emph{key challenge} we address in this paper is \emph{adaptive bias}: The probability to discover a specific error actually increases over time. We provide the first probabilistic analysis of adaptive bias, and introduce two novel classes of estimators that tackle adaptive bias. With our estimators, the engineer can decide with confidence when to abort the campaign.

### HeteroFuzz: Fuzz Testing to Detect Platform Dependent Divergence for Heterogeneous Applications (FSE 2021)

Abstract: As specialized hardware accelerators like FPGAs become a prominent part of the current computing landscape, software applications are increasingly constructed to leverage heterogeneous architectures. Such a trend is already happening in the domain of machine learning and Internet-of-Things (IoT) systems built on edge devices. Yet, debugging and testing methods for heterogeneous applications are currently lacking. These applications may look similar to regular C/C++ code but include hardware synthesis details in terms of preprocessor directives. Therefore, their behavior under heterogeneous architectures may diverge significantly from CPU due to hardware synthesis details. Further, the compilation and hardware simulation cycle takes an enormous amount of time, prohibiting frequent invocations required for fuzz testing.

We propose a novel fuzz testing technique, called HeteroFuzz, designed to specifically target heterogeneous applications and to detect platform-dependent divergence. The key essence of HeteroFuzz is that it uses a three-pronged approach to reduce the long latency of repetitively invoking a hardware simulator on a heterogeneous application. First, in addition to monitoring code coverage as a fuzzing guidance mechanism, we analyze synthesis pragmas in kernel code and monitor accelerator-relevant value spectra. Second, we design dynamic probabilistic mutations to increase the chance of hitting divergent behavior under different platforms. Third, we memorize the boundaries of seen kernel inputs and skip HLS simulator invocation if it can expose only redundant divergent behavior. We evaluate HeteroFuzz on seven real-world heterogeneous applications with FPGA kernels. HeteroFuzz is 754X faster in exposing the same set of distinct divergence symptoms than naive fuzzing. Probabilistic mutations contribute to 17.5X speed up than the one without. Selective invocation of HLS simulation contributes to 8.8X speed up than the one without.

### DirectFuzz: Automated Test Generation for RTL Designs using Directed Graybox Fuzzing (DAC 2021)

Abstract: Abstract— A critical challenge in RTL verification is to generate effective test inputs. Recently, RFUZZ proposed to use an automated software testing technique, namely Graybox Fuzzing, to effectively generate test inputs to maximize the coverage of the whole hardware design. For a scenario where a tiny fraction of a large hardware design needs to be tested, the RFUZZ approach is extremely time consuming. In this work, we present DirectFuzz, a directed test generation mechanism. DirectFuzz uses Directed Graybox Fuzzing to generate test inputs targeted towards a module instance, which enables targeted testing. Our experimental results show that DirectFuzz covers the target sites up to 17.5× faster (2.23×on average) than RFUZZ on a variety of RTL designs.

### PMFuzz: Test Case Generation for Persistent Memory Programs (ASPLOS 2021)

Abstract: This work provides PMFuzz, a fuzzer that automatically generates high-value test cases to detect crash consistency bugs in PM programs. The source of PMFuzz is publicly available at: https://github.com/Systems-ShiftLab/PMFuzz. We implement PMFuzz on top of a well-known fuzzer, AFL++ by incorporating our key insights, and evaluate PMFuzz in a real PM system with common PM programs, including keyvalue stores and databases.

### PGFUZZ: Policy-Guided Fuzzing for Robotic Vehicles (NDSS 2021)

Abstract: Robotic vehicles (RVs) are becoming essential tools of modern systems, including autonomous delivery services, public transportation, and environment monitoring. Despite their diverse deployment, safety and security issues with RVs limit their wide adoption. Most attempts to date in RV security aim to propose defenses that harden their control program against syntactic bugs, input validation bugs, and external sensor spoofing attacks. In this paper, we introduce PGFUZZ, a policy-guided fuzzing framework, which validates whether an RV adheres to identified safety and functional policies that cover user commands, configuration parameters, and physical states. PGFUZZ expresses desired policies through temporal logic formulas with time constraints as a guide to fuzz the analyzed system. Specifically, it generates fuzzing inputs that minimize a distance metric measuring how close'' the RV current state is to a policy violation. In addition, it uses static and dynamic analysis to focus the fuzzing effort only on those commands, parameters, and environmental factors that influence the`truth value’’ of any of the exercised policies. The combination of these two techniques allows PGFUZZ to increase the efficiency of the fuzzing process significantly. We validate PGFUZZ on three RV control programs, ArduPilot, PX4, and Paparazzi, with 56 unique policies. PGFUZZ discovered 156 previously unknown bugs, 106 of which have been acknowledged by their developers.

### IntelliGen: Automatic Driver Synthesis for Fuzz Testing (ICSE 2021 SEIP)

Abstract: Fuzzing is a technique widely used in vulnerability detection. The process usually involves writing effective fuzz driver programs, which, when done manually, can be extremely labor intensive. Previous attempts at automation leave much to be desired, in either degree of automation or quality of output. In this paper, we propose IntelliGen, a framework that constructs valid fuzz drivers automatically. First, IntelliGen determines a set of entry functions and evaluates their respective chance of exhibiting a vulnerability. Then, IntelliGen generates fuzz drivers for the entry functions through hierarchical parameter replacement and type inference. We implemented IntelliGen and evaluated its effectiveness on real-world programs selected from the Android Open-Source Project, Google’s fuzzer-testsuite and industrial collaborators. IntelliGen covered on average 1.08×-2.03× more basic blocks and 1.36×-2.06× more paths over state-of-the-art fuzz driver synthesizers FUDGE and FuzzGen. IntelliGen performed on par with manually written drivers and found 10 more bugs.

### WINNIE: Fuzzing Windows Applications with Harness Synthesis and Fast Cloning (NDSS 2021)

Abstract: Fuzzing is an emerging technique to automatically validate programs and uncover bugs. It has been widely used to test many programs and has found thousands of security vulnerabilities. However, existing fuzzing efforts are mainly centered around Unix-like systems, as Windows imposes unique challenges for fuzzing: a closed-source ecosystem, the heavy use of graphical interfaces and the lack of fast process cloning machinery.

In this paper, we propose two solutions to address the challenges Windows fuzzing faces. Our system, WINNIE, first tries to synthesize a harness for the application, a simple program that directly invokes target functions, based on sample executions. It then tests the harness, instead of the original complicated program, using an efficient implementation of fork on Windows. Using these techniques, WINNIE can bypass irrelevant GUI code to test logic deep within the application. We used WINNIE to fuzz 59 closed-source Windows binaries, and it successfully generated valid fuzzing harnesses for all of them. In our evaluation, WINNIE can support 2.2× more programs than existing Windows fuzzers could, and identified 3.9× more program states and achieved 26.6× faster execution. In total, WINNIE found 61 unique bugs in 32 Windows binaries.

### Industry Practice of Coverage-Guided Enterprise-Level DBMS Fuzzing (ICSE 2021)

Abstract: As an infrastructure for data persistence and analysis, Database Management Systems (DBMSs) are the cornerstones of modern enterprise software. To improve their correctness, the industry has been applying blackbox fuzzing for decades. Recently, the research community achieved impressive fuzzing gains using coverage guidance. However, due to the complexity and distributed nature of enterprise-level DBMSs, seldom are these researches applied to the industry.

In this paper, we apply coverage-guided fuzzing to enterprise-level DBMSs from Huawei and Bloomberg LP. In our practice of testing GaussDB and Comdb2, we found major challenges in all three testing stages. The challenges are collecting precise coverage, optimizing fuzzing performance, and analyzing root causes. In search of a general method to overcome these challenges, we propose RATEL, a coverage-guided fuzzer for enterprise-level DBMSs. With its industry-oriented design, RATEL improves the feedback precision, enhances the robustness of input generation, and performs an on-line investigation on the root cause of bugs. As a result, RATEL outperformed other fuzzers in terms of coverage and bugs. Compared to industrial black box fuzzers SQLsmith and SQLancer, as well as coverage-guided academic fuzzer Squirrel, RATEL covered 38.38%, 106.14%, 583.05% more basic blocks than the best results of other three fuzzers in GaussDB, PostgreSQL, and Comdb2, respectively. More importantly, RATEL has discovered 32, 42, and 5 unknown bugs in GaussDB, Comdb2, and PostgreSQL.

### Refined Grey-Box Fuzzing with Sivo (arXiv 2021)

Abstract: We design and implement from scratch a new fuzzer called Sivo that refines multiple stages of grey-box fuzzing. First, Sivo refines dataflow fuzzing in two ways: (a) it provides a new taint inference engine that requires only logarithmic in the input size number of tests to infer the dependency of all program branches on the input bytes, and (b) it deploys a novel method for inverting branches by solving directly and efficiently systems of inequalities. Second, our fuzzer refines accurate tracking and detection of code coverage with simple and easily implementable methods. Finally, Sivo refines selection of parameters and strategies by parameterizing all stages of fuzzing and then dynamically selecting optimal values during fuzzing. Thus the fuzzer can easily adapt to a target program and rapidly increase coverage. We compare our fuzzer to 11 other state-of-the-art greybox fuzzers on 27 popular benchmarks. Our evaluation shows that Sivo scores the highest both in terms of code coverage and in terms of number of found vulnerabilities.

### Fuzzing Technique in Web Applications and Beyond (MCTE 2020)

Abstract: Web security is an old yet important topic. With the emergence of new applications and new threats, we need new techniques to fight against these threats. Fuzzing technique has been proven as an effective tool and has the potential of evolving with the new threats. In this paper, we discuss the applications of fuzzing technique in both web and other areas. We also present a case study by implementing a fuzzer in this study.

### Nyx: Greybox Hypervisor Fuzzing using Fast Snapshots and Affine Types (USENIX Security2021)

Abstract: A hypervisor (also know as virtual machine monitor, VMM) enforces the security boundaries between different virtual machines (VMs) running on the same physical machine. A malicious user who is able to run her own kernel on a cloud VM can interact with a large variety of attack surfaces. Exploiting a software fault in any of these surfaces leads to full access to all other VMs that are co-located on the same host. Hence, the efficient detection of hypervisor vulnerabilities is crucial for the security of the modern cloud infrastructure. Recent work showed that blind fuzzing is the most efficient approach to identify security issues in hypervisors, mainly due to an outstandingly high test throughput.

In this paper we present the design and implementation of NYX, a highly optimized, coverage-guided hypervisor fuzzer. We show how a fast snapshot restoration mechanism that allows us to reload the system under test thousands of times per second is key to performance. Furthermore, we introduce a novel mutation engine based on custom bytecode programs, encoded as directed acyclic graphs (DAG), and affine types, that enables the required flexibility to express complex interactions. Our evaluation shows that, while NYX has a lower throughput than the state-of-the-art hypervisor fuzzer, it performs competitively on simple targets: NYX typically requires only a few minutes longer to achieve the same test coverage. On complex devices, however, our approach is able to significantly outperform existing works. Moreover, we are able to uncover substantially more bugs: in total, we uncovered 44 new bugs with 22 CVEs requested. Our results demonstrate that coverage guidance is highly valuable, even if a blind fuzzer can be significantly faster.

### The Use of Likely Invariants as Feedback for Fuzzers (USENIX Security2021)

Abstract: While fuzz testing proved to be a very effective technique to find software bugs, open challenges still exist. One of the its main limitations is the fact that popular coverage-guided designs are optimized to reach different parts of the program under test, but struggle when reachability alone is insufficient to trigger a vulnerability. In reality, many bugs require a specific program state that involve not only the control flow, but also the values of some of the program variables. Unfortunately, alternative exploration strategies that have been proposed in the past to capture the program state are of little help in practice, as they immediately result in a state explosion. In this paper, we propose a new feedback mechanism that augments code coverage by taking into account the usual values and relationships among program variables. For this purpose, we learn likely invariants over variables at the basicblock level, and partition the program state space accordingly. Our feedback can distinguish when an input violates one or more invariants and reward it, thus refining the program state approximation that code coverage normally offers. We implemented our technique in a prototype called INVSCOV, developed on top of LLVM and AFL++. Our experiments show that our approach can find more, and different, bugs with respect to fuzzers that use a pure code-coverage feedback. Furthermore, they led to the discovery of two vulnerabilities in a library tested daily on OSS-Fuzz, and still present at the time in its latest version.

### FREEDOM: Engineering a State-of-the-Art DOM Fuzzer (CCS 2020)

Abstract: The DOM engine of a web browser is a popular attack surface and has been thoroughly fuzzed during its development. A common approach adopted by the latest DOM fuzzers is to generate new inputs based on context-free grammars. However, such a generative approach fails to capture the data dependencies in the inputs of a DOM engine, namely, HTML documents. Meanwhile, it is unclear whether or not coverage-guided mutation, which is well-known to be effective in fuzzing numerous software, still remains to be effective against DOM engines. Worse yet, existing DOM fuzzers cannot adopt a coverage-guided approach because they are unable to fully support HTML mutation and suffer from low browser throughput. To scientifically understand the effectiveness and limitations of the two approaches, we propose FreeDom, a full-fledged cluster-friendly DOM fuzzer that works with both generative and coverage-guided modes. FreeDom relies on a context-aware intermediate representation to describe HTML documents with proper data dependencies. FreeDom also exhibits up to 3.74x higher throughput through browser self-termination. FreeDom has found 24 previously unknown bugs in commodity browsers including Safari, Firefox, and Chrome, and 10 CVEs has been assigned so far. With the context-aware generation, FreeDom finds 3x more unique crashes in WebKit than the state-of-the-art DOM fuzzer, Domato. FreeDom guided by coverage is more effective in revealing new code blocks (2.62%) and finds three complex bugs that its generative approach fails to find. However, coverage-guided mutation that bootstraps with an empty corpus triggers 3.8x fewer unique crashes than the generative approach. The newly revealed coverage, more often than not, negatively affects the effectiveness of DOM fuzzers in bug finding. Therefore, we consider context-aware generation the best practice to find more DOM engine bugs and expect further improvement on coverage-guided DOM fuzzing facilitated by FreeDom.

### SQUIRREL: Testing Database Management Systems with Language Validity and Coverage Feedback (CCS 2020)

Abstract: Fuzzing is an increasingly popular technique for verifying software functionalities and finding security vulnerabilities. However, current mutation-based fuzzers cannot effectively test database management systems (DBMSs), which strictly check inputs for valid syntax and semantics. Generation-based testing can guarantee the syntax correctness of the inputs, but it does not utilize any feedback, like code coverage, to guide the path exploration.

In this paper, we develop Squirrel, a novel fuzzing framework that considers both language validity and coverage feedback to test DBMSs. We design an intermediate representation (IR) to maintain SQL queries in a structural and informative manner. To generate syntactically correct queries, we perform type-based mutations on IR, including statement insertion, deletion and replacement. To mitigate semantic errors, we analyze each IR to identify the logical dependencies between arguments, and generate queries that satisfy these dependencies. We evaluated Squirrel on four popular DBMSs: SQLite, MySQL, PostgreSQL and MariaDB. Squirrel found 51 bugs in SQLite, 7 in MySQL and 5 in MariaDB. 52 of the bugs are fixed with 12 CVEs assigned. In our experiment, Squirrel achieves 2.4x-243.9x higher semantic correctness than state-of-the-art fuzzers, and explores 2.0x-10.9x more new edges than mutation-based tools. These results show that Squirrel is effective in finding memory errors of database management systems.

### BigFuzz: Efficient Fuzz Testing for Data Analytics using Framework Abstraction (ASE 2020)

Abstract: As big data analytics become increasingly popular, data-intensive scalable computing (DISC) systems help address the scalability issue of handling large data. However, there exists a lack of automated testing techniques to test such data-centric applications, because data is often incomplete, continuously evolving, and hard to know a priori. Fuzz testing has been proven to be highly effective in other domains such as security; however, it is nontrivial to apply such traditional fuzzing to big data analytics directly for three reasons: (1) the long latency of DISC systems prohibits the applicability of fuzzing: naïve fuzzing would spend 98% of the time in setting up a test environment; (2) conventional branch coverage is unlikely to scale to DISC applications because most binary code comes from the framework implementation such as Apache Spark; and (3) random bit or byte-level mutations can hardly generate meaningful data, which fails to reveal real-world application bugs.

We propose a novel coverage-guided fuzz testing tool for big data analytics, called BigFuzz. The key essence of our approach is that: (a) we focus on exercising application logic as opposed to increasing framework code coverage by abstracting the DISC frame-work using specifications. BigFuzz performs automated source to source transformations to construct an equivalent DISC application suitable for fast test generation, and (b) we design schema-aware data mutation operators based on our in-depth study of DISC application error types. BigFuzz speeds up the fuzzing time by 78X-1477X compared to random fuzzing, improves application code coverage by 20%-271%, and achieves 33%-157% improvement in detecting application errors. When compared to the state of the art that uses symbolic execution to test big data analytics, BigFuzz is applicable to twice more programs and can find 80.6% more bugs.

### MoFuzz: A Fuzzer Suite for Testing Model-Driven Software Engineering Tools (ASE 2020)

Abstract: Fuzzing or fuzz testing is an established technique that aims to discover unexpected program behavior (e.g. bugs, security vulnerabilities, or crashes) by feeding automatically generated data into a program under test. However, the application of fuzzing to test Model-Driven Software Engineering (MDSE) tools is still limited because of the difficulty of existing fuzzers to provide structured, well-typed inputs, namely models that conform to typing and consistency constraints induced by a given meta-model and underlying modeling framework. By drawing from recent advances on both fuzz testing and automated model generation, we present three different approaches for fuzzing MDSE tools: A graph grammar-based fuzzer and two variants of a coverage-guided mutation-based fuzzer working with different sets of model mutation operators. We have evaluated our fuzzing approaches on a set of real-world MDSE tools. Our experimental results show that all approaches can outperform both standard fuzzers and model generators w.r.t.\ their fuzzing capabilities. Moreover, we found that each of our approaches comes with its own strengths and weaknesses in terms of fault finding capabilities and the ability to cover different aspects of the system under test. Thus the approaches complement each other, forming a fuzzer suite for testing MDSE tools.

### AFL++: Combining Incremental Steps of Fuzzing Research (USENIX Woot2020)

Abstract: In this paper, we present AFL++, a community-driven opensource tool that incorporates state-of-the-art fuzzing research, to make the research comparable, reproducible, combinable, and — most importantly – useable. It offers a variety of novel features, for example, its Custom Mutator API, able to extend the fuzzing process at many stages. With it, mutators for specific targets can also be written by experienced security testers. We hope for AFL++ to become a new baseline tool not only for current, but also for future research, as it allows us to test new techniques quickly, and evaluate not only the effectiveness of the single technique versus the state-of-the art, but also in combination with other techniques. The paper gives an evaluation of hand-picked fuzzing technologies —shining light on the fact that while each novel fuzzing method can increase performance in some targets — it decreases performance for other targets. This is an insight future fuzzing research should consider in their evaluations.

### Active Fuzzing for Testing and Securing Cyber-Physical Systems (ISSTA 2020)

Abstract: Cyber-physical systems (CPSs) in critical infrastructure face a pervasive threat from attackers, motivating research into a variety of countermeasures for securing them. Assessing the effectiveness of these countermeasures is challenging, however, as realistic benchmarks of attacks are difficult to manually construct, blindly testing is ineffective due to the enormous search spaces and resource requirements, and intelligent fuzzing approaches require impractical amounts of data and network access. In this work, we propose active fuzzing, an automatic approach for finding test suites of packet-level CPS network attacks, targeting scenarios in which attackers can observe sensors and manipulate packets, but have no existing knowledge about the payload encodings. Our approach learns regression models for predicting sensor values that will result from sampled network packets, and uses these predictions to guide a search for payload manipulations (i.e. bit flips) most likely to drive the CPS into an unsafe state. Key to our solution is the use of online active learning, which iteratively updates the models by sampling payloads that are estimated to maximally improve them. We evaluate the efficacy of active fuzzing by implementing it for a water purification plant testbed, finding it can automatically discover a test suite of flow, pressure, and over/underflow attacks, all with substantially less time, data, and network access than the most comparable approach. Finally, we demonstrate that our prediction models can also be utilised as countermeasures themselves, implementing them as anomaly detectors and early warning systems.

### CrFuzz: Fuzzing Multi-purpose Programs through Input Validation (FSE 2020)

Abstract: Fuzz testing has been proved its effectiveness in discovering software vulnerabilities. Empowered its randomness nature along with a coverage-guiding feature, fuzzing has been identified a vast number of vulnerabilities in real-world programs. This paper begins with an observation that the design of the current state-of-the-art fuzzers is not well suited for a particular (but yet important) set of software programs. Specifically, current fuzzers have limitations in fuzzing programs serving multiple purposes, where each purpose is controlled by extra options. This paper proposes CrFuzz, which overcomes this limitation. CrFuzz designs a clustering analysis to automatically predict if a newly given input would be accepted or not by a target program. Exploiting this prediction capability, CrFuzz is designed to efficiently explore the programs with multiple purposes. We employed CrFuzz for three state-of-the-art fuzzers, AFL, QSYM, and MOpt, and CrFuzz-augmented versions have shown 19.3% and 5.68% better path and edge coverage on average. More importantly, during two weeks of long-running experiments, CrFuzz discovered 277 previously unknown vulnerabilities where 212 of those are already confirmed and fixed by the respected vendors. We would like to emphasize that many of these vulnerabilities were discoverd from FFMpeg, ImageMagick, and Graphicsmagick, all of which are targets of Google’s OSS-Fuzz project and thus heavily fuzzed for last three years by far. Nevertheless, CrFuzz identified a remarkable number of vulnerabilities, demonstrating its effectiveness of vulnerability finding capability.

### SpecFuzz: Bringing Spectre-type vulnerabilities to the surface (USENIX Security2020)

Abstract: SpecFuzz is the first tool that enables dynamic testing for speculative execution vulnerabilities (e.g., Spectre). The key is a novel concept of speculation exposure: The program is instrumented to simulate speculative execution in software by forcefully executing the code paths that could be triggered due to mispredictions, thereby making the speculative memory accesses visible to integrity checkers (e.g., AddressSanitizer). Combined with the conventional fuzzing techniques, speculation exposure enables more precise identification of potential vulnerabilities compared to state-of-the-art static analyzers. Our prototype for detecting Spectre V1 vulnerabilities successfully identifies all known variations of Spectre V1 and decreases the mitigation overheads across the evaluated applications, reducing the amount of instrumented branches by up to 77% given a sufficient test coverage.

### USBFuzz: A Framework for Fuzzing USB Drivers by Device Emulation (USENIX Security2020)

Abstract: The Universal Serial Bus (USB) connects external devices to a host. This interface exposes the OS kernels and device drivers to attacks by malicious devices. Unfortunately, kernels and drivers were developed under a security model that implicitly trusts connected devices. Drivers expect faulty hardware but not malicious attacks. Similarly, security testing drivers are challenging as input must cross the hardware/software barrier. Fuzzing, the most widely used bug-finding technique, relies on providing random data to programs. However, fuzzing device drivers is challenging due to the difficulty in crossing the hardware/software barrier and providing random device data to the driver under test.

We present USBFuzz, a portable, flexible, and modular framework for fuzz testing USB drivers. At its core, USBFuzz uses a software-emulated USB device to provide random device data to drivers (when they perform IO operations). As the emulated USB device works at the device level, porting it to other platforms is straight-forward. Using the USBFuzz framework, we apply (i) coverage-guided fuzzing to a broad range of USB drivers in the Linux kernel; (ii) dumb fuzzing in FreeBSD, MacOS, and Windows through cross-pollination seeded by the Linux inputs; and (iii) focused fuzzing of a USB webcam driver. USBFuzz discovered a total of 26 new bugs, including 16 memory bugs of high-security impact in various Linux subsystems (USB core, USB sound, and network), one bug in FreeBSD, three in MacOS (two resulting in an unplanned reboot and one freezing the system), and four in Windows 8 and Windows 10 (resulting in Blue Screens of Death), and one bug in the Linux USB host controller driver and another one in a USB camera driver. From the Linux bugs, we have fixed and upstreamed 11 bugs and received 10 CVEs.

### Boosting Fuzzer Efficiency: An Information Theoretic Perspective (FSE 2020)

Abstract: In this paper, we take the fundamental perspective of fuzzing as a learning process. Suppose before fuzzing, we know nothing about the behaviors of a program P: What does it do? Executing the first test input, we learn how P behaves for this input. Executing the next input, we either observe the same or discover a new behavior. As such, each execution reveals “some amount” of information about P’s behaviors. A classic measure of information is Shannon’s entropy. Measuring entropy allows us to quantify how much is learned from each generated test input about the behaviors of the program. Within a probabilistic model of fuzzing, we show how entropy also measures fuzzer efficiency. Specifically, it measures the general rate at which the fuzzer discovers new behaviors. Intuitively, efficient fuzzers maximize information.

From this information theoretic perspective, we develop Entropic, an entropy-based power schedule for greybox fuzzing which assigns more energy to seeds that maximize information. We implemented Entropic into the popular greybox fuzzer LibFuzzer. Our experiments with more than 250 open-source programs (60 million LoC) demonstrate a substantially improved efficiency and confirm our hypothesis that an efficient fuzzer maximizes information. Entropic has been independently evaluated and invited for integration into main-line LibFuzzer. Entropic will run on more than 25,000 machines fuzzing hundreds of security-critical software systems simultaneously and continuously.

### Fuzzing Error Handling Code using Context-Sensitive Software Fault Injection (USENIX Security2020)

Abstract: Error handling code is often critical but difficult to test in reality. As a result, many hard-to-find bugs exist in error handling code and may cause serious security problems once triggered. Fuzzing has become a widely used technique for finding software bugs nowadays. Fuzzing approaches mutate and/or generate various inputs to cover infrequently-executed code. However, existing fuzzing approaches are very limited in testing error handling code, because some of this code can be only triggered by occasional errors (such as insufficient memory and network-connection failures), but not specific inputs. Therefore, existing fuzzing approaches in general cannot effectively test such error handling code.

In this paper, we propose a new fuzzing framework named FIFUZZ, to effectively test error handling code and detect bugs. The core of FIFUZZ is a context-sensitive software fault injection (SFI) approach, which can effectively cover error handling code in different calling contexts to find deep bugs hidden in error handling code with complicated contexts. We have implemented FIFUZZ and evaluated it on 9 widely-used C programs. It reports 317 alerts which are caused by 50 unique bugs in terms of the root causes. 32 of these bugs have been confirmed by related developers. We also compare FIFUZZ to existing fuzzing tools (including AFL, AFLFast, AFLSmart and FairFuzz), and find that FIFUZZ finds many bugs missed by these tools. We believe that FIFUZZ can effectively augment existing fuzzing approaches to find many real bugs that have been otherwise missed.

### FANS: Fuzzing Android Native System Services via Automated Interface Analysis (USENIX Security2020)

Abstract: Android native system services provide essential supports and fundamental functionalities for user apps. Finding vulnerabilities in them is crucial for Android security. Fuzzing is one of the most popular vulnerability discovery solutions, yet faces several challenges when applied to Android native system services. First, such services are invoked via a special interprocess communication (IPC) mechanism, namely binder, via service-specific interfaces. Thus, the fuzzer has to recognize all interfaces and generate interface-specific test cases automatically. Second, effective test cases should satisfy the interface model of each interface. Third, the test cases should also satisfy the semantic requirements, including variable dependencies and interface dependencies.

In this paper, we propose an automated generation-based fuzzing solution FANS to find vulnerabilities in Android native system services. It first collects all interfaces in target services and uncovers deep nested multi-level interfaces to test. Then, it automatically extracts interface models, including feasible transaction code, variable names and types in the transaction data, from the abstract syntax tree (AST) of target interfaces. Further, it infers variable dependencies in transactions via the variable name and type knowledge, and infers interface dependencies via the generation and use relationship. Finally, it employs the interface models and dependency knowledge to generate sequences of transactions, which have valid formats and semantics, to test interfaces of target services. We implemented a prototype of FANS from scratch and evaluated it on six smartphones equipped with a recent version of Android, i.e., android-9.0.0_r46 , and found 30 unique vulnerabilities deduplicated from thousands of crashes, of which 20 have been confirmed by Google. Surprisingly, we also discovered 138 unique Java exceptions during fuzzing.

### Fuzzing IPC with Knowledge Inference (SRDS 2019)

Abstract: Sandboxing provides a strong security guarantee for applications, by isolating untrusted code into separated compartments. Untrusted code could only use IPC (inter-process communication) to launch sensitive actions, which are implemented in trusted (and maybe privileged) code. IPC-related security bugs in trusted code could facilitate jailbreaks of sandboxing, and thus are becoming high-value targets. However, finding vulnerabilities that could be triggered by IPC is challenging, due to the fact that IPC communication is stateful and format-sensitive.

In this paper, we propose a new fuzzing solution to discover IPC bugs in IPC services without source code, by combining static analysis and dynamic analysis. We use static analysis to recognize format checks and help construct IPC messages of valid formats. We then use dynamic analysis to infer the constraints between IPC messages, and model the stateful logic with a probability matrix. Therefore, we are able to generate highquality IPC messages to test IPC services, and discover deep and complex IPC bugs. Without loss of generality, we implemented a prototype MACHFUZZER, for a specific complicated and crucial IPC service, i.e., WindowServer in macOS. This prototype helps us find 12 previously unknown vulnerabilities in WindowServer in 48 hours. Among them, three vulnerabilities are confirmed exploitable, and could be exploited to escape the sandbox and gain root privilege.

### HYPER-CUBE: High-Dimensional Hypervisor Fuzzing (NDSS 2020)

Abstract: Applying modern fuzzers to novel targets is often a very lucrative venture. Hypervisors are part of a very critical code base: compromising them could allow an attacker to compromise the whole cloud infrastructure of any cloud provider. In this paper, we build a novel fuzzer that aims explicitly at testing modern hypervisors. Our high throughput fuzzer design for long running interactive targets allows us to fuzz a large number of hypervisors, both open source, and proprietary. In contrast to one-dimensional fuzzers such as AFL, HYPER-CUBE can interact with any number of interfaces in any order.

Our evaluation shows that we can find more bugs (over 2x) and coverage (as much as 2x) than state of the art hypervisor fuzzers. Additionally, in most cases, we were able to do so using multiple orders of magnitude less time than comparable fuzzers. HYPER-CUBE was also able to rediscover a set of well-known vulnerabilities for hypervisors, such as VENOM, in less than five minutes. In total, HYPER-CUBE found 54 novel bugs, and so far we obtained 37 CVEs. Our evaluation results demonstrates that next generation coverage-guided fuzzers should incorporate a higher-throughput design for long running targets such as hypervisors.

### Reproducible Crashes: Fuzzing Pharo by Mutating the Test Methods (IWST20 2020)

Abstract: Fuzzing (or Fuzz Testing) is a technique to verify the robustness of a program-under-test. Valid input is replaced by random values with the goal to force the program-under-test into unresponsive states. In this position paper, we propose a white box Fuzzing approach by transforming (mutating) existing test methods. We adopt the mechanisms used for test amplification to generate crash inducing tests, which developers can reproduce later. We provide anecdotal evidence that our approach towards Fuzzing reveals crashing issues in the Pharo environment.

### Opening Pandora’s Box through ATFuzzer: Dynamic Analysis of AT Interface for Android Smartphones (ACSAC 2019)

Abstract: This paper focuses on checking the correctness and robustness ofthe AT command interface exposed by the cellular baseband processor through Bluetooth and USB. A device’s application processor uses this interface for issuing high-level commands (or, AT commands) to the baseband processor for performing cellular network operations (e.g., placing a phone call). Vulnerabilities in this interface can be leveraged by malicious Bluetooth peripherals to launch pernicious attacks including DoS and privacy attacks. To identify such vulnerabilities, we propose ATFuzzer that uses a grammar-guided evolutionary fuzzing approach which mutates production rules of the AT command grammar instead of concrete AT commands. Empirical evaluation with ATFuzzer on 10 Android smartphones from 6 vendors revealed 4 invalid AT command grammars over Bluetooth and 13 over USB with implications ranging from DoS, downgrade of cellular protocol version (e.g., from 4G to 3G/2G) to severe privacy leaks. The vulnerabilities along with the invalid AT command grammars were responsibly disclosed to affected vendors and two of the reported vulnerabilities have been already assigned CVEs (CVE-2019-16400 and CVE-2019-16401).

### FuzzFactory: Domain-Specific Fuzzing with Waypoints (OOPSLA 2019)

Abstract: Coverage-guided fuzz testing has gained prominence as a highly effective method of finding security vulnerabilities such as buffer overflows in programs that parse binary data. Recently, researchers have introduced various specializations to the coverage-guided fuzzing algorithm for different domain-specific testing goals, such as finding performance bottlenecks, generating valid inputs, handling magic-byte comparisons, etc. Each such solution can require non-trivial implementation effort and produces a distinct variant of a fuzzing tool. We observe that many of these domain-specific solutions follow a common solution pattern.

In this paper, we present FuzzFactory, a framework for developing domain-specific fuzzing applications without requiring changes to mutation and search heuristics. FuzzFactory allows users to specify the collection of dynamic domain-specific feedback during test execution, as well as how such feedback should be aggregated. FuzzFactory uses this information to selectively save intermediate inputs, called waypoints, to augment coverage-guided fuzzing. Such waypoints always make progress towards domain-specific multi-dimensional objectives. We instantiate six domain-specific fuzzing applications using FuzzFactory: three re-implementations of prior work and three novel solutions, and evaluate their effectiveness on benchmarks from Google’s fuzzer test suite. We also show how multiple domains can be composed to perform better than the sum of their parts. For example, we combine domain-specific feedback about strict equality comparisons and dynamic memory allocations, to enable the automatic generation of LZ4 bombs and PNG bombs.

### Compiler Fuzzing: How Much Does It Matter (OOPSLA2019)

Abstract: Despite much recent interest in randomised testing (fuzzing) of compilers, the practical impact of fuzzer-found compiler bugs on real-world applications has barely been assessed. We present the first quantitative and qualitative study of the tangible impact of miscompilation bugs in a mature compiler. We follow a rigorous methodology where the bug impact over the compiled application is evaluated based on (1) whether the bug appears to trigger during compilation; (2) the extent to which generated assembly code changes syntactically due to triggering of the bug; and (3) whether such changes cause regression test suite failures, or whether we can manually find application inputs that trigger execution divergence due to such changes. The study is conducted with respect to the compilation of more than 10 million lines of C/C++ code from 309 Debian packages, using 12% of the historical and now fixed miscompilation bugs found by four state-of-the-art fuzzers in the Clang/LLVM compiler, as well as 18 bugs found by human users compiling real code or as a by-product of formal verification efforts. The results show that almost half of the fuzzer-found bugs propagate to the generated binaries for at least one package, in which case only a very small part of the binary is typically affected, yet causing two failures when running the test suites of all the impacted packages. User-reported and formal verification bugs do not exhibit a higher impact, with a lower rate of triggered bugs and one test failure. The manual analysis of a selection of the syntactic changes caused by some of our bugs (fuzzer-found and non fuzzer-found) in package assembly code, shows that either these changes have no semantic impact or that they would require very specific runtime circumstances to trigger execution divergence.

### RVFuzzer: Finding Input Validation Bugs in Robotic Vehicles through Control-Guided Random Testing (USENIX Security2019)

Abstract: Robotic vehicles (RVs) are being adopted in a variety of application domains. Despite their increasing deployment, many security issues with RVs have emerged, limiting their wider deployment. In this paper, we address a new type of vulnerability in RV control programs, called input validation bugs, which involve missing or incorrect validation checks on control parameter inputs. Such bugs can be exploited to cause physical disruptions to RVs which may result in mission failures and vehicle damages or crashes. Furthermore, attacks exploiting such bugs have a very small footprint: just one innocent-looking ground control command, requiring no code injection, control flow hijacking or sensor spoofing. To prevent such attacks, we propose RVFuzzer, a vetting system for finding input validation bugs in RV control programs through control-guided input mutation. The key insight behind RVFuzzer is that the RV control model, which is the generic theoretical model for a broad range of RVs, provides helpful semantic guidance to improve bug-discovery accuracy and efficiency. Specifically, RVFuzzer involves a control instability detector that detects control program misbehavior, by observing (simulated) physical operations of the RV based on the control model. In addition, RVFuzzer steers the input generation for finding input validation bugs more efficiently, by leveraging results from the control instability detector as feedback. In our evaluation of RVFuzzer on two popular RV control programs, a total of 89 input validation bugs are found, with 87 of them being zero-day bugs.

### Engineering a Better Fuzzer with Synergically Integrated Optimizations (ISSRE 2019)

Abstract: State-of-the-art fuzzers implement various optimizations to enhance their performance. As the optimizations reside in different stages such as input seed selection and mutation, it is tempting to combine the optimizations in different stages. However, our initial attempts demonstrate that naive combination actually worsens the performance, which explains that most optimizations are still isolated by stages and metrics. In this paper, we present InteFuzz, the first framework that synergically integrates multiple fuzzing optimizations. We analyze the root cause for performance degradation in naive combination, and discover optimizations conflict in coverage criteria and optimization granularity. To resolve the conflicts, we propose a novel priority-based scheduling mechanism. The dynamic integration considers both branch-based and block-based coverage feedbacks that are used by most fuzzing optimizations. In our evaluation, we extract four optimizations from popular fuzzers such as AFLFast and FairFuzz and compare InteFuzz against naive combinations. The evaluation results show that InteFuzz outperforms the naive combination by 29% and 26% in path-and branch-coverage. Additionally, InteFuzz triggers 222 more unique crashes, and discovers 33 zero-day vulnerabilities in real-world projects with 12 registered as CVEs.

### Fuzzing Error Handling Code in Device Drivers Based on Software Fault Injection (ISSRE 2019)

Abstract: Device drivers remain a main source of runtime failures in operating systems. To detect bugs in device drivers, fuzzing has been commonly used in practice. However, a main limitation of existing fuzzing approaches is that they cannot effectively test error handling code. Indeed, these fuzzing approaches require effective inputs to cover target code, but much error handling code in drivers is triggered by occasional errors (such as insufficient memory and hardware malfunctions) that are not related to inputs. In this paper, based on software fault injection, we propose a new fuzzing approach named FIZZER, to test error handling code in device drivers. At compile time, FIZZER uses static analysis to recommend possible error sites that can trigger error handling code. During driver execution, by analyzing runtime information, it automatically fuzzes error-site sequences for fault injection to improve code coverage. We evaluate FIZZER on 18 device drivers in Linux 4.19, and in total find 22 real bugs. The code coverage is increased by over 15% compared to normal execution without fuzzing.

### What You Corrupt Is Not What You Crash: Challenges in Fuzzing Embedded Devices (NDSS 2018)

Abstract: As networked embedded systems are becoming more ubiquitous, their security is becoming critical to our daily life. While manual or automated large scale analysis of those systems regularly uncover new vulnerabilities, the way those systems are analyzed follows often the same approaches used on desktop systems. More specifically, traditional testing approaches relies on observable crashes of a program, and binary instrumentation techniques are used to improve the detection of those faulty states. In this paper, we demonstrate that memory corruptions, a common class of security vulnerabilities, often result in different behavior on embedded devices than on desktop systems. In particular, on embedded devices, effects of memory corruption are often less visible. This reduces significantly the effectiveness of traditional dynamic testing techniques in general, and fuzzing in particular. Additionally, we analyze those differences in several categories of embedded devices and show the resulting impact on firmware analysis. We further describe and evaluate relatively simple heuristics which can be applied at run time (on an execution trace or in an emulator), during the analysis of an embedded device to detect previously undetected memory corruptions.

### FOT: A Versatile, Configurable, Extensible Fuzzing Framework (FSE 2018)

Abstract: Greybox fuzzing is one of the most effective approaches for detecting software vulnerabilities. Various new techniques have been continuously emerging to enhance the effectiveness and/or efficiency by incorporating novel ideas into different components of a greybox fuzzer. However, there lacks a modularized fuzzing framework that can easily plugin new techniques and hence facilitate the reuse, integration and comparison of different techniques. To address this problem, we propose a fuzzing framework, namely Fuzzing Orchestration Toolkit (FOT). FOT is designed to be versatile, configurable and extensible. With FOT and its extensions, we have found 111 new bugs from 11 projects. Among these bugs, 18 CVEs have been assigned. Video link: https://youtu.be/O6Qu7BJ8RP0

### Designing New Operating Primitives to Improve Fuzzing Performance (CCS 2017)

Abstract: Fuzzing is a software testing technique that finds bugs by repeatedly injecting mutated inputs to a target program. Known to be a highly practical approach, fuzzing is gaining more popularity than ever before. Current research on fuzzing has focused on producing an input that is more likely to trigger a vulnerability. In this paper, we tackle another way to improve the performance of fuzzing, which is to shorten the execution time of each iteration. We observe that AFL, a state-of-the-art fuzzer, slows down by 24x because of file system contention and the scalability of fork() system call when it runs on 120 cores in parallel. Other fuzzers are expected to suffer from the same scalability bottlenecks in that they follow a similar design pattern. To improve the fuzzing performance, we design and implement three new operating primitives specialized for fuzzing that solve these performance bottlenecks and achieve scalable performance on multi-core machines. Our experiment shows that the proposed primitives speed up AFL and LibFuzzer by 6.1 to 28.9x and 1.1 to 735.7x, respectively, on the overall number of executions per second when targeting Google’s fuzzer test suite with 120 cores. In addition, the primitives improve AFL’s throughput up to 7.7x with 30 cores, which is a more common setting in data centers. Our fuzzer-agnostic primitives can be easily applied to any fuzzer with fundamental performance improvement and directly benefit large-scale fuzzing and cloud-based fuzzing services.

### Chizpurfle: A Gray-Box Android Fuzzer for Vendor Service Customizations (ISSRE 2017)

Abstract: Android has become the most popular mobile OS, as it enables device manufacturers to introduce customizations to compete with value-added services. However, customizations make the OS less dependable and secure, since they can introduce software flaws. Such flaws can be found by using fuzzing, a popular testing technique among security researchers.This paper presents Chizpurfle, a novel “gray-box” fuzzing tool for vendor-specific Android services. Testing these services is challenging for existing tools, since vendors do not provide source code and the services cannot be run on a device emulator. Chizpurfle has been designed to run on an unmodified Android OS on an actual device. The tool automatically discovers, fuzzes, and profiles proprietary services. This work evaluates the applicability and performance of Chizpurfle on the Samsung Galaxy S6 Edge, and discusses software bugs found in privileged vendor services

### Fuzzware: Using Precise MMIO Modeling for Effective Firmware Fuzzing (USENIX Security2022)

Abstract: As embedded devices are becoming more pervasive in our everyday lives, they turn into an attractive target for adversaries. Despite their high value and large attack surface, applying automated testing techniques such as fuzzing is not straightforward for such devices. As fuzz testing firmware on constrained embedded devices is inefficient, state-of-the-art approaches instead opt to run the firmware in an emulator (through a process called re-hosting). However, existing approaches either use coarse-grained static models of hardware behavior or require manual effort to re-host the firmware.

We propose a novel combination of lightweight program analysis, re-hosting, and fuzz testing to tackle these challenges. We present the design and implementation of Fuzzware, a software-only system to fuzz test unmodified monolithic firmware in a scalable way. By determining how hardware-generated values are actually used by the firmware logic, Fuzzware can automatically generate models that help focusing the fuzzing process on mutating the inputs that matter, which drastically improves its effectiveness.

We evaluate our approach on synthetic and real-world targets comprising a total of 19 hardware platforms and 77 firmware images. Compared to state-of-the-art work, Fuzzware achieves up to 3.25 times the code coverage and our modeling approach reduces the size of the input space by up to 95.5%. The synthetic samples contain 66 unit tests for various hardware interactions, and we find that our approach is the first generic re-hosting solution to automatically pass all of them. Fuzzware discovered 15 completely new bugs including bugs in targets which were previously analyzed by other works; a total of 12 CVEs were assigned.