Complete list of publications

Below, readers may find all the publications related to the EVI project. They can be accessed by clicking the corresponding title.

TitleAuthorsYearAbstract

Active Learning of Mealy Machines with Timers

Véronique Bruyère, Bharat Garhewal, Guillermo A. Pérez, Gaëtan Staquet, Frits W. Vaandrager2025We present the first algorithm for query learning of a class of Mealy machines with timers in a black-box context. Our algorithm is an extension of the L# algorithm of Vaandrager et al. to a timed setting. We rely on symbolic queries which empower us to reason on untimed executions while learning. Similarly to the algorithm for learning timed automata of Waga, these symbolic queries can be implemented using finitely many concrete queries. Experiments with a prototype implementation, written in Rust, show that our algorithm is able to efficiently learn realistic benchmarks.

New Fault Domains for Conformance Testing of Finite State Machines

Frits Vaandrager, Ivo Melse2025A fault domain reflects a tester's assumptions about faults that may occur in an implementation and that need to be detected during testing. A fault domain that has been widely studied in the literature on black-box conformance testing is the class of finite state machines (FSMs) with at most states. Numerous strategies for generating test suites have been proposed that guarantee fault coverage for this class. These so-called -complete test suites grow exponentially in , where is the number of states of the specification, so one can only run them for small values of . But the assumption that is small is not realistic in practice. In his seminal paper from 1964, Hennie raised the challenge to design checking experiments in which the number of states may increase appreciably. In order to solve this long-standing open problem, we propose (much larger) fault domains that capture the assumption that all states in an implementation can be reached by first performing a sequence from some set (typically a state cover for the specification), followed by arbitrary inputs, for some small . The number of states of FSMs in these fault domains grows exponentially in . We present a sufficient condition for --completeness of test suites with respect to these fault domains. Our condition implies --completeness of two prominent -complete test suite generation strategies, the Wp and HSI methods. Thus these strategies are complete for much larger fault domains than those for which they were originally designed, and thereby solve Hennie's challenge. We show that three other prominent -complete methods (H, SPY and SPYH) do not always generate --complete test suites.

Systematic Evaluation of Black-Box Checking for Fast Bug Detection

Bram Pellen, María Belén Rodríguez, Frits Vaandrager, Petra van den Bos2025Combinations of active automata learning, model-based testing and model checking have been successfully used in numerous applications, e.g., for spotting bugs in implementations of major network protocols and to support refactoring of embedded controllers. However, in the large majority of these applications, model checking is only used at the very end, when no counterexample can be found anymore for the latest hypothesis model. This contrasts with the original proposal of black-box checking (BBC) by Peled, Vardi & Yannakakis, which applies model checking for all hypotheses, also the intermediate ones. In this article, we present the first systematic evaluation of the ability of BBC to find bugs quickly, based on 77 benchmarks models from real protocol implementations and controllers for which specifications of safety properties are available. Our main finding are: (a) In cases where the full model can be learned, BBC detects violations of the specifications with just 3.4% of the queries needed by an approach in which model checking is only used for the full model. (b) Even when the full model cannot be learned, BBC is still able to detect many violations of the specification. In particular, BBC manages to detect 94% of the safety properties violations in the challenging RERS 2019 industrial LTL benchmarks. (c) Our results also confirm that BBC is way more effective than existing MBT algorithms in finding deep bugs in implementations.