Below, readers may find all the publications related to the EVI project. They can be accessed by clicking the corresponding title.
| Title | Authors | Year | Abstract |
|---|---|---|---|
| Véronique Bruyère, Bharat Garhewal, Guillermo A. Pérez, Gaëtan Staquet, Frits W. Vaandrager | 2025 | We present the first algorithm for query learning of a class of Mealy machines with timers in a black-box context. Our algorithm is an extension of the L# algorithm of Vaandrager et al. to a timed setting. We rely on symbolic queries which empower us to reason on untimed executions while learning. Similarly to the algorithm for learning timed automata of Waga, these symbolic queries can be implemented using finitely many concrete queries. Experiments with a prototype implementation, written in Rust, show that our algorithm is able to efficiently learn realistic benchmarks. | |
An L# Based Algorithm for Active Learning of Minimal Separating Automata | Jasper Laumen, Leonne Snel, Frits Vaandrager | 2026 | A DFA separates two disjoint languages L1 and L2 if it accepts every word in L1 and rejects every word in L2. Algorithms for active learning of small separating DFAs have many applications, e.g., for learning network invariants, learning contextual assumptions in compositional verification, learning state machines from large amounts of log data, and learning bug pattern descriptions. We propose a simple active learning algorithm, inspired by L#, that learns a minimal separating DFA for disjoint languages L1 and L2 if one exists. Experiments show that our algorithm significantly outperforms existing active learning algorithms on both randomly generated and industrial benchmarks. |
New Fault Domains for Conformance Testing of Finite State Machines | Frits Vaandrager, Ivo Melse | 2025 | A fault domain reflects a tester's assumptions about faults that may occur in an implementation and that need to be detected during testing. A fault domain that has been widely studied in the literature on black-box conformance testing is the class of finite state machines (FSMs) with at most states. Numerous strategies for generating test suites have been proposed that guarantee fault coverage for this class. These so-called -complete test suites grow exponentially in , where is the number of states of the specification, so one can only run them for small values of . But the assumption that is small is not realistic in practice. In his seminal paper from 1964, Hennie raised the challenge to design checking experiments in which the number of states may increase appreciably. In order to solve this long-standing open problem, we propose (much larger) fault domains that capture the assumption that all states in an implementation can be reached by first performing a sequence from some set (typically a state cover for the specification), followed by arbitrary inputs, for some small . The number of states of FSMs in these fault domains grows exponentially in . We present a sufficient condition for --completeness of test suites with respect to these fault domains. Our condition implies --completeness of two prominent -complete test suite generation strategies, the Wp and HSI methods. Thus these strategies are complete for much larger fault domains than those for which they were originally designed, and thereby solve Hennie's challenge. We show that three other prominent -complete methods (H, SPY and SPYH) do not always generate --complete test suites. |
PICKLES: a Natural Language Framework for Requirement Specification and Model-Based Testing | María Belén Rodríguez, Petra van den Bos | 2026 | This paper combines methods from the fields of Model-Based Testing (MBT) and Behaviour-Driven Development (BDD) to define a testing approach with human-readable specifications and test cases, as in BDD, while using the modelling techniques and automatic test generation algorithms from MBT. We introduce PICKLES, a Precise Input and Control-flow Keyword-based Language for tEst Scenarios; an extension of Gherkin-style BDD scenarios, specified in structured natural language. We provide a bi-directional translation from Pickles scenarios to formal models, where both specifications and tests are human-readable, and a method to obtain a so-called master model combining all translated scenarios. Standard MBT algorithms can then be applied to automatically derive test cases from it. We implement a prototype of the translation and composition steps, which we use on an industrial case study: a software component from a traffic management system. With it, we illustrate the pipeline and show how Pickles can achieve significantly higher coverage with respect to BDD from the same set of scenarios. |
Systematic Evaluation of Black-Box Checking for Fast Bug Detection | Bram Pellen, María Belén Rodríguez, Frits Vaandrager, Petra van den Bos | 2025 | Combinations of active automata learning, model-based testing and model checking have been successfully used in numerous applications, e.g., for spotting bugs in implementations of major network protocols and to support refactoring of embedded controllers. However, in the large majority of these applications, model checking is only used at the very end, when no counterexample can be found anymore for the latest hypothesis model. This contrasts with the original proposal of black-box checking (BBC) by Peled, Vardi & Yannakakis, which applies model checking for all hypotheses, also the intermediate ones. In this article, we present the first systematic evaluation of the ability of BBC to find bugs quickly, based on 77 benchmarks models from real protocol implementations and controllers for which specifications of safety properties are available. Our main finding are: (a) In cases where the full model can be learned, BBC detects violations of the specifications with just 3.4% of the queries needed by an approach in which model checking is only used for the full model. (b) Even when the full model cannot be learned, BBC is still able to detect many violations of the specification. In particular, BBC manages to detect 94% of the safety properties violations in the challenging RERS 2019 industrial LTL benchmarks. (c) Our results also confirm that BBC is way more effective than existing MBT algorithms in finding deep bugs in implementations. |

