MASTERÂ Assignment
Improving the Failover Time of Distributed Mission-Critical Systems
Type : Master M-CS
Period: November, 2024 - April, 2025
Student : Siderova, A. (Aleksandra, Student CS)
Date Final project: April 14, 2025
Supervisors:
Abstract:
In this thesis, the problem of fast failover in a distributed mission-critical system using Kubernetes is explored. The thesis aims to provide a comprehensive background on the problem, as well as to compare potential solutions that could address the challenges at hand. The technologies considered are Serf, Consul and ZooKeeper, with Serf being selected as a suitable candidate for comparison in an experimental environment that simulates the conditions of a distributed mission-critical system. The findings of this thesis indicate that Serf offers an improved failure detection time compared to the use of Kubernetes alone. Furthermore, a range for the expected failure detection time is established, although its results may not be sufficiently predictable for the stringent requirements of mission-critical systems.