UTFacultiesEEMCSDisciplines & departmentsFormal Methods and ToolsNewsPDEng Defence of Jorryt-Jan Dijkstra (ING Group) on 'Zero-Downtime Schema Changes at ING'

PDEng Defence of Jorryt-Jan Dijkstra (ING Group) on 'Zero-Downtime Schema Changes at ING' Wednesday 1 September 2021

We are very glad to announce that Jorryt-Jan Dijkstra successfully defended his PDEng thesis on Wednesday 1 September 2021. The project is the first outcome of a collaboration with ING group. 

Supervisors: Arend Rensink and Maurice van Keulen Company supervisor: Joost Bosman External assessors: Jan Braaksma and Dik Schippers

Abstract: 
Banking has evolved to digital commodities. With Payment Services Directive (PSD2), banks are opening up their digital services to third party providers. Competition is growing as Bigtechs step into banking and Fintechs are quickly gaining ground. With the growth of competition and customer expectations shifting, it is vital to rapidly respond and adapt software. ING’s strategy is to embrace this shift and adapt, such that changes are delivered faster and more often.

One of the expectations is 24/7 availability of digital services. To adapt faster to customer’s needs and gather feedback more frequently, it is key to release new software versions more often. Releasing new software versions whilst staying available is challenging. Availability can only be guaranteed if two software versions (old and new) shortly live together whenever a new release takes place. Without this overlap, there would be a time window in which customers would face downtime.

The goal of this project is to mitigate downtime for customers, such that availability expectations can be met whilst releasing often. The project took place in the Digital Channels department, which facilitates a portal and mobile application to provide several banking products to large corporate clients. The project started by identifying technical challenges that affect availability at the time of a software release.

The challenge that this project addresses entails database schema changes, which might block queries or break the schema of the existing application. Little effort in the department has been spent on overcoming this challenge and the industry does not have a clear-cut solution for this. Therefore, the focus of the project has been nullifying downtime that is caused (or required) by schema changes.

Criteria and requirements for the solution were gathered. These were combined with identified historical schema changes, such that the solution space could be explored. Primarily this led to a conceptual design that addresses how to propagate schema changes without downtime and how to test it for correctness and performance. Secondarily, the design has also been implemented into a solution that is directly applicable in the department.

The implementation consists of a plugin that works on top of the adopted schema management tool (Liquibase) and a repository of patterns to cater for schema compatibility between two application versions. The plugin provides new schema changes by leveraging non-blocking equivalents of schema changes that otherwise would block queries of existing clients.

The repository of patterns is designed using two approaches to deal with schema compatibility. Both of them adopt the Expand and Contract pattern, which entails postponing breaking schema changes until the to be deleted schema objects would not be in use any more by existing software versions. Next to that, it requires synchronization between old and new schema objects, such that data remains consistent between software versions. The first approach handles the expansion and synchronization both in the table close to the original data (referred to as in-place). The second approach creates a full table copy and atomically swaps the to be changed table. The first approach requires less space; no manual intervention, and fits the department tooling best. The second approach requires more space; manual intervention, and might have short perceived downtime during the atomic swap. As the first approach works in-place and the second approach operates on a full table copy, their risks are also different.

Recommendations such that the department can adopt the design and the example implementations are presented. Finally, information on how to extend and test both the design and implementation to support new types of changes has been included.