UTFaculteitenEEMCSDisciplines & departementenDMBAssignmentsOpen AssignmentsOpen Master Assignments[M] Virtual symphony: matching off-tempo audio from different instruments

[M] Virtual symphony: matching off-tempo audio from different instruments

master Assignment

Virtual symphony: matching off-tempo audio from different instruments

Type: Master CS

Student: Unassigned

Duration: TBD

If you are interested please contact:

Background:

From studio producers to orchestra conductors, harmony is an essential part of music however this temporal alignment between instruments is often difficult to achieve.

e.g. Berlioz's Requiem link requires roughly an orchestra of 400 musicians to be in sync.

Thus, an automated post-production tool that can mix and match sounds from different instruments can be useful to both professional as well as amateur editors.

Recent deep learning methods based on self-supervision have shown great opportunities for both predicting the progress [1,2,3] and matching instances [4]. This project aims to use audio-processing transformer models [5,6] to align melodies, i.e. sounds from individual instruments, to create harmonies, i.e. combined sounds.

Objectives:

You will develop a model that given individual instruments/sound sources that are out-of-sync, it will synthesize a coherent harmony.

Your profile:

You are a graduate student that is interested in audio processing with prior experience in DL frameworks (e.g. pytorch). You are also enthusiastic about researching new directions and applying, testing, and analyzing the outcomes of your ideas.

Related works:

[1] Shen, Y. and Elhamifar, E., 2024. Progress-aware online action segmentation for egocentric procedural task videos. CVPR.

[2] Donahue, G. and Elhamifar, E., 2024. Learning to predict activity progress by self-supervised video alignment. CVPR.

[3] Liu, W., Tekin, B., Coskun, H., Vineet, V., Fua, P. and Pollefeys, M., 2022. Learning to align sequential actions in the wild. CVPR.

[4] Somayazulu, A., Chen, C. and Grauman, K., 2024. Self-supervised visual acoustic matching. NeurIPS.

[5] Chen, S., Wu, Y., Wang, C., Liu, S., Tompkins, D., Chen, Z. and Wei, F., 2022. Beats: Audio pre-training with acoustic tokenizers. arXiv preprint arXiv:2212.09058.

[6] Gong, Y., Chung, Y.A. and Glass, J., 2021. Ast: Audio spectrogram transformer. arXiv preprint arXiv:2104.01778.