Deep Learning for Audio | Petr Grinberg

The course was originally developed by my colleague, Alexander Markovich. In 2023, I joined the teaching staff and now I am one of the main contributors and organizers of the course. The course has two goals:

Provide students with the knowledge about different tasks in audio processing, including human’s voice, that industrial companies are interested in. This includes: Automatic Speech Recognition (ASR), Text-To-Speech (TTS), Voice Conversion (VC), Voice Biometry, and Source Separation. As deep learning techniques tend to be similar between different modalities, the acquired skills are useful for other Deep Learning topics, such as Computer Vision or NLP.
Teach best-practices for DL-engineers and develop mandatory skills of R&D engineers: basics of MLOps (proper code development, experiment tracking, configuration, etc.) and reading/writing academic papers.

Throughout the course, students:

Write structured industry-level code with configuration support.
Replicate State-Of-The-Art models from different years (e.g., DeepSpeech, Conformer, FastSpeech 2, HiFi-GAN, RTFSNet, AASIST).
Read a lot of papers and write their own reports.
Learn how to work in teams and how to write academic papers.

We aim to enhance the course each year and to supplement materials with Guest Lectures given by different companies and universities, such as SberDevices, AIRI, and QMUL.

The course is considered one of the best deep learning course at HSE and awarded with a «Best in terms of gained knowledge novelty».