MVA Material for the course on Audio Signal Processing

 

 

Registration to the course :

Course outline

WARNING : The first introductory lecture is highly audio-based, with many examples, and introduces important concepts.

The slides for each lecture might be updated before the lecture. :

Reading the course slides only is not enough to properly understand the material

Course Validation

For the validation of the course, you must complete a project:

There will be a 25-minute oral presentation:

Projects

All projects involve deep learning components. Their purpose is to place classical signal processing methods (e.g., filtering, spectral analysis, linear prediction, parametric synthesis) in perspective by directly comparing them with modern learning-based approaches addressing the same tasks.

Project 1 - SEGAN: Speech Enhancement Generative Adversarial Network

Paper: Pascual et al., SEGAN: Speech Enhancement Generative Adversarial Network
https://arxiv.org/abs/1703.09452

Objective of the work
The goal of this project is to study speech denoising directly in the time domain using a convolutional neural network. The work focuses on understanding what a learned filter can achieve compared to a classical Wiener filter under controlled assumptions.

Expected work


Project 2 - DCCRN: Deep Complex Convolution Recurrent Network

Paper: Hu et al., DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement
https://arxiv.org/abs/2008.00264

Objective
This project investigates the role of phase information in speech denoising. The goal is to compare magnitude-only Wiener or deep learning approaches with complex-valued deep-learning approach.

Expected work


Project 3 - MetricGAN: Optimizing Perceptual Metrics

Paper: Fu et al., MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement
https://arxiv.org/abs/1905.04874

Objective
This project studies the impact of the loss function in deep learning-based speech denoising. The architecture is fixed and simple; only the training objective is modified.

Expected work


Project 4 - DDSP: Differentiable Digital Signal Processing

Paper: Engel et al., DDSP: Differentiable Digital Signal Processing
https://arxiv.org/abs/2001.04643

Objective
The goal of this project is to compare analytical standard (DSP) pipeline sound synthesis with learned parameter estimation. Students study how neural networks can be used to predict parameters of a classical harmonic-plus-noise synthesizer.

Expected work


Project 5 - LPCNet: Neural Excitation for LPC Vocoding

Paper: Valin & Skoglund, LPCNet: Improving Neural Speech Synthesis through Linear Prediction
https://arxiv.org/abs/1810.11846

Objective
This project analyzes what neural networks learn when combined with a classical LPC vocoder. The focus is exclusively on excitation modeling rather than spectral envelope estimation.

Expected work


Project 6 - CREPE: Deep Learning for Pitch Estimation

Paper: Kim et al., CREPE: A Convolutional Representation for Pitch Estimation
https://arxiv.org/abs/1802.06182

Objective
This project compares classical DSP pitch estimation methods with a deep learning-based approach. The emphasis is on experimental evaluation, robustness, and voiced/unvoiced decision making.

Expected work