PhD - Machine learning / structured prediction for speaker diarization

LIMSI (http://www.limsi.fr) seeks qualified candidates for one fully funded PhD position in the field of automatic speaker recognition. The research will be conducted in the framework of the ANR-funded project ODESSA (Online Diarization Enhanced by recent Speaker identification and Structured prediction Approaches) in partnership with EURECOM (France) and IDIAP (Switzerland).

Broadly, the goal of an automatic speaker recognition system is to authenticate or to identify a person through speech signal. Speaker diarization is an unsupervised process that aims at identifying each speaker within an audio stream and determining the intervals during which each speaker is active.

The overall goal of the position is to advance the state-of-the-art in speaker recognition and diarization. Specifically, the research will explore the use of structured prediction techniques for speaker diarization.

Conversations between several speakers are usually highly structured and speech turns of a given person are not uniformly distributed over time. Hence, knowing that someone is speaking at a particular time t tells us a lot about the probability that (s)he is also going to speak a few seconds later. However, state-of-the-art approaches seldom takes this intrinsic structure into account. The goal of this task is to demonstrate that structured prediction techniques (such as graphical models or SVMstruct) can be applied to speaker diarization.

The proposed research is a collaboration between EURECOM, IDIAP and LIMSI. The research will rely on previous knowledge and softwares developed at LIMSI. Reproducible research is a cornerstone of the project. Hence a strong involvement in data collection and open source libraries are expected.

The ideal candidate should hold a Master degree in computer science, electrical engineering or related fields. She or he should have a background in statistics or applied mathematics, optimization, linear algebra and signal processing. The applicant should also have strong programming skills and be familiar with Python, various scripting languages and with the Linux environment. Knowledge in speech processing and machine learning is an asset.

Starting date is as early as possible and no later than October 2016.

LIMSI is a CNRS laboratory with 250 people and 120 permanent members. The Spoken Language Processing group involved in the project is composed of 41 people including 17 permanent members. The group is internationally recognized for its work on spoken language processing, and in particular for its development on automatic speech recognition. The research carried out in the Spoken Language Processing Group aims at understanding the speech communication processes and developing models for use in automatic speech processing. This research area is inherently multidisciplinary, Different topics are addressed among them speech recognition, speaker recognition, corpus linguistics, error analysis, spoken language dialogue, question-answering in spoken data, multimodal indexation of audio and video documents, and machine translation of both spoken and written language.

Contact : Hervé Bredin (bredin@limsi.fr) and Claude Barras (barras@limsi.fr)