MSc Internship - Machine learning for speaker diarization
Research internship - Machine learning / structured prediction for speaker diarization
LIMSI (http://www.limsi.fr) seeks qualified candidates for one 6-months research internship position in the field of automatic speaker recognition.
Provided successful progress, this internship may lead to a PhD position.
Broadly, the goal of an automatic speaker recognition system is to authenticate or to identify a person through speech signal. Speaker diarization is an unsupervised process that aims at identifying each speaker within an audio stream and determining the intervals during which each speaker is active.
The overall goal of the internship is to advance the state-of-the-art in speaker recognition and diarization. Specifically, the research will explore the use of structured prediction techniques for speaker diarization.
Conversations between several speakers are usually highly structured and speech turns of a given person are not uniformly distributed over time. Hence, knowing that someone is speaking at a particular time t tells us a lot about the probability that (s)he is also going to speak a few seconds later. However, state-of-the-art approaches seldom takes this intrinsic structure into account. The goal of this internship is to demonstrate that structured prediction techniques (such as graphical models or SVMstruct) can be applied to speaker diarization.
The ideal candidate should have a background in statistics or applied mathematics, optimization, linear algebra and signal processing. Knowledge in speech processing, machine learning and the Python programming language is an asset.
Starting date is as early as possible and no later than April 2016. Monthly internship stipend amounts to 546 €.
LIMSI is a CNRS laboratory with 250 people and 120 permanent members. The Spoken Language Processing group involved in the project is composed of 41 people including 17 permanent members. The group is internationally recognized for its work on spoken language processing, and in particular for its development on automatic speech recognition. The research carried out in the Spoken Language Processing Group aims at understanding the speech communication processes and developing models for use in automatic speech processing. This research area is inherently multidisciplinary, Different topics are addressed among them speech recognition, speaker recognition, corpus linguistics, error analysis, spoken language dialogue, question-answering in spoken data, multimodal indexation of audio and video documents, and machine translation of both spoken and written language.
Contact : Hervé Bredin (bredin__at__limsi.fr)