Speech to Speech Translation


  • Natural Language Processing


Mentors :

  • Swapnoneel Kayal

Mentees :

  • 3-4


Speech to Speech Translation involves translating the speech from one language to speech in some other language. The most naive way to go about this would be to use a combination of automatic speech recognition, text-to-text machine translation followed by text-to-speech synthesis models. However, the intention of this project would be to come up with a model that does not rely on intermediate text representation since this offers advantages like improved inference speed, which in turn avoids compounding errors between recognition and translation. Here is a blog about Translatotron → https://ai.googleblog.com/2019/05/introducing-translatotron-end-to-end.html Here is a reading material for you to gain some context → https://arxiv.org/pdf/2107.05604.pdf](https://arxiv.org/pdf/2107.05604.pdf

Prerequisite:Hard prerequisites for the Mentees : - CS101 Fundamentals - Python Programming Proficiency (Beginner to Intermediate) - Must know how to work around Terminal (or Command Prompt) and Github
Soft prerequisites for the Mentees : - Having worked with PyTorch, Keras, TensorFlow or any other equivalent ML library - Patience to go through multiple research papers

Tentative Timeline :

Week Work
Week 1 Brush up on / Python, basic terminal commands as well as go through an interactive gitimmersion tutorial / git crash course video.
Week 2 Perform extensive literature review as well as go through some YT videos for a better understanding of the problem in hand.
Week 3 Acquisition of data → Play around with the LibriSpeech corpus and think about ways of generating dataset for languages for which a corpus is unavailable.
Week 4 Coding → Implementation of the naive Speech to Speech Translation engine.
Week 5 Coding → Implementation of a direct Speech to Speech Translation engine.
Week 6 Buffer Week
Week 7 Comparison and analysis of the 2 models/engines devised. If time permits, we would also be experimenting with the engines in order to better an efficient real-time translation pipeline.
Week 8 Wrapping up → This also includes making a good report and (video) presentation since it helps me and other people value your hard work even more