Do you want to change the way the world interacts with computers? Do you want to be part of a team that pushes the Natural User Experience to the next level? Do you dream that one day
our world will be populated by robots that will help to do our jobs? Do you want to challenge yourself by innovating in an area that s new to Microsoft yet s an important strategic bet? Do you want to make Microsoft products not only accessible
but highly functional to all the users on the world? As both computational horsepower and storage capacity reach unprecedented levels
humans are getting closer and closer to that dream of the natural user interface. Each day we are stepping closer toward being able to interact with computers the same way we interact with another human being.
The Azure text to speech’s mission s to empower every person and every organization on the planet to have human like
diverse and delightful AI voices! The TTS platform (runtime
model and services) we built has been widely used by many Microsoft products and Azure customers in voice assistant
read aloud and accessibility scenarios. We are looking for a motivated
self-driven software development engineer / applied scientist to drive the development of neural text to speech language development for key speech customers
Responsibilities
Advance the state of the art of speech technology through end-to-end modelling.
Improve the speech synthesis quality and performance in terms of naturalness
expressiveness
accuracy for production.
Debug voice quality issues in new languages
Build high quality Neural TTS using low resource data and joint learning with ASR.
Collaborate with remote teams to deliver high quality products.
Qualifications
3+ years of experience in speech synthesis or speech recognition. (required)
PhD/MS Degree in speech synthesis
or equivalent experience (list what s equivalent). (required)
Experience in end-to-end speech modelling (transformer speech etc) (required)
Ability to write deep learning code and implement state of art paper ideas. (preferred)
Understanding of neural acoustic modelling or vocoder. (preferred)
Experience in speech recognition. (preferred)