Google AI recently announced Translatotron, an experimental new direct speech-to-speech translation tool that Google says is capable of “faster inference speed, naturally avoiding compounding errors between recognition and translation… [and retaining] the voice of the original speaker after translation…”
Google AI Translatotron “is based on a sequence-to-sequence network which takes source spectrograms as input and generates spectrograms of the translated content in the target language,” the development team says.
How Google AI Translatotron Works: Simplified
Here is a visual recreation — taken from Google AI’s announcement — of how the technology works:
Preserving the Sound of the Original Speaker
“By incorporating a speaker encoder network, Translatotron is also able to retain the original speaker’s vocal characteristics in the translated speech, which makes the translated speech sound more natural and less jarring… The speaker encoder is pretrained on the speaker verification task, learning to encode speaker characteristics from a short example utterance. Conditioning the spectrogram decoder on this encoding makes it possible to synthesize speech with similar speaker characteristics, even though the content is in a different language.”
What does this mean in practice? Let’s listen to find out.
The audio clips below, taken from the Google AI announcement, show the Google AI Translatotron transferring the original Spanish speaker’s voice into a translation in English.
Reference Translation in English:
Google AI Translatotron Translation in Original Speaker’s Voice:
What this Means for Collaboration
Google AI claims the Translatotron is possibly the first end-to-end model direct speech-to-speech translation tool that can directly translate speech from language into similar-sounding speech in a different language.
If the technology is further developed, this could effectively break down the language barrier in a more instantaneous, seamless way for teams working across cultural or international borders. It could also allow for quicker client relations and a reduced translation service cost.