At the Adobe annual MAX event in San Diego, the company revealed a couple of projects it’s working on. One was called Project VoCo and it’s probably the most awesome and, at the same time, potentially disastrous thing ever conceived.
It’s a simple enough concept that has the potential to save studios and engineers a huge amount of time and money. Project VoCo has the ability to edit and replace the spoken word. Like Photoshop gives you the ability to take some raw material and transform it into something else, so VoCo can re-arrange words, replace them or add to them completely realistically. This gives broadcasters the chance to edit out or correct recorded dialogue.
Developer Zeyu Jin demonstrated (see video below) taking a clip of speech and adding new words simply by typing them in a text box. The words then appeared in the audio file in exactly the same voice. Apparently you need about 20 minutes of recorded speech for the engine to accurately add new words.
So in the analogy with Photoshop where you can take an image and place it in a new location without having to reshoot or pay the model/photographer, with VoCo you can take a speech and change it without having the re-record the audio and so pay the actor/director etc. Awesome money and time saving technology for broadcast, podcasting, audio books and voice overs.
Trusting your ears
But wait a minute. With 20 minutes of recorded speech Project VoCo can make that person say…. anything. So you could take, say, Donald Trump and make him say how he eats children, or make Hilary Clinton say that she’s actually a robot from Uranus. And the voice would be believably accurate? This raises all sorts of ethical dilemmas. We already can’t really believe what we see because of Photoshop and now we won’t be able to believe anything anyone says (although is that actually new?).
One interesting point that Zeyu makes towards the end of the presentation is that while they are working on Project VoCo they are also working on how to watermark audio – to make the edits detectable in some form so that audio can be trusted. It will of course be unbreakable!
Ethical issues aside what interests me is the whole new market for dead celebrity sound packs. Do you want audio books read by Vincent Price? No problem. Bedtime stories read by Richard Burton? Just 99 dollars for the sound pack. You could plunder all recorded speech everywhere to generate SatNav voices and computer assistants with the tones of your favourite celebrity. It could also be useful for your own voice. You could potentially carry out phone conversations with someone by selecting from a library of phrases. The ultimate in lazy social interaction.
Where will it all end? Check out the video below to witness the potential of Project Voco. Information may become available on Adobe’s website.