Among the many innovative audio editing techniques that came to life in recent years is – that’s right – text audio editing. A start-up called Descript has the technology to generate a text transcription from an audio file and let you make changes to it by editing text like you would do with a document in a word processor.
The way it works is the app uses machine learning to match audio samples with their text versions. A time code is assigned to each word so changes to the text are synchronized with the audio file. At the moment, the app is suitable for podcasters, journalists, and musicians. The idea is to give them the same ease of editing audio like they would have with text editing.
Descript founder Andrew Mason acknowledges the product is currently meant for simple tasks like transcribing and editing speech. But if technology like this already exists and gets refined further, we imagine audio software companies will be very interested in unveiling their own products based on it. Audio developers like iZotope and Apple’s Logic team are already including artificial intelligence and machine learning in audio recognition algorithms that analyze sound and generate relevant content like presets or drum tracks. It’s one of the current trends and a way forward for studio technology.
With its unusual and experimental nature, Descript is a somewhat expensive service. The standard version will cost $20 a month, while the free version doesn’t feature the text-to-audio tool and is meant as a transcription service. It’s the price paid for its ongoing research and development. The app is still of limited use to musicians, but it’s not the implementation that really matters, it’s the technology that drives it and the possibilities it opens. And text-to-audio editing is a genuinely interesting proposition…