Learning AI for Audio

There are plenty of resources for machine learning in the image space. The recent launch of Dall-E, takes prompts and turns those prompts into images. Lots of AI is about analyzing images, think about Tesla and their self driving cars, or security cameras and their identification algorithms. Just look at this:

But you hear less about AI in audio tech. I’m not sure why yet, but as we build Boomering from a consumer messaging platform to something bigger, we will have to layer in artificial intelligence to improve the product. Audio is murky area right now for AI. From what I can tell, the main applications in audio are:

  1. Classification
  2. Recognition
  3. Verification
  4. Denoising
  5. Audio upsampling

I can see outright that the first four would have tangible benefits for a social audio platform. I am not so sure of audio upsampling, but I don’t know much about it so withhold my judgement.

How could a social audio platform benefit from classification? Well, it would make messages searchable, which addresses a limitation of audio messages. If my wife sends me a message to pick up 5 different things at the grocery store, it’s much easier for me to read it rather than listen to it. Humans can consume text much faster than audio. But if an audio message can be deconstructed and turned into text by classification, then perhaps an intelligent program would pull out the 5 different things and list them for me in text, or even create a smaller voice note that only lists those 5 things. A message that might contain a greeting, a few “ums” and “ahhs” would be shortened to just list 5 items. Well ins’t that just a text? Why wouldn’t she send a text? There are situational reasons, for example maybe she was holding the baby or driving. But yes, I understand audio messages are not a panacea for communication. Communication is an astoundingly wide world. Audio is one important part of it. And part of what we are building will need to bridge or build solutions for parts of communication that audio isn’t well catered to.

This has been a bit of a rambling post, but what I’m writing about is what I am learning about. That process will be meandering, and then direct as we apply it to our product. I’ll write more about artificial intelligence another time. Now I must publish and get back to building.

Leave a Reply