as a linguist™ i can confirm Amazon Halo's Voice Tone analysis is nonsense and awful. i can't believe it's still a part of the product, let alone *advertised in its own commercial*
"Well, Does It Work?" no, of course not. it literally can't.

we don't have well-developed theories of how acoustic details map onto emotions. so a system like this will try to abstract over a lot of data (speakers, contexts) that it's been trained on and find "patterns" in it
this means that the performance of the system will depend on the quality of the data... which in this case, is almost certainly bad.

there are two ways that the labelled training data here was probably generated, and both are terrible.
1. recordings labelled with emotions by human annotators. this would mean the training data contains all the regular sexist/racist biases that humans have when judging each others' voices. also... think of all the times we get this wrong, especially for people we don't know!
2. the training set is generated using sentiment-analyzed text of an utterance somehow, which means that you've just built a wonky speech recognition system with a small number of "emotion" outputs
(leaving aside the issue of whether the training set is representative of the users, which would be even worse if they did something weirder like "human annotators annotating their own speech" or "actors asked to speak the same text with different emotions each time")
anyway, the takeaway for me is: "we need to develop better linguistic theories, or AI will get ahead of us... badly." as this example shows, we're now in a world where market pressures can make that happen.
You can follow @nimirea_.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.