Thread by @nimirea_, as a linguist™ i can confirm Amazon Halo's Voice Tone analysis is [...]

as a linguist™ i can confirm Amazon Halo's Voice Tone analysis is nonsense and awful. i can't believe it's still a part of the product, let alone *advertised in its own commercial*

this "feature" records what you say and labels your utterances with an emotion using

AI

... a.k.a., sentiment analysis with a neural network that uses acoustic slices as input, and maybe also some pace info through ASR, according to a skim of this blog: https://www.myhealthyapple.com/amazon-halo-voice-tone-sentiment-analysis/

The Science behind Amazon Halo’s Voice Tone analysis and plans for Blood Glucose and Blood Pressure...

Understand the science behind Amazon Halo's tone & sentiment analysis feature and learn how the company is planning to include Blood pressure and Blood glucose sensors in the future.

https://www.myhealthyapple.com/amazon-halo-voice-tone-sentiment-analysis/

"Well, Does It Work?" no, of course not. it literally can't.

we don't have well-developed theories of how acoustic details map onto emotions. so a system like this will try to abstract over a lot of data (speakers, contexts) that it's been trained on and find "patterns" in it

this means that the performance of the system will depend on the quality of the data... which in this case, is almost certainly bad.

there are two ways that the labelled training data here was probably generated, and both are terrible.

1. recordings labelled with emotions by human annotators. this would mean the training data contains all the regular sexist/racist biases that humans have when judging each others' voices. also... think of all the times we get this wrong, especially for people we don't know!

2. the training set is generated using sentiment-analyzed text of an utterance somehow, which means that you've just built a wonky speech recognition system with a small number of "emotion" outputs

(leaving aside the issue of whether the training set is representative of the users, which would be even worse if they did something weirder like "human annotators annotating their own speech" or "actors asked to speak the same text with different emotions each time")

but EVEN IF it did "work", it wouldn't be a good idea for a product—who wants to be nagged to "smile more" in their everyday life?

as usual, the badness is gendered: https://www.washingtonpost.com/technology/2020/12/10/amazon-halo-band-review/ (if the news outlet THAT YOU OWN pans the product... maybe rethink??)

Review | Amazon’s new health band is the most invasive tech we’ve ever tested

The Halo Band asks you to strip down and strap on a microphone so that it can make 3-D scans of your body fat and monitor your tone of voice. After all that, it still isn’t very helpful.

https://www.washingtonpost.com/technology/2020/12/10/amazon-halo-band-review/

anyway, the takeaway for me is: "we need to develop better linguistic theories, or AI will get ahead of us... badly." as this example shows, we're now in a world where market pressures can make that happen.

Latest Threads Unrolled: