as a linguist™ i can confirm Amazon Halo's Voice Tone analysis is nonsense and awful. i can't believe it's still a part of the product, let alone *advertised in its own commercial*
this "feature" records what you say and labels your utterances with an emotion using
AI
... a.k.a., sentiment analysis with a neural network that uses acoustic slices as input, and maybe also some pace info through ASR, according to a skim of this blog: https://www.myhealthyapple.com/amazon-halo-voice-tone-sentiment-analysis/


"Well, Does It Work?" no, of course not. it literally can't.
we don't have well-developed theories of how acoustic details map onto emotions. so a system like this will try to abstract over a lot of data (speakers, contexts) that it's been trained on and find "patterns" in it
we don't have well-developed theories of how acoustic details map onto emotions. so a system like this will try to abstract over a lot of data (speakers, contexts) that it's been trained on and find "patterns" in it
this means that the performance of the system will depend on the quality of the data... which in this case, is almost certainly bad.
there are two ways that the labelled training data here was probably generated, and both are terrible.
there are two ways that the labelled training data here was probably generated, and both are terrible.
1. recordings labelled with emotions by human annotators. this would mean the training data contains all the regular sexist/racist biases that humans have when judging each others' voices. also... think of all the times we get this wrong, especially for people we don't know!
2. the training set is generated using sentiment-analyzed text of an utterance somehow, which means that you've just built a wonky speech recognition system with a small number of "emotion" outputs
(leaving aside the issue of whether the training set is representative of the users, which would be even worse if they did something weirder like "human annotators annotating their own speech" or "actors asked to speak the same text with different emotions each time")
but EVEN IF it did "work", it wouldn't be a good idea for a product—who wants to be nagged to "smile more" in their everyday life?
as usual, the badness is gendered: https://www.washingtonpost.com/technology/2020/12/10/amazon-halo-band-review/ (if the news outlet THAT YOU OWN pans the product... maybe rethink??)
as usual, the badness is gendered: https://www.washingtonpost.com/technology/2020/12/10/amazon-halo-band-review/ (if the news outlet THAT YOU OWN pans the product... maybe rethink??)
anyway, the takeaway for me is: "we need to develop better linguistic theories, or AI will get ahead of us... badly." as this example shows, we're now in a world where market pressures can make that happen.