Alexa, How do I make you sound less robotic?
Drawn To Digital is brainstorming ways to improve Alexa's voice abilities. Speech Synthesis Markup Language (SSML) to the rescue!
A common complaint directed at Alexa developers is Alexa's voice.

But is there really much we can do without replacing everything she says with voiceovers?
Don't despair fellow developers - help is on the way!
Despite her best efforts, Alexa doesn't sound human. The process of making a machine sound more like a human is a fascinating bit of machine learning, but the technology still has a long way to go.
Alexa doesn't sound human.
So what is a low budget developer of Alexa apps to do if we're not gifted with a "voice for radio"?
The Alexa Skill Kit offers but one option for customizing Alexa's voice:
Speech Synthesis Markup Language (SSML). See Alexa Blog's post about it here.
For the purpose of the Deep Breath Alexa skill, I wanted to see what could be done to make Alexa's voice more "relaxing". Since the skill relies on Alexa's voice to lead a user through a deep breathing exercise with grounding, Alexa's default voice is just too fast and authoratative.
Can we make Alexa's voice more "relaxing"?
Let's explore some of the options provided in SSML:
<amazon:effect name="whispered">
Amazon is kind enough to provide us with one alternate voice for Alexa.
<speak>
I want to tell you a secret.
<amazon:effect name="whispered">I am not a real human.
</amazon:effect>.
Can you believe it?
</speak>
This example comes straight out of the SSML documentation Unfortunately, Yes Alexa, we do believe that you're not a real human. I would have loved to discover a <amazon:effect name="relaxing"> but no such luck.
Having Alexa whisper a secret to you is clever, but for our purposes is not really that helpful. A whispered voice certainly isn't relaxing- it makes her sound like the creepiest therapist you've ever had. I would love to see Amazon release more <amazon:effect> in the future! Let's try something else...
<break>
Take some breaks along the way.
<speak>
There is a three second pause here <break time="3s"/>
then the speech continues.
</speak>
Inserting pauses into Alexa's speech can help slow down the pace or change the rhythm at which Alexa delivers your content. Moments of silence can help bring peace into a relaxation session. You are limited to up to <break time="10s">
(or "10000ms"), although you might be able to chain them together for something longer.
<prosody>
We can control the volume, pitch, and rate of speech. Perfect!
A "relaxing" voice is very subjective, but I think most would tend to agree that a soft and slow voice would be more relaxing than Alexa's norm.
Putting it all together
<speak>
Which sounds more relaxing?
This, <break time="1s">
<prosody volume="soft" rate="slow">
or this?
</prosody>
</speak>
You be the judge! Check out Alexa's relaxing voice in the Deep Breath Alexa skill, and please send me some feedback on how you think the effect turned out!

Enough from me! What have you all done to customize Alexa's voice? Have you had any success in making her sound more like a human being and less like a robot or are voiceovers the only way to go? Let me know on twitter! @DrawnToDigital