The new Google Speech solution is the perfect example on why Google had to do th...

taeric · on April 21, 2018

This somewhat blows my mind. Yes, it is impressive. However, the work that Nuance and similar companies used to do are still competitive, just not getting near the money and exposure.

I remember over a decade ago, they even had mood analysis they could apply to listening to people. Far from new. Is it truly more effective or efficient nowadays? Or just getting marketed by companies you've heard of?

jacksmith21006 · on April 22, 2018

"Nuance and similar companies used to do are still competitive"

Surprised. Curious if you can compare the inference per joules of Google 1st gen TPUs compared? Google shared a paper and the numbers are pretty impressive and was not aware of anyone else close to the gen 1 TPUs?

Here is the paper that you can use for the TPU side. Love so see someone else in the ball park? We really do not want just one company but competition.

https://arxiv.org/ftp/arxiv/papers/1704/1704.04760.pdf

taeric · on April 22, 2018

This seems to be comparing them on their own terms. In more curious on features. Dragon naturally speaking and some other products have been really impressive for years now. Far beyond what my phone is capable of.

Not to say that the likes of the echo and others aren't impressive. Just that the speech recognition is the least of those products. Fully transcribed voice mail was available for years with Google voice (even before it was Google voice), yet that seems to happen less now than it did when I first for the product.

So what changed? And why?

jacksmith21006 · on April 23, 2018

What? You are comparing processing a NN. How is that comparing on "own terms"?

taeric · on April 23, 2018

Did the old methods use neutral networks? I wouldn't be surprised if they did, but I would be surprised if they were as deep of networks as what people use today.

That is, I am interested in comparing them on speed of transcription, speech synthesises, error rates, etc. Not on speed of network execution.

jacksmith21006 · on April 24, 2018

No the old method did NOT use NN. I hope Google writes a paper and shares more details.

It is hard to believe they are able to do 16k samples a second through a NN even with the TPUs.

So be curious to see if reduced and how much?

If they really do have the ability to do 16k a second at scale that opens the door for all kinds of other applications.

sanxiyn · on April 22, 2018

It is truly better. Objective metrics (such as word error rate) don't lie. You can argue whether it makes sense to use, say, 100x compute to get 2x less error, but that's a different argument; I don't think anyone is really disputing improved quality.

taeric · on April 22, 2018

Do you have a good comparison point? And not, hopefully, comparing to what they could do a decade ago. I'm assuming they didn't sit still. Did they?

I question whether it is just 100x compute. Feels like more, since naturally speaking and friends didn't hog the machine. Again, over a full decade ago.

More, the resources that Google has to throw at training are ridiculous. Well over 100x what was used to build the old models.

None of this is to say we should pack up and go back to a decade ago. I just worry that we do the opposite; where we ignore progress that was made a decade ago in favor of the new tricks alone.

jacksmith21006 · on April 22, 2018

The thing is it is not simply the training but the inference aspect would have require an incredible amount of compute compared to the old way of doing it.

Hope Google will do a paper like they did with the Gen 1 TPUs. Would love to see the difference in terms of joules per word spoke.

jacksmith21006 · on April 23, 2018

Speech synthesis not recognition.

taeric · on April 23, 2018

Yeah, I noticed we were skirting between those topics. I think mostly the points still stand. On both sides. :)