[clug] Speech-to-text on Ubuntu

Kathy Reid kathy at kathyreid.id.au
Fri Jan 7 00:18:20 UTC 2022


OK, that's super useful.


Some guidance here:

- PyAudio is for recording and playing audio; it has no in-built STT 
capabilities

- CMU Sphinx is no longer supported and is difficult to get running

- Kaldi is better supported, but again difficult to get running, and 
requires a bunch of setup scripts - it's not "sudo apt install kaldi".

- If you are comfortable in Python, DeepSpeech has pre-trained models, 
but is no longer supported.

- The new kid on the block is Coqui - pre-trained models available

- None of these provide an easy to use interface - you will need to have 
some sort of pipeline for recording audio, segmenting it into 5-15 
second chunks, and then running these through an STT engine.


More broadly, open source speech to text / automatic speech recognition 
is challenging at the moment - many projects are abandoned, and none 
provide a useful, helpful interface where you can just drop a recording 
and get a script back. You can expect maybe 90-92% word error rate from 
the above (they don't handle Australian accents well), so expect to 
spend a fair bit of time correcting transcripts that are generated. So 
setting this up will take some effort.


Best, Kathy


On 7/1/22 11:02 am, jhock at iinet.net.au wrote:
> I want to say many things and I want what I say be converted into text sentences. I will then edit it in a text editor or Libre Office. For example:
>
> "When fencing, it's best to use a forked straining yoke to prevent small indentations on the wire that would be caused by a mechanical, gripping fence strainer."
>
> I'm less likely to do the typing because it's more work for me. Hence the speech to text enquiry. :-)
>
> On 7 January 2022 8:41:43 am AEDT, Kathy Reid via linux <linux at lists.samba.org> wrote:
>> What's the use case - as different packages have different strengths and
>> weaknesses?
>>
>> Best, Kathy
>>
>> On 6/1/22 5:17 pm, jhock--- via linux wrote:
>>> Does anyone know any speech-to-text software that I can install onto Ubuntu 20.04 using 'sudo apt install' or similar commands?
>>>
>>> I've seen the python: python3-pyaudio . Is that the best to try?
>>>



More information about the linux mailing list