Category: Speech Recognition

Voice Recognition in Ubuntu

Someone asked me about voice recognition the other day, so I thought it sounded like a fun little project to master.  Here’s my go at it.

Unfortunately, I didn’t have much luck with it.  Please post a comment if you get more working than I did.

Which Package to Install

I did a little research, and found that Wikipedia has a nice list of open source speech recognition programs.  While it’s not huge, it was a good place to start.  I chose Julius because it looked the most promising.

Installing Julius

Since Julius is in the repositories, installing it was easy!  I just installed it from the Software Center.

Installing Julius with the Software Center

I went a step further and installed the voxforge accoustic files from the “More Info” screen.

From what I can tell, there is no gui for julius (although, the project Simon might be a frontend for it).  You should find it installed on the command-line though:

$ which julius<br />
/usr/bin/julius<br />
$ julius -help

Running Julius

Looking at the options, my first attempt was:

$ julius -input mic<br />
ERROR: m_chkparam: you should specify at least one LM to run Julius!

The next thing I found was the VoxForge quickstart.  I downloaded the tarball, and extracted it:

$tar -xzvf julius-3.5.2-quickstart-linux.tgz<br />
$cd julius-3.5.2-quickstart-linux/<br />
$ julius -input mic -C julian.jconf

That was closer, but it gave me this message at the end of all the output:

</p>
<p>------<br />
### read waveform input<br />
Stat: adin_oss: device name = /dev/dsp (application default)<br />
Error: adin_oss: failed to open /dev/dsp<br />
failed to begin input stream

Adding padsp in front of the command fixed that problem:

$ padsp julius -input mic -C julian.jconf

I still got warnings though…

</p>
<p>### read waveform input<br />
Stat: adin_oss: device name = /dev/dsp (application default)<br />
Stat: adin_oss: sampling rate = 16000Hz<br />
Stat: adin_oss: going to set latency to 50 msec<br />
Stat: adin_oss: audio I/O Latency = 32 msec (fragment size = 512 samples)<br />
STAT: AD-in thread created<br />
&lt;&lt;&lt; please speak &gt;&gt;&gt;Warning: adin_oss: no data fragment after 300 msec?<br />
Warning: adin_oss: no data fragment after 300 msec?<br />
Warning: adin_oss: no data fragment after 300 msec?

If you open the Sound Settings, the warnings go away.  I thought was kind of flakey, but it worked.  Unfortunately, the output was a little cryptic, and didn’t give me the feedback that I needed.  This is what I get when I said, “Hello”:

</p>
<p>pass1_best: &lt;s&gt; DIAL EIGHT<br />
pass1_best_wordseq: 0 3 5<br />
pass1_best_phonemeseq: sil | d ay ax l | ey t<br />
pass1_best_score: -3177.784424<br />
### Recognition: 2nd pass (RL heuristic best-first)<br />
STAT: 00 _default: 13 generated, 13 pushed, 5 nodes popped in 109<br />
sentence1: &lt;s&gt; DIAL OH &lt;/s&gt;<br />
wseq1: 0 3 5 1<br />
phseq1: sil | d ay ax l | ow | sil<br />
cmscore1: 1.000 1.000 0.997 1.000<br />
score1: -3393.694580</p>
<p>

Running from a Recording

I probably could have used audacity much more easily, but since I was already on the command line, I decided to keep it there with the arecord program.  I used this line to record:

$ arecord -r 16000 > test.wav

I played it back and it sounded kind of rough, but we’ll try it —

$ mplayer test.wav

Next, I ran it through julius:

</p>
<p>$ ls test.wav &gt; test.txt<br />
$ julius -input rawfile -filelist test.txt -C julian.jconf

Unfortunately, mplayer could play the file, but julius could not open it for some reason.

</p>
<p>### read waveform input<br />
Error: adin_file: bytes per second != 32000 (16000)<br />
Error: adin_file: error in parsing wav header at test.wav<br />
Error: adin_file: failed to read speech data: &quot;test.wav&quot;<br />
0 files processed

So, I found an example that used sox to convert it.  I had to install sox with apt-get …

sudo apt-get install sox

Then, I converted the file and ran it like this:

$ sox test.wav -r 16000 -b 32 -c 1 test.s32<br />
$ ls test.s32 &gt; test.txt<br />
$ julius -input rawfile -filelist test.txt -C julian.jconf

Still, this is the only output that I got:

</p>
<p>### Recognition: 1st pass (LR beam)<br />
...........................................................................................................................pass1_best: &lt;s&gt;<br />
pass1_best_wordseq: 0<br />
pass1_best_phonemeseq: sil<br />
pass1_best_score: -2712.263916<br />
### Recognition: 2nd pass (RL heuristic best-first)<br />
WARNING: IW-triphone for word head &quot;l-ow+t&quot; not found, fallback to pseudo {ow+t}<br />
WARNING: IW-triphone for word head &quot;ow-ow+t&quot; not found, fallback to pseudo {ow+t}<br />
WARNING: IW-triphone for word head &quot;t-ow+t&quot; not found, fallback to pseudo {ow+t}<br />
WARNING: IW-triphone for word head &quot;uw-ow+t&quot; not found, fallback to pseudo {ow+t}<br />
WARNING: 00 _default: hypothesis stack exhausted, terminate search now<br />
STAT: 00 _default: 0 sentences have been found<br />
WARNING: 00 _default: got no candidates, search failed<br />
STAT: 00 _default: 147 generated, 147 pushed, 147 nodes popped in 123<br />
&lt;search failed&gt;<br />
------<br />
### read waveform input<br />
1 files processed

I used audacity to cleanup the file.  The Noise Removal improved it somewhat, but it still wasn’t good quality.  Here’s the output after that:

</p>
<p>### read waveform input<br />
Stat: adin_file: input speechfile: test.wav<br />
STAT: 30000 samples (1.88 sec.)<br />
STAT: ### speech analysis (waveform -&gt; MFCC)<br />
### Recognition: 1st pass (LR beam)<br />
..........................................................................................................................................................................................pass1_best: &lt;s&gt; DIAL OH &lt;/s&gt;<br />
pass1_best_wordseq: 0 3 5 1<br />
pass1_best_phonemeseq: sil | d ay ax l | ow | sil<br />
pass1_best_score: -5237.150391<br />
### Recognition: 2nd pass (RL heuristic best-first)<br />
STAT: 00 _default: 27 generated, 27 pushed, 5 nodes popped in 186<br />
sentence1: &lt;s&gt; DIAL OH &lt;/s&gt;<br />
wseq1: 0 3 5 1<br />
phseq1: sil | d ay ax l | ow | sil<br />
cmscore1: 1.000 0.978 0.987 1.000<br />
score1: -5225.757324<br />
------<br />
### read waveform input<br />
1 files processed

I also tried creating a file from scratch in audacity, and I still couldn’t get it:

</p>
<p>### read waveform input<br />
Stat: adin_file: input speechfile: test.wav<br />
STAT: 21176 samples (1.32 sec.)<br />
STAT: ### speech analysis (waveform -&gt; MFCC)<br />
### Recognition: 1st pass (LR beam)<br />
..................................................................................................................................pass1_best: &lt;s&gt; DIAL OH<br />
pass1_best_wordseq: 0 3 5<br />
pass1_best_phonemeseq: sil | d ay ax l | ow<br />
pass1_best_score: -3417.226318<br />
### Recognition: 2nd pass (RL heuristic best-first)<br />
STAT: 00 _default: 23 generated, 23 pushed, 5 nodes popped in 130<br />
sentence1: &lt;s&gt; DIAL OH &lt;/s&gt;<br />
wseq1: 0 3 5 1<br />
phseq1: sil | d ay ax l | ow | sil<br />
cmscore1: 1.000 0.911 1.000 1.000<br />
score1: -3453.692871<br />
------<br />
### read waveform input<br />
1 files processed

Running on YouTube Videos

My next task that I wanted to attempt was to try to capture something on a good recording.  So, let’s find a good YouTube video to run through julius.

I tried clive, but it failed for some reason:

$sudo apt-get install clive</p>
<p>$ clive -cnrf best http://www.youtube.com/watch?v=dePLd9HAYjQ<br />
fetch http://www.youtube.com/watch?v=dePLd9HAYjQ ...done.<br />
error: no match: `(?-xism:url_encoded_fmt_stream_map=(.*?)&amp;)'

So, I went back to my tried and true Video Downloader Firefox extension.  Here is the first video that I tried:

For God So Loved The World (song and hymn history) 

I converted the flv file to a wav like this:

ffmpeg -i youtube.flv -vn -acodec pcm_s16le -ar 16000 -ac 1 -f wav test.wav

And, I ran it through Julius like this:

$ ls test.wav &gt; test.txt<br />
$ julius -input rawfile -filelist test.txt -C julian.jconf

The end result was a segmentation fault!

I tried another one: Psalm 119 King James Holy Bible 

This one also have me a segmentation fault.

Another: Job 41 (King James Holy Bible) 

This one gave me this message:
....trace_backptr: sentence length exceeded ( > 150)

VoxForge Example

If you want to play with the VoxForge addon package, you can look at the readme file that should be located here:

/usr/share/doc/julius-voxforge/examples/README

 Here are all the files installed with it:
$ dpkg -L julius-voxforge<br />
/.<br />
/usr<br />
/usr/share<br />
/usr/share/doc<br />
/usr/share/doc/julius-voxforge<br />
/usr/share/doc/julius-voxforge/copyright<br />
/usr/share/doc/julius-voxforge/examples<br />
/usr/share/doc/julius-voxforge/examples/controlapp<br />
/usr/share/doc/julius-voxforge/examples/controlapp/mediaplayer.grammar<br />
/usr/share/doc/julius-voxforge/examples/controlapp/command.py<br />
/usr/share/doc/julius-voxforge/examples/controlapp/mediaplayer.voca<br />
/usr/share/doc/julius-voxforge/examples/controlapp/README.controlapp<br />
/usr/share/doc/julius-voxforge/examples/README<br />
/usr/share/doc/julius-voxforge/examples/sample.grammar<br />
/usr/share/doc/julius-voxforge/examples/sample.voca<br />
/usr/share/doc/julius-voxforge/examples/julian.jconf.gz<br />
/usr/share/doc/julius-voxforge/dict.gz<br />
/usr/share/doc/julius-voxforge/changelog.Debian.gz<br />
/usr/share/julius-voxforge<br />
/usr/share/julius-voxforge/acoustic<br />
/usr/share/julius-voxforge/acoustic/hmmdefs<br />
/usr/share/julius-voxforge/acoustic/macros<br />
/usr/share/julius-voxforge/acoustic/tiedlist

Resources