Someone asked me about voice recognition the other day, so I thought it sounded like a fun little project to master. Here’s my go at it.
Unfortunately, I didn’t have much luck with it. Please post a comment if you get more working than I did.
Which Package to Install
I did a little research, and found that Wikipedia has a nice list of open source speech recognition programs. While it’s not huge, it was a good place to start. I chose Julius because it looked the most promising.
Installing Julius
Since Julius is in the repositories, installing it was easy! I just installed it from the Software Center.
I went a step further and installed the voxforge accoustic files from the “More Info” screen.
From what I can tell, there is no gui for julius (although, the project Simon might be a frontend for it). You should find it installed on the command-line though:
$ which julius /usr/bin/julius $ julius -help
Running Julius
Looking at the options, my first attempt was:
$ julius -input mic ERROR: m_chkparam: you should specify at least one LM to run Julius!
The next thing I found was the VoxForge quickstart. I downloaded the tarball, and extracted it:
$tar -xzvf julius-3.5.2-quickstart-linux.tgz $cd julius-3.5.2-quickstart-linux/ $ julius -input mic -C julian.jconf
That was closer, but it gave me this message at the end of all the output:
------ ### read waveform input Stat: adin_oss: device name = /dev/dsp (application default) Error: adin_oss: failed to open /dev/dsp failed to begin input stream
Adding padsp in front of the command fixed that problem:
$ padsp julius -input mic -C julian.jconf
I still got warnings though…
### read waveform input Stat: adin_oss: device name = /dev/dsp (application default) Stat: adin_oss: sampling rate = 16000Hz Stat: adin_oss: going to set latency to 50 msec Stat: adin_oss: audio I/O Latency = 32 msec (fragment size = 512 samples) STAT: AD-in thread created <<< please speak >>>Warning: adin_oss: no data fragment after 300 msec? Warning: adin_oss: no data fragment after 300 msec? Warning: adin_oss: no data fragment after 300 msec?
If you open the Sound Settings, the warnings go away. I thought was kind of flakey, but it worked. Unfortunately, the output was a little cryptic, and didn’t give me the feedback that I needed. This is what I get when I said, “Hello”:
pass1_best: <s> DIAL EIGHT pass1_best_wordseq: 0 3 5 pass1_best_phonemeseq: sil | d ay ax l | ey t pass1_best_score: -3177.784424 ### Recognition: 2nd pass (RL heuristic best-first) STAT: 00 _default: 13 generated, 13 pushed, 5 nodes popped in 109 sentence1: <s> DIAL OH </s> wseq1: 0 3 5 1 phseq1: sil | d ay ax l | ow | sil cmscore1: 1.000 1.000 0.997 1.000 score1: -3393.694580
Running from a Recording
I probably could have used audacity much more easily, but since I was already on the command line, I decided to keep it there with the arecord program. I used this line to record:
$ arecord -r 16000 > test.wav
I played it back and it sounded kind of rough, but we’ll try it —
$ mplayer test.wav
Next, I ran it through julius:
$ ls test.wav > test.txt $ julius -input rawfile -filelist test.txt -C julian.jconf
Unfortunately, mplayer could play the file, but julius could not open it for some reason.
### read waveform input Error: adin_file: bytes per second != 32000 (16000) Error: adin_file: error in parsing wav header at test.wav Error: adin_file: failed to read speech data: "test.wav" 0 files processed
So, I found an example that used sox to convert it. I had to install sox with apt-get …
sudo apt-get install sox
Then, I converted the file and ran it like this:
$ sox test.wav -r 16000 -b 32 -c 1 test.s32 $ ls test.s32 > test.txt $ julius -input rawfile -filelist test.txt -C julian.jconf
Still, this is the only output that I got:
### Recognition: 1st pass (LR beam) ...........................................................................................................................pass1_best: <s> pass1_best_wordseq: 0 pass1_best_phonemeseq: sil pass1_best_score: -2712.263916 ### Recognition: 2nd pass (RL heuristic best-first) WARNING: IW-triphone for word head "l-ow+t" not found, fallback to pseudo {ow+t} WARNING: IW-triphone for word head "ow-ow+t" not found, fallback to pseudo {ow+t} WARNING: IW-triphone for word head "t-ow+t" not found, fallback to pseudo {ow+t} WARNING: IW-triphone for word head "uw-ow+t" not found, fallback to pseudo {ow+t} WARNING: 00 _default: hypothesis stack exhausted, terminate search now STAT: 00 _default: 0 sentences have been found WARNING: 00 _default: got no candidates, search failed STAT: 00 _default: 147 generated, 147 pushed, 147 nodes popped in 123 <search failed> ------ ### read waveform input 1 files processed
I used audacity to cleanup the file. The Noise Removal improved it somewhat, but it still wasn’t good quality. Here’s the output after that:
### read waveform input Stat: adin_file: input speechfile: test.wav STAT: 30000 samples (1.88 sec.) STAT: ### speech analysis (waveform -> MFCC) ### Recognition: 1st pass (LR beam) ..........................................................................................................................................................................................pass1_best: <s> DIAL OH </s> pass1_best_wordseq: 0 3 5 1 pass1_best_phonemeseq: sil | d ay ax l | ow | sil pass1_best_score: -5237.150391 ### Recognition: 2nd pass (RL heuristic best-first) STAT: 00 _default: 27 generated, 27 pushed, 5 nodes popped in 186 sentence1: <s> DIAL OH </s> wseq1: 0 3 5 1 phseq1: sil | d ay ax l | ow | sil cmscore1: 1.000 0.978 0.987 1.000 score1: -5225.757324 ------ ### read waveform input 1 files processed
I also tried creating a file from scratch in audacity, and I still couldn’t get it:
### read waveform input Stat: adin_file: input speechfile: test.wav STAT: 21176 samples (1.32 sec.) STAT: ### speech analysis (waveform -> MFCC) ### Recognition: 1st pass (LR beam) ..................................................................................................................................pass1_best: <s> DIAL OH pass1_best_wordseq: 0 3 5 pass1_best_phonemeseq: sil | d ay ax l | ow pass1_best_score: -3417.226318 ### Recognition: 2nd pass (RL heuristic best-first) STAT: 00 _default: 23 generated, 23 pushed, 5 nodes popped in 130 sentence1: <s> DIAL OH </s> wseq1: 0 3 5 1 phseq1: sil | d ay ax l | ow | sil cmscore1: 1.000 0.911 1.000 1.000 score1: -3453.692871 ------ ### read waveform input 1 files processed
Running on YouTube Videos
My next task that I wanted to attempt was to try to capture something on a good recording. So, let’s find a good YouTube video to run through julius.
I tried clive, but it failed for some reason:
$sudo apt-get install clive $ clive -cnrf best http://www.youtube.com/watch?v=dePLd9HAYjQ fetch http://www.youtube.com/watch?v=dePLd9HAYjQ ...done. error: no match: `(?-xism:url_encoded_fmt_stream_map=(.*?)&)'
So, I went back to my tried and true Video Downloader Firefox extension. Here is the first video that I tried:
For God So Loved The World (song and hymn history)
I converted the flv file to a wav like this:
ffmpeg -i youtube.flv -vn -acodec pcm_s16le -ar 16000 -ac 1 -f wav test.wav
And, I ran it through Julius like this:
$ ls test.wav > test.txt $ julius -input rawfile -filelist test.txt -C julian.jconf
The end result was a segmentation fault!
I tried another one: Psalm 119 King James Holy Bible
This one also have me a segmentation fault.
Another: Job 41 (King James Holy Bible)
....trace_backptr: sentence length exceeded ( > 150)
VoxForge Example
If you want to play with the VoxForge addon package, you can look at the readme file that should be located here:
/usr/share/doc/julius-voxforge/examples/README
$ dpkg -L julius-voxforge /. /usr /usr/share /usr/share/doc /usr/share/doc/julius-voxforge /usr/share/doc/julius-voxforge/copyright /usr/share/doc/julius-voxforge/examples /usr/share/doc/julius-voxforge/examples/controlapp /usr/share/doc/julius-voxforge/examples/controlapp/mediaplayer.grammar /usr/share/doc/julius-voxforge/examples/controlapp/command.py /usr/share/doc/julius-voxforge/examples/controlapp/mediaplayer.voca /usr/share/doc/julius-voxforge/examples/controlapp/README.controlapp /usr/share/doc/julius-voxforge/examples/README /usr/share/doc/julius-voxforge/examples/sample.grammar /usr/share/doc/julius-voxforge/examples/sample.voca /usr/share/doc/julius-voxforge/examples/julian.jconf.gz /usr/share/doc/julius-voxforge/dict.gz /usr/share/doc/julius-voxforge/changelog.Debian.gz /usr/share/julius-voxforge /usr/share/julius-voxforge/acoustic /usr/share/julius-voxforge/acoustic/hmmdefs /usr/share/julius-voxforge/acoustic/macros /usr/share/julius-voxforge/acoustic/tiedlist
Resources
- VoxForge Quickstart
- Julius Forums: how do i run the julius executable file in the julius folder
- “The Julius Book” Version 4.1.5 PDF Format
- VoxForge: Running Julian Live
- LaunchPad Answers: How do I install Simon
- VoxForge: Running Julius on 64-bit Ubuntu 10.04
- StackOverflow: ffmpeg 0.5 flv to wav conversion creates wav files that other programs won’t open
- clive