February 2012 – Linux Sagas

Someone asked me about voice recognition the other day, so I thought it sounded like a fun little project to master. Here’s my go at it.

Unfortunately, I didn’t have much luck with it. Please post a comment if you get more working than I did.

Which Package to Install

I did a little research, and found that Wikipedia has a nice list of open source speech recognition programs. While it’s not huge, it was a good place to start. I chose Julius because it looked the most promising.

Installing Julius

Since Julius is in the repositories, installing it was easy! I just installed it from the Software Center.

I went a step further and installed the voxforge accoustic files from the “More Info” screen.

From what I can tell, there is no gui for julius (although, the project Simon might be a frontend for it). You should find it installed on the command-line though:

$ which julius
/usr/bin/julius
$ julius -help

Running Julius

Looking at the options, my first attempt was:

$ julius -input mic
ERROR: m_chkparam: you should specify at least one LM to run Julius!

The next thing I found was the VoxForge quickstart. I downloaded the tarball, and extracted it:

$tar -xzvf julius-3.5.2-quickstart-linux.tgz
$cd julius-3.5.2-quickstart-linux/
$ julius -input mic -C julian.jconf

That was closer, but it gave me this message at the end of all the output:


------
### read waveform input
Stat: adin_oss: device name = /dev/dsp (application default)
Error: adin_oss: failed to open /dev/dsp
failed to begin input stream

Adding padsp in front of the command fixed that problem:

$ padsp julius -input mic -C julian.jconf

I still got warnings though…


### read waveform input
Stat: adin_oss: device name = /dev/dsp (application default)
Stat: adin_oss: sampling rate = 16000Hz
Stat: adin_oss: going to set latency to 50 msec
Stat: adin_oss: audio I/O Latency = 32 msec (fragment size = 512 samples)
STAT: AD-in thread created
<<< please speak >>>Warning: adin_oss: no data fragment after 300 msec?
Warning: adin_oss: no data fragment after 300 msec?
Warning: adin_oss: no data fragment after 300 msec?

If you open the Sound Settings, the warnings go away. I thought was kind of flakey, but it worked. Unfortunately, the output was a little cryptic, and didn’t give me the feedback that I needed. This is what I get when I said, “Hello”:


pass1_best: <s> DIAL EIGHT
pass1_best_wordseq: 0 3 5
pass1_best_phonemeseq: sil | d ay ax l | ey t
pass1_best_score: -3177.784424
### Recognition: 2nd pass (RL heuristic best-first)
STAT: 00 _default: 13 generated, 13 pushed, 5 nodes popped in 109
sentence1: <s> DIAL OH </s>
wseq1: 0 3 5 1
phseq1: sil | d ay ax l | ow | sil
cmscore1: 1.000 1.000 0.997 1.000
score1: -3393.694580

Running from a Recording

I probably could have used audacity much more easily, but since I was already on the command line, I decided to keep it there with the arecord program. I used this line to record:

$ arecord -r 16000 > test.wav

I played it back and it sounded kind of rough, but we’ll try it —

$ mplayer test.wav

Next, I ran it through julius:


$ ls test.wav > test.txt
$ julius -input rawfile -filelist test.txt -C julian.jconf

Unfortunately, mplayer could play the file, but julius could not open it for some reason.


### read waveform input
Error: adin_file: bytes per second != 32000 (16000)
Error: adin_file: error in parsing wav header at test.wav
Error: adin_file: failed to read speech data: "test.wav"
0 files processed

So, I found an example that used sox to convert it. I had to install sox with apt-get …

sudo apt-get install sox

Then, I converted the file and ran it like this:

$ sox test.wav -r 16000 -b 32 -c 1 test.s32
$ ls test.s32 > test.txt
$ julius -input rawfile -filelist test.txt -C julian.jconf

Still, this is the only output that I got:


### Recognition: 1st pass (LR beam)
...........................................................................................................................pass1_best: <s>
pass1_best_wordseq: 0
pass1_best_phonemeseq: sil
pass1_best_score: -2712.263916
### Recognition: 2nd pass (RL heuristic best-first)
WARNING: IW-triphone for word head "l-ow+t" not found, fallback to pseudo {ow+t}
WARNING: IW-triphone for word head "ow-ow+t" not found, fallback to pseudo {ow+t}
WARNING: IW-triphone for word head "t-ow+t" not found, fallback to pseudo {ow+t}
WARNING: IW-triphone for word head "uw-ow+t" not found, fallback to pseudo {ow+t}
WARNING: 00 _default: hypothesis stack exhausted, terminate search now
STAT: 00 _default: 0 sentences have been found
WARNING: 00 _default: got no candidates, search failed
STAT: 00 _default: 147 generated, 147 pushed, 147 nodes popped in 123
<search failed>
------
### read waveform input
1 files processed

I used audacity to cleanup the file. The Noise Removal improved it somewhat, but it still wasn’t good quality. Here’s the output after that:


### read waveform input
Stat: adin_file: input speechfile: test.wav
STAT: 30000 samples (1.88 sec.)
STAT: ### speech analysis (waveform -> MFCC)
### Recognition: 1st pass (LR beam)
..........................................................................................................................................................................................pass1_best: <s> DIAL OH </s>
pass1_best_wordseq: 0 3 5 1
pass1_best_phonemeseq: sil | d ay ax l | ow | sil
pass1_best_score: -5237.150391
### Recognition: 2nd pass (RL heuristic best-first)
STAT: 00 _default: 27 generated, 27 pushed, 5 nodes popped in 186
sentence1: <s> DIAL OH </s>
wseq1: 0 3 5 1
phseq1: sil | d ay ax l | ow | sil
cmscore1: 1.000 0.978 0.987 1.000
score1: -5225.757324
------
### read waveform input
1 files processed

I also tried creating a file from scratch in audacity, and I still couldn’t get it:


### read waveform input
Stat: adin_file: input speechfile: test.wav
STAT: 21176 samples (1.32 sec.)
STAT: ### speech analysis (waveform -> MFCC)
### Recognition: 1st pass (LR beam)
..................................................................................................................................pass1_best: <s> DIAL OH
pass1_best_wordseq: 0 3 5
pass1_best_phonemeseq: sil | d ay ax l | ow
pass1_best_score: -3417.226318
### Recognition: 2nd pass (RL heuristic best-first)
STAT: 00 _default: 23 generated, 23 pushed, 5 nodes popped in 130
sentence1: <s> DIAL OH </s>
wseq1: 0 3 5 1
phseq1: sil | d ay ax l | ow | sil
cmscore1: 1.000 0.911 1.000 1.000
score1: -3453.692871
------
### read waveform input
1 files processed

Running on YouTube Videos

My next task that I wanted to attempt was to try to capture something on a good recording. So, let’s find a good YouTube video to run through julius.

I tried clive, but it failed for some reason:

$sudo apt-get install clive

$ clive -cnrf best http://www.youtube.com/watch?v=dePLd9HAYjQ
fetch http://www.youtube.com/watch?v=dePLd9HAYjQ ...done.
error: no match: `(?-xism:url_encoded_fmt_stream_map=(.*?)&)'

So, I went back to my tried and true Video Downloader Firefox extension. Here is the first video that I tried:

For God So Loved The World (song and hymn history)

I converted the flv file to a wav like this:

ffmpeg -i youtube.flv -vn -acodec pcm_s16le -ar 16000 -ac 1 -f wav test.wav

And, I ran it through Julius like this:

$ ls test.wav > test.txt
$ julius -input rawfile -filelist test.txt -C julian.jconf

The end result was a segmentation fault!

I tried another one: Psalm 119 King James Holy Bible

This one also have me a segmentation fault.

Another: Job 41 (King James Holy Bible)

This one gave me this message:

....trace_backptr: sentence length exceeded ( > 150)

VoxForge Example

If you want to play with the VoxForge addon package, you can look at the readme file that should be located here:

/usr/share/doc/julius-voxforge/examples/README

Here are all the files installed with it:

$ dpkg -L julius-voxforge
/.
/usr
/usr/share
/usr/share/doc
/usr/share/doc/julius-voxforge
/usr/share/doc/julius-voxforge/copyright
/usr/share/doc/julius-voxforge/examples
/usr/share/doc/julius-voxforge/examples/controlapp
/usr/share/doc/julius-voxforge/examples/controlapp/mediaplayer.grammar
/usr/share/doc/julius-voxforge/examples/controlapp/command.py
/usr/share/doc/julius-voxforge/examples/controlapp/mediaplayer.voca
/usr/share/doc/julius-voxforge/examples/controlapp/README.controlapp
/usr/share/doc/julius-voxforge/examples/README
/usr/share/doc/julius-voxforge/examples/sample.grammar
/usr/share/doc/julius-voxforge/examples/sample.voca
/usr/share/doc/julius-voxforge/examples/julian.jconf.gz
/usr/share/doc/julius-voxforge/dict.gz
/usr/share/doc/julius-voxforge/changelog.Debian.gz
/usr/share/julius-voxforge
/usr/share/julius-voxforge/acoustic
/usr/share/julius-voxforge/acoustic/hmmdefs
/usr/share/julius-voxforge/acoustic/macros
/usr/share/julius-voxforge/acoustic/tiedlist

Resources

I decided it was time for a whole new post listing the latest Android 4.0/Ice Cream Sandwich ROMS for the HD2 phone. A few new ones have popped up since my last list.Currently, I have been following Tytung’s ROMs. I am on his NexusHD2-ICS-CM9 v1.2. So, here is a list of links relating to his ROMs:

Nexus HD2 Website: This has links to all of Tytung’s different ROMs. It is the best place to go to check for updates and find what you want.

XDA Thread for CM9 SD Version
XDA Thread for CM9 NAND Version
XDA Thread for AOSP Version

Here are a few more other ROMs out there:

IceCreamTosti: This ROM has some cool unlock screen tweaks from the screenshots. I haven’t tried it, but it looks cool.
Michie’s ROM: Has a few different cool features
Ankuch’s ROM: SD Version
Jama’s ROM

Note: SportsStar89’s ROM is discontinued. According to this post, he doesn’t have a working HD2 anymore. Apparently, he dropped it, cracked the digitizer, and broke it more trying to repair it.

I have picked up a number of source sites from these links that would be handy for creating a ROM.

Android Open Kang Project: Source for IceCreamTosti
MeDroid ICS Project Site: Source for Tytung’s ROMs

I can’t forget to mention the HD2 ROM site that lists all of the ROMs out there: HTC HD2 Android Roms/Builds.

Month: February 2012

VMWare Updated to 4.0.2 on Ubuntu 11.10

Voice Recognition in Ubuntu