Synthesizing an Accapella Song with Festival Speech Synthesis

I have played with Festival before.  It will easily generate speech from written commands.  It seems pretty full featured.  But, I have always wanted to add pitch.  Could I make it sing the words?

I think I found my answer:

Festival Singing Synthesis

So, to try it out, I decided to try to generate the first phrase of this song:

When I Survey Sheet Music

Festival uses an XML format to describe how the notes match up to the words.  So, for the first part, the melody, I created survey1.xml to contain this:

</p>
<p>&lt;?xml version=&quot;1.0&quot;?&gt;<br />
&lt;!DOCTYPE SINGING PUBLIC &quot;-//SINGING//DTD SINGING mark up//EN&quot;<br />
 &quot;Singing.v0_1.dtd&quot;<br />
[]&gt;<br />
&lt;SINGING BPM=&quot;30&quot;&gt;<br />
&lt;PITCH NOTE=&quot;F3&quot;&gt;&lt;DURATION BEATS=&quot;0.6&quot;&gt;When&lt;/DURATION&gt;&lt;/PITCH&gt;<br />
&lt;PITCH NOTE=&quot;F3&quot;&gt;&lt;DURATION BEATS=&quot;0.3&quot;&gt;I&lt;/DURATION&gt;&lt;/PITCH&gt;<br />
&lt;PITCH NOTE=&quot;G3&quot;&gt;&lt;DURATION BEATS=&quot;0.3&quot;&gt;Sur&lt;/DURATION&gt;&lt;/PITCH&gt;<br />
&lt;PITCH NOTE=&quot;A3&quot;&gt;&lt;DURATION BEATS=&quot;0.6&quot;&gt;vey&lt;/DURATION&gt;&lt;/PITCH&gt;<br />
&lt;PITCH NOTE=&quot;G3&quot;&gt;&lt;DURATION BEATS=&quot;0.3&quot;&gt;The&lt;/DURATION&gt;&lt;/PITCH&gt;<br />
&lt;PITCH NOTE=&quot;A3&quot;&gt;&lt;DURATION BEATS=&quot;0.3&quot;&gt;a&lt;/DURATION&gt;&lt;/PITCH&gt;<br />
&lt;PITCH NOTE=&quot;B3&quot;&gt;&lt;DURATION BEATS=&quot;0.6&quot;&gt;Won&lt;/DURATION&gt;&lt;/PITCH&gt;<br />
&lt;PITCH NOTE=&quot;A3&quot;&gt;&lt;DURATION BEATS=&quot;0.3&quot;&gt;dra&lt;/DURATION&gt;&lt;/PITCH&gt;<br />
&lt;PITCH NOTE=&quot;G3&quot;&gt;&lt;DURATION BEATS=&quot;0.3&quot;&gt;as&lt;/DURATION&gt;&lt;/PITCH&gt;<br />
&lt;PITCH NOTE=&quot;A3&quot;&gt;&lt;DURATION BEATS=&quot;0.3&quot;&gt;cross&lt;/DURATION&gt;&lt;/PITCH&gt;<br />
&lt;/SINGING&gt;</p>
<p>

To play it, I could use the command:

(tts “survey1.xml” ‘singing)

Here’s the full output:

</p>
<p>skp@pecan:~/Downloads$ festival</p>
<p>Festival Speech Synthesis System 2.1:release November 2010<br />
Copyright (C) University of Edinburgh, 1996-2010. All rights reserved.</p>
<p>clunits: Copyright (C) University of Edinburgh and CMU 1997-2010<br />
clustergen_engine: Copyright (C) CMU 2005-2010<br />
hts_engine:<br />
The HMM-based speech synthesis system (HTS)<br />
hts_engine API version 1.04 (http://hts-engine.sourceforge.net/)<br />
Copyright (C) 2001-2010 Nagoya Institute of Technology<br />
 2001-2008 Tokyo Institute of Technology<br />
All rights reserved.<br />
For details type `(festival_warranty)'<br />
festival&gt; (tts &quot;survey1.xml&quot; 'singing)<br />
nil<br />
festival&gt;</p>
<p>

Next, I wanted harmony.  So, I created a second file: survey2.xml.  It contained this:

</p>
<p>&lt;?xml version=&quot;1.0&quot;?&gt;<br />
&lt;!DOCTYPE SINGING PUBLIC &quot;-//SINGING//DTD SINGING mark up//EN&quot;<br />
 &quot;Singing.v0_1.dtd&quot;<br />
[]&gt;<br />
&lt;SINGING BPM=&quot;30&quot;&gt;<br />
&lt;PITCH NOTE=&quot;C3&quot;&gt;&lt;DURATION BEATS=&quot;0.6&quot;&gt;When&lt;/DURATION&gt;&lt;/PITCH&gt;<br />
&lt;PITCH NOTE=&quot;C3&quot;&gt;&lt;DURATION BEATS=&quot;0.3&quot;&gt;I&lt;/DURATION&gt;&lt;/PITCH&gt;<br />
&lt;PITCH NOTE=&quot;E3&quot;&gt;&lt;DURATION BEATS=&quot;0.3&quot;&gt;Sur&lt;/DURATION&gt;&lt;/PITCH&gt;<br />
&lt;PITCH NOTE=&quot;F3&quot;&gt;&lt;DURATION BEATS=&quot;0.6&quot;&gt;vey&lt;/DURATION&gt;&lt;/PITCH&gt;<br />
&lt;PITCH NOTE=&quot;G3&quot;&gt;&lt;DURATION BEATS=&quot;0.3&quot;&gt;The&lt;/DURATION&gt;&lt;/PITCH&gt;<br />
&lt;PITCH NOTE=&quot;F#3&quot;&gt;&lt;DURATION BEATS=&quot;0.3&quot;&gt;a&lt;/DURATION&gt;&lt;/PITCH&gt;<br />
&lt;PITCH NOTE=&quot;G3&quot;&gt;&lt;DURATION BEATS=&quot;0.6&quot;&gt;Won&lt;/DURATION&gt;&lt;/PITCH&gt;<br />
&lt;PITCH NOTE=&quot;F3&quot;&gt;&lt;DURATION BEATS=&quot;0.3&quot;&gt;dra&lt;/DURATION&gt;&lt;/PITCH&gt;<br />
&lt;PITCH NOTE=&quot;E3&quot;&gt;&lt;DURATION BEATS=&quot;0.3&quot;&gt;as&lt;/DURATION&gt;&lt;/PITCH&gt;<br />
&lt;PITCH NOTE=&quot;F3&quot;&gt;&lt;DURATION BEATS=&quot;0.3&quot;&gt;cross&lt;/DURATION&gt;&lt;/PITCH&gt;<br />
&lt;/SINGING&gt;</p>
<p>

The documented command to generate this to a wav file is “text2wav”.  Unfortunately, that just returns an error:

</p>
<p>skp@pecan:~/Downloads$ text2wave -mode singing survey1.xml -o survey1.wav<br />
SIOD ERROR: wrong type of argument to get_c_val</p>
<p>

That didn’t work, so I dropped back to just recording it with arecord.  To get arecord to record from my soundcard rather than my microphone, I had to create a loopback device.  This command did the trick:

sudo modprobe snd-aloop

Next, in my volume control, I had to select output to my new loopback device:

Selecting the loopback device

Finally, I threw together this little script to start recording, generate the singing, and close the recording:

</p>
<p>#!/bin/sh</p>
<p>arecord -D hw:1,1,0 -f cd survey1.wav &amp;<br />
pid=$!<br />
festival &lt;&lt;!<br />
(tts_file &quot;survey1.xml&quot; 'singing)<br />
!<br />
kill $pid</p>
<p>

Or, this script does all of my parts for me:

</p>
<p>#!/bin/sh</p>
<p>for f in `ls survey?.xml`<br />
do<br />
 arecord -D hw:1,1,0 -f cd $f.wav &amp;<br />
 pid=$!<br />
 festival &lt;&lt;!<br />
 (tts_file &quot;$f&quot; 'singing)<br />
!<br />
 kill $pid<br />
done</p>
<p>

Now, to mix the output down to a single file, I needed to install the speech-tools package:

sudo apt-get install speech-tools

Then, this command mixed all of the parts into a single song:

ch_wave -o survey_full.wav -pc longest survey?.xml.wav

Flinger

Several places refer to a more advanced version of Festival designed more for generating singing.  The variant is called Flinger.

I had trouble finding where to download Flinger.  I think this might be a place, but it requires registration.  I’ll save this for a follow up post:

https://www.cslu.ogi.edu/tts/download/data/

Resources

One thought on “Synthesizing an Accapella Song with Festival Speech Synthesis

Leave a Comment

Your email address will not be published. Required fields are marked *