Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

xArtx

macrumors 6502a
Original poster
Mar 30, 2012
764
1
Hi Guys,
I'm looking for a synthesis engine (Mac or PC),
that will output a file of the speech samples it produces.
Flite would be fine, but I haven't seen an implementation that records to file.

I enquired about using samples produced by an online synthesis engine,
but it's licensing is expensive.
 
Last edited:
Audio Hijack lets you record any sounds any app is making to a file. The trial version limits recording sessions to ten minutes - plenty for your needs, I think.

http://www.rogueamoeba.com/audiohijack/

So just use it with your choice of TTS - flite or the system voice, for example.
 
Last edited by a moderator:
The built in speech synth on OS X let's you do this with the command line tool. Try this to save it as an aif file for example:

Code:
say "Hello world" -o hello.aif
 
Thanks for the replies.
I'm not quite sure you'd be allowed to rip the Mac samples for distribution,
although I'd likely do it if it was the only option.
Flite I know for sure, you can.

I'll bet you'd be in trouble for making Siri say something, for example.
 
Thanks for the replies.
I'm not quite sure you'd be allowed to rip the Mac samples for distribution,
although I'd likely do it if it was the only option.
Flite I know for sure, you can.

I'll bet you'd be in trouble for making Siri say something, for example.

I don't know, they are not samples though. But perhaps you should check it before using it. Otherwise Audio highjack should work, or Soundflower which appears like a virtual sound card.
 
Found this one:
http://vozme.com/index.php?lang=en

which is a Festival implementation with male or female voice,
and records to mp3 which can be converted to PCM later on.

This is overall the best I found:
http://www2.research.att.com/~ttsweb/tts/demo.php

beautiful sound at a low sample rate...
but unless you plan to make a lot of money out of your app,
don't bother enquiring about licensing!

Play with the "say" command in Terminal. I was surprised at how rich it is, and as the other poster said, it has options to send the output to a file, with lots of control on file format.

The "say" command also lets you control voices, set speed, add pauses, and lots of other things.
 
Play with the "say" command in Terminal. I was surprised at how rich it is, and as the other poster said, it has options to send the output to a file, with lots of control on file format.

The "say" command also lets you control voices, set speed, add pauses, and lots of other things.

I didn't know about those options, thanks :)
It is better than the output of the site I linked.
 
I've still had a hell of a time with it,
I need PCM data for the sound player, and the command posted a few
posts above doesn't produce wav or mp3 files. (the .mp3 file is always 16 bytes in size for me).
So I use a Winamp plugin to convert .aif to .wav for a PC wave file editor,
convert to mono, and eventually also decide it sounds better at 22050.

Used another program to convert wav files to C arrays, and wrote functionality
to live fade in and fade out volume for each sample very quickly so I can cut
into the start and end points of each sample slightly.

It's only just ready to go now, and that was just to announce the time (25 samples).
That's why the Xcode icon is a hammer!
 
I've still had a hell of a time with it,
I need PCM data for the sound player, and the command posted a few
posts above doesn't produce wav or mp3 files. (the .mp3 file is always 16 bytes in size for me).
So I use a Winamp plugin to convert .aif to .wav for a PC wave file editor,
convert to mono, and eventually also decide it sounds better at 22050.

AIFF is an uncompressed PCM format, in fact the only thing that differs from WAV is the byte order. mp3 is not a PCM format, if you are going for compressed files use AAC which is the successor of mp3.

Why do you hardcode them as arrays btw, it seems like it would be much more sensible to store them in the format they are, and read them when you need them.
 
AIFF is an uncompressed PCM format, in fact the only thing that differs from WAV is the byte order. mp3 is not a PCM format, if you are going for compressed files use AAC which is the successor of mp3.
I tried to save samples from the "say" command as mp3 or wav so that I could
open them in a wave editor on the PC which won't accept AIF.
They don't all come out perfect... "zero" is a good example.. has some click that can be cleaned up.
Either way, I can save from the editor to wav files, and later ditch the file headers
with a hex editor for clean PCM data.
Why do you hardcode them as arrays btw, it seems like it would be much more sensible to store them in the format they are, and read them when you need them.
It's a time announcement (so far).... 0,1,2,3...10,11,12,13,14...20,30,40,50.
When I need them is now.
They would all be loaded at program launch anyway.
Been there, done that with AVplayer delay in loading files. No thanks.
I have since down sampled all speech to 22050 8 bit to save RAM. Still fine for speech.

http://www.youtube.com/watch?v=DHKIryZ-yxY

The trickiest part is live fading the volume in & out for every sample,
otherwise each sample ends abruptly with a click.
So far every other interface sound is 16 bit 44100 sample rate.
 
Last edited:
it seems like it would be much more sensible to store them in the format they are
I'm curious as to why it seems more sensible?

Why waste time loading a file, and looking at the header to determine the size,
bits, and sample rate of audio when all of that is already known?
Even if I were to store audio in files, I would still use raw PCM data as opposed
to a standard file, unless space was the issue, and compression entered into it,
in which case, I'd probably get around that by simply compressing the raw PCM data.
 
I'm curious as to why it seems more sensible?

Because at a 22.5khz sample rate you end up with 22500 bytes per second, per sample, in arrays that are defined in a header file.

Why waste time loading a file, and looking at the header to determine the size,
bits, and sample rate of audio when all of that is already known?
Even if I were to store audio in files, I would still use raw PCM data as opposed
to a standard file, unless space was the issue, and compression entered into it,
in which case, I'd probably get around that by simply compressing the raw PCM data.

You wont be wasting time doing any of that since there are system functions to do that with an audio asset. Both aiff and wav use pcm.

----------

When I need them is now.
They would all be loaded at program launch anyway.
Been there, done that with AVplayer delay in loading files. No thanks.

You would load them at program start, not when you need them.
 
What kind of delays were you experiencing? Could you measure it?
The delay when iOS has decided to ditch the memory,
and reload the source sound file, which the programmer has no control over,
or you wouldn't need a function like "PrepareToPlay".
You can see it happen using a program, but no, I didn't measure it.
I could start a timer from when I call the sound to play, but don't know how
to determine when it actually starts playing, but it might be possible.
Fortunately, I will never have to use it for an interface again, so don't care.

Also, did you look into using AudioServicesPlaySystemSound?
You've asked me that question before, in a thread where I was frustrated with
AVplayer, and the answer is the same, it's fast enough, but if the user has their
ringer volume and normal volume set to different levels,
system sound plays at the ringer volume level.
Then the user has to go into settings to adjust the ringer volume to change the volume level of that app.
Apple have their guidelines, I have mine ;)
This breaks one of them, so is only used to run the vibrator.

You wont be wasting time doing any of that since there are system functions to do that with an audio asset. Both aiff and wav use pcm.
I'm not talking about wasting my time, but a processor's.
Why put that work back on a computer when I've saved that time?
It isn't free, you only have to Google AVplayer lag,
or the Apple Multimedia Programming Guide:
• To play and record audio in the fewest lines of code, use the AV Foundation framework. See “Playing Sounds Easily with the AVAudioPlayer Class” and “Recording with the AVAudioRecorder Class.”
• To provide lowest latency audio, especially when doing simultaneous input and output (such as for a VoIP application), use the I/O unit or the Voice Processing I/O unit. See “Audio Unit Support in iOS.”

I have since drawn the wave in the background for the speech effect,
but don't know if that access to the PCM data is possible with AVplayer.

Because at a 22.5khz sample rate you end up with 22500 bytes per second, per sample, in arrays that are defined in a header file.
At the moment, while the data is small, but if I were to store as a file,
it would still be a single pure PCM stream representing all interface samples,
loaded at run time into the same explicitly declared arrays in a header file.
They won't be purged from memory by iOS. So the only difference is where the data is stored?
 
I'm not talking about wasting my time, but a processor's.
Why put that work back on a computer when I've saved that time?

Parsing an audio file header is work that wouldn't even be possible to measure. We are not even talking about a compressed format here.

It isn't free, you only have to Google AVplayer lag,
or the Apple Multimedia Programming Guide:

You need to load the sounds before you use them. Where in the Apple guide do they advice you to embed audio data in your source code?

I don't really want to argue about this though, if you think it's sensible to embed large amount of application assets in your source code, then go a head.
 
Parsing an audio file header is work that wouldn't even be possible to measure. We are not even talking about a compressed format here.
More work for the compiler, yes.
The arrays though, are compiled back to a chunk of binary data at a
known RAM address that isn't slow for the device to access.

I don't particularly want to argue either,
but am open to any reason for opposition to this. It's been going on for years.
There is an issue (for iOS platform) where the data is compiled for
both binaries, so it is duplicated on disc space.
...but I don't think you thought of that, and isn't the reason for your opposition.

It's still faster to load on the device. That's something that can be measured.
 
More work for the compiler, yes.
The arrays though, are compiled back to a chunk of binary data at a
known RAM address that isn't slow for the device to access.

It would not affect the compile time or compiler, as the parsing would be done at runtime when you load the sound.

I don't particularly want to argue either,
but am open to any reason for opposition to this. It's been going on for years.
There is an issue (for iOS platform) where the data is compiled for
both binaries, so it is duplicated on disc space.
...but I don't think you thought of that, and isn't the reason for your opposition.

It's still faster to load on the device. That's something that can be measured.

In your case the binary would be bigger since the assets would be baked in, keeping the assets separate would not affect the total size that needs to be loaded. It's just the convoluted way of doing things, select sound, edit in hexeditor and paste in to your source code, as opposed to dropping in the sound assets as regular sound files. Make sure they are loaded when the application starts up, (and never released obviously). Embedding the data in the source code is a hack to make sure that the data is in memory at all times, it makes your source files bloated and hard to edit (comparably).
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.