Resolved Speech Samples

Discussion in 'iOS Programming' started by xArtx, May 6, 2013.

  1. xArtx, May 6, 2013
    Last edited: May 7, 2013

    xArtx macrumors 6502a

    Joined:
    Mar 30, 2012
    #1
    Hi Guys,
    I'm looking for a synthesis engine (Mac or PC),
    that will output a file of the speech samples it produces.
    Flite would be fine, but I haven't seen an implementation that records to file.

    I enquired about using samples produced by an online synthesis engine,
    but it's licensing is expensive.
     
  2. ArtOfWarfare, May 6, 2013
    Last edited by a moderator: May 6, 2013

    ArtOfWarfare macrumors 604

    ArtOfWarfare

    Joined:
    Nov 26, 2007
    #2
    Audio Hijack lets you record any sounds any app is making to a file. The trial version limits recording sessions to ten minutes - plenty for your needs, I think.

    http://www.rogueamoeba.com/audiohijack/

    So just use it with your choice of TTS - flite or the system voice, for example.
     
  3. subsonix macrumors 68040

    Joined:
    Feb 2, 2008
    #3
    The built in speech synth on OS X let's you do this with the command line tool. Try this to save it as an aif file for example:

    Code:
    say "Hello world" -o hello.aif
    
     
  4. xArtx thread starter macrumors 6502a

    Joined:
    Mar 30, 2012
    #4
    Thanks for the replies.
    I'm not quite sure you'd be allowed to rip the Mac samples for distribution,
    although I'd likely do it if it was the only option.
    Flite I know for sure, you can.

    I'll bet you'd be in trouble for making Siri say something, for example.
     
  5. subsonix macrumors 68040

    Joined:
    Feb 2, 2008
    #5
    I don't know, they are not samples though. But perhaps you should check it before using it. Otherwise Audio highjack should work, or Soundflower which appears like a virtual sound card.
     
  6. xArtx thread starter macrumors 6502a

    Joined:
    Mar 30, 2012
    #6
  7. Duncan C macrumors 6502a

    Duncan C

    Joined:
    Jan 21, 2008
    Location:
    Northern Virginia
    #7
    Play with the "say" command in Terminal. I was surprised at how rich it is, and as the other poster said, it has options to send the output to a file, with lots of control on file format.

    The "say" command also lets you control voices, set speed, add pauses, and lots of other things.
     
  8. xArtx thread starter macrumors 6502a

    Joined:
    Mar 30, 2012
    #8
    I didn't know about those options, thanks :)
    It is better than the output of the site I linked.
     
  9. ArtOfWarfare macrumors 604

    ArtOfWarfare

    Joined:
    Nov 26, 2007
    #9
    Code:
    man say
     
  10. xArtx thread starter macrumors 6502a

    Joined:
    Mar 30, 2012
    #10
  11. Duncan C macrumors 6502a

    Duncan C

    Joined:
    Jan 21, 2008
    Location:
    Northern Virginia
    #11
    Even the man page is missing some of the options.
     
  12. xArtx thread starter macrumors 6502a

    Joined:
    Mar 30, 2012
    #12
    I've still had a hell of a time with it,
    I need PCM data for the sound player, and the command posted a few
    posts above doesn't produce wav or mp3 files. (the .mp3 file is always 16 bytes in size for me).
    So I use a Winamp plugin to convert .aif to .wav for a PC wave file editor,
    convert to mono, and eventually also decide it sounds better at 22050.

    Used another program to convert wav files to C arrays, and wrote functionality
    to live fade in and fade out volume for each sample very quickly so I can cut
    into the start and end points of each sample slightly.

    It's only just ready to go now, and that was just to announce the time (25 samples).
    That's why the Xcode icon is a hammer!
     
  13. subsonix macrumors 68040

    Joined:
    Feb 2, 2008
    #13
    AIFF is an uncompressed PCM format, in fact the only thing that differs from WAV is the byte order. mp3 is not a PCM format, if you are going for compressed files use AAC which is the successor of mp3.

    Why do you hardcode them as arrays btw, it seems like it would be much more sensible to store them in the format they are, and read them when you need them.
     
  14. xArtx, May 11, 2013
    Last edited: May 11, 2013

    xArtx thread starter macrumors 6502a

    Joined:
    Mar 30, 2012
    #14
    I tried to save samples from the "say" command as mp3 or wav so that I could
    open them in a wave editor on the PC which won't accept AIF.
    They don't all come out perfect... "zero" is a good example.. has some click that can be cleaned up.
    Either way, I can save from the editor to wav files, and later ditch the file headers
    with a hex editor for clean PCM data.
    It's a time announcement (so far).... 0,1,2,3...10,11,12,13,14...20,30,40,50.
    When I need them is now.
    They would all be loaded at program launch anyway.
    Been there, done that with AVplayer delay in loading files. No thanks.
    I have since down sampled all speech to 22050 8 bit to save RAM. Still fine for speech.

    http://www.youtube.com/watch?v=DHKIryZ-yxY

    The trickiest part is live fading the volume in & out for every sample,
    otherwise each sample ends abruptly with a click.
    So far every other interface sound is 16 bit 44100 sample rate.
     
  15. xArtx thread starter macrumors 6502a

    Joined:
    Mar 30, 2012
    #15
    I'm curious as to why it seems more sensible?

    Why waste time loading a file, and looking at the header to determine the size,
    bits, and sample rate of audio when all of that is already known?
    Even if I were to store audio in files, I would still use raw PCM data as opposed
    to a standard file, unless space was the issue, and compression entered into it,
    in which case, I'd probably get around that by simply compressing the raw PCM data.
     
  16. dejo Moderator

    dejo

    Staff Member

    Joined:
    Sep 2, 2004
    Location:
    The Centennial State
    #16
    What kind of delays were you experiencing? Could you measure it? Also, did you look into using AudioServicesPlaySystemSound?
     
  17. subsonix macrumors 68040

    Joined:
    Feb 2, 2008
    #17
    Because at a 22.5khz sample rate you end up with 22500 bytes per second, per sample, in arrays that are defined in a header file.

    You wont be wasting time doing any of that since there are system functions to do that with an audio asset. Both aiff and wav use pcm.

    ----------

    You would load them at program start, not when you need them.
     
  18. xArtx thread starter macrumors 6502a

    Joined:
    Mar 30, 2012
    #18
    The delay when iOS has decided to ditch the memory,
    and reload the source sound file, which the programmer has no control over,
    or you wouldn't need a function like "PrepareToPlay".
    You can see it happen using a program, but no, I didn't measure it.
    I could start a timer from when I call the sound to play, but don't know how
    to determine when it actually starts playing, but it might be possible.
    Fortunately, I will never have to use it for an interface again, so don't care.

    You've asked me that question before, in a thread where I was frustrated with
    AVplayer, and the answer is the same, it's fast enough, but if the user has their
    ringer volume and normal volume set to different levels,
    system sound plays at the ringer volume level.
    Then the user has to go into settings to adjust the ringer volume to change the volume level of that app.
    Apple have their guidelines, I have mine ;)
    This breaks one of them, so is only used to run the vibrator.

    I'm not talking about wasting my time, but a processor's.
    Why put that work back on a computer when I've saved that time?
    It isn't free, you only have to Google AVplayer lag,
    or the Apple Multimedia Programming Guide:
    I have since drawn the wave in the background for the speech effect,
    but don't know if that access to the PCM data is possible with AVplayer.

    At the moment, while the data is small, but if I were to store as a file,
    it would still be a single pure PCM stream representing all interface samples,
    loaded at run time into the same explicitly declared arrays in a header file.
    They won't be purged from memory by iOS. So the only difference is where the data is stored?
     
  19. subsonix macrumors 68040

    Joined:
    Feb 2, 2008
    #19
    Parsing an audio file header is work that wouldn't even be possible to measure. We are not even talking about a compressed format here.

    You need to load the sounds before you use them. Where in the Apple guide do they advice you to embed audio data in your source code?

    I don't really want to argue about this though, if you think it's sensible to embed large amount of application assets in your source code, then go a head.
     
  20. xArtx thread starter macrumors 6502a

    Joined:
    Mar 30, 2012
    #20
    More work for the compiler, yes.
    The arrays though, are compiled back to a chunk of binary data at a
    known RAM address that isn't slow for the device to access.

    I don't particularly want to argue either,
    but am open to any reason for opposition to this. It's been going on for years.
    There is an issue (for iOS platform) where the data is compiled for
    both binaries, so it is duplicated on disc space.
    ...but I don't think you thought of that, and isn't the reason for your opposition.

    It's still faster to load on the device. That's something that can be measured.
     
  21. subsonix macrumors 68040

    Joined:
    Feb 2, 2008
    #21
    It would not affect the compile time or compiler, as the parsing would be done at runtime when you load the sound.

    In your case the binary would be bigger since the assets would be baked in, keeping the assets separate would not affect the total size that needs to be loaded. It's just the convoluted way of doing things, select sound, edit in hexeditor and paste in to your source code, as opposed to dropping in the sound assets as regular sound files. Make sure they are loaded when the application starts up, (and never released obviously). Embedding the data in the source code is a hack to make sure that the data is in memory at all times, it makes your source files bloated and hard to edit (comparably).
     
  22. dejo Moderator

    dejo

    Staff Member

    Joined:
    Sep 2, 2004
    Location:
    The Centennial State
    #22
    I thought I had but couldn't find the thread. My apologies. I won't bring it up again. :)
     

Share This Page