How to decode PCM from microphone?

Discussion in 'iOS Programming' started by HARDWARRIOR, Dec 11, 2008.

  1. macrumors member

    Joined:
    Nov 17, 2008
    #1
    I need to make clap recognition on iPhone.
    I know an algorithm how to make it analyzing amplitude-from-time audio signal. But it's not clear for me now how to get such a representation of audio signal from microphone.

    Now I only know how to write audio files from microphone:

    recorder.h
    Code:
    #import <Cocoa/Cocoa.h>
    #import <AudioToolbox/AudioQueue.h>
    #import <AudioToolbox/AudioFile.h>
    
    #define NUM_BUFFERS 10
    
    typedef struct 	{
    	AudioFileID                 audioFile;
    	AudioStreamBasicDescription dataFormat;
    	AudioQueueRef               queue;
    	AudioQueueBufferRef         buffers[NUM_BUFFERS];
    	UInt32                      bufferByteSize; 
    	SInt64                      currentPacket;
    	BOOL                        recording;
    } RecordState;
    
    @interface recorder : NSObject {
    @private
    	BOOL recording;
    	RecordState recordState;	
    }
    
    @property(nonatomic, assign) BOOL recording;
    
    - (void)start;
    - (void)stop;
    
    @end
    
    recorder.m
    Code:
    #import "recorder.h"
    
    
    @implementation recorder
    
    -(void)setRecording:(BOOL)val {
    	if(val)
    		[self start];
    	else
    		[self stop];
    }
    
    -(BOOL)recording {
    	return self->recording;
    }
    
    - (id)init {
    	if (self = [super init]) {
    		recording = NO;
    		
    		recordState.dataFormat.mSampleRate = 8000.0;
    		recordState.dataFormat.mFormatID = kAudioFormatLinearPCM;
    		recordState.dataFormat.mFramesPerPacket = 1;
    		recordState.dataFormat.mChannelsPerFrame = 1;
    		recordState.dataFormat.mBytesPerFrame = 2;
    		recordState.dataFormat.mBytesPerPacket = 2;
    		recordState.dataFormat.mBitsPerChannel = 16;
    		recordState.dataFormat.mReserved = 0;
    		recordState.dataFormat.mFormatFlags = kLinearPCMFormatFlagIsBigEndian | kLinearPCMFormatFlagIsSignedInteger | kLinearPCMFormatFlagIsPacked;
    	}
    	return self;
    }
    
    - (void)dealloc {
    	
    	
    	[super dealloc];
    }
    
    void AudioInputCallback(void *inUserData, AudioQueueRef inAQ, AudioQueueBufferRef inBuffer, const AudioTimeStamp *inStartTime, UInt32 inNumberPacketDescriptions, const AudioStreamPacketDescription *inPacketDescs) {
        RecordState *recordState = (RecordState *)inUserData;
    	
        OSStatus status = AudioFileWritePackets(recordState->audioFile, false, inBuffer->mAudioDataByteSize, inPacketDescs, recordState->currentPacket, &inNumberPacketDescriptions, inBuffer->mAudioData);
        if(status == 0)
            recordState->currentPacket += inNumberPacketDescriptions;
    	
        AudioQueueEnqueueBuffer(recordState->queue, inBuffer, 0, NULL);
    }
    
    - (void)start {
    	if(!recording) {
    		self->recording = YES;
    		
    		recordState.currentPacket = 0;
    		
    		OSStatus status = AudioQueueNewInput(&recordState.dataFormat, AudioInputCallback, &recordState, CFRunLoopGetCurrent(), kCFRunLoopCommonModes, 0, &recordState.queue);
    		
    		if(status == 0) {
    			for(int i = 0; i < NUM_BUFFERS; i++) {
    				AudioQueueAllocateBuffer(recordState.queue, 16000, &recordState.buffers[i]);
    				AudioQueueEnqueueBuffer(recordState.queue, recordState.buffers[i], 0, NULL);
    			}
    			
    			NSArray *paths = NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES);
    			NSString *documentsDirectory = [paths objectAtIndex:0];
    			NSString *writablePath = [documentsDirectory stringByAppendingPathComponent:@"audio.aiff"];
    			
    			NSLog(writablePath);
    			
    			CFURLRef fileURL = CFURLCreateFromFileSystemRepresentation(NULL, (const UInt8 *) [writablePath UTF8String], [writablePath length], NO);
    			
    			status = AudioFileCreateWithURL(fileURL, kAudioFileAIFFType, &recordState.dataFormat, kAudioFileFlags_EraseFile, &recordState.audioFile);
    			if(status == 0)
    				status = AudioQueueStart(recordState.queue, NULL);
    		}
    		
    		if(status != 0)
    			NSLog(@"Recording Failed!");
    	}	
    }
    
    - (void)stop {
    	if(recording) {
    		self->recording = NO;
    		
    		NSLog(@"Stopping queue...");
    		AudioQueueStop(recordState.queue, true);
    		NSLog(@"Queue stopped");
    		
    		for(int i = 0; i < NUM_BUFFERS; i++)
    			AudioQueueFreeBuffer(recordState.queue, recordState.buffers[i]);
    		
    		AudioQueueDispose(recordState.queue, true);
    		AudioFileClose(recordState.audioFile);
    		NSLog(@"Audio file closed");		
    	}
    }
    
    @end
    
    I have an idea to replace AudioFileWritePackets in callback AudioInputCallback. And that inBuffer->mAudioDataByteSize bites from pointer inBuffer->mAudioData will be amplitude-time signal representation... But I do not know how to decode that memory area: looks like I need to work with a pairs of bites (mBytesPerPacket = 2). May I represent them as one of "standart" types (like int)? Is it signed or not?
     
  2. macrumors 603

    Joined:
    Jul 29, 2003
    Location:
    Silicon Valley
    #2
    Arrays of signed big-endian short integers using the format you described.

    I suggest books on digital signal processing, and maybe research papers on speech/sound recognition. You might also find some good suggestions in books on computer music.

    .
     
  3. macrumors member

    Joined:
    Sep 17, 2007
    Location:
    San Diego, CA
    #3
    Look at the sample SpeakHere app from Apple. Among other things it shows you how to monitor the amplitude of the incoming signal. To recognize a clap (or short burst of sound) you just need to take this information and look for a rapid increase and decrease in amplitude.

    Craig
     
  4. macrumors 6502

    Joined:
    Jan 6, 2008
    #4
    Have you had any progress on this?

    I am also trying to decode the data to pass into a fast fourier transformation function.

    Any info would be greatly appreciated.
     
  5. macrumors 603

    Joined:
    Jul 29, 2003
    Location:
    Silicon Valley
    #5
    There are already too many apps in the store where developers don't know enough about the data going into or coming out of an FFT.

    I suggest an intense study, not only of DSP theory, but of data types and how to avoid problems with computer arithmetic.

    .
     
  6. macrumors 6502

    Joined:
    Jan 6, 2008
    #6
    Thanks for the tip.

    I've figured it out.
     
  7. thread starter macrumors member

    Joined:
    Nov 17, 2008
    #7
    No luck. Actually Apple's aurioTouch example does what I needed: it visualizes amplitude and can make fft. But it's not documented in a point where raw audio data is processed (aurioTouchAppDelegate.mm:performThru). You are given a void pointer and number of bytes allocated by data. aurioTouch the performs some magic in a loop parsing this data as SInt8s and stepping by 4 * (sizeof(SInt8)) bits ahead. I just dont understand why 4?
    Also this example uses audioUnits. I dont get point in using them in iPhone and even why Apple made an AudioUnit framework for iPhone.

    For a clap recognition I checked audioLevels like in SpeakHere example. It is actually an average power which is integral from wave over a period. Kind of "smoothed" amplitude. So if you need only to wach amp peaks and decreasing over time - it's ok. But for fft you cant use average power.

    So if you will understand every line of code of aurioTouch - then yol'll solve your problem) Please notify of any progress on fft in this topic ;)
     
  8. macrumors 6502

    Joined:
    Jan 6, 2008
    #8
    I had no idea that such a sample code existed from Apple. I've actually succeeded in what I was trying to accomplish (i did what firewood said -- signed shorts, big endian), but the use of AudioUnit in auriotouch is interesting.
     
  9. thread starter macrumors member

    Joined:
    Nov 17, 2008
    #9
    Can you post a code where parsing to signed shorts is made? How do you set dataFormat? Especially interested in dataFormat.mFormatFlags.
     
  10. macrumors 6502

    Joined:
    Jan 6, 2008
    #10
    I'm have no clue about the dataFormat -- I just use the default one from the sample code. My goal was to simply pass in the wav data to get some semblance of frequency. Something simple like if I whistle, it would register a higher number and if I talk low it would register a lower number. I didn't really care whether it actually was the right frequency. So I'm not sure how much this will help, but what I did was:


    for (i = 0; i < len; i+=2)
    {
    char *buf = (char *) inBuffer->mAudioData;
    SInt16 s = (buf << 16) | (buf[i+1] << 8);
    in = (double) s;
    in[i+1] = 0;
    }
     
  11. macrumors newbie

    Joined:
    Feb 2, 2009
    #11


    Hi, I can't understand what your code does...

    How can I get the instant audio frequency to do a successive FFT?
    What is the "in" array?

    thanks,

    regards
     
  12. macrumors newbie

    Joined:
    May 6, 2009
    #12


    adamk77 can you tell me a way to detect heartbeat by changing some code in auriotouch sample please.i am stuck at this from last two weeks.
     

Share This Page