Open a document, and search for all email addresses

Discussion in 'Mac Programming' started by czeluff, Nov 26, 2008.

  1. czeluff macrumors 6502

    Joined:
    Oct 23, 2006
    #1
    Hello to all,

    I'm still a pretty new programmer, but I'd like to try taking on a more difficult program. Someone at work asked me to write a program that'd do the following:

    1. Search for all emails within a document (.doc, .pages, .rtf etc does not matter, whichever will be easiest to code).
    2. Export the list to an rtf.

    Where should I start looking for the classes and code necessary to do this? I'm a C++ developer (new) for work, so if it's easiest to do in C++ on Windows i'll do that. But i'd prefer to do this using Objective-C on a Mac.

    Any guidance would be greatly appreciated. :)

    Chad
     
  2. lee1210 macrumors 68040

    lee1210

    Joined:
    Jan 10, 2005
    Location:
    Dallas, TX
    #2
    Plain text would be the easiest to parse, but there are likely libraries to allow acess to the actual text of other filetypes.

    As far as pattern matching, querying by way of NSPredicate would likely work, but as has been mentioned here before, what is a valid email address is very complex. If you simplified to anything between two spaces with an @, you might get close, but would assuredly get false positive results.

    -Lee
     
  3. mysterytramp macrumors 65816

    mysterytramp

    Joined:
    Jul 17, 2008
    Location:
    Maryland
    #3
    This would be pretty easy with AppleScript, too.

    mt
     
  4. kainjow Moderator emeritus

    kainjow

    Joined:
    Jun 15, 2000
    #4
    As for reading in those file formats, you could use Cocoa and the NSAttributedString class, which can read Word, RTF, and HTML and give you back a plain text representation, which then could be used to search for emails. Here's a sample (untested, assumes ASCII):

    Code:
    #include <Foundation/Foundation.h>
    
    char* plainTextFromFile(char *file) {
        char *string = NULL;
        NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
        NSAttributedString *astr = [[[NSAttributedString alloc] initWithPath:[NSString stringWithUTF8String:file] documentAttributes:nil] autorelease];
        if (astr) {
            NSString *plain = [astr string];
            string = malloc([plain length]+1);
            [plain getCString:string maxLength:[plain length] encoding:NSASCIIStringEncoding];
        }
        [pool release];
        return string;
    }
    You could call that from a C/C++ program, just make sure you compile the file as Objective-C (easiest way is to give it a .m extension with gcc) and link with the Foundation framework :)
     

Share This Page