PDA

View Full Version : creative text file parsing methodology? any ideas?




dj.mooky
Jun 28, 2009, 04:31 PM
So from the poker application I'm working on.. I made major headway today, but noticed a glitch in the tracking of money.

As of right now, I am breaking the text file down line by line, and then analyzing the prefix attached to it to determine where to send the next few lines too(without giving too much away, I most of the time can establish EXACTLY what would be before the name, and have the beginning search aspect of this covered). I then go on to break the line down into an array separated by spaces to fetch other info like numbers, or actions based on " objectAtIndex:x" type of situations... but onto the problem

In a nutshell, players are allowed to have names with spaces in them, and there is no set way in the text files generated from pokerstars to descern where EXACTLY they begin and or end.

.txt example being like this


Seat 2: pebos train ($6.59 in chips)

pebos train: folds //(however this could be checks, bets, raises, calls....)

Seat 2: pebos train folded before Flop


These are 3 examples of this one person named "pebos train," but that is not to say that they are the only lines throwing loops at me. The name could have even more spaces, or it could be something that I would expect to come after it, for example he could have the name "pebos folds a lot" then the line would look like "pebos folds a lot: folds" and if i just searched for the first existance of "folds" then it would give me the incorrect name, then the rest of the data will be incorrect.

So i've set about attempting to write a method to fetch the name, without knowing EXACTLY what will be after the name. I can however come up with a list of characters, such as ":" or "(" that will come after a name in given scenarios, but I was hoping to come up with a way to do it all in one function.

So far I can't find a way to hammer down a perfect method for either A: getting the full range(again mainly the end point) of a given name of a player, or B: figure out where a name ends in general because of the naming conventions allowed by pokerstars.

Any brilliant ideas? Has anyone else done something like this?

The function i'm using right now looks something like this:

+(NSString *) getNameFromString: (NSArray *) anArray from:(NSString *) preString to: (NSString *) postString {

int start, finish;
start = 0;
BOOL done;
// we figure out the start point for the name by finding the end point of the pre-string
if ( preString != nil ) {
done = NO;
while (!done) {

if ([[anArray objectAtIndex: start] hasPrefix: preString]) {
start++;
done = YES;
}
start++;

}

}


//now we figure the the end point when we hit the word that the name should end on
done = NO;
finish = start + 1;
while (!done) {

if ([[anArray objectAtIndex: finish] hasPrefix: postString]) {
finish++;
if (![[anArray objectAtIndex: finish] hasPrefix: postString]) {
done = YES;
}
}
finish++;
}

NSString *tmpString = @"";

int i;

for ( i = start; i < finish; i++) {
tmpString = [tmpString stringByAppendingString: [anArray objectAtIndex: i]];
if ( (i + 1) < finish)
tmpString = [tmpString stringByAppendingString: @" "]; // this throws a space in the name where needed
}

return tmpString;

}


I've thought about accepting a string, instead of an array as an argument, and searching for specific strings that would be an expected ending, however.. I always stumble back to the point of the name could have the ending string I'm searching for in it. And for some reason, searching FROM the end of the string to the beginning just does not seem like a good idea, especially when I could be looking for multiple options.... I know there is a better way, I just cannot think of it



angelwatt
Jun 28, 2009, 05:41 PM
Generally, when I work with flat files (plain text, often each line is a record) I use a string as the separator that won't exist in the text.

field 1#$#field-2#$#some other field#$#and so on
You then just split the line based on that separator (#$#).

firewood
Jun 28, 2009, 06:14 PM
You could always send the data to a language environment that has better support for string processing and regex's. For instance, pipe to perl on a Mac, or send to Javascript in a UIWebView on an iPhone, to do any fancy string processing needed.

dj.mooky
Jun 28, 2009, 09:28 PM
Generally, when I work with flat files (plain text, often each line is a record) I use a string as the separator that won't exist in the text.

field 1#$#field-2#$#some other field#$#and so on
You then just split the line based on that separator (#$#).

I'm not the one writing the text files actually. They are being produced by a program for playing poker online(pokerstars.com if you are interested). So my pure intent is to draw information OUT of the file, as opposed to writing it :)

Writing it would be a piece of cake comparatively.


and to firewood:
I currently know no pearl whatsoever. Could you perhaps expound just enough so that I know what to search for in terms of examples so that I may try this idea of yours? I am always looking to expand my horizons to be able to meet any challenge that comes up

Sayer
Jun 28, 2009, 10:49 PM
It looks like just searching for the cash amounts will not vary at all as they will always start with "($" chars. So you could do a pass and gather up the names by looking for those lines with "($" in them and getting all unique names.

Once you have the names you can use the names for breaking down the strings using the name as the delimiter.

Yeah its not all one function, but darn, sometimes you deal with the problem at hand any way that works.

ChOas
Jun 29, 2009, 03:42 AM
As I'm normally a Perl programmer I would probably use a regular expression (I'm pretty sure ObjC/Cocoa supports this, right ?)

Now, I don't know much about your input, but if I assume that you have one file per match (and I read your input correctly) and want to know all the users and seat the user is on then I would use this regexp to walk over the file once so I have all the users and their seats:

I'm assuming this is an 'introduction' line which describes the players:

Seat 2: pebos train ($6.59 in chips)

So for every line:

Try to match:

Seat
followed by whitespace
followed by an integer (and save the integer)
followed by colon and whitespace
Any text (the username, certainly save this)
followed by ' ($'
followed by an amount (might as wel save this)
followed by 'in chips)'

In Perl (can't test this second, but should work):

if (/Seat\s+(\d+):\s+(.*?)\s+\(\$([0-9.]+)\s+in chips\)$/) {
my $seat = $1;
my $username = $2;
my $person_cash = $3;
_savePersonToStructSomewhere($seat,$username,$person_cash);
};


There might be some stuff here which deals with regexps in for NSStrings: http://www.stiefels.net/2007/01/24/regular-expressions-for-nsstring/

If you have a lot of sample data and can let me know what you need exactly I might be able to figure this out in Obj-C for you tonight.

ChOas
Jun 29, 2009, 04:10 AM
NSScanner looks pretty promising too:

http://developer.apple.com/DOCUMENTATION/Cocoa/Conceptual/Strings/Articles/Scanners.html#//apple_ref/doc/uid/20000147

cqexbesd
Jun 29, 2009, 06:52 AM
As of right now, I am breaking the text file down line by line, and then analyzing the prefix attached to it to determine where to send the

Use a regular expression. There are other ways but nothing so powerful or elegant. When the format of the file changes you will be glad you went with this method.

I haven't checked but I think PCRE comes with OSX - I'm sure there will be an Obj-C wrapper though I don't know if that will come by default.

HTH,

Andrew

ChOas
Jun 29, 2009, 11:41 AM
Okay, I'm still pretty new with Obj-C ... But how would something like this work for you ? :


#import <Foundation/Foundation.h>

int main (int argc, const char * argv[]) {
NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];

NSArray *testFile = [NSArray arrayWithObjects: @"Seat 2: pebos train ($6.59 in chips)\n",
@"Seat 5: ChOas ($15.0 in chips)\n",
@"something bad about iphonejudy\n",
@"Seat 7: Bastard ($5.5 in chips) player ($15.0 in chips)\n",
@"Seat 4: Easier in Perl ($15.0 in chips)\n",nil];

NSInteger userSeat;
NSString *userName = NULL;
float userCash;

for (NSString *curLine in testFile) {
NSScanner *myScanner = [NSScanner scannerWithString:curLine];

if ([myScanner scanString:@"Seat" intoString:NULL] &&
[myScanner scanInteger:&userSeat] &&
[myScanner scanString:@":" intoString:NULL] &&
[myScanner scanUpToString:@" ($" intoString:&userName] &&
[myScanner scanString:@"($" intoString:NULL] &&
[myScanner scanFloat:&userCash] &&
[myScanner scanString:@"in chips)" intoString:NULL]) {

NSLog(@"Found: \nSeat: \"%d\"\nUser:\"%@\"\nCash:\"%0.2f\"\n",userSeat, userName, userCash);
}

}
NSLog(@"Done");
[pool drain];
return 0;
}


maybe sscanf would have been possible too, but I'm not sure...

Oh, seat 7 messes stuff up... dunno how to anchor the searching to the back so I hope they sanitise the player names :)

dj.mooky
Jun 29, 2009, 11:46 AM
Use a regular expression. There are other ways but nothing so powerful or elegant. When the format of the file changes you will be glad you went with this method.

I haven't checked but I think PCRE comes with OSX - I'm sure there will be an Obj-C wrapper though I don't know if that will come by default.

HTH,

Andrew

I looked over the link from ChOas, and think I am going to go with the regex route. Mainly because what cqexbesd said rang so true. I think overall it will allow for easier searches when things change, or I decide to incorporate compatibility for other poker sites...

Now if I can only fix that strange math error >.> When you have only 6 people total, 3 people win $30 and no one lost more than $4 something is up for sure :)

Thanks for all the help, I think I have plenty to get me going down the right path now

mysterytramp
Jun 29, 2009, 05:52 PM
In a nutshell, players are allowed to have names with spaces in them, and there is no set way in the text files generated from pokerstars to descern where EXACTLY they begin and or end.

1) You know how to find dollar amounts
2) You know all the words in pokerstars' "dictionary"

Names would be what's left, right?

mt

dj.mooky
Jun 30, 2009, 12:30 PM
I managed to get it working(100% as per my handhistory files that I can see) with a combination of regex's and NSScanner to do the info sucking in the strings.

Much thanks for all the help from everyone.

Currently seeking 2-3 beta testers if anyone is interested >.>

App is still buggy as all get out but I really want to hammer down the cash tracking before I progress much further down the line

dj.mooky
Jun 30, 2009, 12:47 PM
oh, and the solution..... was something like this



NSPredicate *regextest;

NSString *seatedPlayer = @"Seat .*: .* (.* in chips).*";

NSCharacterSet *colonSet;
NSCharacterSet *dollarSet;
NSCharacterSet *openPenSet;
NSCharacterSet *closePenSet;

colonSet = [NSCharacterSet characterSetWithCharactersInString:@":"];
dollarSet = [NSCharacterSet characterSetWithCharactersInString:@"$"];
openPenSet = [NSCharacterSet characterSetWithCharactersInString:@"("];
closePenSet = [NSCharacterSet characterSetWithCharactersInString:@")"];

NSString *thisGuysName;

NSScanner *theScanner;
theScanner = [NSScanner scannerWithString:[lineByLine objectAtIndex:line]];
float amount;

NSLog(@"%@", [lineByLine objectAtIndex:line]);

regextest= [NSPredicate predicateWithFormat:@"SELF MATCHES %@", seatedPlayer];
//@"Seat .*: .* .*.* in chips)";
if ([regextest evaluateWithObject:[lineByLine objectAtIndex:line]] == YES) {
if ( [theScanner scanUpToCharactersFromSet:colonSet intoString:NULL] &&
[theScanner scanCharactersFromSet:colonSet intoString:NULL] &&
[theScanner scanUpToString:@"($" intoString: &thisGuysName] ) {


NSLog(@"Seated player: %@", thisGuysName);
} else {
NSLog(@"Match posted! but we have a problem");
}

NSLog(@"Match posted blind!");
}




I used a similar fashion to glean the rest of the info I wanted out of other lines