Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

mikezang

macrumors 6502a
Original poster
May 22, 2010
854
7
Tokyo, Japan
I use code as below to read text file, this file is end with "0x0A0x0D" of all lines. I have to remove null line because newlineCharacterSet can only konw one of "0x0A0x0D", do you have any idea to simple reading text file?
Code:
	NSString *contents = [NSString stringWithContentsOfFile:filePath encoding:NSShiftJISStringEncoding error:nil];
	list = [NSMutableArray arrayWithArray:[contents componentsSeparatedByCharactersInSet:[NSCharacterSet newlineCharacterSet]]];
	int count = [list count];
	
	for (int i = count - 1; i >= 0; i--) {
		if ([[list objectAtIndex:i] length] == 0) {
			[list removeObjectAtIndex:i];
		}
	}
 

RonC

macrumors regular
Oct 18, 2007
108
0
Chicago-area
Well, there's always this:
Code:
NSArray *list = [contents componentsSeparatedByString:@"\r\n"];
But it still seems like we're stuck with the problem of empty lines. For example, consider a file with this content:
Code:
abc\r\n
\r\n
def\r\n
\r\n
The way I read the original request, the result should be a 2-element array with content "abc","def."

The loop that trolls the array and deletes the empty lines is still useful, but it's kinda ugly. Perhaps a better approach would be instead to convert consecutive occurrences of the string "\r\n" (or any line separator character - this code only does Windows-style line separators), perhaps with something like:
Code:
NSRegularExpression *newlines = [NSregularExpression regularExpressionWithPattern: @"(\\r\\n)+";
list = [NSMutableArray arrayWithArray:
            [[newlines 
                 stringByReplacingMatchesInString: contents 
                 options: 0
                 range: NSMakeRange(0, [contents length])
                 withTemplate: @"\\r\\n"
             ] componentsSeparatedByString:@"\r\n"
            ];
Now list doesn't contain any empty lines. Some of those NSStrings probably need to be NSMutableStrings, and I haven't tried this code, so YMMV.
 

mikezang

macrumors 6502a
Original poster
May 22, 2010
854
7
Tokyo, Japan
This way is better for my case, as there is no any empty lines in my file.
Hi, I am sorry that I report a miss.

This way can't let me get what I need, string can't be separated, there is only one line in my array!

Do you have any other suggestion?
 

chown33

Moderator
Staff member
Aug 9, 2009
10,740
8,416
A sea of green
Hi, I am sorry that I report a miss.

This way can't let me get what I need, string can't be separated, there is only one line in my array!

Do you have any other suggestion?

Post your current code, even if it's not working.

Post some sample data, between 5 and 10 lines.
 

chown33

Moderator
Staff member
Aug 9, 2009
10,740
8,416
A sea of green
Here is my data file.

Where is your current code?

Your uncompressed data file is 220 KB. How often did you plan to parse it?

If you have a dataset that is only updated infrequently, and mostly consists of a few additions, then a better overall plan is usually to parse the big dataset once. Keep it in a format that is more easily read, such as plist, then only apply incremental additions and removals.

The overall strategy is to avoid repeating actions that don't need to be repeated. Do it once, then reuse it.
 

mikezang

macrumors 6502a
Original poster
May 22, 2010
854
7
Tokyo, Japan
Where is your current code?

Your uncompressed data file is 220 KB. How often did you plan to parse it?

If you have a dataset that is only updated infrequently, and mostly consists of a few additions, then a better overall plan is usually to parse the big dataset once. Keep it in a format that is more easily read, such as plist, then only apply incremental additions and removals.

The overall strategy is to avoid repeating actions that don't need to be repeated. Do it once, then reuse it.
I will put it tonight as I am not at home.

This file is downloaded from Internet and I have to use it everyday.
 

lloyddean

macrumors 65816
May 10, 2009
1,047
19
Des Moines, WA
So if iI understand correctly you wish process the contents of an SHIFT JIS character encoded text file, currently consisting of line endings with both CARRIDGE-RETURN and LINE-FEED, and change the line ending to either CARRIDGE-RETURN or LINE-FEED.

Is this correct?
 

chown33

Moderator
Staff member
Aug 9, 2009
10,740
8,416
A sea of green
Works fine for me on Mac OS with a Foundation Tool.

The data file was exactly as supplied.

Code was a simple extrapolation from code fragments posted so far.

I did not code the for:in loop, because it's not needed to demonstrate that the NSArray contains more than one line from the split file.

Output:
Code:
2010-08-18 17:10:25.820 a.out[16954] length: 195780
2010-08-18 17:10:25.821 a.out[16954] count: 4329

For comparison, here's the output from the 'wc' command, which performs "word counts":
Code:
wc index.txt
    4328    4504  224788 index.txt

The three numbers are: line count, word count, byte count.


Code:
#import <Foundation/Foundation.h>

int main (int argc, const char * argv[]) 
{
    NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
	
	NSString * filePath = @"index.txt";

	NSString * contents = [NSString stringWithContentsOfFile:filePath
			encoding:NSShiftJISStringEncoding error:nil];

	NSArray * list = [contents componentsSeparatedByString:@"\r\n"];

	int count =[list count];

	NSLog( @"length: %i", [contents length] );  // length of text, measured in unichars (UTF-16 code units)
	NSLog( @"count: %i", count );

    [pool drain];
    return 0;
}
 

mikezang

macrumors 6502a
Original poster
May 22, 2010
854
7
Tokyo, Japan
Works fine for me on Mac OS with a Foundation Tool.

The data file was exactly as supplied.

Code was a simple extrapolation from code fragments posted so far.

I did not code the for:in loop, because it's not needed to demonstrate that the NSArray contains more than one line from the split file.

Output:
Code:
2010-08-18 17:10:25.820 a.out[16954] length: 195780
2010-08-18 17:10:25.821 a.out[16954] count: 4329

For comparison, here's the output from the 'wc' command, which performs "word counts":
Code:
wc index.txt
    4328    4504  224788 index.txt

The three numbers are: line count, word count, byte count.


Code:
#import <Foundation/Foundation.h>

int main (int argc, const char * argv[]) 
{
    NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
	
	NSString * filePath = @"index.txt";

	NSString * contents = [NSString stringWithContentsOfFile:filePath
			encoding:NSShiftJISStringEncoding error:nil];

	NSArray * list = [contents componentsSeparatedByString:@"\r\n"];

	int count =[list count];

	NSLog( @"length: %i", [contents length] );  // length of text, measured in unichars (UTF-16 code units)
	NSLog( @"count: %i", count );

    [pool drain];
    return 0;
}
I used the same code as you, but not in main, my code is inside View controller, but I got lines 4329*2, all line with a blank line, this is why I thought strange.
Anyway, I will try your code tonight.
 

chown33

Moderator
Staff member
Aug 9, 2009
10,740
8,416
A sea of green
I used the same code as you, but not in main, my code is inside View controller, but I got lines 4329*2, all line with a blank line, this is why I thought strange.
Anyway, I will try your code tonight.

Your original code used an NSCharacterSet. That means it has the following behavior, taken from the reference doc for NSString (underline added):
The substrings in the array appear in the order they did in the receiver. Adjacent occurrences of the separator characters produce empty strings in the result. ...
http://developer.apple.com/iphone/l...SString/componentsSeparatedByCharactersInSet:


Your later post said you only got one line in the array.
That's the code we're interested in seeing.
We already know the NSCharacterSet code doesn't work.
 

mikezang

macrumors 6502a
Original poster
May 22, 2010
854
7
Tokyo, Japan
Your original code used an NSCharacterSet. That means it has the following behavior, taken from the reference doc for NSString (underline added):
The substrings in the array appear in the order they did in the receiver. Adjacent occurrences of the separator characters produce empty strings in the result. ...
http://developer.apple.com/iphone/l...SString/componentsSeparatedByCharactersInSet:


Your later post said you only got one line in the array.
That's the code we're interested in seeing.
We already know the NSCharacterSet code doesn't work.
In my test, using ByCharactersInSet will get two times lines, using ByString only get one line.
 

mikezang

macrumors 6502a
Original poster
May 22, 2010
854
7
Tokyo, Japan
Check at my debug status in picture. By the way, is it ok without release in this code?
Code:
-(BOOL) application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions {    
    // Override point for customization after application launch.
	NSBundle *bundle = [NSBundle mainBundle];
	NSString *filePath = [bundle pathForResource:@"index" ofType:@"txt"];
	NSString *contents = [NSString stringWithContentsOfFile:filePath encoding:NSShiftJISStringEncoding error:nil];
	NSArray *list = [contents componentsSeparatedByString:@"¥r¥n"];
//	NSArray *list = [contents componentsSeparatedByCharactersInSet:[NSCharacterSet newlineCharacterSet]];
	int i = 0;	

	if (list) {
		NSMutableArray *stock = [[NSMutableArray alloc] init];
	
		for (NSString *item in list) {
			if ([item length] != 0 && i != 0) {
				NSRange obs1 = [item rangeOfString:@"OBS"];
				NSRange obs2 = [item rangeOfString:@"投信"];
				NSRange obs3 = [item rangeOfString:@"上場"];
				NSRange obs4 = [item rangeOfString:@"インデックス"];
				if (obs1.location == NSNotFound && obs2.location == NSNotFound && obs3.location == NSNotFound && obs4.location == NSNotFound) {
					[stock addObject:item];
				}
			}
			
			i++;
		}
		
		stockSplit = stock;
	}
    ...
}
 

Attachments

  • SnapShot 2010-08-19 at 20.05.25.jpg
    SnapShot 2010-08-19 at 20.05.25.jpg
    232.3 KB · Views: 91

mikezang

macrumors 6502a
Original poster
May 22, 2010
854
7
Tokyo, Japan
Shouldn't that be \r\n?
My OS is Japanese, '¥' in Japanese is the same as slashin English, I can't input '\' by keyboard ...

Thanks for your confirm, You are right!

I copy your typed '\', now I got what I need!

I think this is a bug from Apple, because in all other programming language, we use '¥' to instead of '\'.

I think I have to report Apple for this bug, what you think about this issue?
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.