PDA

View Full Version : How to delete CRLF at text file line




mikezang
Aug 17, 2010, 10:36 AM
I use code as below to read text file, this file is end with "0x0A0x0D" of all lines. I have to remove null line because newlineCharacterSet can only konw one of "0x0A0x0D", do you have any idea to simple reading text file?

NSString *contents = [NSString stringWithContentsOfFile:filePath encoding:NSShiftJISStringEncoding error:nil];
list = [NSMutableArray arrayWithArray:[contents componentsSeparatedByCharactersInSet:[NSCharacterSet newlineCharacterSet]]];
int count = [list count];

for (int i = count - 1; i >= 0; i--) {
if ([[list objectAtIndex:i] length] == 0) {
[list removeObjectAtIndex:i];
}
}



jnic
Aug 17, 2010, 10:46 AM
http://developer.apple.com/mac/library/documentation/Cocoa/Reference/Foundation/Classes/NSString_Class/Reference/NSString.html#//apple_ref/occ/instm/NSString/stringByReplacingOccurrencesOfString:withString:options:range:

RonC
Aug 17, 2010, 01:52 PM
Well, there's always this:
NSArray *list = [contents componentsSeparatedByString:@"\r\n"];

But it still seems like we're stuck with the problem of empty lines. For example, consider a file with this content:
abc\r\n
\r\n
def\r\n
\r\n
The way I read the original request, the result should be a 2-element array with content "abc","def."

The loop that trolls the array and deletes the empty lines is still useful, but it's kinda ugly. Perhaps a better approach would be instead to convert consecutive occurrences of the string "\r\n" (or any line separator character - this code only does Windows-style line separators), perhaps with something like:

NSRegularExpression *newlines = [NSregularExpression regularExpressionWithPattern: @"(\\r\\n)+";
list = [NSMutableArray arrayWithArray:
[[newlines
stringByReplacingMatchesInString: contents
options: 0
range: NSMakeRange(0, [contents length])
withTemplate: @"\\r\\n"
] componentsSeparatedByString:@"\r\n"
];
Now list doesn't contain any empty lines. Some of those NSStrings probably need to be NSMutableStrings, and I haven't tried this code, so YMMV.

mikezang
Aug 17, 2010, 11:52 PM
http://developer.apple.com/mac/library/documentation/Cocoa/Reference/Foundation/Classes/NSString_Class/Reference/NSString.html#//apple_ref/occ/instm/NSString/stringByReplacingOccurrencesOfString:withString:options:range:
Thanks for your URL, though I had read it before your reply.

I know a lot of way in C for windows, but I am new in Objective-C for Mac, I need the detail samples.

Anyway, thanks for your reply.

mikezang
Aug 18, 2010, 12:00 AM
Well, there's always this:
NSArray *list = [contents componentsSeparatedByString:@"\r\n"];


This way is better for my case, as there is no any empty lines in my file.

mikezang
Aug 18, 2010, 03:36 AM
This way is better for my case, as there is no any empty lines in my file.
Hi, I am sorry that I report a miss.

This way can't let me get what I need, string can't be separated, there is only one line in my array!

Do you have any other suggestion?

chown33
Aug 18, 2010, 11:36 AM
Hi, I am sorry that I report a miss.

This way can't let me get what I need, string can't be separated, there is only one line in my array!

Do you have any other suggestion?

Post your current code, even if it's not working.

Post some sample data, between 5 and 10 lines.

dejo
Aug 18, 2010, 01:08 PM
...this file is end with "0x0A0x0D" of all lines.

...there is only one line in my array!

I'm confused. :confused:

So, the file contains multiple lines but componentsSeparatedByString: only returns a single element in the array, is that right?

mikezang
Aug 18, 2010, 06:20 PM
I'm confused. :confused:

So, the file contains multiple lines but componentsSeparatedByString: only returns a single element in the array, is that right?
Yes, that is like what you said.

mikezang
Aug 18, 2010, 06:25 PM
Post your current code, even if it's not working.

Post some sample data, between 5 and 10 lines.
Here is my data file.

chown33
Aug 18, 2010, 06:30 PM
Here is my data file.

Where is your current code?

Your uncompressed data file is 220 KB. How often did you plan to parse it?

If you have a dataset that is only updated infrequently, and mostly consists of a few additions, then a better overall plan is usually to parse the big dataset once. Keep it in a format that is more easily read, such as plist, then only apply incremental additions and removals.

The overall strategy is to avoid repeating actions that don't need to be repeated. Do it once, then reuse it.

lloyddean
Aug 18, 2010, 06:36 PM
How about the character encoding of the file as it doesn't appear to be plain ASCII text?

mikezang
Aug 18, 2010, 06:42 PM
How about the character encoding of the file as it doesn't appear to be plain ASCII text?
SHIFT JIS

mikezang
Aug 18, 2010, 06:44 PM
Where is your current code?

Your uncompressed data file is 220 KB. How often did you plan to parse it?

If you have a dataset that is only updated infrequently, and mostly consists of a few additions, then a better overall plan is usually to parse the big dataset once. Keep it in a format that is more easily read, such as plist, then only apply incremental additions and removals.

The overall strategy is to avoid repeating actions that don't need to be repeated. Do it once, then reuse it.
I will put it tonight as I am not at home.

This file is downloaded from Internet and I have to use it everyday.

lloyddean
Aug 18, 2010, 06:54 PM
So if iI understand correctly you wish process the contents of an SHIFT JIS character encoded text file, currently consisting of line endings with both CARRIDGE-RETURN and LINE-FEED, and change the line ending to either CARRIDGE-RETURN or LINE-FEED.

Is this correct?

chown33
Aug 18, 2010, 07:23 PM
Works fine for me on Mac OS with a Foundation Tool.

The data file was exactly as supplied.

Code was a simple extrapolation from code fragments posted so far.

I did not code the for:in loop, because it's not needed to demonstrate that the NSArray contains more than one line from the split file.

Output:
2010-08-18 17:10:25.820 a.out[16954] length: 195780
2010-08-18 17:10:25.821 a.out[16954] count: 4329


For comparison, here's the output from the 'wc' command, which performs "word counts":
wc index.txt
4328 4504 224788 index.txt


The three numbers are: line count, word count, byte count.


#import <Foundation/Foundation.h>

int main (int argc, const char * argv[])
{
NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];

NSString * filePath = @"index.txt";

NSString * contents = [NSString stringWithContentsOfFile:filePath
encoding:NSShiftJISStringEncoding error:nil];

NSArray * list = [contents componentsSeparatedByString:@"\r\n"];

int count =[list count];

NSLog( @"length: %i", [contents length] ); // length of text, measured in unichars (UTF-16 code units)
NSLog( @"count: %i", count );

[pool drain];
return 0;
}

mikezang
Aug 18, 2010, 08:54 PM
Works fine for me on Mac OS with a Foundation Tool.

The data file was exactly as supplied.

Code was a simple extrapolation from code fragments posted so far.

I did not code the for:in loop, because it's not needed to demonstrate that the NSArray contains more than one line from the split file.

Output:
2010-08-18 17:10:25.820 a.out[16954] length: 195780
2010-08-18 17:10:25.821 a.out[16954] count: 4329


For comparison, here's the output from the 'wc' command, which performs "word counts":
wc index.txt
4328 4504 224788 index.txt


The three numbers are: line count, word count, byte count.


#import <Foundation/Foundation.h>

int main (int argc, const char * argv[])
{
NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];

NSString * filePath = @"index.txt";

NSString * contents = [NSString stringWithContentsOfFile:filePath
encoding:NSShiftJISStringEncoding error:nil];

NSArray * list = [contents componentsSeparatedByString:@"\r\n"];

int count =[list count];

NSLog( @"length: %i", [contents length] ); // length of text, measured in unichars (UTF-16 code units)
NSLog( @"count: %i", count );

[pool drain];
return 0;
}

I used the same code as you, but not in main, my code is inside View controller, but I got lines 4329*2, all line with a blank line, this is why I thought strange.
Anyway, I will try your code tonight.

chown33
Aug 18, 2010, 09:22 PM
I used the same code as you, but not in main, my code is inside View controller, but I got lines 4329*2, all line with a blank line, this is why I thought strange.
Anyway, I will try your code tonight.

Your original code used an NSCharacterSet. That means it has the following behavior, taken from the reference doc for NSString (underline added):
The substrings in the array appear in the order they did in the receiver. Adjacent occurrences of the separator characters produce empty strings in the result. ...

http://developer.apple.com/iphone/library/documentation/cocoa/reference/foundation/Classes/NSString_Class/Reference/NSString.html#//apple_ref/occ/instm/NSString/componentsSeparatedByCharactersInSet:


Your later post said you only got one line in the array.
That's the code we're interested in seeing.
We already know the NSCharacterSet code doesn't work.

mikezang
Aug 18, 2010, 09:29 PM
Your original code used an NSCharacterSet. That means it has the following behavior, taken from the reference doc for NSString (underline added):
The substrings in the array appear in the order they did in the receiver. Adjacent occurrences of the separator characters produce empty strings in the result. ...

http://developer.apple.com/iphone/library/documentation/cocoa/reference/foundation/Classes/NSString_Class/Reference/NSString.html#//apple_ref/occ/instm/NSString/componentsSeparatedByCharactersInSet:


Your later post said you only got one line in the array.
That's the code we're interested in seeing.
We already know the NSCharacterSet code doesn't work.
In my test, using ByCharactersInSet will get two times lines, using ByString only get one line.

chown33
Aug 19, 2010, 01:03 AM
In my test, using ByCharactersInSet will get two times lines, using ByString only get one line.

Post your code that only gets one line.

mikezang
Aug 19, 2010, 01:09 AM
Post your code that only gets one line.
Well, I will put after 4 hours:(

mikezang
Aug 19, 2010, 06:08 AM
Check at my debug status in picture. By the way, is it ok without release in this code?

-(BOOL) application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions {
// Override point for customization after application launch.
NSBundle *bundle = [NSBundle mainBundle];
NSString *filePath = [bundle pathForResource:@"index" ofType:@"txt"];
NSString *contents = [NSString stringWithContentsOfFile:filePath encoding:NSShiftJISStringEncoding error:nil];
NSArray *list = [contents componentsSeparatedByString:@"rn"];
// NSArray *list = [contents componentsSeparatedByCharactersInSet:[NSCharacterSet newlineCharacterSet]];
int i = 0;

if (list) {
NSMutableArray *stock = [[NSMutableArray alloc] init];

for (NSString *item in list) {
if ([item length] != 0 && i != 0) {
NSRange obs1 = [item rangeOfString:@"OBS"];
NSRange obs2 = [item rangeOfString:@"投信"];
NSRange obs3 = [item rangeOfString:@"上場"];
NSRange obs4 = [item rangeOfString:@"インデックス"];
if (obs1.location == NSNotFound && obs2.location == NSNotFound && obs3.location == NSNotFound && obs4.location == NSNotFound) {
[stock addObject:item];
}
}

i++;
}

stockSplit = stock;
}
...
}

jnic
Aug 19, 2010, 07:09 AM
Shouldn't that be \r\n?

mikezang
Aug 19, 2010, 07:22 AM
Shouldn't that be \r\n?
My OS is Japanese, '' in Japanese is the same as slashin English, I can't input '\' by keyboard ...

Thanks for your confirm, You are right!

I copy your typed '\', now I got what I need!

I think this is a bug from Apple, because in all other programming language, we use '' to instead of '\'.

I think I have to report Apple for this bug, what you think about this issue?