PDA

View Full Version : replacing bytes




MrFusion
Apr 2, 2007, 09:33 AM
Hey,

I have a data file that contains some annoying bytes. e.g. ^M instead of \n
I need to replace them, so I figured I'll go through the bytes one by one and replace them if necessary.

However my program doesn't want to do it. It goes to 100% CPU usage. Everything else goes rather smoothly.

This is in a program with coredata. The filename is stored and each time this record is fetched it just reads the data from file.


for (j=0; j< strlen(bytes); j++) {
switch ((int)bytes[j]) {
case 13:
//tab
//bytes[j] = 9;
//newline
//bytes[j] = 10;
//space
bytes[j] = 32;
break;
default:
break;
}
}


Any hints?



robbieduncan
Apr 2, 2007, 10:25 AM
Are you sure your bytes are \0 terminated? Have you tried logging every time through the loop to check what's going on?

pilotError
Apr 2, 2007, 10:32 AM
What are you trying to replace it with?

The code snippet looks to replace the char 13 with a space.

If you have text files with the ^M, you should be able to run a utility dos2unix which does this for you. Just a suggestion.

Typically you see this when transferring DOS text files to Unix.

The end of line in DOS is Carriage Return (int 13 or ^M) + Line Feed (int 10 or ^J) whereas in Unix its just a Line Feed.

Remember, you need to write that record to a file if you want to save the changes.

CanadaRAM
Apr 2, 2007, 10:37 AM
Or if it is a one-shot requirement just pull the data file into BBEdit or TextWrangler and do a regex search and replace?

MrFusion
Apr 2, 2007, 10:48 AM
What are you trying to replace it with?

The code snippet looks to replace the char 13 with a space.

If you have text files with the ^M, you should be able to run a utility dos2unix which does this for you. Just a suggestion.

Typically you see this when transferring DOS text files to Unix.

The end of line in DOS is Carriage Return (int 13 or ^M) + Line Feed (int 10 or ^J) whereas in Unix its just a Line Feed.

Remember, you need to write that record to a file if you want to save the changes.

That is indeed the problem. And I already have a script file to convert them. But I want to leave the original files untouched.

The program I am making should just display the data and plot it using gnuplot by means of a temporary plot- and datafile.

It's not a one time deal. It will be a large amount of data files in the next few years.

MrFusion
Apr 2, 2007, 11:04 AM
Are you sure your bytes are \0 terminated? Have you tried logging every time through the loop to check what's going on?

Does that matter in this case? I just count the number of bytes and evaluate them one by one. I am not doing more than there are number of bytes.

robbieduncan
Apr 2, 2007, 11:41 AM
Does that matter in this case? I just count the number of bytes and evaluate them one by one. I am not doing more than there are number of bytes.

Yes as that what strlen uses to know where then end of the char (byte) array is. It just keeps going through memory until it finds one!

Edit to add a link to some documentation (http://www.cplusplus.com/reference/clibrary/cstring/strlen.html).

pilotError
Apr 2, 2007, 01:39 PM
I would agree with robbieduncan that the strlen is probably running off into who knows where.

It also depends on how your doing the read. If your filling a buffer (ie. read(buffer) ) and parsing that, you may not be finding the null terminator. If your doing a fgets(myfile, buffer) that may not be the issue. Its hard to tell from the code snippet you provided, because it looks correct. I'm not an OS X guy (coredata), so I'm not sure what to expect there.

MrFusion
Apr 3, 2007, 11:46 AM
Staring me in the eyes:
[data length] instead of strlen(bytes)

[data bytes] doesn't return a null terminated set of bytes. They don't say it in the docs, but I read it somewhere else also. So your guess was correct, robbieduncan.

Yes as that what strlen uses to know where then end of the char (byte) array is. It just keeps going through memory until it finds one!

Edit to add a link to some documentation (http://www.cplusplus.com/reference/clibrary/cstring/strlen.html).

Ok. So I figure I just add '\0' to my bytes:

more complete code snippet:

NSMutableData *data = [NSMutableData dataWithContentsOfFile:@"file"];
if (!data)
return;
char *bytes = [data mutableBytes];

// add terminator byte here

int j;
for (j=0; j< strlen(bytes); j++) {
switch ((int)bytes[j]) {
//replace some bytes for testing
case 111:
case 112:
bytes[j] = 100;
break;
default:
break;
}
}
[data replaceBytesInRange:NSMakeRange(0,[data length]) withBytes:bytes];
NSString *newString = [NSString stringWithCString:[data bytes]];



But the code I tried to add a byte doesn't really work:


strcat(bytes,'\0');
//alternative
[data appendBytes:'\0' length:1];
//if file is read as string instead as data:
[string UTF8String] // but this gives a const char*


Low level C-code is not really my thing. So any hints are more than welcome.

lazydog
Apr 3, 2007, 12:28 PM
strcat() assumes the string you are appending to is already null terminated!

You need to stick the null in, ie bytes[ length ] = '\0' ; Don't forget to allocate an extra byte for the null terminator when you create and initialse your bytes array.

b e n