Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

farmerdoug

macrumors 6502a
Original poster
Sep 16, 2008
541
0
fgets is suppose to stop at a newline.
If TextEdit shows lines and the print statement prints 500 characters including several line returns, why doesn't fgets stop where it is suppose to?


Code:
  char  * tmpbuf;
FILE * ddata;
        
    
    
    tmpbuf = (char *) calloc(500, sizeof(char));
    if((ddata = fopen("data.txt","r+")) == NULL)
        printf("file not open \n");
   
    fgets(tmpbuf,500,ddata);
    printf("%s\n", tmpbuf);

printf output:

07/13/11,20.61,49,40.552,63.1,16.86,28.01,13.79
07/12/11,20.75,49.44,41.3,63.22,16.99,28.41,13.58
07/11/11,20.58,48.68,40.84,62.94,16.82,28.24,13.65
07/08/11,19.86,46.99,39.2,59.79,16.44,27.09,13.18
07/07/11,19.58,46.72,38.75,58.49,16.28,26.74,13.1
07/06/11,832.2,20,48.05,39.94,60.28,16.53,27.46
07/05/11,20.08,48.4,40.29,59.97,16.7,27.35,13.59
07/01/11,20.02,48.87,40.5,59.24,16.67,27.64,13.78
06/30/11,20.64,50.42,41.76,61.58,17.14,28.22,14.28
06/29/11,21.04,51.78,42
 
Last edited by a moderator:
TextEdit recognizes isolated CRs as line endings.
fgets() does not.

You should look at the binary or hex data in its original form, and confirm there are newlines ('\n' 0x0A) characters at your expected line-endings, and not just isolated CRs ('\r', 0x0D).

Simplest way to look at hex data for a file is this command-line:
Code:
hexdump -C your/file/path/here
If the file is large, you will probably be better off with a tool like Hex Fiend (google it) or some other hex editor.

In general, if you see something anomalous with how your data is being handled by a well-known library function, you should carefully examine the original input data.
 
Works for me. What that means to me is that the source of the text file might be using a different style of linebreaks than your current platform. The three styles I know of:
CRLF - Windows
LF - Unixes
CR - Mac OS 9

If you're on OS X fgets is likely looking for an LF... if it's never finding one, this result seems reasonable, though the newlines in the output then become a bit suspicious.

Can you attach the text file you're using?

-Lee
 
I used hexdump and found lots of 0d's in about the right places. Didn't find any 0A's. So I tried the following code.

PHP:
char c;

  fscanf(datafile,"%c\n", &c);
    while( c != '\r')
        {
        fscanf(datafile,"%c\n", &c);
        printf("%c\n", c);
        }

It flew right by the '\r'. Prior to the 0d is a 'Q'. I substituted 'Q' for '\r'. Code stopped just like it was supposed to.
 
Last edited by a moderator:
Please read the man page for fscanf(). Note what it says about white space in the format string.

So the format string you're using doesn't do what you seem to think it does. I'm sorta guessing here, because you haven't really described what you expected to happen, and how that differs from what actually happened.

You should probably look at the fgetc() function, too.


Rules of Thumb:
1. Be specific.
2. Post your code.
3. Describe what you expected to happen.
4. Describe what actually happened.
 
You're a genius. I put \n in my format statements as a matter of habit. Once I took it out, the code read to the '\r' and stopped. But why would it be able to read to a letter in the middle of the line and not '\r' when the '\n' was there? They are both just an octal number. I guess starting with 0 must mean something special.
thanks.
 
But why would it be able to read to a letter in the middle of the line and not '\r' when the '\n' was there?

I'm not sure what you're asking here.

Post the actual code and the actual data that wasn't working the way you expected. Be sure to:
Describe what you expected to happen.
Describe what actually happened.


They are both just an octal number.
I'm not sure that makes any sense.

A '\r' represents an ASCII character. That particular notation is called backslash-escaped. The fact that it's enclosed in single-quotes makes it a character constant.

The same numeric value can also be represented any number of ways: as a decimal number (13), as a hex number (0x0d), as an octal number (015) or as binary bits. The actual binary bit-pattern is identical in all cases: 00001101.

So "They are both just an octal number" doesn't really make sense. Any value can be represented as "just an octal number", if you format it as one.

I guess starting with 0 must mean something special.
I don't know what you mean by this, either.

If you mean how the C compiler interprets numeric literals, then yes: it means octal:
Code:
int q = 015;
int d = 13;
int x = 0x0d;
int c = '\r';
The values assigned to the variables are identical. That is, they all satisfy this relationship:
Code:
[I]A[/I] == [I]B[/I]
regardless of which variable names you use for A or B.

However, AFAICT, you weren't using a leading-zero form for literal numbers in your posted code, so the relevance escapes me.
 
basically what I was saying is that if my code is
fscanf(file,"%c\n", c);
if (c != x)

it will execute properly if x is for example 'Q'; it will not run it x = '\r'.
if the code is
fscanf(file,"%c", c);
if (c != x)
then it will execute if x is 'Q' or '\r'
The octal number for '\r' and '\n' starts with 0, so maybe characters above a certain number are treated differently.

New question. (and what I learned early came in very handy). Somebody sends me a text file from a window machine. it uses '\n' but its in the wrong format. So I put it into Excel to edit it and it ends up with the '\n' being replaced with '\r'. Do I have to rewrite my C code or is there a way to change '\r' to '\n'?
 
I would look into isprint() in ctype.h, it returns true (non-zero) for any printable ascii characters which excludes both '\n' and '\r'. So you could test your variable like this for example:

Code:
if( isprint(c) )
 
basically what I was saying is that if my code is
fscanf(file,"%c\n", c);
if (c != x)

it will execute properly if x is for example 'Q'; it will not run it x = '\r'.
if the code is
fscanf(file,"%c", c);
if (c != x)
then it will execute if x is 'Q' or '\r'
The octal number for '\r' and '\n' starts with 0, so maybe characters above a certain number are treated differently.
Ignoring the fact that your code is syntactically wrong (c is not the same as &c), I think the answer lies in the man page for fscanf():
White space (such as blanks, tabs, or newlines) in the
format string match any amount of white space, including none, in the
input.
If it's not clear, '\r' is white space.

Frankly, that's still just a guess, because you still haven't posted actual code and actual data. In particular, you haven't posted what fscanf() format string might have been used before this code fragment is executed. If that format string ends with white space, then intervening white space (including \r's) will be consumed by that format string.

Context is important. Accuracy is important. You're providing neither.


New question. (and what I learned early came in very handy). Somebody sends me a text file from a window machine. it uses '\n' but its in the wrong format. So I put it into Excel to edit it and it ends up with the '\n' being replaced with '\r'. Do I have to rewrite my C code or is there a way to change '\r' to '\n'?
man tr
 
subsonix:

isprint is a good start but the result of reading one file is
...04/29/11 1175

05/02/11 -128

05/03/11 -174

05/04/11 -735

05/05/11 -356
05/06/11 45
05/09/11 871
05/10/11 1550
and the other has no spaces.
file2 has the problem.
Can't spend any more time on it today, but if someone wants to look at the files and tell me how to get around the problem without worrying about the file structure I'd be grateful. fgets deals with file2 with no problems.
thanks
 

Attachments

  • file1.txt
    5.8 KB · Views: 138
  • file2.txt
    6.2 KB · Views: 143
'file1.txt' use CR line endings
'file2.txt' use CR LF line endings

ok but why does the print out for file2 change in the middle? If it changed from using CR to CR LF, how might that have happen?

I got around the problem with an extra isprint() in the code. Very helpful function.

thanks.
 
ok but why does the print out for file2 change in the middle? If it changed from using CR to CR LF, how might that have happen?

Look at your data. It's partly CRLF and partly LF-only.

Try this command-line in Terminal:
Code:
tr '\r' '@' <file2.txt >at.txt
Open at.txt. Every CR ('\r') in the original will be converted to @. Notice where the @'s end.

If you're asking why the lines might be partly CRLF and partly LF-only, then only you can answer that. You didn't describe exactly how the files were produced, nor exactly what was in any original data you might have obtained, nor where it was obtained.


I said this in my first reply to this thread:
In general, if you see something anomalous with how your data is being handled by a well-known library function, you should carefully examine the original input data.​
This time I have underlined and bolded a very important word.
 
Last edited:
"In general, if you see something anomalous with how your data is being handled by a well-known library function, you should carefully examine the original input data."

Unargueably true. That's why in my day job, experiments are repeated; careful notes are taken; more than one person looks at everything. Furthermore, I should sometimes be able to answer my own questions; they are obvious.

But this isn't my day job. The data can come in different forms and formats. I don't have complete control and am the only one with a clue on how to handle it. That's why the forum is the rest of my "team" to which I am always grateful despite that fact that our communications are sometimes bumpy.

This time around I learned isprint, hexdump and tr, at least.

Thanks again to all.
farmerdoug
 
But this isn't my day job. The data can come in different forms and formats. I don't have complete control and am the only one with a clue on how to handle it. That's why the forum is the rest of my "team" to which I am always grateful despite that fact that our communications are sometimes bumpy.

Communications would be a lot less bumpy if you followed up when you're asked to post actual code and actual data files. Lee1210 asked you to post data files in post #3. It took you until post #11 to do it, despite additional requests to do so.

Once the data was posted, it took only a few hours before you had an answer. My guess is it would have taken even less time if you hadn't posted early AM on Saturday.

If you intend to post data, but don't have the time right away, then simply say so: "I will post the data soon, but I don't have time right now".

If you ask a followup question, but haven't posted data yet, then consider that we may not be able to answer it without the data files. If the request for actual data files is repeated, it's a pretty clear indication that the data files will be necessary in order to answer the question.

In all your questions posted here, I have never once seen anyone ask for data (or code) gratuitously. It's always because the only way to answer the question is to see the actual data or the actual code. Without actual data, we have to guess, which is the worst possible way to debug something.

Your descriptions of data and/or code are simply not sufficient, yet you keep posting as if they were.

If this time around, you learned to start a thread by posting actual code and actual data files, I would consider that a more useful outcome than learning about hexdump or tr.

The reason it's more useful is because it reduces the time and effort it takes before we can make suggestions like using hexdump or tr or isprint() or any other technical. It's not only our time that's being saved, it's yours as well.
 
Communication is some what bumpy and mostly my fault for more reasons than just not attaching code and files. I also appreciate your willingness to continue to work with me despite this. However, while I might have had an answer sooner, I am not confident that lessons learned would have been the same. If my original post had code and files, you would have told me the files were different and how they were different but that in and of itself wouldn't have been enough for me to rewrite my code. I might not have had to rewrite, if I was told about tr at the point. However, I did make several attempts using preprogrammed code to change the line feeds and carriage returns which failed. Furthermore, as it turned out there was more wrong with the files than just the difference in end of line characters. I might have never found this out doing it your way. I hope we can continue to work together.
doug.
 
If my original post had code and files, you would have told me the files were different and how they were different but that in and of itself wouldn't have been enough for me to rewrite my code.

You are assuming how I would have answered. While some of my replies may be terse, I try to make them more useful than simply telling you what the differences are. If were going to do that I may have just posted a suitable command line using 'cmp'.

However, you do have to explain exactly what you want. This is part of the "Be specific" rule of thumb.

If your question basically amounts to "Why doesn't a CR-only line of data stop at the CR when read by fgets()?", then you are asking a "Why? question. The answer to that is a "Because" answer: Because fgets() only stops at newlines.

Your original question ended:
why doesn't fgets stop where it is suppose to?​

That's a "Why" question. If that's what you're asking, then I would consider the question answered by a "Because" answer.

Your original question also presupposed that fgets() was supposed to stop at CRs. Which is false. Which I pointed out. Which answered the question "why doesn't fgets stop".


What you posted next was completely unclear as to what you were trying to accomplish.


We can't read your mind. We can't tell if you're looking for an alternate solution or simply an explanation for some observed phenomenon.

If you want a different answer, you should ask a different question.

For example, "How do I get fgets() to stop at CRs?" or "How can I convert my data to LFs so fgets() will stop?". Or simply explain what you want to accomplish: "I want it to stop at CRs, and I don't care if the CR is preserved or not." Now you have defined what you want to happen, which fits the rule of thumb, "Describe what you expect to happen".

You really need to read this, and apply it:
http://www.mikeash.com/getting_answers.html

The condensed version of that essay is summarized in the 4 rules of thumb I already posted. I would only amend "Post your code" to be "Post your code and actual data".


I might not have had to rewrite, if I was told about tr at the point. However, I did make several attempts using preprogrammed code to change the line feeds and carriage returns which failed.
How much of that code did you post? Was that what your "%c\n" attempts were? Because I don't think you ever explained that was what you were trying to do. You simply posted code with no explanation of what goal you were trying to accomplish.

I did mention fgetc() in my first reply. Did you look at it? If, after looking at it, did you consider writing your own function with parameters like fgets(), but which stopped on CR or LF?

fgetc() would make such a function easy to write, because it only works one char at a time. There's also a related ungetc() function in case you read ahead too much. ungetc() is specifically mentioned in fgetc's man page.

If you defined your own function, it's not necessary to rewrite all the occurrances of fgets() to call your new function. The preprocessor can do that automatically:
Code:
#define fgets(a,b,c)  my_fgets(a,b,c)
With that macro defined, all occurrances of fgets(some,things,here) will be translated to instead call the function my_fgets().

If you had said you were writing your own replacement for fgets(), but didn't want to change all your code that already used fgets(), I would have told you about writing a macro.


Furthermore, as it turned out there was more wrong with the files than just the difference in end of line characters. I might have never found this out doing it your way.
Or you might have. Or one of us might have. Or you might have quickly solved the line-ending problem and moved on to solve the other problems more quickly.

Once again, we can't read your mind, so if you're having other data problems, we can't possibly know that unless you tell us.
 
I have a number of files that consists of a dates and numbers.

01/10/11, -34
01/11/11,17
etc.

They are generated in a number of ways. Edited xls files converted to txt or csv; made by hand in Text Edit; made in Excel, then expanded daily in Text Edit; etc

I don't want to worry about hidden characters in the code, I'm looking for something that works regardless of the CR and LR, etc.

The following works so far. Comments?
Code:
sauce = fopen("/users/doug/file1.txt", "r");

while ( (c = fgetc(sauce))  != EOF ) 
        { 
        i = 0;
        j = 0; 
        k++;
        while  (c != ',')  // get date
            {
            if(isprint(c))
                special[0][k][j++] = c;
            c = fgetc(sauce);
            }
            special[0][k][j] = '\0';
            j = 0;
            i++;
            c = fgetc(sauce);
        while  (isprint(c) ) //get number
             {
             special[1][k][j++] = c;
             c = fgetc(sauce);
             }
         special[1][k][j] = '\0';
      	}
 
1. You haven't shown any type declarations for any of the variables.

2. You haven't shown what any dimensions of 'special' are.

3. It appears to be intolerant of several kinds of data errors:
  • Lines with data but no comma will put it out of sync.
  • Lines with a single comma but no data will put it out of sync.
  • Lines with more bytes between commas than the relevant dimension of 'special' will overflow 'special'.
  • An EOF while getting date (the while(c!=',') loop) will enter an infinite loop.
Any or all of the above could cause crashes, data loss, or who knows what. It's conceivable that any of these errors could occur without being detected. Can't say without knowing what error-checks the other code is doing.

By "out of sync" I mean that the date would be placed into the 'special' slot where the number should go, or vice versa.
 
I don't want to worry about hidden characters in the code, I'm looking for something that works regardless of the CR and LR, etc.

Assume that the rest of your concerns are taken care of.
 
LOL. I don't know which of us is more stubborn. :) Let's call a truce for now. The code runs with the two text files it has to deal with. If I get another one that fails, I'll return the complete code and file.
Thanks. Have a good day.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.