calling grep from C

Discussion in 'Mac Programming' started by farmerdoug, Sep 22, 2010.

  1. macrumors 6502a

    Joined:
    Sep 16, 2008
    #1
    Most of the time calling grep from c fails. I get my error message "grep failed" when it does. The string that calls grep is always right.
    Here's the code.
    Some of the file is below. output is below that.
    Help is appreciated.

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <math.h>
    
    void parse( char *record, char *delim, char**tarr);
    
    int main (int argc, const char * argv[]) {
    
        char *symbol, newsymbol[5],date[14], *grep, **arr;
        int i, j;
        float price;
        FILE *corncob, *voldata;
        char *buf, olddate[12];
       
        grep = (char*)calloc(150, sizeof(char));  
        buf = (char*)calloc(150, sizeof(char));   
        arr = (char **)calloc(100, sizeof(char*));
    
        symbol = calloc(100,sizeof(char));
    
        if ((voldata = fopen("/users/doug/andy/data/IV/IV_surface_200_1.txt", "r")) == NULL)
            {
            printf("couldn't open file\n");
            return(0);
            }
            
       
        while ( (fscanf(voldata,"%s\n",symbol)) != EOF)
            {   
            i = 0;
            while (symbol[i] != ',')
                {
                newsymbol[i] = symbol[i]; 
                i++;
                }
            newsymbol[i] = '\0';  
            
            while (symbol[++i] != ',');
                i++;
            j = 0;
            while (symbol[i] != ',')
                {
                date[j] = symbol[i]; 
                i++;
                j++;
                }
            date[j] = '\0';        
            if(strcmp(olddate,date) != 0)
             {  
                strcpy(grep, "grep ");
                strcat(grep, date);
                strcat(grep, " /users/doug/andy/data/IV/opens/");
                strcat(grep, newsymbol);
                strcat(grep, ".txt");
                printf("%s\n", grep);
                corncob = popen(grep, "r");
                if ( (fgets(buf,150,corncob) != NULL))
                    {
               
                printf("%s\n",buf);
                parse(buf,",", arr);
                price = atof(arr[2]);  
                printf("%s %s %f\n\n", newsymbol,date, atof(arr[2]));
                }
                else
                {
                printf("grep failed\n");
                }
                pclose(corncob);
            }
            strcpy(olddate, date);
            
             } 
     
    
        return 0;
    
    }
    
    void parse( char *record, char *delim, char**tarr)
    
    {
    	
    	char *p;
    	int i,fld=0, element = 15;
    	if ( (p = (char *) calloc(element, sizeof(char))) == NULL)
    		printf("no memory for p");
        for (i = 0; i < 100; i ++)
    	{
    		if( (tarr[i] = (char*)calloc(element, sizeof(char))) == NULL)
    			printf("no memory allocated for arr\n");
    	}
    	
    	p = strtok(record,delim);
    	
    	while(p)
    	{	
    		strcpy(tarr[fld],p);
    		
    		fld++;
    		p=strtok('\0',delim);	
    	}
    
    		
    }
    
    
    01/02/2008,1600,199.27,200.26,192.55,194.84,38542020,0
    01/03/2008,1600,195.41,197.39,192.69,194.93,30074230,0
    01/04/2008,1600,191.45,193.00,178.89,180.05,51994640,0
    01/07/2008,1600,181.25,183.60,170.23,177.64,74007344,0
    01/08/2008,1600,180.14,182.46,170.80,171.25,54425900,0
    01/09/2008,1600,171.30,179.50,168.30,179.40,64856432,0
    01/10/2008,1600,177.58,181.00,175.41,178.02,52965156,0
    01/11/2008,1600,176.00,177.85,170.00,172.69,44018848,0
    01/14/2008,1600,177.52,179.42,175.17,178.78,39303848,0
    01/15/2008,1600,177.72,179.22,164.66,169.04,83943904,0
    01/16/2008,1600,165.23,169.01,156.70,159.64,79194288,0
    01/17/2008,1600,161.51,164.01,158.42,160.89,63430200,0
    01/18/2008,1600,162.30,165.75,159.61,161.36,61590136,0
    01/22/2008,1600,148.06,159.98,146.00,155.64,86974080,0
    01/23/2008,1600,136.19,140.00,126.14,139.07,120466592,0

    grep 01/17/2008 /users/doug/andy/data/IV/opens/AAPL.txt
    grep failed
    grep 01/18/2008 /users/doug/andy/data/IV/opens/AAPL.txt
    grep failed
    grep 01/22/2008 /users/doug/andy/data/IV/opens/AAPL.txt
    01/22/2008,1600,148.06,159.98,146.00,155.64,86974080,0

    AAPL 01/22/2008 148.060000

    grep 01/23/2008 /users/doug/andy/data/IV/opens/AAPL.txt
    grep failed
    grep 01/24/2008 /users/doug/andy/data/IV/opens/AAPL.txt
    grep failed
     
  2. macrumors 603

    Joined:
    Aug 9, 2009
    #2
  3. macrumors 68040

    lee1210

    Joined:
    Jan 10, 2005
    Location:
    Dallas, TX
    #3
    i ran this code with the following as the "main" input file, replacing:
    /users/doug/andy/data/IV/IV_surface_200_1.txt
    I then copied what you had right after your code into AAPL.txt. I ran and got:
    I did use files in the local directory, but i don't think that should matter. If you can post the contents (maybe zipped, not copy and pasted) of IV_surface_200_1.txt as chown33 requested, or at least a few lines of it, this might make a difference.

    -Lee

    P.S. Attention has changed from astronomy to stock market analysis, eh?

    EDIT: Some things to try:
    run "/bin/sh -c ..." where ... is the command printed out and see if you get the results you'd expect.
    Is what you paste in a cat of /users/doug/andy/data/IV/opens/AAPL.txt? If not, cat that file and look at it to be sure things are right. It's curious that 1/22 worked and no other dates did.
     
  4. macrumors 603

    Joined:
    Aug 9, 2009
    #4
    I concur with lee1210. Make a zip of the data files and upload them. It may not be necessary to include the entire file, but the lines that lead directly to grep failures should be included.

    I made placeholder files, like lee did, although mine had a slightly different form. And as he did, I used current directory instead of a fixed absolute pathname. All the greps worked, except when I gave it intentionally non-matching patterns. I also tried it with absolute paths, and it still worked the same way.

    I suspect that the first file being read contains some kind of unexpected characters, and those are somehow appearing in the parameters being extracted from the file to run grep. Those characters are then part of the arguments to running grep, and since they aren't in the file being grepped, then grep returns no match. That's just a guess, though. If that's the cause, then editing the file to make a smaller test-case might change or remove the unexpected characters, but if the data is huge, it might be worthwhile as a first attempt.

    The only other thing I can think of is to use a hex editor like Hex Fiend.app to examine the first file, and look carefully at the actual bytes in the file, at the exact locations of the dates that lead to grep failures. There should be nothing but plain displayable ASCII on each line, terminated by 0x0A (the \n character).
    http://ridiculousfish.com/hexfiend/
     
  5. thread starter macrumors 6502a

    Joined:
    Sep 16, 2008
    #5
    here are the files.

    apple.txt is what you want to read. change name to AAPL.txt
    list is where the symbol and date come from. (IV_surface ...)

    PS to Lee.
    The instrument came back from Palomar for an overhaul. (Any time you get to NY, let me know). I have to rework one of the control systems using LabView Real Time. (see www.ni.com)
    We are also getting a new detector. I think I'll be working in java and .NET.
    The stock stuff is after hours for a trader I know. You guys have helped me with this before. In particular with the code to use grep.
     
  6. macrumors 68040

    lee1210

    Joined:
    Jan 10, 2005
    Location:
    Dallas, TX
    #6
    Think you forgot to hit attach. I hate it when that happens.

    -Lee

    Edit: To complement chown33's suggestion of HexFiend, which is likely friendlier, a "dirtier" way to look at exactly what's in your file is:
    od -ta filename.xyz
    It will give output like this:
    Code:
    0000000    0   1   /   0   2   /   2   0   0   8   ,   1   6   0   0   ,
    0000020    1   9   9   .   2   7   ,   2   0   0   .   2   6   ,   1   9
    0000040    2   .   5   5   ,   1   9   4   .   8   4   ,   3   8   5   4
    0000060    2   0   2   0   ,   0  nl   0   1   /   0   3   /   2   0   0
    0000100    8   ,   1   6   0   0   ,   1   9   5   .   4   1   ,   1   9
    0000120    7   .   3   9   ,   1   9   2   .   6   9   ,   1   9   4   .
    0000140    9   3   ,   3   0   0   7   4   2   3   0   ,   0  nl   0   1
    
    And you can see if there are any "weird" characters in there that you weren't expecting. You might combine this with head -n2, i.e.:
    head -n2 AAPL.txt | od -ta
    so you only get a couple of lines to look at in one go. There are other options for od to see hex values, etc., but this shows you each and every character with abbreviations for whitespace characters you might not see otherwise.

    cat -v filename.txt will get you similar results.
     
  7. thread starter macrumors 6502a

    Joined:
    Sep 16, 2008
    #7
    hit the upload button, pretty sure i did before too
    preview shows them attached.
     

    Attached Files:

  8. macrumors 68040

    lee1210

    Joined:
    Jan 10, 2005
    Location:
    Dallas, TX
    #8
    Unless the text got "cleaned" when i grabbed it, the code works just fine on my machine with the files in the local directory. I'll try it with a longer path, but that really shouldn't matter. Just to be safe, try zipping the files and putting them up, so we can be sure we've got the authentic source and the upload/download/etc. didn't mess with any special characters.

    In the meantime you may add error checking to the popen itself. Right now "grep failed" is printed if there are no rows of output. This could mean that the popen failed, or it could mean there were no results found. It would be best to know what the situation is. You may also want to pass your command to system before you call popen, and see if result you're expecting is displayed.

    -Lee

    Here's my output, which is what i expect:
     
  9. thread starter macrumors 6502a

    Joined:
    Sep 16, 2008
  10. thread starter macrumors 6502a

    Joined:
    Sep 16, 2008
    #10
    I looked at the file in TextWranger and saw nothing odd.
     
  11. macrumors 603

    Joined:
    Aug 9, 2009
    #11
    How big is it?

    Both lee1210 and I have run the posted program with the posted data, and it works. This suggests that the problem lies in the data you're using.

    If the problem's in the code, it's not a problem in the source, but a problem in how it's being built or run. You'd have to upload the Xcode project, with the 'build' folder removed to make it smaller.

    Other options:

    1. Edit the problematic data file so it's small enough to upload here, while still exhibiting the failure. You must test it to make sure it does exhibit the failure.

    2. Improve your code so it sanitizes the data being read, before anything else has a chance to parse it. The parser expects comma-separators, and none of the extracted fields contains any blanks or control characters, so you should sanitize accordingly. That means you filter out all blanks and control characters anywhere in the line, before parsing it at comma delimiters.

    3. Find an upload site, upload the big zip there, then post the URL here. Example site:
    http://filebin.ca/
    It might also be possible to "create a hosted project" on a site like Google Code:
    http://code.google.com/p/support/wiki/GettingStarted

    4. Upload to a paid service like Amazon S3, make the upload public-read, then post the URL here. S3's limit on a single upload file is 5 GB. Pricing is 15 cents per GB-month, no minimum fee. It's pretty easy to sign up, and once that's done, you can upload with an app like CyberDuck.
     
  12. thread starter macrumors 6502a

    Joined:
    Sep 16, 2008
    #12
    I wrote new code that opens the file and reads the data one line at a time. There were no problems. Wouldn't this suggest that the problem is not the file?
     
  13. macrumors 603

    Joined:
    Aug 9, 2009
    #13
    There's no way to discover what the original problem is, except by analyzing a failure. Currently, you're the only one who can produce a failure, because you're using a different data set than any of us have.

    In particular, the code you've posted works perfectly fine when run on the data you've posted. I suggest that you actually try running the code you posted on the small sections of data you posted, and confirm this (or refute it) for yourself.

    We can't tell anything about the new code without seeing it.

    The problem could be the original code plus the full set of original data. There could easily be a latent bug in the original code that only happens with the original data.

    One thing I noticed about the original code is the length of some of the parsing buffers is the minimal size needed for things like the AAPL symbol name, yet when you parse the line at comma delimiters, nothing is done to prevent overflow of these buffers. So if there is one line of data where the parsed value is a little too long, then you've caused a buffer overflow. Avoiding buffer overflows is one aspect of sanitizing input data.

    Once an overflow happens, there's no way to know what other variables are affected, nor where they are damaged. It could be that such an overflow damages some variable that's important in later lines, without adversely affecting the line that caused the damage. That's just a guess, though.

    That's not the only potential problem, either. The comma-parsing doesn't stop when it encounters a nul string-terminator. It won't handle a line with fewer commas than expected. There are no length-checks anywhere, nor are the length-checking functions like strlcpy() and strlcat() used, nor is the length-limiting ability of fscanf() used. There are many ways that some slightly different data could cause a failure.

    Anyway, whatever the problem is, it does not occur when run with the posted data. Two independent observers have confirmed this, and posted the results. As far as I know, you have not run the same test. You've run the program on different data, but not on just the posted data alone.

    Putting it in simple terms:
    P1 + D1 = problem
    P1 + D2 = fine
    P1 is the originally posted program. D1 is your big data set. D2 is your small posted data set.

    If you're going to assert the problem lies entirely in P1, then the only way to test that hypothesis is by making D1 available for independent corroboration and analysis. You could also learn the debugging and analysis skills necessary to analyze P1 yourself when processing D1, but I'm guessing that would take too long.

    Now, you're stating:
    P2 + D1 = fine
    and speculating whether D1 is causative or not. No one but you can possibly determine this, unless D1 is made available.

    It's possible that making P2 available can lead to a differential analysis of the code, and discover what changes might avoid certain kinds of latent bugs. The only way to test that hypothesis is by posting P2. Nevertheless, no result is guaranteed.
     
  14. thread starter macrumors 6502a

    Joined:
    Sep 16, 2008
    #14
    Sorry to take so long to get back. I went back to just opening and reading the files. thanks for the help.
     
  15. thread starter macrumors 6502a

    Joined:
    Sep 16, 2008
    #15
    Realized that I had to use grep. I did two tests. First I tried grep from X11 which worked; then I tried my code with GuardMalloc. Grep worked again.

    Does this tell anybody anything?
     
  16. macrumors 603

    Joined:
    Aug 9, 2009
    #16
    It tells me nothing, but it does raise some questions.

    1. Did you run the posted program on the posted data or not? If so, what was your result?

    2. What is this "grep from X11" you tried? Be specific. Describe the procedure for what you did. Describe it as if we can't see your screen, but we're familiar with shell command-lines.
    If you mean you ran the command 'grep' in the X11 shell window, then that's the same grep you get in Terminal.app. You can tell by typing the command which grep in both windows (Terminal.app and X11.app) and noting the identical pathnames.

    3. Which program did you run with GuardMalloc? There are at least two versions of the program you've posted about. The first one, whose code you posted in post #1, and the second one mentioned in post #12, whose code has not been posted.

    4. What data did you use for the run with GuardMalloc? The short posted data, which is known to work, or the long unposted data, which you say fails?

    5. If you ran the posted version of the program with GuardMalloc, what were you expecting? Some of the latent buffer-overflow issues I pointed out are in stack-resident arrays (i.e. C's auto storage class). If those are overrun then GuardMalloc would not be expected to detect it, since those arrays aren't malloc'ed in the first place. The specific arrays with no bounds-checking, and thus latent overflow potential, are:
    Code:
      newsymbol[5]
      date[14]
      olddate[12]
    
    I have no idea why date's length is 14, yet olddate's length is 12, since olddate is expected to hold a copy of the contents of date:
    Code:
            strcpy(olddate, date);
    
    Perhaps you are somehow expecting C to prevent this, or to perform bounds-checking for you. This could not be more wrong.
     
  17. thread starter macrumors 6502a

    Joined:
    Sep 16, 2008
    #17
    1. Did you run the failing program on the posted data or not? If so, what was your result?

    didn't not always return the right information. I.e. failed as before.

    2. What is this "grep from X11" you tried? Be specific. If you mean you ran the command 'grep' in the X11 shell window, then that's the same grep you get in Terminal.app. You can tell by typing the command *which grep* in both windows (Terminal.app and X11.app) and noting the identical pathnames.


    ran /usr/bin/grep which I then put explicitly in my code. It still failed in the code.

    3. What program did you run with GuardMalloc? There are at least two versions of the program you've posted about. The first one, whose code you posted in post #1, and the second one mentioned in post #12, whose code has not been posted.

    the short code I posted
    4. What data did you use for the run with GuardMalloc? The short posted data, which is known to work, or the long unposted data, which you say fails?
    unposted data

    both, both worked


    5. If you ran the posted version of the program with GuardMalloc, what were you expecting? Some of the latent buffer-overflow issues I pointed out are in stack-resident arrays (i.e. C's auto storage class), so if those are overrun then GuardMalloc would not be expected to detect that, since those arrays aren't malloc'ed in the first place. The specific arrays with no bounds-checking, and thus latent overflow potential, are:
    Code:
    ---------
    newsymbol[5]
    date[14]
    olddate[12]
    ---------
    I have no idea why date's length is 14, yet olddate's length is 12, since olddate is expected to hold a copy of the contents of date:

    Code:
    ---------
    strcpy(olddate, date);
    ---------
    Perhaps you are expecting C to complain about this, or to perform bounds-checking for you. This could not be more wrong.
    ***************

    Code now reads.


    grep = (char*)calloc(100, sizeof(char));
    newsymbol = (char*)calloc(20, sizeof(char));
    date = (char*)calloc(20, sizeof(char));
    olddate = (char*)calloc(20, sizeof(char));
    buf = (char*)calloc(1000, sizeof(char));
    arr = (char **)calloc(100, sizeof(char*));

    symbol = calloc(100,sizeof(char));

    I hope to get some clue as to what was up.
     
  18. macrumors 603

    Joined:
    Aug 9, 2009
    #18
    In order to get a sense of the unposted data's scale, please run the following command, and post the output.
    Code:
    tr ',' ' ' </users/doug/andy/data/IV/IV_surface_200_1.txt | wc
    Also post the output of this command:
    Code:
    tr ',' ' ' </users/doug/andy/data/IV/opens/AAPL.txt | wc 
    Also post the source of the new program you mentioned in post #12.
     
  19. thread starter macrumors 6502a

    Joined:
    Sep 16, 2008
    #19
    Had trouble with your command.
    ran just wc in case that helps
    working code is below
    bash-3.2$ tr ','''</users/doug/andy/data/IV/IV_surface_200_1.txt | wc
    usage: tr [-Ccsu] string1 string2
    tr [-Ccu] -d string1
    tr [-Ccu] -s string1
    tr [-Ccu] -ds string1 string2
    0 0 0
    bash-3.2$ wc /users/doug/andy/data/IV/IV_surface_200_1.txt
    2230656 2230656 125347876 /users/doug/andy/data/IV/IV_surface_200_1.txt
    bash-3.2$ wc /users/doug/andy/data/IV/opens/aapl.txt
    686 686 38017 /users/doug/andy/data/IV/opens/aapl.txt



    Here's code that just reads from the file.

    Code:
    #include <math.h>
    
    
    #define FEE 0.015
    #define REVERSE 1
    #define REV 0
    #define PRINTMAX 0
    #define PRINTMIN 0
    #define DELTADAYS 2
    
    
    void parse( char *record, char *delim, char**tarr);
    
    
    int main (int argc, const char * argv[]) {
        // insert code here...
        FILE *voldata;//, *stocks;
        char *symbol, *newsymbol,*oldsymbol,*date, **arr;
        int i, j, k, l, p, num = -1, daynum = -1, maxdays = 0;// numsymbols = 0;
            float price;
       FILE *corncob;
     char *buf, *grep;
     grep = (char*)calloc(100, sizeof(char));  
     buf = (char*)calloc(200, sizeof(char)); 
    arr = (char **)calloc(100, sizeof(char*)); 
     newsymbol = (char*)calloc(7, sizeof(char));  
     oldsymbol = (char*)calloc(7, sizeof(char)); 
        date = (char*)calloc(14, sizeof(char));  
        symbol = calloc(100,sizeof(char));
           if ((voldata = fopen("/users/doug/andy/data/IV/IV_surface_200_1.txt", "r")) == NULL)
            {
            printf("couldn't open file\n");
            return(0);
            }
            
     strcpy(oldsymbol, "\0");
    while ( (fscanf(voldata,"%s\n",symbol)) != EOF)
        {   i = 0;
         //  printf("%s\n", symbol); 
            while (symbol[i] != ',')
                {
                newsymbol[i] = symbol[i]; 
                i++;
                }
              newsymbol[i] = '\0';  
          
       if (strcmp(newsymbol,oldsymbol) != 0)
                { 
                  strcpy(grep, "/users/doug/andy/data/IV/opens/");
                  strcpy(oldsymbol,newsymbol);
                strcat(grep, oldsymbol);
                strcat(grep, ".txt");
                         if( (corncob = fopen(grep, "r")) != NULL)  
                {fscanf(corncob, "%s\n", buf);
         // printf("here %s\n",buf);
                 fscanf(corncob, "%s\n", buf);
         // printf("%s\n",buf);
                 }
                else 
                printf("%s\n", grep);
                }
               while (symbol[++i] != ',');
                i++;
                j = 0;
                while (symbol[i] != ',')
                {
                date[j] = symbol[i]; 
                i++;
                j++;
                }
              date[j] = '\0';   
        if(corncob != NULL)
              if((fscanf(corncob, "%s\n", buf) != EOF))
                {
               parse(buf,",", arr);
                
                price = atof(arr[2]);  // note date is every other element 
                printf("%s %s %f\n\n", oldsymbol,date, atof(arr[2]));
                }
                else
                fclose(corncob);
            
    
     
      if (num > 0) 
        {
        for(p = 0; p < 1000; p ++)
        
        if(0)
        { p =  10000;
            for (i = 0; i < 3; i++)
            fscanf(voldata,"%s\n",symbol);
            fscanf(voldata,"%s\n",symbol); 
            fscanf(voldata,"%s\n",symbol);  
            fscanf(voldata,"%s\n",symbol);        
            fscanf(voldata,"%s\n",symbol); 
            for (i = 0; i < 4; i++)
                fscanf(voldata,"%s\n",symbol);        
             for (i = 0; i < 4; i++)
                fscanf(voldata,"%s\n",symbol);
            fscanf(voldata,"%s\n",symbol); 
            for (i = 0; i < 2; i++)
             fscanf(voldata,"%s\n",symbol);
            fscanf(voldata,"%s\n",symbol); 
            for (i = 0; i < 4; i++)
                fscanf(voldata,"%s\n",symbol);
            for (i = 0; i < 72; i++) // read other time frames
                    fscanf(voldata,"%s\n",symbol);
        } //if same date
           else
            {
       
           }
       }
        else
        { 
     
            for (i = 0; i < 3; i++)
            fscanf(voldata,"%s\n",symbol);
             fscanf(voldata,"%s\n",symbol);          
       fscanf(voldata,"%s\n",symbol);
       fscanf(voldata,"%s\n",symbol);                      
             fscanf(voldata,"%s\n",symbol); 
             for (i = 0; i < 4; i++)
                fscanf(voldata,"%s\n",symbol);
             for (i = 0; i < 4; i++)
                fscanf(voldata,"%s\n",symbol);        
             fscanf(voldata,"%s\n",symbol); 
                    for (i = 0; i < 2; i++)
             fscanf(voldata,"%s\n",symbol);
        fscanf(voldata,"%s\n",symbol); 
                      for (i = 0; i < 4; i++)
                fscanf(voldata,"%s\n",symbol);
              for (i = 0; i < 72; i++) // read other time frames
                    fscanf(voldata,"%s\n",symbol);
            daynum++;
    } //if same date
    } 
    
    //    fclose(stocks);
        return 0;
    
    }
    #define element 12
    void parse( char *record, char *delim, char**tarr)
    
    {
    	
    	char *p;
    	int i,fld=0;
    	if ( (p = (char *) calloc(element, sizeof(char))) == NULL)
    		printf("no memory for p");
        for (i = 0; i < 100; i ++)
    	{
    		if( (tarr[i] = (char*)calloc(element, sizeof(char))) == NULL)
    			printf("no memory allocated for arr\n");
    	}
    	
    	p = strtok(record,delim);
    	
    	while(p)
    	{	
    		strcpy(tarr[fld],p);
    		
    		fld++;
    		p=strtok('\0',delim);	
    	}
    
    		
    }
    
    
     
  20. macrumors 603

    Joined:
    Aug 9, 2009
    #20
    Is there something wrong with copy and paste on your machine?

    The malfunctioning code you posted is this:
    Code:
    tr ','''</users/doug/andy/data/IV/IV_surface_200_1.txt | wc
    
    Unfortunately, that's not what I posted, which is this:
    Code:
    tr ',' ' ' </users/doug/andy/data/IV/IV_surface_200_1.txt | wc
    
    There is a space between the quoted comma and the subsequent quoted single space. These spaces are crucial, because the translation is from comma to space, so there needs to be a separate argument of a single quoted space.

    Anyway, it does tell me that the file is about 2 million lines. So maybe the problem is related to the sheer size of the file.

    For example, the parse() function always calloc's 100 new elements, storing the pointers in its tarr arg. These are never freed, though, so over the course of 2 million lines, that kind of leak could become a significant problem. That's just a guess. There may be other such problems that somehow lead popen() to malfunction, simply because of the scale of the data. I see no practical way to analyze and debug it except by having data of that scale.
     
  21. thread starter macrumors 6502a

    Joined:
    Sep 16, 2008
    #21
    bash-3.2$ tr ',' ' '</users/doug/andy/data/IV/opens/Aapl.txt | wc
    686 5488 38017
    bash-3.2$ tr ',' ' '</users/doug/andy/data/IV/IV_surface_200_1.txt | wc
    2230656 20075904 125347876
    bash-3.2$

    The larger file is not the one its having trouble with.
     
  22. macrumors 603

    Joined:
    Aug 9, 2009
    #22
    Then post the entire AAPL.txt file attached as a zip file. The one you posted previously is only 24 lines.


    FWIW, I expanded the previously posted IV_surface_200_1.txt (post #7, list.txt) until it was 2.3 million lines of data, by cat'ing it onto itself multiple times. When the originally posted program (post #1) is run on this large file, and also using the 24-line AAPL.txt (post #7), it runs fine: there are no unexpected 'grep failed' messages, only the expected one shown below. This is without any attempt to fix any memory leaks or potential buffer overflows.

    The expected 'grep failed' is this:
    Code:
    grep 02/05/2008 /users/doug/andy/data/IV/opens/AAPL.txt
    grep failed
    
    EDIT:
    What OS version is it failing on?

    What version of gcc? Post the output from this command:
    Code:
    gcc --version
     
  23. thread starter macrumors 6502a

    Joined:
    Sep 16, 2008
    #23
    Will answer your previous post after I get home. Seems an earlier attempt post didn't make it. I added a delay to the code; the code no longer failed.

    for (i =0; i < 1000000; i++);

    after popen, fgets (not sure)
     
  24. macrumors 6502

    Joined:
    Jun 1, 2006
    #24
    Just out of curiosity, why are you using C for this at all? Try using a scripting language like Perl or Python, or if you need to use C, try the libc regular expression library. You could probably do what you want here with a 20 line Perl script.
     
  25. thread starter macrumors 6502a

    Joined:
    Sep 16, 2008
    #25
    You are seeing a small part of a much longer code.
     

Share This Page