PDA

View Full Version : calling grep from C




farmerdoug
Sep 22, 2010, 09:06 PM
Most of the time calling grep from c fails. I get my error message "grep failed" when it does. The string that calls grep is always right.
Here's the code.
Some of the file is below. output is below that.
Help is appreciated.


#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>

void parse( char *record, char *delim, char**tarr);

int main (int argc, const char * argv[]) {

char *symbol, newsymbol[5],date[14], *grep, **arr;
int i, j;
float price;
FILE *corncob, *voldata;
char *buf, olddate[12];

grep = (char*)calloc(150, sizeof(char));
buf = (char*)calloc(150, sizeof(char));
arr = (char **)calloc(100, sizeof(char*));

symbol = calloc(100,sizeof(char));

if ((voldata = fopen("/users/doug/andy/data/IV/IV_surface_200_1.txt", "r")) == NULL)
{
printf("couldn't open file\n");
return(0);
}


while ( (fscanf(voldata,"%s\n",symbol)) != EOF)
{
i = 0;
while (symbol[i] != ',')
{
newsymbol[i] = symbol[i];
i++;
}
newsymbol[i] = '\0';

while (symbol[++i] != ',');
i++;
j = 0;
while (symbol[i] != ',')
{
date[j] = symbol[i];
i++;
j++;
}
date[j] = '\0';
if(strcmp(olddate,date) != 0)
{
strcpy(grep, "grep ");
strcat(grep, date);
strcat(grep, " /users/doug/andy/data/IV/opens/");
strcat(grep, newsymbol);
strcat(grep, ".txt");
printf("%s\n", grep);
corncob = popen(grep, "r");
if ( (fgets(buf,150,corncob) != NULL))
{

printf("%s\n",buf);
parse(buf,",", arr);
price = atof(arr[2]);
printf("%s %s %f\n\n", newsymbol,date, atof(arr[2]));
}
else
{
printf("grep failed\n");
}
pclose(corncob);
}
strcpy(olddate, date);

}


return 0;

}

void parse( char *record, char *delim, char**tarr)

{

char *p;
int i,fld=0, element = 15;
if ( (p = (char *) calloc(element, sizeof(char))) == NULL)
printf("no memory for p");
for (i = 0; i < 100; i ++)
{
if( (tarr[i] = (char*)calloc(element, sizeof(char))) == NULL)
printf("no memory allocated for arr\n");
}

p = strtok(record,delim);

while(p)
{
strcpy(tarr[fld],p);

fld++;
p=strtok('\0',delim);
}


}



01/02/2008,1600,199.27,200.26,192.55,194.84,38542020,0
01/03/2008,1600,195.41,197.39,192.69,194.93,30074230,0
01/04/2008,1600,191.45,193.00,178.89,180.05,51994640,0
01/07/2008,1600,181.25,183.60,170.23,177.64,74007344,0
01/08/2008,1600,180.14,182.46,170.80,171.25,54425900,0
01/09/2008,1600,171.30,179.50,168.30,179.40,64856432,0
01/10/2008,1600,177.58,181.00,175.41,178.02,52965156,0
01/11/2008,1600,176.00,177.85,170.00,172.69,44018848,0
01/14/2008,1600,177.52,179.42,175.17,178.78,39303848,0
01/15/2008,1600,177.72,179.22,164.66,169.04,83943904,0
01/16/2008,1600,165.23,169.01,156.70,159.64,79194288,0
01/17/2008,1600,161.51,164.01,158.42,160.89,63430200,0
01/18/2008,1600,162.30,165.75,159.61,161.36,61590136,0
01/22/2008,1600,148.06,159.98,146.00,155.64,86974080,0
01/23/2008,1600,136.19,140.00,126.14,139.07,120466592,0

grep 01/17/2008 /users/doug/andy/data/IV/opens/AAPL.txt
grep failed
grep 01/18/2008 /users/doug/andy/data/IV/opens/AAPL.txt
grep failed
grep 01/22/2008 /users/doug/andy/data/IV/opens/AAPL.txt
01/22/2008,1600,148.06,159.98,146.00,155.64,86974080,0

AAPL 01/22/2008 148.060000

grep 01/23/2008 /users/doug/andy/data/IV/opens/AAPL.txt
grep failed
grep 01/24/2008 /users/doug/andy/data/IV/opens/AAPL.txt
grep failed



chown33
Sep 22, 2010, 10:02 PM
Without knowing exactly what the input data is, it's impossible to run the program.
In particular, show example lines for "IV_surface_200_1.txt", identified as such.

Also, does 'grep' itself print any error messages to its stderr?

Offhand, this looks like a task better suited to awk. It's not a difficult language.
http://developer.apple.com/library/mac/#documentation/Darwin/Reference/ManPages/man1/awk.1.html

lee1210
Sep 22, 2010, 11:14 PM
i ran this code with the following as the "main" input file, replacing:
/users/doug/andy/data/IV/IV_surface_200_1.txt

AAPL,,01/17/2008,
AAPL,,01/18/2008,
AAPL,,01/22/2008,
AAPL,,01/23/2008,
AAPL,,01/24/2008,


I then copied what you had right after your code into AAPL.txt. I ran and got:

grep 01/17/2008 AAPL.txt
01/17/2008,1600,161.51,164.01,158.42,160.89,63430200,0

AAPL 01/17/2008 161.510000

grep 01/18/2008 AAPL.txt
01/18/2008,1600,162.30,165.75,159.61,161.36,61590136,0

AAPL 01/18/2008 162.300000

grep 01/22/2008 AAPL.txt
01/22/2008,1600,148.06,159.98,146.00,155.64,86974080,0

AAPL 01/22/2008 148.060000

grep 01/23/2008 AAPL.txt
01/23/2008,1600,136.19,140.00,126.14,139.07,120466592,0

AAPL 01/23/2008 136.190000

grep 01/24/2008 AAPL.txt
grep failed


I did use files in the local directory, but i don't think that should matter. If you can post the contents (maybe zipped, not copy and pasted) of IV_surface_200_1.txt as chown33 requested, or at least a few lines of it, this might make a difference.

-Lee

P.S. Attention has changed from astronomy to stock market analysis, eh?

EDIT: Some things to try:
run "/bin/sh -c ..." where ... is the command printed out and see if you get the results you'd expect.
Is what you paste in a cat of /users/doug/andy/data/IV/opens/AAPL.txt? If not, cat that file and look at it to be sure things are right. It's curious that 1/22 worked and no other dates did.

chown33
Sep 23, 2010, 12:27 AM
I concur with lee1210. Make a zip of the data files and upload them. It may not be necessary to include the entire file, but the lines that lead directly to grep failures should be included.

I made placeholder files, like lee did, although mine had a slightly different form. And as he did, I used current directory instead of a fixed absolute pathname. All the greps worked, except when I gave it intentionally non-matching patterns. I also tried it with absolute paths, and it still worked the same way.

I suspect that the first file being read contains some kind of unexpected characters, and those are somehow appearing in the parameters being extracted from the file to run grep. Those characters are then part of the arguments to running grep, and since they aren't in the file being grepped, then grep returns no match. That's just a guess, though. If that's the cause, then editing the file to make a smaller test-case might change or remove the unexpected characters, but if the data is huge, it might be worthwhile as a first attempt.

The only other thing I can think of is to use a hex editor like Hex Fiend.app to examine the first file, and look carefully at the actual bytes in the file, at the exact locations of the dates that lead to grep failures. There should be nothing but plain displayable ASCII on each line, terminated by 0x0A (the \n character).
http://ridiculousfish.com/hexfiend/

farmerdoug
Sep 23, 2010, 05:14 AM
here are the files.

apple.txt is what you want to read. change name to AAPL.txt
list is where the symbol and date come from. (IV_surface ...)

PS to Lee.
The instrument came back from Palomar for an overhaul. (Any time you get to NY, let me know). I have to rework one of the control systems using LabView Real Time. (see www.ni.com)
We are also getting a new detector. I think I'll be working in java and .NET.
The stock stuff is after hours for a trader I know. You guys have helped me with this before. In particular with the code to use grep.

lee1210
Sep 23, 2010, 07:49 AM
Think you forgot to hit attach. I hate it when that happens.

-Lee

Edit: To complement chown33's suggestion of HexFiend, which is likely friendlier, a "dirtier" way to look at exactly what's in your file is:
od -ta filename.xyz
It will give output like this:

0000000 0 1 / 0 2 / 2 0 0 8 , 1 6 0 0 ,
0000020 1 9 9 . 2 7 , 2 0 0 . 2 6 , 1 9
0000040 2 . 5 5 , 1 9 4 . 8 4 , 3 8 5 4
0000060 2 0 2 0 , 0 nl 0 1 / 0 3 / 2 0 0
0000100 8 , 1 6 0 0 , 1 9 5 . 4 1 , 1 9
0000120 7 . 3 9 , 1 9 2 . 6 9 , 1 9 4 .
0000140 9 3 , 3 0 0 7 4 2 3 0 , 0 nl 0 1


And you can see if there are any "weird" characters in there that you weren't expecting. You might combine this with head -n2, i.e.:
head -n2 AAPL.txt | od -ta
so you only get a couple of lines to look at in one go. There are other options for od to see hex values, etc., but this shows you each and every character with abbreviations for whitespace characters you might not see otherwise.

cat -v filename.txt will get you similar results.

farmerdoug
Sep 23, 2010, 09:13 AM
hit the upload button, pretty sure i did before too
preview shows them attached.

lee1210
Sep 23, 2010, 09:25 AM
Unless the text got "cleaned" when i grabbed it, the code works just fine on my machine with the files in the local directory. I'll try it with a longer path, but that really shouldn't matter. Just to be safe, try zipping the files and putting them up, so we can be sure we've got the authentic source and the upload/download/etc. didn't mess with any special characters.

In the meantime you may add error checking to the popen itself. Right now "grep failed" is printed if there are no rows of output. This could mean that the popen failed, or it could mean there were no results found. It would be best to know what the situation is. You may also want to pass your command to system before you call popen, and see if result you're expecting is displayed.

-Lee

Here's my output, which is what i expect:

grep 01/02/2008 AAPL.txt
01/02/2008,1600,199.27,200.26,192.55,194.84,38542020,0

AAPL 01/02/2008 199.270000

grep 01/03/2008 AAPL.txt
01/03/2008,1600,195.41,197.39,192.69,194.93,30074230,0

AAPL 01/03/2008 195.410000

grep 01/04/2008 AAPL.txt
01/04/2008,1600,191.45,193.00,178.89,180.05,51994640,0

AAPL 01/04/2008 191.450000

grep 01/07/2008 AAPL.txt
01/07/2008,1600,181.25,183.60,170.23,177.64,74007344,0

AAPL 01/07/2008 181.250000

grep 01/08/2008 AAPL.txt
01/08/2008,1600,180.14,182.46,170.80,171.25,54425900,0

AAPL 01/08/2008 180.140000

grep 01/09/2008 AAPL.txt
01/09/2008,1600,171.30,179.50,168.30,179.40,64856432,0

AAPL 01/09/2008 171.300000

grep 01/10/2008 AAPL.txt
01/10/2008,1600,177.58,181.00,175.41,178.02,52965156,0

AAPL 01/10/2008 177.580000

grep 01/11/2008 AAPL.txt
01/11/2008,1600,176.00,177.85,170.00,172.69,44018848,0

AAPL 01/11/2008 176.000000

grep 01/14/2008 AAPL.txt
01/14/2008,1600,177.52,179.42,175.17,178.78,39303848,0

AAPL 01/14/2008 177.520000

grep 01/15/2008 AAPL.txt
01/15/2008,1600,177.72,179.22,164.66,169.04,83943904,0

AAPL 01/15/2008 177.720000

grep 01/16/2008 AAPL.txt
01/16/2008,1600,165.23,169.01,156.70,159.64,79194288,0

AAPL 01/16/2008 165.230000

grep 01/17/2008 AAPL.txt
01/17/2008,1600,161.51,164.01,158.42,160.89,63430200,0

AAPL 01/17/2008 161.510000

grep 01/18/2008 AAPL.txt
01/18/2008,1600,162.30,165.75,159.61,161.36,61590136,0

AAPL 01/18/2008 162.300000

grep 01/22/2008 AAPL.txt
01/22/2008,1600,148.06,159.98,146.00,155.64,86974080,0

AAPL 01/22/2008 148.060000

grep 01/23/2008 AAPL.txt
01/23/2008,1600,136.19,140.00,126.14,139.07,120466592,0

AAPL 01/23/2008 136.190000

grep 01/24/2008 AAPL.txt
01/24/2008,1600,139.99,140.70,132.01,135.60,71641408,0

AAPL 01/24/2008 139.990000

grep 01/25/2008 AAPL.txt
01/25/2008,1600,138.99,139.09,129.61,130.01,55527712,0

AAPL 01/25/2008 138.990000

grep 01/28/2008 AAPL.txt
01/28/2008,1600,128.16,132.68,126.45,130.01,52673824,0

AAPL 01/28/2008 128.160000

grep 01/29/2008 AAPL.txt
01/29/2008,1600,131.15,132.79,129.05,131.54,39287064,0

AAPL 01/29/2008 131.150000

grep 01/30/2008 AAPL.txt
01/30/2008,1600,131.37,135.45,130.00,132.18,44396740,0

AAPL 01/30/2008 131.370000

grep 01/31/2008 AAPL.txt
01/31/2008,1600,129.45,136.65,129.40,135.36,48061804,0

AAPL 01/31/2008 129.450000

grep 02/01/2008 AAPL.txt
02/01/2008,1600,136.24,136.59,132.18,133.75,36099544,0

AAPL 02/01/2008 136.240000

grep 02/04/2008 AAPL.txt
02/04/2008,1600,134.21,135.90,131.42,131.65,32116480,0

AAPL 02/04/2008 134.210000

grep 02/05/2008 AAPL.txt
grep failed

farmerdoug
Sep 25, 2010, 02:35 PM
The zip file is too big to be attached.

farmerdoug
Sep 25, 2010, 02:48 PM
I looked at the file in TextWranger and saw nothing odd.

chown33
Sep 25, 2010, 04:51 PM
The zip file is too big to be attached.
How big is it?

I looked at the file in TextWranger and saw nothing odd.

Both lee1210 and I have run the posted program with the posted data, and it works. This suggests that the problem lies in the data you're using.

If the problem's in the code, it's not a problem in the source, but a problem in how it's being built or run. You'd have to upload the Xcode project, with the 'build' folder removed to make it smaller.

Other options:

1. Edit the problematic data file so it's small enough to upload here, while still exhibiting the failure. You must test it to make sure it does exhibit the failure.

2. Improve your code so it sanitizes the data being read, before anything else has a chance to parse it. The parser expects comma-separators, and none of the extracted fields contains any blanks or control characters, so you should sanitize accordingly. That means you filter out all blanks and control characters anywhere in the line, before parsing it at comma delimiters.

3. Find an upload site, upload the big zip there, then post the URL here. Example site:
http://filebin.ca/
It might also be possible to "create a hosted project" on a site like Google Code:
http://code.google.com/p/support/wiki/GettingStarted

4. Upload to a paid service like Amazon S3, make the upload public-read, then post the URL here. S3's limit on a single upload file is 5 GB. Pricing is 15 cents per GB-month, no minimum fee. It's pretty easy to sign up, and once that's done, you can upload with an app like CyberDuck.

farmerdoug
Sep 26, 2010, 06:04 AM
I wrote new code that opens the file and reads the data one line at a time. There were no problems. Wouldn't this suggest that the problem is not the file?

chown33
Sep 26, 2010, 11:19 AM
I wrote new code that opens the file and reads the data one line at a time. There were no problems. Wouldn't this suggest that the problem is not the file?

There's no way to discover what the original problem is, except by analyzing a failure. Currently, you're the only one who can produce a failure, because you're using a different data set than any of us have.

In particular, the code you've posted works perfectly fine when run on the data you've posted. I suggest that you actually try running the code you posted on the small sections of data you posted, and confirm this (or refute it) for yourself.

We can't tell anything about the new code without seeing it.

The problem could be the original code plus the full set of original data. There could easily be a latent bug in the original code that only happens with the original data.

One thing I noticed about the original code is the length of some of the parsing buffers is the minimal size needed for things like the AAPL symbol name, yet when you parse the line at comma delimiters, nothing is done to prevent overflow of these buffers. So if there is one line of data where the parsed value is a little too long, then you've caused a buffer overflow. Avoiding buffer overflows is one aspect of sanitizing input data.

Once an overflow happens, there's no way to know what other variables are affected, nor where they are damaged. It could be that such an overflow damages some variable that's important in later lines, without adversely affecting the line that caused the damage. That's just a guess, though.

That's not the only potential problem, either. The comma-parsing doesn't stop when it encounters a nul string-terminator. It won't handle a line with fewer commas than expected. There are no length-checks anywhere, nor are the length-checking functions like strlcpy() and strlcat() used, nor is the length-limiting ability of fscanf() used. There are many ways that some slightly different data could cause a failure.

Anyway, whatever the problem is, it does not occur when run with the posted data. Two independent observers have confirmed this, and posted the results. As far as I know, you have not run the same test. You've run the program on different data, but not on just the posted data alone.

Putting it in simple terms:
P1 + D1 = problem
P1 + D2 = fine

P1 is the originally posted program. D1 is your big data set. D2 is your small posted data set.

If you're going to assert the problem lies entirely in P1, then the only way to test that hypothesis is by making D1 available for independent corroboration and analysis. You could also learn the debugging and analysis skills necessary to analyze P1 yourself when processing D1, but I'm guessing that would take too long.

Now, you're stating:
P2 + D1 = fine

and speculating whether D1 is causative or not. No one but you can possibly determine this, unless D1 is made available.

It's possible that making P2 available can lead to a differential analysis of the code, and discover what changes might avoid certain kinds of latent bugs. The only way to test that hypothesis is by posting P2. Nevertheless, no result is guaranteed.

farmerdoug
Oct 1, 2010, 04:17 AM
Sorry to take so long to get back. I went back to just opening and reading the files. thanks for the help.

farmerdoug
Oct 2, 2010, 04:26 AM
Realized that I had to use grep. I did two tests. First I tried grep from X11 which worked; then I tried my code with GuardMalloc. Grep worked again.

Does this tell anybody anything?

chown33
Oct 2, 2010, 04:45 PM
Realized that I had to use grep. I did two tests. First I tried grep from X11 which worked; then I tried my code with GuardMalloc. Grep worked again.

Does this tell anybody anything?

It tells me nothing, but it does raise some questions.

1. Did you run the posted program on the posted data or not? If so, what was your result?

2. What is this "grep from X11" you tried? Be specific. Describe the procedure for what you did. Describe it as if we can't see your screen, but we're familiar with shell command-lines.
If you mean you ran the command 'grep' in the X11 shell window, then that's the same grep you get in Terminal.app. You can tell by typing the command which grep in both windows (Terminal.app and X11.app) and noting the identical pathnames.

3. Which program did you run with GuardMalloc? There are at least two versions of the program you've posted about. The first one, whose code you posted in post #1, and the second one mentioned in post #12, whose code has not been posted.

4. What data did you use for the run with GuardMalloc? The short posted data, which is known to work, or the long unposted data, which you say fails?

5. If you ran the posted version of the program with GuardMalloc, what were you expecting? Some of the latent buffer-overflow issues I pointed out are in stack-resident arrays (i.e. C's auto storage class). If those are overrun then GuardMalloc would not be expected to detect it, since those arrays aren't malloc'ed in the first place. The specific arrays with no bounds-checking, and thus latent overflow potential, are:
newsymbol[5]
date[14]
olddate[12]

I have no idea why date's length is 14, yet olddate's length is 12, since olddate is expected to hold a copy of the contents of date:
strcpy(olddate, date);

Perhaps you are somehow expecting C to prevent this, or to perform bounds-checking for you. This could not be more wrong.

farmerdoug
Oct 2, 2010, 05:25 PM
1. Did you run the failing program on the posted data or not? If so, what was your result?

didn't not always return the right information. I.e. failed as before.

2. What is this "grep from X11" you tried? Be specific. If you mean you ran the command 'grep' in the X11 shell window, then that's the same grep you get in Terminal.app. You can tell by typing the command *which grep* in both windows (Terminal.app and X11.app) and noting the identical pathnames.


ran /usr/bin/grep which I then put explicitly in my code. It still failed in the code.

3. What program did you run with GuardMalloc? There are at least two versions of the program you've posted about. The first one, whose code you posted in post #1, and the second one mentioned in post #12, whose code has not been posted.

the short code I posted
4. What data did you use for the run with GuardMalloc? The short posted data, which is known to work, or the long unposted data, which you say fails?
unposted data

both, both worked


5. If you ran the posted version of the program with GuardMalloc, what were you expecting? Some of the latent buffer-overflow issues I pointed out are in stack-resident arrays (i.e. C's auto storage class), so if those are overrun then GuardMalloc would not be expected to detect that, since those arrays aren't malloc'ed in the first place. The specific arrays with no bounds-checking, and thus latent overflow potential, are:
Code:
---------
newsymbol[5]
date[14]
olddate[12]
---------
I have no idea why date's length is 14, yet olddate's length is 12, since olddate is expected to hold a copy of the contents of date:

Code:
---------
strcpy(olddate, date);
---------
Perhaps you are expecting C to complain about this, or to perform bounds-checking for you. This could not be more wrong.
***************

Code now reads.


grep = (char*)calloc(100, sizeof(char));
newsymbol = (char*)calloc(20, sizeof(char));
date = (char*)calloc(20, sizeof(char));
olddate = (char*)calloc(20, sizeof(char));
buf = (char*)calloc(1000, sizeof(char));
arr = (char **)calloc(100, sizeof(char*));

symbol = calloc(100,sizeof(char));

I hope to get some clue as to what was up.

chown33
Oct 2, 2010, 06:14 PM
In order to get a sense of the unposted data's scale, please run the following command, and post the output.
tr ',' ' ' </users/doug/andy/data/IV/IV_surface_200_1.txt | wc
Also post the output of this command:
tr ',' ' ' </users/doug/andy/data/IV/opens/AAPL.txt | wc
Also post the source of the new program you mentioned in post #12.

farmerdoug
Oct 3, 2010, 07:37 AM
Had trouble with your command.
ran just wc in case that helps
working code is below
bash-3.2$ tr ','''</users/doug/andy/data/IV/IV_surface_200_1.txt | wc
usage: tr [-Ccsu] string1 string2
tr [-Ccu] -d string1
tr [-Ccu] -s string1
tr [-Ccu] -ds string1 string2
0 0 0
bash-3.2$ wc /users/doug/andy/data/IV/IV_surface_200_1.txt
2230656 2230656 125347876 /users/doug/andy/data/IV/IV_surface_200_1.txt
bash-3.2$ wc /users/doug/andy/data/IV/opens/aapl.txt
686 686 38017 /users/doug/andy/data/IV/opens/aapl.txt



Here's code that just reads from the file.


#include <math.h>


#define FEE 0.015
#define REVERSE 1
#define REV 0
#define PRINTMAX 0
#define PRINTMIN 0
#define DELTADAYS 2


void parse( char *record, char *delim, char**tarr);


int main (int argc, const char * argv[]) {
// insert code here...
FILE *voldata;//, *stocks;
char *symbol, *newsymbol,*oldsymbol,*date, **arr;
int i, j, k, l, p, num = -1, daynum = -1, maxdays = 0;// numsymbols = 0;
float price;
FILE *corncob;
char *buf, *grep;
grep = (char*)calloc(100, sizeof(char));
buf = (char*)calloc(200, sizeof(char));
arr = (char **)calloc(100, sizeof(char*));
newsymbol = (char*)calloc(7, sizeof(char));
oldsymbol = (char*)calloc(7, sizeof(char));
date = (char*)calloc(14, sizeof(char));
symbol = calloc(100,sizeof(char));
if ((voldata = fopen("/users/doug/andy/data/IV/IV_surface_200_1.txt", "r")) == NULL)
{
printf("couldn't open file\n");
return(0);
}

strcpy(oldsymbol, "\0");
while ( (fscanf(voldata,"%s\n",symbol)) != EOF)
{ i = 0;
// printf("%s\n", symbol);
while (symbol[i] != ',')
{
newsymbol[i] = symbol[i];
i++;
}
newsymbol[i] = '\0';

if (strcmp(newsymbol,oldsymbol) != 0)
{
strcpy(grep, "/users/doug/andy/data/IV/opens/");
strcpy(oldsymbol,newsymbol);
strcat(grep, oldsymbol);
strcat(grep, ".txt");
if( (corncob = fopen(grep, "r")) != NULL)
{fscanf(corncob, "%s\n", buf);
// printf("here %s\n",buf);
fscanf(corncob, "%s\n", buf);
// printf("%s\n",buf);
}
else
printf("%s\n", grep);
}
while (symbol[++i] != ',');
i++;
j = 0;
while (symbol[i] != ',')
{
date[j] = symbol[i];
i++;
j++;
}
date[j] = '\0';
if(corncob != NULL)
if((fscanf(corncob, "%s\n", buf) != EOF))
{
parse(buf,",", arr);

price = atof(arr[2]); // note date is every other element
printf("%s %s %f\n\n", oldsymbol,date, atof(arr[2]));
}
else
fclose(corncob);



if (num > 0)
{
for(p = 0; p < 1000; p ++)

if(0)
{ p = 10000;
for (i = 0; i < 3; i++)
fscanf(voldata,"%s\n",symbol);
fscanf(voldata,"%s\n",symbol);
fscanf(voldata,"%s\n",symbol);
fscanf(voldata,"%s\n",symbol);
fscanf(voldata,"%s\n",symbol);
for (i = 0; i < 4; i++)
fscanf(voldata,"%s\n",symbol);
for (i = 0; i < 4; i++)
fscanf(voldata,"%s\n",symbol);
fscanf(voldata,"%s\n",symbol);
for (i = 0; i < 2; i++)
fscanf(voldata,"%s\n",symbol);
fscanf(voldata,"%s\n",symbol);
for (i = 0; i < 4; i++)
fscanf(voldata,"%s\n",symbol);
for (i = 0; i < 72; i++) // read other time frames
fscanf(voldata,"%s\n",symbol);
} //if same date
else
{

}
}
else
{

for (i = 0; i < 3; i++)
fscanf(voldata,"%s\n",symbol);
fscanf(voldata,"%s\n",symbol);
fscanf(voldata,"%s\n",symbol);
fscanf(voldata,"%s\n",symbol);
fscanf(voldata,"%s\n",symbol);
for (i = 0; i < 4; i++)
fscanf(voldata,"%s\n",symbol);
for (i = 0; i < 4; i++)
fscanf(voldata,"%s\n",symbol);
fscanf(voldata,"%s\n",symbol);
for (i = 0; i < 2; i++)
fscanf(voldata,"%s\n",symbol);
fscanf(voldata,"%s\n",symbol);
for (i = 0; i < 4; i++)
fscanf(voldata,"%s\n",symbol);
for (i = 0; i < 72; i++) // read other time frames
fscanf(voldata,"%s\n",symbol);
daynum++;
} //if same date
}

// fclose(stocks);
return 0;

}
#define element 12
void parse( char *record, char *delim, char**tarr)

{

char *p;
int i,fld=0;
if ( (p = (char *) calloc(element, sizeof(char))) == NULL)
printf("no memory for p");
for (i = 0; i < 100; i ++)
{
if( (tarr[i] = (char*)calloc(element, sizeof(char))) == NULL)
printf("no memory allocated for arr\n");
}

p = strtok(record,delim);

while(p)
{
strcpy(tarr[fld],p);

fld++;
p=strtok('\0',delim);
}


}

chown33
Oct 3, 2010, 12:41 PM
Had trouble with your command.
ran just wc in case that helps
working code is below
bash-3.2$ tr ','''</users/doug/andy/data/IV/IV_surface_200_1.txt | wc
usage: tr [-Ccsu] string1 string2
tr [-Ccu] -d string1
tr [-Ccu] -s string1
tr [-Ccu] -ds string1 string2
0 0 0
bash-3.2$ wc /users/doug/andy/data/IV/IV_surface_200_1.txt
2230656 2230656 125347876 /users/doug/andy/data/IV/IV_surface_200_1.txt
bash-3.2$ wc /users/doug/andy/data/IV/opens/aapl.txt
686 686 38017 /users/doug/andy/data/IV/opens/aapl.txt

Is there something wrong with copy and paste on your machine?

The malfunctioning code you posted is this:
tr ','''</users/doug/andy/data/IV/IV_surface_200_1.txt | wc

Unfortunately, that's not what I posted, which is this:
tr ',' ' ' </users/doug/andy/data/IV/IV_surface_200_1.txt | wc

There is a space between the quoted comma and the subsequent quoted single space. These spaces are crucial, because the translation is from comma to space, so there needs to be a separate argument of a single quoted space.

Anyway, it does tell me that the file is about 2 million lines. So maybe the problem is related to the sheer size of the file.

For example, the parse() function always calloc's 100 new elements, storing the pointers in its tarr arg. These are never freed, though, so over the course of 2 million lines, that kind of leak could become a significant problem. That's just a guess. There may be other such problems that somehow lead popen() to malfunction, simply because of the scale of the data. I see no practical way to analyze and debug it except by having data of that scale.

farmerdoug
Oct 3, 2010, 02:03 PM
bash-3.2$ tr ',' ' '</users/doug/andy/data/IV/opens/Aapl.txt | wc
686 5488 38017
bash-3.2$ tr ',' ' '</users/doug/andy/data/IV/IV_surface_200_1.txt | wc
2230656 20075904 125347876
bash-3.2$

The larger file is not the one its having trouble with.

chown33
Oct 3, 2010, 02:30 PM
The larger file is not the one its having trouble with.

Then post the entire AAPL.txt file attached as a zip file. The one you posted previously is only 24 lines.


FWIW, I expanded the previously posted IV_surface_200_1.txt (post #7, list.txt) until it was 2.3 million lines of data, by cat'ing it onto itself multiple times. When the originally posted program (post #1) is run on this large file, and also using the 24-line AAPL.txt (post #7), it runs fine: there are no unexpected 'grep failed' messages, only the expected one shown below. This is without any attempt to fix any memory leaks or potential buffer overflows.

The expected 'grep failed' is this:
grep 02/05/2008 /users/doug/andy/data/IV/opens/AAPL.txt
grep failed


EDIT:
What OS version is it failing on?

What version of gcc? Post the output from this command:
gcc --version

farmerdoug
Oct 4, 2010, 09:26 AM
Will answer your previous post after I get home. Seems an earlier attempt post didn't make it. I added a delay to the code; the code no longer failed.

for (i =0; i < 1000000; i++);

after popen, fgets (not sure)

antibact1
Oct 4, 2010, 10:18 AM
Just out of curiosity, why are you using C for this at all? Try using a scripting language like Perl or Python, or if you need to use C, try the libc regular expression library. You could probably do what you want here with a 20 line Perl script.

farmerdoug
Oct 4, 2010, 11:12 AM
You are seeing a small part of a much longer code.

chown33
Oct 4, 2010, 11:48 AM
Will answer your previous post after I get home. Seems an earlier attempt post didn't make it. I added a delay to the code; the code no longer failed.

for (i =0; i < 1000000; i++);

after popen, fgets (not sure)

Accuracy is important. So post the exact code when you can do so, rather than guess at it when unsure. And in general, posting complete code is better than posting code fragments that have to be manually edited or combined to produce a whole.

And in addition to my earlier question about OS version and gcc version, please identify the CPU architecture by choosing "About This Mac" from the Apple menu, and posting the exact value listed after Processor.

farmerdoug
Oct 4, 2010, 02:34 PM
Definitely a timing issue. Launching popen, starts a process on its own clock. It needs to be synced with the launching code.

Thanks.

chown33
Oct 4, 2010, 04:24 PM
Definitely a timing issue. Launching popen, starts a process on its own clock. It needs to be synced with the launching code.

Excessive brevity hinders intelligibility. Expand?


If you need a real-time delay, don't use an empty loop. For one thing, the optimizer could remove it.

To delay for a fixed time, see the usleep() C function. Read its man page:
man usleep