Mac Text Parser: Text to MySQL Database

barr08

macrumors 65816
Original poster
Aug 9, 2006
1,361
0
Boston, MA
Oh hey,

I need to parse text from a text file in this form:

06/07/07 02:44:29.966 INFO Servlet.Engine.Transports : 0 accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2

To be split accordingly:

06/07/07 / 02:44:29.966 / INFO /Servlet.Engine.Transports : 0 / accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2

Into a 5 column HTML table with the fields Date, Time, Level, ID, and Info. These columns are seperated above with the bold shashies.

Any ideas? PERL isn't neccesary, it is just what I was thinking of using. Even a start would be apprectiated, you don't need to give me a full working script or anything.

Thanks for looking!
 

barr08

macrumors 65816
Original poster
Aug 9, 2006
1,361
0
Boston, MA
Use regexes. AGRegex is the best I've found (and it has my initials :cool: )

EDIT: Oh, sorry, I didn't ask which language you are using :\
I am trying to use PERL, but really at this point I would try anything. After I do this, my next step is to search these fields, so regexes will deffinitely be useful.
 

stndn

macrumors member
Oct 22, 2006
79
0
earth
Here's something to get you started:

Code:
#!/usr/bin/perl -w

my $sampleText = "06/07/07 02:44:29.966 INFO Servlet.Engine.Transports : 0 accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2";

$sampleText =~ m|^([^\s]+)\s+([^\s]+)\s+([^\s]+)\s+(.+?)(accept.*)$|i;

print "$1 / $2 / $3 / $4 / $5\n";

Not the prettiest sight and most likely unefficient, but at least it should give you some ideas.


-stndn.
 

barr08

macrumors 65816
Original poster
Aug 9, 2006
1,361
0
Boston, MA
Thanks for the sample. I should probably admit I am a complete PERL noob, so how would I go by making this work. Do I just make a text file and ad ".pl" to the end of it? What program would I use to run the script?

If PERL isn't the best choice for this task, could someone reccomend a better language to do this in?
 

iBert

macrumors regular
Jul 14, 2004
148
0
Perl is a pretty good scripting language specially for parsing text. At least in my opinion.

When are you going to need the parser to be called? Is it like a web service (just to say something) or are you manipulating something in C, C++, Java, etc and need to call this parser?
 

stndn

macrumors member
Oct 22, 2006
79
0
earth
I thought you wanted to have something in Perl, and then I realized you mentioned that Perl is not necessary in your initial post ,p

iBert has the right questions, which I'm going to add to: Is this text parsing a one-time deal? Or are you going to run the script every time you want to show the output in other format?

If it's a one time deal, then you can take the Perl script as your base, make some changes, save it as filename.pl, and run the script through perl parser (one of which can be done from command line / terminal by typing: perl filename.pl).

If it's an ongoing parsing (done per request), maybe change it to something like PHP or whatnot. Not that Perl can't be used for web, but PHP can be mixed with HTML more easily / intuitively than Perl, which seems like what you're looking for (add HTML to the output).

-stndn.
 

barr08

macrumors 65816
Original poster
Aug 9, 2006
1,361
0
Boston, MA
I thought you wanted to have something in Perl, and then I realized you mentioned that Perl is not necessary in your initial post ,p

iBert has the right questions, which I'm going to add to: Is this text parsing a one-time deal? Or are you going to run the script every time you want to show the output in other format?

If it's a one time deal, then you can take the Perl script as your base, make some changes, save it as filename.pl, and run the script through perl parser (one of which can be done from command line / terminal by typing: perl filename.pl).

If it's an ongoing parsing (done per request), maybe change it to something like PHP or whatnot. Not that Perl can't be used for web, but PHP can be mixed with HTML more easily / intuitively than Perl, which seems like what you're looking for (add HTML to the output).

-stndn.
This will be ongoing, per user request, so maybe PHP would be better? I am thinking of parsing the data to a MySQl database, though I am not sure right now if my company supports it.

So should I ditch PERL and go with PHP? I do know that PHP and MySQL work well together (as far as I can remember)
 

pengu

macrumors 6502a
Mar 20, 2005
576
0
Diddily Daddily...
PHP won't have an issue doing it (neither will Perl) but as stated, PHP is more aimed at mixing with/generating HTML.

You didn't really specify what parts (other than date/time) are likely to change? it APPEARS to be some kind of serverlog, relating to maybe Tomcat or BES or some other JSP/J2EE related server?

If you could give some examples of what other values you might get, we can probably help with a more accurate/specific/efficient method.
 

barr08

macrumors 65816
Original poster
Aug 9, 2006
1,361
0
Boston, MA
Here is a chunk of a log file:

Code:
06/07/07 02:44:27.585 INFO Servlet.Engine.Transports : 0 Type: TEST
06/07/07 02:44:27.586 INFO Servlet.Engine.Transports : 0 Site User: QA_BPORTAL
06/07/07 02:44:27.586 INFO Servlet.Engine.Transports : 0 Product:ProductId: Broker Portal Name: Broker Portal
06/07/07 02:44:27.588 INFO Servlet.Engine.Transports : 0 Retrieving user ent list
06/07/07 02:44:27.619 INFO Servlet.Engine.Transports : 0 Retrieve ent names for user: 3771000016, contract: qa_bportal
06/07/07 02:44:28.064 INFO Servlet.Engine.Transports : 0 ibg.menu.db.UserEntListDBBuilder statistics:  proc = pk_portal.fn_get_ent_nm_by_usr_ent_typ( ?,?,?,?,? ) connMillis = 7 prepareMillis = 16 executeMillis = 75 processRowsMillis = 117 postProcessMillis = 0 subClassLogInfo = 
06/07/07 02:44:28.067 INFO Servlet.Engine.Transports : 0 Linking site attributes
06/07/07 02:44:28.075 INFO Servlet.Engine.Transports : 0 Storing ent set for BROK back to user
06/07/07 02:44:28.832 INFO Servlet.Engine.Transports : 0 ibg.db.user.UserProductDBBuilder statistics:  proc = ESDBO.PK_PORTAL.FN_GET_USER_PRODUCT_INFO( ?,?,?,? ) connMillis = 7 prepareMillis = 1 executeMillis = 5 processRowsMillis = 1 postProcessMillis = 0 subClassLogInfo = 
06/07/07 02:44:28.887 INFO Servlet.Engine.Transports : 0 ibg.db.user.UserCompCheckDBBuilder statistics:  proc = ESDBO.PK_ADMIN_USER.FN_CHECK_PARENT_COMPANY(?,?,?) connMillis = 4 prepareMillis = 1 executeMillis = 14 processRowsMillis = 0 postProcessMillis = 33 subClassLogInfo = 
06/07/07 02:44:28.902 INFO Servlet.Engine.Transports : 0 ibg.db.user.UserCompCheckDBBuilder statistics:  proc = ESDBO.PK_ADMIN_USER.FN_CHECK_PARENT_COMPANY(?,?,?) connMillis = 6 prepareMillis = 1 executeMillis = 4 processRowsMillis = 0 postProcessMillis = 1 subClassLogInfo = 
06/07/07 02:44:28.910 INFO Servlet.Engine.Transports : 0 POST /XML_IBGZP_A HTTP/1.1
06/07/07 02:44:28.911 INFO Servlet.Engine.Transports : 0 content-type: application/xml
06/07/07 02:44:28.911 INFO Servlet.Engine.Transports : 0 connection: keep-alive
06/07/07 02:44:28.912 INFO Servlet.Engine.Transports : 0 accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
06/07/07 02:44:28.912 INFO Servlet.Engine.Transports : 0 host: Unknown
06/07/07 02:44:28.913 INFO Servlet.Engine.Transports : 0 content-length: 302
06/07/07 02:44:28.913 INFO Servlet.Engine.Transports : 0 user-agent: HttpConnection
06/07/07 02:44:28.914 INFO Servlet.Engine.Transports : 0 
06/07/07 02:44:28.915 INFO Servlet.Engine.Transports : 0 <?xml version="1.0" ?>
06/07/07 02:44:28.915 INFO Servlet.Engine.Transports : 0 <FBIBLV_REQUEST>
06/07/07 02:44:28.916 INFO Servlet.Engine.Transports : 0     <PSL_VIEW>FBIBLV.VIEW</PSL_VIEW>
06/07/07 02:44:28.917 INFO Servlet.Engine.Transports : 0     <FBIBLVI>
06/07/07 02:44:28.917 INFO Servlet.Engine.Transports : 0         <PRODUCT_ID>Z</PRODUCT_ID>
06/07/07 02:44:28.918 INFO Servlet.Engine.Transports : 0         <client_group_id>xxxxxxxxxx</client_group_id>
06/07/07 02:44:28.918 INFO Servlet.Engine.Transports : 0         <portal_id>xxxxxxxxxx</portal_id>
06/07/07 02:44:28.919 INFO Servlet.Engine.Transports : 0         <incl_rules_table>N</incl_rules_table>
06/07/07 02:44:28.920 INFO Servlet.Engine.Transports : 0     </FBIBLVI>
06/07/07 02:44:28.920 INFO Servlet.Engine.Transports : 0 </FBIBLV_REQUEST>
Basically this program I am writing is going to be a search. Each day, this log file will be parsed somewhere to a database, so a user can log on to a site, search via a field (such as date, time, etc.) and get a list of results. The database does not need to be viewable to the user, only their search results.

At this point, I am thinking of parsing with PHP, making a MySQL database to hold the data, and searching the database with PHP.

Any ideas, help, guidance, whatever is greatly appreciated.

Edit: If PHP and MySQL end up being the languages I use, I would need a testing server, right? I can't just test these scripts on my computer?
 

bobber205

macrumors 68020
Nov 15, 2005
2,182
0
Oregon
Here is a chunk of a log file:

Code:
06/07/07 02:44:27.585 INFO Servlet.Engine.Transports : 0 Type: TEST
06/07/07 02:44:27.586 INFO Servlet.Engine.Transports : 0 Site User: QA_BPORTAL
06/07/07 02:44:27.586 INFO Servlet.Engine.Transports : 0 Product:ProductId: Broker Portal Name: Broker Portal
06/07/07 02:44:27.588 INFO Servlet.Engine.Transports : 0 Retrieving user ent list
06/07/07 02:44:27.619 INFO Servlet.Engine.Transports : 0 Retrieve ent names for user: 3771000016, contract: qa_bportal
06/07/07 02:44:28.064 INFO Servlet.Engine.Transports : 0 ibg.menu.db.UserEntListDBBuilder statistics:  proc = pk_portal.fn_get_ent_nm_by_usr_ent_typ( ?,?,?,?,? ) connMillis = 7 prepareMillis = 16 executeMillis = 75 processRowsMillis = 117 postProcessMillis = 0 subClassLogInfo = 
06/07/07 02:44:28.067 INFO Servlet.Engine.Transports : 0 Linking site attributes
06/07/07 02:44:28.075 INFO Servlet.Engine.Transports : 0 Storing ent set for BROK back to user
06/07/07 02:44:28.832 INFO Servlet.Engine.Transports : 0 ibg.db.user.UserProductDBBuilder statistics:  proc = ESDBO.PK_PORTAL.FN_GET_USER_PRODUCT_INFO( ?,?,?,? ) connMillis = 7 prepareMillis = 1 executeMillis = 5 processRowsMillis = 1 postProcessMillis = 0 subClassLogInfo = 
06/07/07 02:44:28.887 INFO Servlet.Engine.Transports : 0 ibg.db.user.UserCompCheckDBBuilder statistics:  proc = ESDBO.PK_ADMIN_USER.FN_CHECK_PARENT_COMPANY(?,?,?) connMillis = 4 prepareMillis = 1 executeMillis = 14 processRowsMillis = 0 postProcessMillis = 33 subClassLogInfo = 
06/07/07 02:44:28.902 INFO Servlet.Engine.Transports : 0 ibg.db.user.UserCompCheckDBBuilder statistics:  proc = ESDBO.PK_ADMIN_USER.FN_CHECK_PARENT_COMPANY(?,?,?) connMillis = 6 prepareMillis = 1 executeMillis = 4 processRowsMillis = 0 postProcessMillis = 1 subClassLogInfo = 
06/07/07 02:44:28.910 INFO Servlet.Engine.Transports : 0 POST /XML_IBGZP_A HTTP/1.1
06/07/07 02:44:28.911 INFO Servlet.Engine.Transports : 0 content-type: application/xml
06/07/07 02:44:28.911 INFO Servlet.Engine.Transports : 0 connection: keep-alive
06/07/07 02:44:28.912 INFO Servlet.Engine.Transports : 0 accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
06/07/07 02:44:28.912 INFO Servlet.Engine.Transports : 0 host: Unknown
06/07/07 02:44:28.913 INFO Servlet.Engine.Transports : 0 content-length: 302
06/07/07 02:44:28.913 INFO Servlet.Engine.Transports : 0 user-agent: HttpConnection
06/07/07 02:44:28.914 INFO Servlet.Engine.Transports : 0 
06/07/07 02:44:28.915 INFO Servlet.Engine.Transports : 0 <?xml version="1.0" ?>
06/07/07 02:44:28.915 INFO Servlet.Engine.Transports : 0 <FBIBLV_REQUEST>
06/07/07 02:44:28.916 INFO Servlet.Engine.Transports : 0     <PSL_VIEW>FBIBLV.VIEW</PSL_VIEW>
06/07/07 02:44:28.917 INFO Servlet.Engine.Transports : 0     <FBIBLVI>
06/07/07 02:44:28.917 INFO Servlet.Engine.Transports : 0         <PRODUCT_ID>Z</PRODUCT_ID>
06/07/07 02:44:28.918 INFO Servlet.Engine.Transports : 0         <client_group_id>xxxxxxxxxx</client_group_id>
06/07/07 02:44:28.918 INFO Servlet.Engine.Transports : 0         <portal_id>xxxxxxxxxx</portal_id>
06/07/07 02:44:28.919 INFO Servlet.Engine.Transports : 0         <incl_rules_table>N</incl_rules_table>
06/07/07 02:44:28.920 INFO Servlet.Engine.Transports : 0     </FBIBLVI>
06/07/07 02:44:28.920 INFO Servlet.Engine.Transports : 0 </FBIBLV_REQUEST>
Basically this program I am writing is going to be a search. Each day, this log file will be parsed somewhere to a database, so a user can log on to a site, search via a field (such as date, time, etc.) and get a list of results. The database does not need to be viewable to the user, only their search results.

At this point, I am thinking of parsing with PHP, making a MySQL database to hold the data, and searching the database with PHP.

Any ideas, help, guidance, whatever is greatly appreciated.

Edit: If PHP and MySQL end up being the languages I use, I would need a testing server, right? I can't just test these scripts on my computer?
MySql and PHP can be installed locally. You can reach your server by going to locahost:80

I just checked my apps folder and I can't seem to find the app that does this server. Do a quick "PHP and MYSQL server for OS X" search on google.

EDIT: I tried that search and I can't find it. Damn.
 

barr08

macrumors 65816
Original poster
Aug 9, 2006
1,361
0
Boston, MA
MySql and PHP can be installed locally. You can reach your server by going to locahost:80

I just checked my apps folder and I can't seem to find the app that does this server. Do a quick "PHP and MYSQL server for OS X" search on google.

EDIT: I tried that search and I can't find it. Damn.
I am actually stuck at work on this XP box, which might make looking for something like that easier. I'll poke around google for something.
Does anyone know how I could do this on a windows machine? I am coming up short here.
 

macfaninpdx

macrumors regular
Mar 6, 2007
198
0
Does anyone know how I could do this on a windows machine? I am coming up short here.
You could either install them separately, by going to their respective sites (mySQL and PHP), or you could use packaged solution such as XAMPP, which will install Apache, MySQL, PHP & PEAR, and PERL along with several other useful utilities.

The advantage to installing them individually is that you only install the pieces you need. The advantage to using something like XAMPP is that it gives you administration utilities as well.
 

barr08

macrumors 65816
Original poster
Aug 9, 2006
1,361
0
Boston, MA
You could either install them separately, by going to their respective sites (mySQL and PHP), or you could use packaged solution such as XAMPP, which will install Apache, MySQL, PHP & PEAR, and PERL along with several other useful utilities.

The advantage to installing them individually is that you only install the pieces you need. The advantage to using something like XAMPP is that it gives you administration utilities as well.
Awesome, thanks. That program is exactly what I needed.

Now back to the original problem...:(
 

macfaninpdx

macrumors regular
Mar 6, 2007
198
0
I need to parse text from a text file in this form:

06/07/07 02:44:29.966 INFO Servlet.Engine.Transports : 0 accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2

To be split accordingly:

06/07/07 / 02:44:29.966 / INFO /Servlet.Engine.Transports : 0 / accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2

Into a 5 column HTML table with the fields Date, Time, Level, ID, and Info. These columns are seperated above with the bold shashies.
If you use PHP for you original question, you could try something like the following:
PHP:
$logfile = file("\\path\\to\\your\\file");

foreach ($logfile AS $linenum => $line) {
    $logdate = substr ($line, 0, 8);
    $logtime = substr ($line, 9, 12);
    $level = substr ($line, 22, 4);
    $id = substr ($line, 27, 29);
    $info = rtrim (substr ($line, 57) );
}
Doing it this way (with substr) means that the date, time, level and id must be the same exact length every time, or it will break the line in the wrong spot. So a RegEx expression would definitely be more robust, but since I do not know regex expressions very well :eek:, this is another solution.
 

pengu

macrumors 6502a
Mar 20, 2005
576
0
Diddily Daddily...
firstly. i suggest you install apache2, php5 and mysql5 seperately. there are very easy to follow tutorials, and you aren't likely to need to mess with them much once its set up.

as for the log. i'll have a look @ it tomorrow while im at work (its 1.30am here and wayyy too late to be thinking technically) and see if i can offer a basic solution for extracting/splitting the string values.
 

barr08

macrumors 65816
Original poster
Aug 9, 2006
1,361
0
Boston, MA
firstly. i suggest you install apache2, php5 and mysql5 seperately. there are very easy to follow tutorials, and you aren't likely to need to mess with them much once its set up.
I would do this, but getting programs installed on these work computers is a huge hassle. The all in one solution would be much easier and less time consuming to get installed on this box.
 

macfaninpdx

macrumors regular
Mar 6, 2007
198
0
I just thought of one more solution to splitting the log file entries, although it still may not be as efficient as a properly written regex:
PHP:
$logfile = file("\\path\\to\\your\\file"); 

foreach ($logfile AS $linenum => $line) { 
    //explode line using space delimiter
    list ($logdate, $logtime, $level, $id1, $id2, $id3, $info) = explode (" ", rtrim( $line ), 7);
    //combine id variables into one
    $id = implode (" ", array ($id1, $id2, $id3) );
}
Since the explode function is using a space character to split the line, the ID field was split into three parts. I used the implode line to combine them again. Also, since there is a limit of 7 on the explode function, the remaining info in the INFO field will all be kept together (not separated).
 

barr08

macrumors 65816
Original poster
Aug 9, 2006
1,361
0
Boston, MA
If you use PHP for you original question, you could try something like the following:
PHP:
$logfile = file("\\path\\to\\your\\file");

foreach ($logfile AS $linenum => $line) {
    $logdate = substr ($line, 0, 8);
    $logtime = substr ($line, 9, 12);
    $level = substr ($line, 22, 4);
    $id = substr ($line, 27, 29);
    $info = rtrim (substr ($line, 57) );
}
Doing it this way (with substr) means that the date, time, level and id must be the same exact length every time, or it will break the line in the wrong spot. So a RegEx expression would definitely be more robust, but since I do not know regex expressions very well :eek:, this is another solution.
I just thought of one more solution to splitting the log file entries, although it still may not be as efficient as a properly written regex:
PHP:
$logfile = file("\\path\\to\\your\\file"); 

foreach ($logfile AS $linenum => $line) { 
    //explode line using space delimiter
    list ($logdate, $logtime, $level, $id1, $id2, $id3, $info) = explode (" ", rtrim( $line ), 7);
    //combine id variables into one
    $id = implode (" ", array ($id1, $id2, $id3) );
}
Since the explode function is using a space character to split the line, the ID field was split into three parts. I used the implode line to combine them again. Also, since there is a limit of 7 on the explode function, the remaining info in the INFO field will all be kept together (not separated).
Thanks for these! I think I am going to try to use the second as a starting point for now. Unfortunately I can't test anything here at work because they have yet to supply me with a server or let me install any programs, and I can't access my school's testing servers because work blocks them.
 

stndn

macrumors member
Oct 22, 2006
79
0
earth
I've just seen the sample log files you provided, and it's safe to say that the example code with regex won't work because I used the keyword 'accept' as one of the separators towards the end.

The key with text parsing is identifying consistent section(s) of the texts you are trying to parse.

From your example log, it's safe to assume that the part that will be consistent is the timestamp. It's not clear yet (to us) whether the texts 'INFO Servlet.Engine.Transports : 0' will always be there as well. If yes, then it can be used as separator if you are going through the regex way.

If you are using the substr () or explode () way per what's given by macfaninpdx, then you need to make sure that the texts and spacings are consistent among different parts of the log.

Otherwise, you might consider mixing regex and substr/explode as well.

-stndn.
 

barr08

macrumors 65816
Original poster
Aug 9, 2006
1,361
0
Boston, MA
I've just seen the sample log files you provided, and it's safe to say that the example code with regex won't work because I used the keyword 'accept' as one of the separators towards the end.

The key with text parsing is identifying consistent section(s) of the texts you are trying to parse.

From your example log, it's safe to assume that the part that will be consistent is the timestamp. It's not clear yet (to us) whether the texts 'INFO Servlet.Engine.Transports : 0' will always be there as well. If yes, then it can be used as separator if you are going through the regex way.

If you are using the substr () or explode () way per what's given by macfaninpdx, then you need to make sure that the texts and spacings are consistent among different parts of the log.

Otherwise, you might consider mixing regex and substr/explode as well.

-stndn.
I don't know anything about regex, can you give an example?
 

krunk

macrumors regular
Jan 29, 2004
236
0
I don't know anything about regex, can you give an example?
Regex is extremely useful, but if your learning can be very time consuming ironing out the special cases and honing your regex.

If your doing this to learn how to process files, regex is a great thing to pick up toward that end. If your doing it out of some need and getting it done, done right, and done quickly is your primary concern. There are other methods with a lower learning curve...like below.

========================Ruby Code================
file = File.new('/path/to/file')
html = Fie.new('/path/to/html')

html.puts "<table>"
html.puts "<tr>"
html.puts "<th>DATE</th>"
html.puts "<th>TIME</th>"
html.puts "<th>LEVEL</th>"
html.puts "<th>ID</th>"
html.puts "<th>INFO</th>"
html.puts "</tr>"

file.readlines.each do |line|
l = line.split # l => ["06/07/07", "02:44:29.966", "INFO", "Servlet.Engine.Transports", ":", "0", "accept:", "text/html,", "image/gif,", "image/jpeg,", "*;", "q=.2,", "*/*; q=.2"]

date = l[0]
time = l[1]
level = l[2]
id = l[3] << l[3] << l[4]
info = l[4..l.length]

html.puts "<tr>"
html.puts "<td>#{date}</td>"
html.puts "<td>#{time}</td>"
html.puts "<td>#{level}</td>"
html.puts "<td>#{id}</td>"
html.puts "<td>#{info}</td>"
html.puts "</tr>"
end

html.puts "</table>"
html.close


==============================

That's ruby, the # beside the line.split is just a comment showing contents of the returned array. By default, split() separates a line by spaces into a new array.
 

barr08

macrumors 65816
Original poster
Aug 9, 2006
1,361
0
Boston, MA
I've just seen the sample log files you provided, and it's safe to say that the example code with regex won't work because I used the keyword 'accept' as one of the separators towards the end.

The key with text parsing is identifying consistent section(s) of the texts you are trying to parse.

From your example log, it's safe to assume that the part that will be consistent is the timestamp. It's not clear yet (to us) whether the texts 'INFO Servlet.Engine.Transports : 0' will always be there as well. If yes, then it can be used as separator if you are going through the regex way.

If you are using the substr () or explode () way per what's given by macfaninpdx, then you need to make sure that the texts and spacings are consistent among different parts of the log.

Otherwise, you might consider mixing regex and substr/explode as well.

-stndn.
I am unclear on this also. Let me look into it and get back to you.
 

stndn

macrumors member
Oct 22, 2006
79
0
earth
The first thing to note before using regex for anything is to make sure you can make a pattern out of what you are trying to process. Without a fixed pattern (regardless of how simple or complex it is), regex won't work. Actually, this is true for any form of (mostly text) parsing.

To learn more about regex, you may look at http://www.regular-expressions.info/.

It's not an easy learning experience, but it's worth it if you have time. Otherwise, start with simple (temporary) solution using explode() or other methods.


-stndn.
 

barr08

macrumors 65816
Original poster
Aug 9, 2006
1,361
0
Boston, MA
From your example log, it's safe to assume that the part that will be consistent is the timestamp. It's not clear yet (to us) whether the texts 'INFO Servlet.Engine.Transports : 0' will always be there as well. If yes, then it can be used as separator if you are going through the regex way.



-stndn.
OK I looked into this further, the parts with a constant size include the date, time, and the "Servlet.Engine.Transports : ". The number after that can be anywhere from one to three digits. Every log file will have those three things being the same number of digits. Does that help at all?

I put another chunk for you to take a look at for easy reference.

Code:
06/07/07 02:44:27.585 INFO Servlet.Engine.Transports : 0 Type: TEST
06/07/07 02:44:27.586 INFO Servlet.Engine.Transports : 0 Site User: QA_BPORTAL
06/07/07 02:44:27.586 INFO Servlet.Engine.Transports : 0 Product:ProductId: Broker Portal Name: Broker Portal
06/07/07 02:44:27.588 INFO Servlet.Engine.Transports : 0 Retrieving user ent list
06/07/07 02:44:27.619 INFO Servlet.Engine.Transports : 0 Retrieve ent names for user: 3771000016, contract: qa_bportal
The red parts are of constant size, the size of the part in bold blue ranges largely, but is always the end of the string:

06/07/07 02:44:27.619 INFO Servlet.Engine.Transports : 0 Retrieve ent names for user: 3771000016, contract: qa_bportal