PDA

View Full Version : Text Parser: Text to MySQL Database




barr08
Jul 26, 2007, 11:46 AM
Oh hey,

I need to parse text from a text file in this form:

06/07/07 02:44:29.966 INFO Servlet.Engine.Transports : 0 accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2

To be split accordingly:

06/07/07 / 02:44:29.966 / INFO /Servlet.Engine.Transports : 0 / accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2

Into a 5 column HTML table with the fields Date, Time, Level, ID, and Info. These columns are seperated above with the bold shashies.

Any ideas? PERL isn't neccesary, it is just what I was thinking of using. Even a start would be apprectiated, you don't need to give me a full working script or anything.

Thanks for looking!



ATG
Jul 26, 2007, 02:21 PM
Use regexes (http://www.regular-expressions.info/). AGRegex (http://sourceforge.net/projects/agkit) is the best I've found (and it has my initials :cool: )

EDIT: Oh, sorry, I didn't ask which language you are using :\

barr08
Jul 26, 2007, 02:41 PM
Use regexes (http://www.regular-expressions.info/). AGRegex (http://sourceforge.net/projects/agkit) is the best I've found (and it has my initials :cool: )

EDIT: Oh, sorry, I didn't ask which language you are using :\

I am trying to use PERL, but really at this point I would try anything. After I do this, my next step is to search these fields, so regexes will deffinitely be useful.

stndn
Jul 26, 2007, 10:32 PM
Here's something to get you started:

#!/usr/bin/perl -w

my $sampleText = "06/07/07 02:44:29.966 INFO Servlet.Engine.Transports : 0 accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2";

$sampleText =~ m|^([^\s]+)\s+([^\s]+)\s+([^\s]+)\s+(.+?)(accept.*)$|i;

print "$1 / $2 / $3 / $4 / $5\n";


Not the prettiest sight and most likely unefficient, but at least it should give you some ideas.


-stndn.

barr08
Jul 30, 2007, 09:13 AM
Thanks for the sample. I should probably admit I am a complete PERL noob, so how would I go by making this work. Do I just make a text file and ad ".pl" to the end of it? What program would I use to run the script?

If PERL isn't the best choice for this task, could someone reccomend a better language to do this in?

iBert
Jul 30, 2007, 03:55 PM
Perl is a pretty good scripting language specially for parsing text. At least in my opinion.

When are you going to need the parser to be called? Is it like a web service (just to say something) or are you manipulating something in C, C++, Java, etc and need to call this parser?

stndn
Jul 30, 2007, 08:29 PM
I thought you wanted to have something in Perl, and then I realized you mentioned that Perl is not necessary in your initial post ,p

iBert has the right questions, which I'm going to add to: Is this text parsing a one-time deal? Or are you going to run the script every time you want to show the output in other format?

If it's a one time deal, then you can take the Perl script as your base, make some changes, save it as filename.pl, and run the script through perl parser (one of which can be done from command line / terminal by typing: perl filename.pl).

If it's an ongoing parsing (done per request), maybe change it to something like PHP or whatnot. Not that Perl can't be used for web, but PHP can be mixed with HTML more easily / intuitively than Perl, which seems like what you're looking for (add HTML to the output).

-stndn.

barr08
Jul 31, 2007, 08:37 AM
I thought you wanted to have something in Perl, and then I realized you mentioned that Perl is not necessary in your initial post ,p

iBert has the right questions, which I'm going to add to: Is this text parsing a one-time deal? Or are you going to run the script every time you want to show the output in other format?

If it's a one time deal, then you can take the Perl script as your base, make some changes, save it as filename.pl, and run the script through perl parser (one of which can be done from command line / terminal by typing: perl filename.pl).

If it's an ongoing parsing (done per request), maybe change it to something like PHP or whatnot. Not that Perl can't be used for web, but PHP can be mixed with HTML more easily / intuitively than Perl, which seems like what you're looking for (add HTML to the output).

-stndn.

This will be ongoing, per user request, so maybe PHP would be better? I am thinking of parsing the data to a MySQl database, though I am not sure right now if my company supports it.

So should I ditch PERL and go with PHP? I do know that PHP and MySQL work well together (as far as I can remember)

pengu
Jul 31, 2007, 09:10 AM
PHP won't have an issue doing it (neither will Perl) but as stated, PHP is more aimed at mixing with/generating HTML.

You didn't really specify what parts (other than date/time) are likely to change? it APPEARS to be some kind of serverlog, relating to maybe Tomcat or BES or some other JSP/J2EE related server?

If you could give some examples of what other values you might get, we can probably help with a more accurate/specific/efficient method.

barr08
Jul 31, 2007, 09:37 AM
Here is a chunk of a log file:


06/07/07 02:44:27.585 INFO Servlet.Engine.Transports : 0 Type: TEST
06/07/07 02:44:27.586 INFO Servlet.Engine.Transports : 0 Site User: QA_BPORTAL
06/07/07 02:44:27.586 INFO Servlet.Engine.Transports : 0 Product:ProductId: Broker Portal Name: Broker Portal
06/07/07 02:44:27.588 INFO Servlet.Engine.Transports : 0 Retrieving user ent list
06/07/07 02:44:27.619 INFO Servlet.Engine.Transports : 0 Retrieve ent names for user: 3771000016, contract: qa_bportal
06/07/07 02:44:28.064 INFO Servlet.Engine.Transports : 0 ibg.menu.db.UserEntListDBBuilder statistics: proc = pk_portal.fn_get_ent_nm_by_usr_ent_typ( ?,?,?,?,? ) connMillis = 7 prepareMillis = 16 executeMillis = 75 processRowsMillis = 117 postProcessMillis = 0 subClassLogInfo =
06/07/07 02:44:28.067 INFO Servlet.Engine.Transports : 0 Linking site attributes
06/07/07 02:44:28.075 INFO Servlet.Engine.Transports : 0 Storing ent set for BROK back to user
06/07/07 02:44:28.832 INFO Servlet.Engine.Transports : 0 ibg.db.user.UserProductDBBuilder statistics: proc = ESDBO.PK_PORTAL.FN_GET_USER_PRODUCT_INFO( ?,?,?,? ) connMillis = 7 prepareMillis = 1 executeMillis = 5 processRowsMillis = 1 postProcessMillis = 0 subClassLogInfo =
06/07/07 02:44:28.887 INFO Servlet.Engine.Transports : 0 ibg.db.user.UserCompCheckDBBuilder statistics: proc = ESDBO.PK_ADMIN_USER.FN_CHECK_PARENT_COMPANY(?,?,?) connMillis = 4 prepareMillis = 1 executeMillis = 14 processRowsMillis = 0 postProcessMillis = 33 subClassLogInfo =
06/07/07 02:44:28.902 INFO Servlet.Engine.Transports : 0 ibg.db.user.UserCompCheckDBBuilder statistics: proc = ESDBO.PK_ADMIN_USER.FN_CHECK_PARENT_COMPANY(?,?,?) connMillis = 6 prepareMillis = 1 executeMillis = 4 processRowsMillis = 0 postProcessMillis = 1 subClassLogInfo =
06/07/07 02:44:28.910 INFO Servlet.Engine.Transports : 0 POST /XML_IBGZP_A HTTP/1.1
06/07/07 02:44:28.911 INFO Servlet.Engine.Transports : 0 content-type: application/xml
06/07/07 02:44:28.911 INFO Servlet.Engine.Transports : 0 connection: keep-alive
06/07/07 02:44:28.912 INFO Servlet.Engine.Transports : 0 accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
06/07/07 02:44:28.912 INFO Servlet.Engine.Transports : 0 host: Unknown
06/07/07 02:44:28.913 INFO Servlet.Engine.Transports : 0 content-length: 302
06/07/07 02:44:28.913 INFO Servlet.Engine.Transports : 0 user-agent: HttpConnection
06/07/07 02:44:28.914 INFO Servlet.Engine.Transports : 0
06/07/07 02:44:28.915 INFO Servlet.Engine.Transports : 0 <?xml version="1.0" ?>
06/07/07 02:44:28.915 INFO Servlet.Engine.Transports : 0 <FBIBLV_REQUEST>
06/07/07 02:44:28.916 INFO Servlet.Engine.Transports : 0 <PSL_VIEW>FBIBLV.VIEW</PSL_VIEW>
06/07/07 02:44:28.917 INFO Servlet.Engine.Transports : 0 <FBIBLVI>
06/07/07 02:44:28.917 INFO Servlet.Engine.Transports : 0 <PRODUCT_ID>Z</PRODUCT_ID>
06/07/07 02:44:28.918 INFO Servlet.Engine.Transports : 0 <client_group_id>xxxxxxxxxx</client_group_id>
06/07/07 02:44:28.918 INFO Servlet.Engine.Transports : 0 <portal_id>xxxxxxxxxx</portal_id>
06/07/07 02:44:28.919 INFO Servlet.Engine.Transports : 0 <incl_rules_table>N</incl_rules_table>
06/07/07 02:44:28.920 INFO Servlet.Engine.Transports : 0 </FBIBLVI>
06/07/07 02:44:28.920 INFO Servlet.Engine.Transports : 0 </FBIBLV_REQUEST>


Basically this program I am writing is going to be a search. Each day, this log file will be parsed somewhere to a database, so a user can log on to a site, search via a field (such as date, time, etc.) and get a list of results. The database does not need to be viewable to the user, only their search results.

At this point, I am thinking of parsing with PHP, making a MySQL database to hold the data, and searching the database with PHP.

Any ideas, help, guidance, whatever is greatly appreciated.

Edit: If PHP and MySQL end up being the languages I use, I would need a testing server, right? I can't just test these scripts on my computer?

bobber205
Jul 31, 2007, 09:51 AM
Here is a chunk of a log file:


06/07/07 02:44:27.585 INFO Servlet.Engine.Transports : 0 Type: TEST
06/07/07 02:44:27.586 INFO Servlet.Engine.Transports : 0 Site User: QA_BPORTAL
06/07/07 02:44:27.586 INFO Servlet.Engine.Transports : 0 Product:ProductId: Broker Portal Name: Broker Portal
06/07/07 02:44:27.588 INFO Servlet.Engine.Transports : 0 Retrieving user ent list
06/07/07 02:44:27.619 INFO Servlet.Engine.Transports : 0 Retrieve ent names for user: 3771000016, contract: qa_bportal
06/07/07 02:44:28.064 INFO Servlet.Engine.Transports : 0 ibg.menu.db.UserEntListDBBuilder statistics: proc = pk_portal.fn_get_ent_nm_by_usr_ent_typ( ?,?,?,?,? ) connMillis = 7 prepareMillis = 16 executeMillis = 75 processRowsMillis = 117 postProcessMillis = 0 subClassLogInfo =
06/07/07 02:44:28.067 INFO Servlet.Engine.Transports : 0 Linking site attributes
06/07/07 02:44:28.075 INFO Servlet.Engine.Transports : 0 Storing ent set for BROK back to user
06/07/07 02:44:28.832 INFO Servlet.Engine.Transports : 0 ibg.db.user.UserProductDBBuilder statistics: proc = ESDBO.PK_PORTAL.FN_GET_USER_PRODUCT_INFO( ?,?,?,? ) connMillis = 7 prepareMillis = 1 executeMillis = 5 processRowsMillis = 1 postProcessMillis = 0 subClassLogInfo =
06/07/07 02:44:28.887 INFO Servlet.Engine.Transports : 0 ibg.db.user.UserCompCheckDBBuilder statistics: proc = ESDBO.PK_ADMIN_USER.FN_CHECK_PARENT_COMPANY(?,?,?) connMillis = 4 prepareMillis = 1 executeMillis = 14 processRowsMillis = 0 postProcessMillis = 33 subClassLogInfo =
06/07/07 02:44:28.902 INFO Servlet.Engine.Transports : 0 ibg.db.user.UserCompCheckDBBuilder statistics: proc = ESDBO.PK_ADMIN_USER.FN_CHECK_PARENT_COMPANY(?,?,?) connMillis = 6 prepareMillis = 1 executeMillis = 4 processRowsMillis = 0 postProcessMillis = 1 subClassLogInfo =
06/07/07 02:44:28.910 INFO Servlet.Engine.Transports : 0 POST /XML_IBGZP_A HTTP/1.1
06/07/07 02:44:28.911 INFO Servlet.Engine.Transports : 0 content-type: application/xml
06/07/07 02:44:28.911 INFO Servlet.Engine.Transports : 0 connection: keep-alive
06/07/07 02:44:28.912 INFO Servlet.Engine.Transports : 0 accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
06/07/07 02:44:28.912 INFO Servlet.Engine.Transports : 0 host: Unknown
06/07/07 02:44:28.913 INFO Servlet.Engine.Transports : 0 content-length: 302
06/07/07 02:44:28.913 INFO Servlet.Engine.Transports : 0 user-agent: HttpConnection
06/07/07 02:44:28.914 INFO Servlet.Engine.Transports : 0
06/07/07 02:44:28.915 INFO Servlet.Engine.Transports : 0 <?xml version="1.0" ?>
06/07/07 02:44:28.915 INFO Servlet.Engine.Transports : 0 <FBIBLV_REQUEST>
06/07/07 02:44:28.916 INFO Servlet.Engine.Transports : 0 <PSL_VIEW>FBIBLV.VIEW</PSL_VIEW>
06/07/07 02:44:28.917 INFO Servlet.Engine.Transports : 0 <FBIBLVI>
06/07/07 02:44:28.917 INFO Servlet.Engine.Transports : 0 <PRODUCT_ID>Z</PRODUCT_ID>
06/07/07 02:44:28.918 INFO Servlet.Engine.Transports : 0 <client_group_id>xxxxxxxxxx</client_group_id>
06/07/07 02:44:28.918 INFO Servlet.Engine.Transports : 0 <portal_id>xxxxxxxxxx</portal_id>
06/07/07 02:44:28.919 INFO Servlet.Engine.Transports : 0 <incl_rules_table>N</incl_rules_table>
06/07/07 02:44:28.920 INFO Servlet.Engine.Transports : 0 </FBIBLVI>
06/07/07 02:44:28.920 INFO Servlet.Engine.Transports : 0 </FBIBLV_REQUEST>


Basically this program I am writing is going to be a search. Each day, this log file will be parsed somewhere to a database, so a user can log on to a site, search via a field (such as date, time, etc.) and get a list of results. The database does not need to be viewable to the user, only their search results.

At this point, I am thinking of parsing with PHP, making a MySQL database to hold the data, and searching the database with PHP.

Any ideas, help, guidance, whatever is greatly appreciated.

Edit: If PHP and MySQL end up being the languages I use, I would need a testing server, right? I can't just test these scripts on my computer?

MySql and PHP can be installed locally. You can reach your server by going to locahost:80

I just checked my apps folder and I can't seem to find the app that does this server. Do a quick "PHP and MYSQL server for OS X" search on google.

EDIT: I tried that search and I can't find it. Damn.

barr08
Jul 31, 2007, 09:53 AM
MySql and PHP can be installed locally. You can reach your server by going to locahost:80

I just checked my apps folder and I can't seem to find the app that does this server. Do a quick "PHP and MYSQL server for OS X" search on google.

EDIT: I tried that search and I can't find it. Damn.

I am actually stuck at work on this XP box, which might make looking for something like that easier. I'll poke around google for something.
Does anyone know how I could do this on a windows machine? I am coming up short here.

macfaninpdx
Jul 31, 2007, 10:08 AM
Does anyone know how I could do this on a windows machine? I am coming up short here.

You could either install them separately, by going to their respective sites (mySQL (http://mysql.org/downloads/mysql/5.0.html#win32) and PHP (http://www.php.net/downloads.php)), or you could use packaged solution such as XAMPP (http://www.apachefriends.org/en/xampp.html), which will install Apache, MySQL, PHP & PEAR, and PERL along with several other useful utilities.

The advantage to installing them individually is that you only install the pieces you need. The advantage to using something like XAMPP is that it gives you administration utilities as well.

barr08
Jul 31, 2007, 10:13 AM
You could either install them separately, by going to their respective sites (mySQL (http://mysql.org/downloads/mysql/5.0.html#win32) and PHP (http://www.php.net/downloads.php)), or you could use packaged solution such as XAMPP (http://www.apachefriends.org/en/xampp.html), which will install Apache, MySQL, PHP & PEAR, and PERL along with several other useful utilities.

The advantage to installing them individually is that you only install the pieces you need. The advantage to using something like XAMPP is that it gives you administration utilities as well.

Awesome, thanks. That program is exactly what I needed.

Now back to the original problem...:(

macfaninpdx
Jul 31, 2007, 10:26 AM
I need to parse text from a text file in this form:

06/07/07 02:44:29.966 INFO Servlet.Engine.Transports : 0 accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2

To be split accordingly:

06/07/07 / 02:44:29.966 / INFO /Servlet.Engine.Transports : 0 / accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2

Into a 5 column HTML table with the fields Date, Time, Level, ID, and Info. These columns are seperated above with the bold shashies.

If you use PHP for you original question, you could try something like the following:
$logfile = file("\\path\\to\\your\\file");

foreach ($logfile AS $linenum => $line) {
$logdate = substr ($line, 0, 8);
$logtime = substr ($line, 9, 12);
$level = substr ($line, 22, 4);
$id = substr ($line, 27, 29);
$info = rtrim (substr ($line, 57) );
}

Doing it this way (with substr) means that the date, time, level and id must be the same exact length every time, or it will break the line in the wrong spot. So a RegEx expression would definitely be more robust, but since I do not know regex expressions very well :o, this is another solution.

pengu
Jul 31, 2007, 10:33 AM
firstly. i suggest you install apache2, php5 and mysql5 seperately. there are very easy to follow tutorials, and you aren't likely to need to mess with them much once its set up.

as for the log. i'll have a look @ it tomorrow while im at work (its 1.30am here and wayyy too late to be thinking technically) and see if i can offer a basic solution for extracting/splitting the string values.

barr08
Jul 31, 2007, 10:36 AM
firstly. i suggest you install apache2, php5 and mysql5 seperately. there are very easy to follow tutorials, and you aren't likely to need to mess with them much once its set up.



I would do this, but getting programs installed on these work computers is a huge hassle. The all in one solution would be much easier and less time consuming to get installed on this box.

macfaninpdx
Jul 31, 2007, 10:42 AM
I just thought of one more solution to splitting the log file entries, although it still may not be as efficient as a properly written regex:
$logfile = file("\\path\\to\\your\\file");

foreach ($logfile AS $linenum => $line) {
//explode line using space delimiter
list ($logdate, $logtime, $level, $id1, $id2, $id3, $info) = explode (" ", rtrim( $line ), 7);
//combine id variables into one
$id = implode (" ", array ($id1, $id2, $id3) );
}

Since the explode function is using a space character to split the line, the ID field was split into three parts. I used the implode line to combine them again. Also, since there is a limit of 7 on the explode function, the remaining info in the INFO field will all be kept together (not separated).

barr08
Jul 31, 2007, 01:44 PM
If you use PHP for you original question, you could try something like the following:
$logfile = file("\\path\\to\\your\\file");

foreach ($logfile AS $linenum => $line) {
$logdate = substr ($line, 0, 8);
$logtime = substr ($line, 9, 12);
$level = substr ($line, 22, 4);
$id = substr ($line, 27, 29);
$info = rtrim (substr ($line, 57) );
}

Doing it this way (with substr) means that the date, time, level and id must be the same exact length every time, or it will break the line in the wrong spot. So a RegEx expression would definitely be more robust, but since I do not know regex expressions very well :o, this is another solution.

I just thought of one more solution to splitting the log file entries, although it still may not be as efficient as a properly written regex:
$logfile = file("\\path\\to\\your\\file");

foreach ($logfile AS $linenum => $line) {
//explode line using space delimiter
list ($logdate, $logtime, $level, $id1, $id2, $id3, $info) = explode (" ", rtrim( $line ), 7);
//combine id variables into one
$id = implode (" ", array ($id1, $id2, $id3) );
}

Since the explode function is using a space character to split the line, the ID field was split into three parts. I used the implode line to combine them again. Also, since there is a limit of 7 on the explode function, the remaining info in the INFO field will all be kept together (not separated).

Thanks for these! I think I am going to try to use the second as a starting point for now. Unfortunately I can't test anything here at work because they have yet to supply me with a server or let me install any programs, and I can't access my school's testing servers because work blocks them.

stndn
Jul 31, 2007, 09:10 PM
I've just seen the sample log files you provided, and it's safe to say that the example code with regex won't work because I used the keyword 'accept' as one of the separators towards the end.

The key with text parsing is identifying consistent section(s) of the texts you are trying to parse.

From your example log, it's safe to assume that the part that will be consistent is the timestamp. It's not clear yet (to us) whether the texts 'INFO Servlet.Engine.Transports : 0' will always be there as well. If yes, then it can be used as separator if you are going through the regex way.

If you are using the substr () or explode () way per what's given by macfaninpdx, then you need to make sure that the texts and spacings are consistent among different parts of the log.

Otherwise, you might consider mixing regex and substr/explode as well.

-stndn.

barr08
Aug 1, 2007, 09:29 AM
I've just seen the sample log files you provided, and it's safe to say that the example code with regex won't work because I used the keyword 'accept' as one of the separators towards the end.

The key with text parsing is identifying consistent section(s) of the texts you are trying to parse.

From your example log, it's safe to assume that the part that will be consistent is the timestamp. It's not clear yet (to us) whether the texts 'INFO Servlet.Engine.Transports : 0' will always be there as well. If yes, then it can be used as separator if you are going through the regex way.

If you are using the substr () or explode () way per what's given by macfaninpdx, then you need to make sure that the texts and spacings are consistent among different parts of the log.

Otherwise, you might consider mixing regex and substr/explode as well.

-stndn.

I don't know anything about regex, can you give an example?

krunk
Aug 1, 2007, 10:36 AM
I don't know anything about regex, can you give an example?
Regex is extremely useful, but if your learning can be very time consuming ironing out the special cases and honing your regex.

If your doing this to learn how to process files, regex is a great thing to pick up toward that end. If your doing it out of some need and getting it done, done right, and done quickly is your primary concern. There are other methods with a lower learning curve...like below.

========================Ruby Code================
file = File.new('/path/to/file')
html = Fie.new('/path/to/html')

html.puts "<table>"
html.puts "<tr>"
html.puts "<th>DATE</th>"
html.puts "<th>TIME</th>"
html.puts "<th>LEVEL</th>"
html.puts "<th>ID</th>"
html.puts "<th>INFO</th>"
html.puts "</tr>"

file.readlines.each do |line|
l = line.split # l => ["06/07/07", "02:44:29.966", "INFO", "Servlet.Engine.Transports", ":", "0", "accept:", "text/html,", "image/gif,", "image/jpeg,", "*;", "q=.2,", "*/*; q=.2"]

date = l[0]
time = l[1]
level = l[2]
id = l[3] << l[3] << l[4]
info = l[4..l.length]

html.puts "<tr>"
html.puts "<td>#{date}</td>"
html.puts "<td>#{time}</td>"
html.puts "<td>#{level}</td>"
html.puts "<td>#{id}</td>"
html.puts "<td>#{info}</td>"
html.puts "</tr>"
end

html.puts "</table>"
html.close


==============================

That's ruby, the # beside the line.split is just a comment showing contents of the returned array. By default, split() separates a line by spaces into a new array.

barr08
Aug 1, 2007, 01:14 PM
I've just seen the sample log files you provided, and it's safe to say that the example code with regex won't work because I used the keyword 'accept' as one of the separators towards the end.

The key with text parsing is identifying consistent section(s) of the texts you are trying to parse.

From your example log, it's safe to assume that the part that will be consistent is the timestamp. It's not clear yet (to us) whether the texts 'INFO Servlet.Engine.Transports : 0' will always be there as well. If yes, then it can be used as separator if you are going through the regex way.

If you are using the substr () or explode () way per what's given by macfaninpdx, then you need to make sure that the texts and spacings are consistent among different parts of the log.

Otherwise, you might consider mixing regex and substr/explode as well.

-stndn.

I am unclear on this also. Let me look into it and get back to you.

stndn
Aug 1, 2007, 08:22 PM
The first thing to note before using regex for anything is to make sure you can make a pattern out of what you are trying to process. Without a fixed pattern (regardless of how simple or complex it is), regex won't work. Actually, this is true for any form of (mostly text) parsing.

To learn more about regex, you may look at http://www.regular-expressions.info/.

It's not an easy learning experience, but it's worth it if you have time. Otherwise, start with simple (temporary) solution using explode() or other methods.


-stndn.

barr08
Aug 2, 2007, 12:31 PM
From your example log, it's safe to assume that the part that will be consistent is the timestamp. It's not clear yet (to us) whether the texts 'INFO Servlet.Engine.Transports : 0' will always be there as well. If yes, then it can be used as separator if you are going through the regex way.



-stndn.

OK I looked into this further, the parts with a constant size include the date, time, and the "Servlet.Engine.Transports : ". The number after that can be anywhere from one to three digits. Every log file will have those three things being the same number of digits. Does that help at all?

I put another chunk for you to take a look at for easy reference.

06/07/07 02:44:27.585 INFO Servlet.Engine.Transports : 0 Type: TEST
06/07/07 02:44:27.586 INFO Servlet.Engine.Transports : 0 Site User: QA_BPORTAL
06/07/07 02:44:27.586 INFO Servlet.Engine.Transports : 0 Product:ProductId: Broker Portal Name: Broker Portal
06/07/07 02:44:27.588 INFO Servlet.Engine.Transports : 0 Retrieving user ent list
06/07/07 02:44:27.619 INFO Servlet.Engine.Transports : 0 Retrieve ent names for user: 3771000016, contract: qa_bportal

The red parts are of constant size, the size of the part in bold blue ranges largely, but is always the end of the string:

06/07/07 02:44:27.619 INFO Servlet.Engine.Transports : 0 Retrieve ent names for user: 3771000016, contract: qa_bportal

stndn
Aug 2, 2007, 08:45 PM
Here's the updated version of the PHP code. Note that it's quickly put together so it might not be the best solution but it works for now. I'm only splitting the text for you. How you process it later is up to you -)

<?php
# Note: I modified some of the texts for testing purpose (the 1-3 digits after Transports part)

$myText[] = '06/07/07 02:44:28.917 INFO Servlet.Engine.Transports : 0 <FBIBLVI>';
$myText[] = '06/07/07 02:44:28.918 INFO Servlet.Engine.Transports : 01 <client_group_id>xxxxxxxxxx</client_group_id>';
$myText[] = '06/07/07 02:44:28.918 INFO Servlet.Engine.Transports : 04 <portal_id>xxxxxxxxxx</portal_id>';
$myText[] = '06/07/07 02:44:28.919 INFO Servlet.Engine.Transports : 098 <incl_rules_table>N</incl_rules_table>';
$myText[] = '06/07/07 02:44:27.585 INFO Servlet.Engine.Transports : 088 Type: TEST';
$myText[] = '06/07/07 02:44:27.586 INFO Servlet.Engine.Transports : 013 Site User: QA_BPORTAL';
$myText[] = '06/07/07 02:44:27.588 INFO Servlet.Engine.Transports : 0 Retrieving user ent list';
$myText[] = '06/07/07 02:44:27.619 INFO Servlet.Engine.Transports : 01 Retrieve ent names for user: 3771000016, contract: qa_bportal';

foreach ($myText as $val)
{
preg_match ('/^([^\s]+)\s+([^\s]+)\s+(.*?)\s+(Servlet.Engine.Transports : \d+)\s+(.*)$/i', $val, $myResult);

print_r ($myResult);
}

?>

For your reference, this is roughly what happens in the regex:

^ -> Start at the beginning of line
([\s])+) -> Grab 1 or more non-space characters and store it in placeholder 1 (note the () )
\s+ -> Grab 1 or more whitespace characters
([^\s]+) -> Grab 1 or more non-space characters and store it in placeholder 2
(.*?)\s+ -> Grab any characters before encountering a whitespace. Note the ?, which is a limiter. Place the 'any characters' in placeholder 3. Then grab 1 or more whitespace characters
(Servlet.Engine.Transports : \d+) -> Grab the Servlet.(etc) and then one or more digits, then store it in placeholder 4. Note that I used a single space around the : sign, but if it can be any type of whitespace, substitute it with \s+ as before.
\s+ -> Grab 1 or more whitespace
(.*) -> Grab 1 or more any characters and store it in placeholder 5.
$ -> Denotes the end of line.

The i at the end is for case insensitive matching. I put it there just to be safe.

Hope it helps.

-stndn.

Soulstorm
Aug 3, 2007, 01:21 AM
Sorry for messing up the thread, but could someone give any idea on how to answer the original question in C++?

iBert
Aug 3, 2007, 09:06 AM
Sorry for messing up the thread, but could someone give any idea on how to answer the original question in C++?

I'll suggest to use Perl or a better scripting language to work with text or strings. And then execute that script and read the new file in C++, I haven't used C++ in some time but I remember how trouble some it is to work with string in it. Just a suggestion!

barr08
Aug 3, 2007, 09:26 AM
Sorry for messing up the thread, but could someone give any idea on how to answer the original question in C++?

No worries!

I didn't even consider using C++, as I was already familiar with the parsing power of PHP and PERL. I used those as a starting point, and recently decided to go with PHP. I have been working on some code for a while, using the help I am getting here as a starting point, so I wouldn't really want to go back.

krunk
Aug 3, 2007, 11:26 AM
Sorry for messing up the thread, but could someone give any idea on how to answer the original question in C++?

If it's a flat text file that is not changing (or you don't care about constant monitoring just the data at a single point) this would suffice:

06/07/07 02:44:27.586 INFO Servlet.Engine.Transports : 013 Site User: QA_BPORTAL

C code

FILE *fp;

fp = fdopen(f, "r");
char* a[10],b[10],c[10],d[10],e[10],f[10],g[10], h[10], i[10], j[10];

// if you want to ignore (skip) a section, you add a * like this %*s
while(fscanf(fp, "%s %s %s %s %s %s %s %s %s %s", &a, &b, &c, &d, &e, &f, &g, &h, &i, &j)) {
// do stuff with your string values, output, write to file...including html
}

fp.close();


C++

fstream f.open("/path/to/log/file");

char* line[100];

while(f.getline(line, 100)) {
// and so on.
}


In C you have to do lower level processing of the input.

If you wish to "monitor" the file, as in sit and watch it for changes then do something on that event then even queues are more important. Event Queues depend on the Operating system, kqueue for bsd, epoll for linux, and something event manager or some such on Windows.

The bsd kqueue would look like:


struct kevent change;
kq = kqueue();
int f, kq, nev;

EV_SET(&change, f, EVFILT_VNODE, EV_ADD | EV_ENABLE | EV_ONESHOT, NOTE_WRITE | NOTE_EXTEND, 0,0);

for(;;) {
nev = kevent(kq, &change, 1, &event, 1, NULL);
if(nev == -1) {perror("kevent");}

else if(nev > 0) {
if(event.fflags & NOTE_WRITE || event.fflags & NOTE_EXTEND) {
this->functionToProcessDataStream(f);
}
close(kq);
close(f);
}
}


Those aren't debugged at all and are just meant to illustrate the basic pattern. If you go with the Event Queue you should look into an Observer/Dispatcher design pattern as they are particularly suited for such work.

barr08
Aug 6, 2007, 08:58 AM
OK I have been working on this for a while, and I have a decent script going, but I was wondering if anyone can help me out with my new problem.

I now need to take the script I have an do two things to it. First, I need to take data in from a text file, in the format I have been using in this thread, but thousands of lines of it. So the parser needs to pull the data from this text file, do the parsing (the parsing step I have under control), and then the second step is to unload it into a mysql database.

Any ideas or starting points would be great.
Thanks!

iBert
Aug 6, 2007, 09:35 AM
My first thought or idea would be read 1 line at a time parse it and push it to the DB. Problem with this is that it'll be too expensive in computing time. Since you want to upload this data to a DB, why not create an XML for the data you wish to push to the DB. So then you parse all the text file to an XML file and then tell the DB to read the XML file and store it.

A friend of mine works a lot with DBs and I've heard him talk about how sql can manipulate XMLs. Pretty sure MySQL can handle XML types. I guess with the second idea you do less computing or better word would be you don't do much work. Like on my first idea, where you read a line parse it then access a DB to store it vs read a line parse store it in an XML then dump all of the XML to DB.

barr08
Aug 6, 2007, 09:39 AM
My first thought or idea would be read 1 line at a time parse it and push it to the DB. Problem with this is that it'll be too expensive in computing time. Since you want to upload this data to a DB, why not create an XML for the data you wish to push to the DB. So then you parse all the text file to an XML file and then tell the DB to read the XML file and store it.

A friend of mine works a lot with DBs and I've heard him talk about how sql can manipulate XMLs. Pretty sure MySQL can handle XML types. I guess with the second idea you do less computing or better word would be you don't do much work. Like on my first idea, where you read a line parse it then access a DB to store it vs read a line parse store it in an XML then dump all of the XML to DB.

So instead of parsing directly to the DB, I could first parse to an XML file, and then parse from the XML to the DB? Would this be easier than going directly from text to the DB through PHP?

Would doing it one line at a time, without XML, take way too long? I am planning on running this at like midnight each night, and hoping for it to be available by like 7:00 AM.

iBert
Aug 6, 2007, 10:01 AM
Don't expect this to take hours, well I wouldn't. But I don't know how big of a text file are you going to be parsing. What I said was more from an execution perspective or an algorithm analysis, think about it if you do a manual work. If someone gives you a book full of things and you have to look for something specific to store elsewhere, would you organize all the stuff in the box given to you or would you stop and store the things as you find them? Remember that computers now a day and quite powerful to manipulate data, although for this think about where is that process gonna be running. A server or a simple PC.

Remember, to read a file that is 1 execution for each line then you execute your parser then you execute your storing. But if you read all your file to memory then parse it all to an XML file so that you can push the info to the DB. Remember that you need to open a connection to the DB and the push that one line, so if you have lets say 100 lines to parse that 100 inserts you'll be executing. You won't necessarily need to connect and close the DB for each line, but keeping a connection open for x amount of time to do this job could be a security breach you might want to avoid. Well this if this is running on a machine online. But if you got an XML file with all your data, you can then connect to the DB and tell the DB to read this XML and do the insert. So you'd connect to the DB tell the DB to read your XML insert the data and close.

The idea with the XML should or could be consider if MySQL can handle XML types, that way you upload your XML like a variable and SQL will handle the rest. Again, I haven't done anything with this but my friend uses it a lot when handling XMLs. One note he uses SQL Server from Microsoft, so not sure if that is only available to SQL Server or any DBM.

Another thing, this are just ideas that are coming as I've read this post. Maybe wait if others do some input or run this by anyone else you know you can talk to.

krunk
Aug 6, 2007, 11:33 AM
On line by line entry into the DB:

I've done this quite a bit, a 10,000+ line file can be parsed and entered into the DB using Perl's DBI package in less then 5m. So unless your dealing with 100,000 or more lines I'd say direct entry into MySQL should be fine.

barr08
Aug 6, 2007, 01:12 PM
On line by line entry into the DB:

I've done this quite a bit, a 10,000+ line file can be parsed and entered into the DB using Perl's DBI package in less then 5m. So unless your dealing with 100,000 or more lines I'd say direct entry into MySQL should be fine.

I will be dealing with considerably more than 100,000 lines. Suggestions?

krunk
Aug 6, 2007, 01:34 PM
I will be dealing with considerably more than 100,000 lines. Suggestions?

As a previous poster mentioned, you may want to two stage it then. Create a text file with the data in a "nice" format like xml, then load that into mysql.

If I'm not mistaken though, xml import support in mysql is pretty new and most people parse the xml file and then insert...much like your already doing.

CSV might be a better data storage as it is well supported by mysqlimport.

Best thing to do is lurk the mysql user forums and search their documentation to find the method that best suits you.

Most of these alternative methods have to do with preventing your server from being overly taxed during the import procedure. If that's not a concern, e.g. if you could set the script to run for an extended time (say late at night) without worry...mine as well just insert as you go. There's nothing inherently bad about taking 30m to an hour to import data if that's what it takes and it doesn't adversely effect users.

barr08
Aug 7, 2007, 09:47 AM
OK it has come time to test. I need a way to test my php and mysql code. How would you suggest doing this. I have installed XAMMP, but can't figure out how to use it for php. Any other suggestions?

Oh, and remember, I am in windows. Thanks.

krunk
Aug 7, 2007, 10:32 AM
OK it has come time to test. I need a way to test my php and mysql code. How would you suggest doing this. I have installed XAMMP, but can't figure out how to use it for php. Any other suggestions?

Oh, and remember, I am in windows. Thanks.

If integrating into an existing db copy that db over to "existing_dbname_testing. If not, just create your db. Take out a snippet, few 1000 lines, of the file for testing.

Write code, test, debug, fix, test, debug, fix. And so on until your sure it's good. Then run with it.

Make backups as needed.

barr08
Aug 8, 2007, 09:45 AM
If integrating into an existing db copy that db over to "existing_dbname_testing. If not, just create your db. Take out a snippet, few 1000 lines, of the file for testing.

Write code, test, debug, fix, test, debug, fix. And so on until your sure it's good. Then run with it.

Make backups as needed.

Don't I need to install something, like MySQL or PHP? I have never done this before, so you all need to be pretty specific. Thanks

krunk
Aug 8, 2007, 10:42 AM
Don't I need to install something, like MySQL or PHP? I have never done this before, so you all need to be pretty specific. Thanks

Not to be vague, but you need to install what you need hehe. If your serving it, you'll need an httpd server.

If your writing the script in php, you'll need php. . . if in perl you need to install perl.

It's been a long time since I fussed with windows, but you'll have to install all of these tools to do the job.

Perhaps a windows user can pipe in with some good tutorials on how to get these things up and running smoothly in that environment.

It bears mentioning that all these packages are included by default in osx and easily installed in any linux with a single command. Unless it's absolutely necessary to host on the windows machine, I personally would simply share the log directory via smb and do my work in a unix rather then go through the trouble of installing and configuring for a one shot job.

barr08
Aug 8, 2007, 11:54 AM
Perhaps a windows user can pipe in with some good tutorials on how to get these things up and running smoothly in that environment.

It bears mentioning that all these packages are included by default in osx and easily installed in any linux with a single command. Unless it's absolutely necessary to host on the windows machine, I personally would simply share the log directory via smb and do my work in a unix rather then go through the trouble of installing and configuring for a one shot job.

Well I don't need to host for anyone else to see, I just need a way to test. Hosting comes in from a different department.

Maybe a good php editor could do this. Does anyone know a free php editor that allows testing for windows? That would be awesome!

krunk
Aug 8, 2007, 04:25 PM
Well I don't need to host for anyone else to see, I just need a way to test. Hosting comes in from a different department.

Maybe a good php editor could do this. Does anyone know a free php editor that allows testing for windows? That would be awesome!

The best way to test for php is within a web server. For strictly scripts, ones that don't tie into server specific functions, testing is done by installing php then executing the script with the php cli interpreter.

Looks something like this:
$ php -e myUberParsingScript.php

macfaninpdx
Aug 13, 2007, 11:58 AM
OK it has come time to test. I need a way to test my php and mysql code. How would you suggest doing this. I have installed XAMMP, but can't figure out how to use it for php. Any other suggestions?

Oh, and remember, I am in windows. Thanks.

Don't I need to install something, like MySQL or PHP? I have never done this before, so you all need to be pretty specific. Thanks

If you have installed XAMPP, you have already installed your server testbed on your local machine. XAMPP includes Apache (for the http server), PHP and MySQL.

First, check your Start menu. Look for Program -> Apache Friends -> XAMPP -> XAMPP Control Panel. If you run this, a control panel should pop up telling you what is running. Click the Admin button next to Apache to see some links to a web-based administration for your installation. Follow the suggestions to get you started.

Then you can place your php folder (files) in c:\xampp\htdocs\ if you used the default location. Then fire up your browser of choice and head to http://localhost/<your_folder>/<your_file.php>.

Try to get that going before you dive into MySQL and setting up users, tables and permissions.

barr08
Aug 13, 2007, 01:08 PM
If you have installed XAMPP, you have already installed your server testbed on your local machine. XAMPP includes Apache (for the http server), PHP and MySQL.

First, check your Start menu. Look for Program -> Apache Friends -> XAMPP -> XAMPP Control Panel. If you run this, a control panel should pop up telling you what is running. Click the Admin button next to Apache to see some links to a web-based administration for your installation. Follow the suggestions to get you started.

Then you can place your php folder (files) in c:\xampp\htdocs\ if you used the default location. Then fire up your browser of choice and head to http://localhost/<your_folder>/<your_file.php>.

Try to get that going before you dive into MySQL and setting up users, tables and permissions.

Awesome, got the php to work. Thanks for the idiot-proof instructions :)

barr08
Aug 13, 2007, 01:11 PM
<?php
# Note: I modified some of the texts for testing purpose (the 1-3 digits after Transports part)

$myText[] = '06/07/07 02:44:28.917 INFO Servlet.Engine.Transports : 0 <FBIBLVI>';
$myText[] = '06/07/07 02:44:28.918 INFO Servlet.Engine.Transports : 01 <client_group_id>xxxxxxxxxx</client_group_id>';
$myText[] = '06/07/07 02:44:28.918 INFO Servlet.Engine.Transports : 04 <portal_id>xxxxxxxxxx</portal_id>';
$myText[] = '06/07/07 02:44:28.919 INFO Servlet.Engine.Transports : 098 <incl_rules_table>N</incl_rules_table>';
$myText[] = '06/07/07 02:44:27.585 INFO Servlet.Engine.Transports : 088 Type: TEST';
$myText[] = '06/07/07 02:44:27.586 INFO Servlet.Engine.Transports : 013 Site User: QA_BPORTAL';
$myText[] = '06/07/07 02:44:27.588 INFO Servlet.Engine.Transports : 0 Retrieving user ent list';
$myText[] = '06/07/07 02:44:27.619 INFO Servlet.Engine.Transports : 01 Retrieve ent names for user: 3771000016, contract: qa_bportal';

foreach ($myText as $val)
{
preg_match ('/^([^\s]+)\s+([^\s]+)\s+(.*?)\s+(Servlet.Engine.Transports : \d+)\s+(.*)$/i', $val, $myResult);

print_r ($myResult);
}

?>

This works perfectly. Now I need one more thing for this part of the process.

Right now this script is taking the sample lines of log data that is placed in the script manually. I need it to pull from the log file, in text form, and then parse it, without having to enter the lines automatically. So $myText has to be put there from the log file, not hard-written into the code. Is this possible?

The log data is in this format:
06/07/07 02:44:27.585 INFO Servlet.Engine.Transports : 0 Type: TEST
06/07/07 02:44:27.586 INFO Servlet.Engine.Transports : 0 Site User: QA_BPORTAL
06/07/07 02:44:27.586 INFO Servlet.Engine.Transports : 0 Product:ProductId: Broker Portal Name: Broker Portal
06/07/07 02:44:27.588 INFO Servlet.Engine.Transports : 0 Retrieving user ent list
06/07/07 02:44:27.619 INFO Servlet.Engine.Transports : 0 Retrieve ent names for user: 3771000016, contract: qa_bportal

A log file is just thousands of lines like these ones.

Thanks!

iBert
Aug 13, 2007, 02:04 PM
Yes barr08, should be kinda(quite) simple. Look at the read() function, I think that is the name, or google read function in php or something like that.

Just need to learn how reading a file works and you should be set. This last part you are on should be easy after you got all the parsing worked out. Couple of ways to do this, could read 1 line at a time or a x-lines at a time or the whole document. Once you get the general idea on how php read files and you can apply your parser test what works best. Hope that reading a file a few thousands of lines to memory won't kill the server. :)

barr08
Aug 13, 2007, 02:22 PM
Yes barr08, should be kinda(quite) simple. Look at the read() function, I think that is the name, or google read function in php or something like that.

Just need to learn how reading a file works and you should be set. This last part you are on should be easy after you got all the parsing worked out. Couple of ways to do this, could read 1 line at a time or a x-lines at a time or the whole document. Once you get the general idea on how php read files and you can apply your parser test what works best. Hope that reading a file a few thousands of lines to memory won't kill the server. :)

Thanks, I knew there was a name for it, I just couldn't remember it. I'll look into this.

macfaninpdx
Aug 13, 2007, 02:46 PM
$logfile = file("\\path\\to\\your\\file");

foreach ($logfile AS $linenum => $val) {
preg_match ('/^([^\s]+)\s+([^\s]+)\s+(.*?)\s+(Servlet.Engine.Transports : \d+)\s+(.*)$/i', $val, $myResult);

print_r ($myResult);
}

NOTE: The preg_match line was taken from your previous post - I haven't looked at it.

barr08
Aug 13, 2007, 03:32 PM
$logfile = file("\\path\\to\\your\\file");

foreach ($logfile AS $linenum => $val) {
preg_match ('/^([^\s]+)\s+([^\s]+)\s+(.*?)\s+(Servlet.Engine.Transports : \d+)\s+(.*)$/i', $val, $myResult);

print_r ($myResult);
}

NOTE: The preg_match line was taken from your previous post - I haven't looked at it.

Worked great, thanks a ton!

barr08
Aug 14, 2007, 09:11 AM
OK you all have been such a great help, there is one final stage in the journey of these log files. As seen in the title, I need to send the now parsed data into a table on a database.

Now my database knowledge is limited, as displayed through my previous posts on this thread, so it may be a little tricky.

Right now, the php parses the data to an html page. When the souce is viewed, it looks like this:


Array
(
[0] => 06/07/07 02:44:16.516 INFO Servlet.Engine.Transports : 0 PortalMonitorHandler is initialised with 'alertableLevel' set to 15 and 'maxLevel' set to 40
[1] => 06/07/07
[2] => 02:44:16.516
[3] => INFO
[4] => Servlet.Engine.Transports : 0
[5] => PortalMonitorHandler is initialised with 'alertableLevel' set to 15 and 'maxLevel' set to 40
)
Array
(
[0] => 06/07/07 02:44:21.798 INFO Servlet.Engine.Transports : 0 returning cpsManager
[1] => 06/07/07
[2] => 02:44:21.798
[3] => INFO
[4] => Servlet.Engine.Transports : 0
[5] => returning cpsManager
)


Items 1 - 5 need to be put into a table, in that order. I think I need to make a database and a table before this can draw the information, is that correct?

Thanks!

iBert
Aug 14, 2007, 12:11 PM
Not really, once you have the information in an array in php you can still use php to present the data in a table using html. You might have got a glimpse of this while looking at php, but you don't really need to make a DB to display data. They are used depending what's been developed, when you use a DB it's easier to display the information since all you do is a query and get all the information right there and create a function to make the viewing page. I'm quite certain that now a days this should be very easy to do with php and functions provided by the language.

But since you need to store this log file in a DB, I guess it would be a good exercise to learn what you need to do. Once you have managed to store the data with the testing you are doing then you should be done or close to be done with this project.

barr08
Aug 14, 2007, 12:19 PM
Well, the real point of this project is to eventually be able to search the database. There is no need to view the entire thing on the screen, it will be far too large, but there is a need to search the data in the DB by date, time, a range of dates, a range of times, or what is contained in the message part. This is why I think I should probably use a DB.

macfaninpdx
Aug 14, 2007, 12:50 PM
If this will always be used to parse the same log file, I would create the table/fields first, and then have the code send INSERT statements. But if you will be using this on many different log files (i.e. different fields each time), then you can have the code send the CREATE TABLE query.

To set up the table initially, use the Apache admin mentioned previously, and click on the phpmyadmin link on the menu at the left. This is an excellent web-based GUI to mySQL administration. Create the table and fields, and add a user with the correct permissions to the table.

Then in your PHP code, you will connect to mySQL and select the db. Then your foreach loop will send an INSERT for each line of the log file. If this takes too long to process (if you have hundreds of thousands of lines each time the code runs), you will either need to increase the PHP script timeout property, or you will need to use mysqli_multi_query.

The latter can take multiple INSERT statements separated by a semi-colon. So your foreach loop would concatenate the INSERT statements with a semi-colon (using .=) and then after your loop you could issue the mysqli_multi_query. Although I do not know what the limit is on how large the query can be (or how many statements it can contain).

barr08
Aug 14, 2007, 12:53 PM
If this will always be used to parse the same log file, I would create the table/fields first, and then have the code send INSERT statements. But if you will be using this on many different log files (i.e. different fields each time), then you can have the code send the CREATE TABLE query.

It will be for many different log files. One a day. All of this will be done at midnight, hopefully ready by 7 AM the next day.


To set up the table initially, use the Apache admin mentioned previously, and click on the phpmyadmin link on the menu at the left. This is an excellent web-based GUI to mySQL administration. Create the table and fields, and add a user with the correct permissions to the table.


Do I need to do any database-creating first? Or will this alone do the trick.

krunk
Aug 14, 2007, 02:27 PM
Do I need to do any database-creating first? Or will this alone do the trick.

Yes.

barr08
Aug 14, 2007, 02:29 PM
Yes.

Yes...to which question?

krunk
Aug 14, 2007, 02:37 PM
Yes...to which question?

hehe sorry bout that.

Do I need to do any database-creating first?

Yes. ;)

barr08
Aug 14, 2007, 02:40 PM
hehe sorry bout that.

Do I need to do any database-creating first?

Yes. ;)

Oh OK, thats the tricky part.

Do I need to write a script to create a database? Or can I do it in XAMPP?

krunk
Aug 14, 2007, 02:46 PM
Oh OK, thats the tricky part.

Do I need to write a script to create a database? Or can I do it in XAMPP?

Follow the instructions above to install phpMyAdmin. It's a very intuitive and thorough mysql front end in which you can create databases as well as manage them.

Or enter the mysql console and type:

> create database myDBName;

barr08
Aug 14, 2007, 02:47 PM
Follow the instructions above to install phpMyAdmin. It's a very intuitive and thorough mysql front end in which you can create databases as well as manage them.

Or enter the mysql console and type:

> create database myDBName;

Thanks!

stndn
Aug 14, 2007, 08:40 PM
As you mentioned more about the thing you wanted to accomplish, and that you don't know things that well, here's some learning steps you can follow to figure out how things should work and make your next task a little easier.

You mentioned that:
1. You have several log files to work with
2. The log file will be parsed and entered to database on daily basis.
3. The user will be able to search the database for specific log file.

You also mentioned that everyday the script will parse the whole log file (which I hope is a 1-day log file and not everything from the years ago).

Since one of the tasks would be searching by date, I suggest that you make (in your database) your date column with type "DATE" or even "DATETIME" if you want to store the date and time together (although I don't think DATETIME can store up to milisecond). That way searching will be much easier since you can use the SQL's DATE functions and not simply the date characters stored as-is.

This means you will have to make some changes to the PHP script to make the text '15/08/07' into something like '2007-08-15'. After that, you can store the datestamp in the table as DATE (or DATETIME) type.


After that, assuming you have PHP and MySQL available, here's what you should do:
1.
Create a database (required, of course)

2.
Create one table for each of the log files you want to work with (to avoid mixed data and too much clutter). This process only needs to be done once, and you can use phpMyAdmin or other MySQL client to do that.

3.
Find out the SQL query to insert the data manually. Don't try to build upon the PHP script provided to you just yet.

4.
Once that works, modify the PHP script you have to connect, select, and insert data to database. Since you'll want to start from a simple trial, forget about parsing the log file and just use the hard-coded texts for now. Alternatively, create a sample log file that's small (a hundred record or so) just to check that your script works.

5.
Verify that your data has been entered correctly. Run the script several times and keep checking your database to make sure it works.

6.
Clean the table in your database from your test data (use 'TRUNCATE' on the tables)

7.
Modify your script to parse the real log file, and set up a cron job to run the PHP script on daily basis.

8.
Work on the search algorithm (which in itself is not really an easy task)


Note that this is assuming that you will only ADD new log to the database on daily basis, and not recreate the whole log on daily basis (which will be overkill anyway). I know I've left out some details, but you're free to ask for more information if you're stuck somewhere. -)


-stndn.

barr08
Aug 15, 2007, 11:55 AM
As you mentioned more about the thing you wanted to accomplish, and that you don't know things that well, here's some learning steps you can follow to figure out how things should work and make your next task a little easier.

You mentioned that:
1. You have several log files to work with
2. The log file will be parsed and entered to database on daily basis.
3. The user will be able to search the database for specific log file.

You also mentioned that everyday the script will parse the whole log file (which I hope is a 1-day log file and not everything from the years ago).

It is indeed only for that day.


Since one of the tasks would be searching by date, I suggest that you make (in your database) your date column with type "DATE" or even "DATETIME" if you want to store the date and time together (although I don't think DATETIME can store up to milisecond). That way searching will be much easier since you can use the SQL's DATE functions and not simply the date characters stored as-is.

This means you will have to make some changes to the PHP script to make the text '15/08/07' into something like '2007-08-15'. After that, you can store the datestamp in the table as DATE (or DATETIME) type.

I have actually been working through this is my head for the past few days. DATETIME does not store the millisecond, which is neccesary. I thought about making the millisecond it's own column, but that would require some extra regex work, and I am lucky I got to where I am with the regex, never mind doing more.

Also, I wouldn't be sure how to change the format of the time and date, and both would require these changes if DATETIME was to apply. Do you think the date handling in MySQL is worth the extra effort here?


After that, assuming you have PHP and MySQL available, here's what you should do:
1.
Create a database (required, of course)

2.
Create one table for each of the log files you want to work with (to avoid mixed data and too much clutter). This process only needs to be done once, and you can use phpMyAdmin or other MySQL client to do that.

Would I need to manually create a new table each day, or could I automatically create a table before the info is parsed each night?


3.
Find out the SQL query to insert the data manually. Don't try to build upon the PHP script provided to you just yet.


What do you mean?


4.
Once that works, modify the PHP script you have to connect, select, and insert data to database. Since you'll want to start from a simple trial, forget about parsing the log file and just use the hard-coded texts for now. Alternatively, create a sample log file that's small (a hundred record or so) just to check that your script works.

5.
Verify that your data has been entered correctly. Run the script several times and keep checking your database to make sure it works.

6.
Clean the table in your database from your test data (use 'TRUNCATE' on the tables)

7.
Modify your script to parse the real log file, and set up a cron job to run the PHP script on daily basis.

8.
Work on the search algorithm (which in itself is not really an easy task)


Note that this is assuming that you will only ADD new log to the database on daily basis, and not recreate the whole log on daily basis (which will be overkill anyway). I know I've left out some details, but you're free to ask for more information if you're stuck somewhere. -)


-stndn.

Thank you very much for the detailed reply. I am just a little confused about a few of your steps. You're saying I should skip the parser for now, and just work on drawing data directly to the table in the DB from the log file? Then, apply what I come up with to the parsed data?

I also need to decide about the DATETIME stuff. What is your opinion on this. Is it worth the change, from a search perspective?

Thanks

stndn
Aug 15, 2007, 08:39 PM
Using the DATETIME format will be very beneficial when you are searching by date, because if you are storing it as plain text, you will have to break apart the values in your column into the date and time format before you can perform your date-based search. That means more work when selecting the data (per-use basis) vs one-time work when parsing and inserting your data.

MySQL (or rather, SQL) has many date-related functions you could use. For example, here's one from the MySQL manual on Date and Time Functions to select data from the last 30 days:

SELECT something FROM tbl_name
-> WHERE DATE_SUB(CURDATE(),INTERVAL 30 DAY) <= date_col;

As for the milisecond part, I think storing it in a separate column (of numeric type) will be better than trying to mash things (date and time) in string-type columns. Searching for numeric values in string columns is a mess.

You may need to change the PHP script a little bit. For one, you will need to convert the date and time to SQL format.

No need to use another regex (although you could). Use PHP's split() (http://www.php.net/split) to accomplish that. If you are using PHP 5, you may even use strptime () (http://www.php.net/strptime) to help you out. After you split the year, month, date, hour, minute, second, milisecond, you simply concatenate them as one string:

$datetime = "20{$y}-{$m}-{$d} {$H}:{$M}:{$S}"; # Remember to prefix the '20' for your year to avoid problems
$milisecond = $MS;

Actually, here's a modified regex to parse the date and time parts on their own:
foreach ($myText as $val)
{
preg_match ('/^(\d+)\/(\d+)\/(\d+)\s+(\d+):(\d+):(\d+)\.(\d+)\s+(.*?)\s+(Servlet.Engine.Transports : \d+)\s+(.*)$/i', $val, $myResult);

print_r ($myResult);
}

Would I need to manually create a new table each day, or could I automatically create a table before the info is parsed each night?

Only once when you initially create the database. Once it is created, you simply add more data into it on nightly basis. Or, you could be fancy and perform a check and have your script create the table if it doesn't exist. But I don't think it's necessary and it's too much work.

Find out the SQL query to insert the data manually. Don't try to build upon the PHP script provided to you just yet.

By this I mean, manually try to figure out the correct SQL query that will insert your data into your database. This means, from your MySQL client (phpMyAdmin is a good one), type your SQL INSERT query manually to perform the data entry. Once you find out the correct syntax for entering your data, then continue with the next step. This will save you frustration when your script doesn't work as expected, as you may be wondering whether the problem lies with the PHP script or the SQL query.

For step 4, sorry if it was confusing. What I meant was skip the parsing of your log file, and not skip the parsing of the input texts. So, instead of running the full script in post #50, try running the partial script in post #46 first. That is actually a trial and debugging step, where you try your completed PHP script to see whether it will work as expected. That is, check whether your PHP script can parse the data and insert it correctly. There's no point in running 5 hours of processing just to try out your script, right?

Step 3 and 4 should be repeated when you are doing your select script. So, try to come up with the correct SQL SELECT statement first, then apply it to your PHP script.

Btw, I'm assuming this is only for internal use, right? Because if it's for public use, then there's other things to worry about, such as security and all that.


-stndn.

barr08
Aug 16, 2007, 12:05 PM
Edit: Ignore me

barr08
Aug 16, 2007, 12:11 PM
OK, the updated regex returns this:

Array
(
[0] => 06/07/07 02:44:16.516 INFO Servlet.Engine.Transports : 0 PortalMonitorHandler is initialised with 'alertableLevel' set to 15 and 'maxLevel' set to 40
[1] => 06
[2] => 07
[3] => 07
[4] => 02
[5] => 44
[6] => 16
[7] => 516
[8] => INFO
[9] => Servlet.Engine.Transports : 0
[10] => PortalMonitorHandler is initialised with 'alertableLevel' set to 15 and 'maxLevel' set to 40
)

So now that each piece of the date and time is split into a seperate item in the array, I can use the DATETIME function for it? Do I need to write a new script for this, or can I include it in the parser?

Edit: One more question. I am making my table in my database. Should the date and time go into one column? I am going to do it this way for now, let me know if I am wrong.

Edit: Here is what I made for my table:
5 columns:
DATETIME - type DATETIME - this is for date and time
MS - type INT(3) - this is the milliseconds.
LVL - varchar(10) - this will be INFO or DEBUG or something short like that
ID - varchar(30) - this is the "Servlet.Engine.Transports : 0" part
MSG - varchar(500) - this is the mssage, it can get pretty long

How does this look?

macfaninpdx
Aug 16, 2007, 01:59 PM
So now that each piece of the date and time is split into a seperate item in the array, I can use the DATETIME function for it? Do I need to write a new script for this, or can I include it in the parser?

Edit: One more question. I am making my table in my database. Should the date and time go into one column? I am going to do it this way for now, let me know if I am wrong.

Edit: Here is what I made for my table:
5 columns:
DATETIME - type DATETIME - this is for date and time
MS - type INT(3) - this is the milliseconds.
LVL - varchar(10) - this will be INFO or DEBUG or something short like that
ID - varchar(30) - this is the "Servlet.Engine.Transports : 0" part
MSG - varchar(500) - this is the mssage, it can get pretty long

How does this look?
The DATETIME field does hold both the date and time info (see here (http://mysql.org/doc/refman/5.0/en/datetime.html)). So you will need to concatenate your array fields to the following format before inserting it into the table:
YYYY-MM-DD HH:MM:SS

You could either concatenate the fields manually:
$datefield = "{$yourarray[3]}-{$yourarray[1]}-{$yourarray[2]} {$yourarray[4]}:{$yourarray[5]}:{$yourarray[6]}";
Or you can use a combination of the mktime and date functions, as follows:
$datefield = date( "Y-m-d H:i:s", mktime( $yourarray[4], $yourarray[5], $yourarray[6], $yourarray[1], $yourarray[2], $yourarray[3] ) );

barr08
Aug 16, 2007, 02:18 PM
The DATETIME field does hold both the date and time info (see here (http://mysql.org/doc/refman/5.0/en/datetime.html)). So you will need to concatenate your array fields to the following format before inserting it into the table:
YYYY-MM-DD HH:MM:SS

You could either concatenate the fields manually:
$datefield = "{$yourarray[3]}-{$yourarray[1]}-{$yourarray[2]} {$yourarray[4]}:{$yourarray[5]}:{$yourarray[6]}";
Or you can use a combination of the mktime and date functions, as follows:
$datefield = date( "Y-m-d H:i:s", mktime( $yourarray[4], $yourarray[5], $yourarray[6], $yourarray[1], $yourarray[2], $yourarray[3] ) );

So I would need to includ this in my parsing script after the data is made into arrays then? I assume this would be placed in the script somewhere between when I parse it and when I send it to the DB.

macfaninpdx
Aug 16, 2007, 03:57 PM
So I would need to includ this in my parsing script after the data is made into arrays then? I assume this would be placed in the script somewhere between when I parse it and when I send it to the DB.

Yes, you would add the line after your regex line parses the data into an array (and of course replace $yourarray above with whatever variable name you used to store the array).

Then your insert statement will be something like,
$sql = "INSERT INTO tablename (`datetime`, `ms`, `lvl`, `id`, `msg`) VALUES ('$datefield', '{$yourarray[7]}', '{$yourarray[8]}', '{$yourarray[9]}', '{$yourarray[10]}')";

Now that I think about it, you probably shouldn't use reserved names as fieldnames in your MySQL table - I have been bitten by this before. Use something instead of DATETIME and ID for your field names.

barr08
Aug 17, 2007, 09:26 AM
Now that I think about it, you probably shouldn't use reserved names as fieldnames in your MySQL table - I have been bitten by this before. Use something instead of DATETIME and ID for your field names.

Oh yeah, good catch. Rookie mistake :o

OK here is what I have going on:


<?php

$logfile = file("C:\Documents and Settings\a407667\Desktop\Parser\sampleLogs");

foreach ($logfile AS $linenum => $val) {
preg_match ('/^(\d+)\/(\d+)\/(\d+)\s+(\d+):(\d+):(\d+)\.(\d+)\s+(.*?)\s+(Servlet.Engine.Transports : \d+)\s+(.*)$/i', $val, $myResult);

print_r ($myResult);
}

$datefield = "{$myResult[3]}-{$myResult[1]}-{$myResult[2]} {$myResult[4]}:{$myResult[5]}:{$myResult[6]}";

$sql = "INSERT INTO diamond (`DT`, `MS`, `LVL`, `DENT`, `MSG`) VALUES ('$datefield', '{$myResult[7]}', '{$myResult[8]}', '{$myResult[9]}', '{$myResult[10]}')";


?>



I assume I need to connect to the database at the beginning.

macfaninpdx
Aug 17, 2007, 11:39 AM
I assume I need to connect to the database at the beginning.

Yes. Check here (http://us.php.net/manual/en/function.mysqli-query.php) for sample code for connecting to your db.

macfaninpdx
Aug 17, 2007, 11:51 AM
Another thing I noticed: your array variable $myResult is getting overwritten for each line in the logfile. So that means you will either need to perform the SQL INSERT inside the foreach loop, or append the $myResult to a new array, and then use another foreach loop to process all the results.

For example:
foreach ($logfile AS $linenum => $val) {
preg_match ('/^(\d+)\/(\d+)\/(\d+)\s+(\d+):(\d+):(\d+)\.(\d+)\s+(.*?)\s+(Servlet.Engine.Transports : \d+)\s+(.*)$/i', $val, $myResult);

// print_r ($myResult);

$datefield = "{$myResult[3]}-{$myResult[1]}-{$myResult[2]} {$myResult[4]}:{$myResult[5]}:{$myResult[6]}";

$sql = "INSERT INTO diamond (`DT`, `MS`, `LVL`, `DENT`, `MSG`) VALUES ('$datefield', '{$myResult[7]}', '{$myResult[8]}', '{$myResult[9]}', '{$myResult[10]}')";
$result = mysqli_query( $link, $sql );
}
or
foreach ($logfile AS $linenum => $val) {
preg_match ('/^(\d+)\/(\d+)\/(\d+)\s+(\d+):(\d+):(\d+)\.(\d+)\s+(.*?)\s+(Servlet.Engine.Transports : \d+)\s+(.*)$/i', $val, $myResult);

// print_r ($myResult);

$resultsArray[] = $myResult;
}
foreach( $resultsArray AS $thisResult ) {
$datefield = "{$thisResult [3]}-{$thisResult [1]}-{$thisResult [2]} {$thisResult [4]}:{$thisResult [5]}:{$thisResult [6]}";

$sql = "INSERT INTO diamond (`DT`, `MS`, `LVL`, `DENT`, `MSG`) VALUES ('$datefield', '{$thisResult [7]}', '{$thisResult [8]}', '{$thisResult [9]}', '{$thisResult [10]}')";
$result = mysqli_query( $link, $sql );
}

Although as mentioned previously, this may take a long time if you have thousands of INSERT statements being executed. But I think as stndn mentioned, it is best to get the code working first before optimizing it.

barr08
Aug 17, 2007, 12:16 PM
I am using this as my code to connect to the database, but it is erroring out.

$mysqli = new mysqli("localhost","test","******","roger");
if (mysqli_connect_errno())
{
printf("Connection failed: %s\n", mysqli_connect_error());
exit();
}

Is there something wrong with it?

macfaninpdx
Aug 17, 2007, 12:21 PM
What is the error? Are you using PHP 5.0? Is your database "roger" created, and does the user "test" have access permission to it?

barr08
Aug 17, 2007, 12:31 PM
What is the error? Are you using PHP 5.0? Is your database "roger" created, and does the user "test" have access permission to it?

Yeah, roger exists, and test has access to it.

It was silly of me to post this without the error :o. Here it is:

Warning: mysqli_query() expects parameter 1 to be mysqli, null given in C:\xampp\htdocs\PHP\Parser2.php on line 20

Repeated one time for each entry in the log file.

Line 20 is:

$result = mysqli_query( $link, $sql );

macfaninpdx
Aug 17, 2007, 12:44 PM
Line 20 is:

$result = mysqli_query( $link, $sql );

Change the line $mysqli = new mysqli("localhost","test","******","roger"); to $link = mysqli_connect("localhost", "test", "******", "roger");Then change if (mysqli_connect_errno()) to if (!$link) {

barr08
Aug 17, 2007, 12:46 PM
You're so wise :)

barr08
Aug 17, 2007, 12:52 PM
Now I am getting this error:

Parse error: syntax error, unexpected $end in C:\xampp\htdocs\PHP\Parser2.php on line 26

macfaninpdx
Aug 17, 2007, 01:15 PM
Now I am getting this error:

Parse error: syntax error, unexpected $end in C:\xampp\htdocs\PHP\Parser2.php on line 26

Well, first of all, it would be easiest if you were to show me line 26, and some lines before and after.

But parse errors mean your code won't parse, most like because of a typo. Check for semi-colons at the end of the lines - it is a common cause. Also, if you literally copied and pasted the change I suggested earlier, you may have two {'s on your if loop. Check the if (!$link) loop to make sure there is exactly one open bracket and one close bracket.

barr08
Aug 17, 2007, 01:17 PM
Well, first of all, it would be easiest if you were to show me line 26, and some lines before and after.

But parse errors mean your code won't parse, most like because of a typo. Check for semi-colons at the end of the lines - it is a common cause. Also, if you literally copied and pasted the change I suggested earlier, you may have two {'s on your if loop. Check the if (!$link) loop to make sure there is exactly one open bracket and one close bracket.

Thats the thing, there is no line 26. The whole thing is only 22 lines. I forgot to mention this initially. And I noticed the bracket thing, thats not it.

macfaninpdx
Aug 17, 2007, 01:20 PM
Thats the thing, there is no line 26. The whole thing is only 22 lines. I forgot to mention this initially. And I noticed the bracket thing, thats not it.

Did you close your code with ?>? If so, I will need to see the whole Parser2.php file to be of assistance.

barr08
Aug 17, 2007, 01:21 PM
It isn't erroring any more, but I ran it, and it didn't send anything to the database, as far as I can tell.

Here is my code:

<?php

$link = mysqli_connect("localhost", "test", "xxxxxxx", "roger");
if (!$link)
{
printf("Connection failed: %s\n", mysqli_connect_error());
exit();
}

$logfile = file("C:\Documents and Settings\a407667\Desktop\Parser\sampleLogs");

foreach ($logfile AS $linenum => $val) {
preg_match ('/^(\d+)\/(\d+)\/(\d+)\s+(\d+):(\d+):(\d+)\.(\d+)\s+(.*?)\s+(Servlet.Engine.Transports : \d+)\s+(.*)$/i', $val, $myResult);

print_r ($myResult);

$datefield = "{$myResult[3]}-{$myResult[1]}-{$myResult[2]} {$myResult[4]}:{$myResult[5]}:{$myResult[6]}";

$sql = "INSERT INTO diamond (`DT`, `MS`, `LVL`, `DENT`, `MSG`) VALUES ('$datefield', '{$myResult[7]}', '{$myResult[8]}', '{$myResult[9]}', '{$myResult[10]}')";
$result = mysqli_query( $link, $sql );
}

?>

Thanks for sticking around and being so patient with me.

macfaninpdx
Aug 17, 2007, 01:31 PM
It isn't erroring any more, but I ran it, and it didn't send anything to the database, as far as I can tell.
Put some debugging lines in your code:
<?php

$link = mysqli_connect("localhost", "test", "xxxxxxx", "roger");
if (!$link)
{
printf("Connection failed: %s\n", mysqli_connect_error());
exit();
}

$logfile = file("C:\Documents and Settings\a407667\Desktop\Parser\sampleLogs");

foreach ($logfile AS $linenum => $val) {
preg_match ('/^(\d+)\/(\d+)\/(\d+)\s+(\d+):(\d+):(\d+)\.(\d+)\s+(.*?)\s+(Servlet.Engine.Transports : \d+)\s+(.*)$/i', $val, $myResult);

print_r ($myResult);

$datefield = "{$myResult[3]}-{$myResult[1]}-{$myResult[2]} {$myResult[4]}:{$myResult[5]}:{$myResult[6]}";
echo $datefield . "<br>"; //debug

$sql = "INSERT INTO diamond (`DT`, `MS`, `LVL`, `DENT`, `MSG`) VALUES ('$datefield', '{$myResult[7]}', '{$myResult[8]}', '{$myResult[9]}', '{$myResult[10]}')";
echo $sql . "<br>"; //debug
$result = mysqli_query( $link, $sql );
if ($result) { //debug
printf("INSERTED %d rows.\n", mysqli_num_rows($result));
} else {
echo "mysqli_query returned false.<br>";
}
mysqli_free_result($result);
}

?>

Also, I added the mysqli_free_result command, as I can't remember if you need that. Also, what does the print_r($myResult) display? Anything? If not, you may not be reading in the logfile correctly. View the source to see if there are a bunch of empty arrays being displayed.

barr08
Aug 17, 2007, 01:34 PM
I'll try the debug. The log files are being parsed and displayed in the source. Thanks

barr08
Aug 17, 2007, 01:38 PM
OK the error from the debug is:


Array ( ) -- ::
INSERT INTO diamond (`DT`, `MS`, `LVL`, `DENT`, `MSG`) VALUES ('-- ::', '', '', '', '')
mysqli_query returned false.

and

Warning: mysqli_free_result() expects parameter 1 to be mysqli_result, boolean given in C:\xampp\htdocs\PHP\Parse.php on line 28


before each array.

macfaninpdx
Aug 18, 2007, 01:23 AM
This means the array is empty, so the logfile is not being read correctly (or the preg_match isn't working).
Array ( )
This means the array is empty - the result of the $datefield line.
-- ::
This also means the array is empty, since there are empty values.
INSERT INTO diamond (`DT`, `MS`, `LVL`, `DENT`, `MSG`) VALUES ('-- ::', '', '', '', '')
This means that $result is false, which means that mysqli_query did not work.
mysqli_query returned false.

This is because the query did not work.
Warning: mysqli_free_result() expects parameter 1 to be mysqli_result, boolean given in C:\xampp\htdocs\PHP\Parse.php on line 28


So I am thinking there may be a problem with the file path. When you specify "C:\Documents and Settings\a407667\Desktop\Parser\sampleLogs", the backslashes are treated as escape characters and hence the path is screwy. Try "C:\\Documents and Settings\\a407667\\Desktop\\Parser\\sampleLogs".

barr08
Aug 20, 2007, 08:39 AM
I tried changing the path, same result.

I think the array is being filled, you can see it here:

Warning: mysqli_free_result() expects parameter 1 to be mysqli_result, boolean given in C:\xampp\htdocs\PHP\Parse.php on line 28
Array ( [0] => 06/07/07 02:44:16.516 INFO Servlet.Engine.Transports : 0 PortalMonitorHandler is initialised with 'alertableLevel' set to 15 and 'maxLevel' set to 40 [1] => 06 [2] => 07 [3] => 07 [4] => 02 [5] => 44 [6] => 16 [7] => 516 [8] => INFO [9] => Servlet.Engine.Transports : 0 [10] => PortalMonitorHandler is initialised with 'alertableLevel' set to 15 and 'maxLevel' set to 40 ) 07-06-07 02:44:16
INSERT INTO diamond (`DT`, `MS`, `LVL`, `DENT`, `MSG`) VALUES ('07-06-07 02:44:16', '516', 'INFO', 'Servlet.Engine.Transports : 0', 'PortalMonitorHandler is initialised with 'alertableLevel' set to 15 and 'maxLevel' set to 40 ')
mysqli_query returned false.

In this part specifically:

[1] => 06 [2] => 07 [3] => 07 [4] => 02 [5] => 44 [6] => 16 [7] => 516 [8] => INFO [9] => Servlet.Engine.Transports : 0 [10] => PortalMonitorHandler is initialised with 'alertableLevel' set to 15 and 'maxLevel' set to 40 ) 07-06-07 02:44:16

I tried changing the path with the extra backslash, it errored with the results above. The excerpt above is from the second result on the page. The first result is indeed an empty array, but it is the only one:

Array ( ) -- ::
INSERT INTO diamond (`DT`, `MS`, `LVL`, `DENT`, `MSG`) VALUES ('-- ::', '', '', '', '')
mysqli_query returned false.

macfaninpdx
Aug 20, 2007, 10:05 AM
OK the error from the debug is:


Array ( ) -- ::
INSERT INTO diamond (`DT`, `MS`, `LVL`, `DENT`, `MSG`) VALUES ('-- ::', '', '', '', '')
mysqli_query returned false.

I think the array is being filled, you can see it here:

Warning: mysqli_free_result() expects parameter 1 to be mysqli_result, boolean given in C:\xampp\htdocs\PHP\Parse.php on line 28
Array ( [0] => 06/07/07 02:44:16.516 INFO Servlet.Engine.Transports : 0 PortalMonitorHandler is initialised with 'alertableLevel' set to 15 and 'maxLevel' set to 40 [1] => 06 [2] => 07 [3] => 07 [4] => 02 [5] => 44 [6] => 16 [7] => 516 [8] => INFO [9] => Servlet.Engine.Transports : 0 [10] => PortalMonitorHandler is initialised with 'alertableLevel' set to 15 and 'maxLevel' set to 40 ) 07-06-07 02:44:16
INSERT INTO diamond (`DT`, `MS`, `LVL`, `DENT`, `MSG`) VALUES ('07-06-07 02:44:16', '516', 'INFO', 'Servlet.Engine.Transports : 0', 'PortalMonitorHandler is initialised with 'alertableLevel' set to 15 and 'maxLevel' set to 40 ')
mysqli_query returned false.

In your previous post, the array was empty (or you did not post the entire results). In your recent post, the array is not empty. Did you post the entire results when you first posted the error you were getting?

It is very difficult to debug remotely, especially when we are not looking at the entire code, nor are we seeing the entire output. We don't need to see repeating portions of the output, but at least the entire piece between repeats. Repost the results of your current code so we can take a look.

Also, the mysqli_free_result warning is from a different file, Parse.php. I don't believe you have shown the code from that file, right? We were dealing with a file called Parser2.php, weren't we?

barr08
Aug 20, 2007, 10:16 AM
In your previous post, the array was empty (or you did not post the entire results). In your recent post, the array is not empty. Did you post the entire results when you first posted the error you were getting?

It is very difficult to debug remotely, especially when we are not looking at the entire code, nor are we seeing the entire output. We don't need to see repeating portions of the output, but at least the entire piece between repeats. Repost the results of your current code so we can take a look.

Also, the mysqli_free_result warning is from a different file, Parse.php. I don't believe you have shown the code from that file, right? We were dealing with a file called Parser2.php, weren't we?


Sorry, I changed the name, same code. The file is now Parse.php.

Here is a chunk of the result I get when I run the script, from the source (because it is formatted better):

Array
(
)
-- ::<br>INSERT INTO diamond (`DT`, `MS`, `LVL`, `DENT`, `MSG`) VALUES ('-- ::', '', '', '', '')<br>mysqli_query returned false.<br><br />
<b>Warning</b>: mysqli_free_result() expects parameter 1 to be mysqli_result, boolean given in <b>C:\xampp\htdocs\PHP\Parse.php</b> on line <b>28</b><br />
Array
(
[0] => 06/07/07 02:44:16.516 INFO Servlet.Engine.Transports : 0 PortalMonitorHandler is initialised with 'alertableLevel' set to 15 and 'maxLevel' set to 40
[1] => 06
[2] => 07
[3] => 07
[4] => 02
[5] => 44
[6] => 16
[7] => 516
[8] => INFO
[9] => Servlet.Engine.Transports : 0
[10] => PortalMonitorHandler is initialised with 'alertableLevel' set to 15 and 'maxLevel' set to 40
)
07-06-07 02:44:16<br>INSERT INTO diamond (`DT`, `MS`, `LVL`, `DENT`, `MSG`) VALUES ('07-06-07 02:44:16', '516', 'INFO', 'Servlet.Engine.Transports : 0', 'PortalMonitorHandler is initialised with 'alertableLevel' set to 15 and 'maxLevel' set to 40
')<br>mysqli_query returned false.<br><br />
<b>Warning</b>: mysqli_free_result() expects parameter 1 to be mysqli_result, boolean given in <b>C:\xampp\htdocs\PHP\Parse.php</b> on line <b>28</b><br />
Array
(
[0] => 06/07/07 02:44:21.798 INFO Servlet.Engine.Transports : 0 returning cpsManager
[1] => 06
[2] => 07
[3] => 07
[4] => 02
[5] => 44
[6] => 21
[7] => 798
[8] => INFO
[9] => Servlet.Engine.Transports : 0
[10] => returning cpsManager
)

This is the first 3 results from the source, the rest is just the same thing but with different data in the array items. Please let me know if you need more information.

This is the most current version of the code I am working with:


<?php

$link = mysqli_connect("localhost", "test", "thumper", "roger");
if (!$link)
{
printf("Connection failed: %s\n", mysqli_connect_error());
exit();
}

$logfile = file("C:\\Documents and Settings\\a407667\\Desktop\\Parser\\sampleLogs");

foreach ($logfile AS $linenum => $val) {
preg_match ('/^(\d+)\/(\d+)\/(\d+)\s+(\d+):(\d+):(\d+)\.(\d+)\s+(.*?)\s+(Servlet.Engine.Transports : \d+)\s+(.*)$/i', $val, $myResult);

print_r ($myResult);

$datefield = "{$myResult[3]}-{$myResult[1]}-{$myResult[2]} {$myResult[4]}:{$myResult[5]}:{$myResult[6]}";
echo $datefield . "<br>"; //debug

$sql = "INSERT INTO diamond (`DT`, `MS`, `LVL`, `DENT`, `MSG`) VALUES ('$datefield', '{$myResult[7]}', '{$myResult[8]}', '{$myResult[9]}', '{$myResult[10]}')";
echo $sql . "<br>"; //debug
$result = mysqli_query( $link, $sql );
if ($result) { //debug
printf("INSERTED %d rows.\n", mysqli_num_rows($result));
} else {
echo "mysqli_query returned false.<br>";
}
mysqli_free_result($result);
}

?>

macfaninpdx
Aug 20, 2007, 11:30 AM
OK, now things are starting to make a little more sense. The first empty array is most likely because of a blank line in the log file. This would cause the INSERT statement to faile, and hence the initial warning.

The next line does not have an empty array, but there are single quotes in the last field og the array. These need to be escaped in PHP so mySQL will not get confused on your INSERT statement. And this is also most likely causing the rest of the warnings, since if the INSERT falils there will not be a $result fo free.

See changes below (I tried to make it easier to identify what I changed by wrapping them in comments):
<?php

$link = mysqli_connect("localhost", "test", "********", "roger");
if (!$link)
{
printf("Connection failed: %s\n", mysqli_connect_error());
exit();
}

$logfile = file("C:\\Documents and Settings\\a407667\\Desktop\\Parser\\sampleLogs");

foreach ($logfile AS $linenum => $val) {
preg_match ('/^(\d+)\/(\d+)\/(\d+)\s+(\d+):(\d+):(\d+)\.(\d+)\s+(.*?)\s+(Servlet.Engine.Transports : \d+)\s+(.*)$/i', $val, $myResult);

print_r ($myResult); //debug

//Begin Changed
if (count ($myResult) > 0 ) { //Added if loop
//End Changed
$datefield = "{$myResult[3]}-{$myResult[1]}-{$myResult[2]} {$myResult[4]}:{$myResult[5]}:{$myResult[6]}";
echo $datefield . "<br>"; //debug

//Begin Changed
$msg = addslashes (rtrim ($myResult[10]) ); //addslashes will escape the quotes, rtrim will get rid of the LF character.
$sql = "INSERT INTO diamond (`DT`, `MS`, `LVL`, `DENT`, `MSG`) VALUES ('$datefield', '{$myResult[7]}', '{$myResult[8]}', '{$myResult[9]}', '$msg')"; //updated line
//End Changed
echo $sql . "<br>"; //debug
$result = mysqli_query( $link, $sql );
if ($result) { //debug
printf("INSERTED %d rows.\n", mysqli_num_rows($result));
//Begin Changed
mysqli_free_result($result); //moved line
//End Changed
} else {
echo "mysqli_query returned false.<br>";
}
//Begin Changed
} //Close if loop
//End Changed
}

?>

barr08
Aug 20, 2007, 12:18 PM
Well, it worked without an error, but it didn't add any info to the tables.

Maybe there is something wrong with my database. Here is a screenshot of the phpmyadmin page:

macfaninpdx
Aug 20, 2007, 12:23 PM
Once again, you did not post any results from your ouput. It is difficult to determine what is happening without seeing what the code produces.

barr08
Aug 20, 2007, 12:24 PM
Sorry.

Array
(
)
Array
(
[0] => 06/07/07 02:44:16.516 INFO Servlet.Engine.Transports : 0 PortalMonitorHandler is initialised with 'alertableLevel' set to 15 and 'maxLevel' set to 40
[1] => 06
[2] => 07
[3] => 07
[4] => 02
[5] => 44
[6] => 16
[7] => 516
[8] => INFO
[9] => Servlet.Engine.Transports : 0
[10] => PortalMonitorHandler is initialised with 'alertableLevel' set to 15 and 'maxLevel' set to 40
)
07-06-07 02:44:16<br>INSERT INTO diamond (`DT`, `MS`, `LVL`, `DENT`, `MSG`) VALUES ('07-06-07 02:44:16', '516', 'INFO', 'Servlet.Engine.Transports : 0', 'PortalMonitorHandler is initialised with \'alertableLevel\' set to 15 and \'maxLevel\' set to 40')<br>mysqli_query returned false.<br>Array
(
[0] => 06/07/07 02:44:21.798 INFO Servlet.Engine.Transports : 0 returning cpsManager
[1] => 06
[2] => 07
[3] => 07
[4] => 02
[5] => 44
[6] => 21
[7] => 798
[8] => INFO
[9] => Servlet.Engine.Transports : 0
[10] => returning cpsManager
)
07-06-07 02:44:21<br>INSERT INTO diamond (`DT`, `MS`, `LVL`, `DENT`, `MSG`) VALUES ('07-06-07 02:44:21', '798', 'INFO', 'Servlet.Engine.Transports : 0', 'returning cpsManager')<br>mysqli_query returned false.<br>Array
(
[0] => 06/07/07 02:44:21.896 INFO Servlet.Engine.Transports : 0 Creating new connection to: cpsuat33.fmr.com:33443null ok: true
[1] => 06
[2] => 07
[3] => 07
[4] => 02
[5] => 44
[6] => 21
[7] => 896
[8] => INFO
[9] => Servlet.Engine.Transports : 0
[10] => Creating new connection to: cpsuat33.fmr.com:33443null ok: true
)
07-06-07 02:44:21<br>INSERT INTO diamond (`DT`, `MS`, `LVL`, `DENT`, `MSG`) VALUES ('07-06-07 02:44:21', '896', 'INFO', 'Servlet.Engine.Transports : 0', 'Creating new connection to: cpsuat33.fmr.com:33443null ok: true')<br>mysqli_query returned false.<br>Array
(
)

macfaninpdx
Aug 20, 2007, 12:30 PM
OK, the first thing I noticed is that in your INSERT statement, you are using a field name of LVL, but the actual field is LEVEL in the db. This will cause an error and will not execute the query correctly.

Also, to help catch these types of errors next time, change echo "mysqli_query returned false.<br>"; to printf("SQL Error: %s\n", mysqli_error($link));

barr08
Aug 20, 2007, 12:37 PM
Good catch on the LVL. I changed that and added that error catching line, here is what I got:

Array
(
)
Array
(
[0] => 06/07/07 02:44:16.516 INFO Servlet.Engine.Transports : 0 PortalMonitorHandler is initialised with 'alertableLevel' set to 15 and 'maxLevel' set to 40
[1] => 06
[2] => 07
[3] => 07
[4] => 02
[5] => 44
[6] => 16
[7] => 516
[8] => INFO
[9] => Servlet.Engine.Transports : 0
[10] => PortalMonitorHandler is initialised with 'alertableLevel' set to 15 and 'maxLevel' set to 40
)
07-06-07 02:44:16<br>INSERT INTO diamond (`DT`, `MS`, `LVL`, `DENT`, `MSG`) VALUES ('07-06-07 02:44:16', '516', 'INFO', 'Servlet.Engine.Transports : 0', 'PortalMonitorHandler is initialised with \'alertableLevel\' set to 15 and \'maxLevel\' set to 40')<br><br />
<b>Warning</b>: mysqli_num_rows() expects parameter 1 to be mysqli_result, boolean given in <b>C:\xampp\htdocs\PHP\Parser2.php</b> on line <b>30</b><br />
INSERTED 0 rows.
<br />
<b>Warning</b>: mysqli_free_result() expects parameter 1 to be mysqli_result, boolean given in <b>C:\xampp\htdocs\PHP\Parser2.php</b> on line <b>32</b><br />
Array
(
[0] => 06/07/07 02:44:21.798 INFO Servlet.Engine.Transports : 0 returning cpsManager
[1] => 06
[2] => 07
[3] => 07
[4] => 02
[5] => 44
[6] => 21
[7] => 798
[8] => INFO
[9] => Servlet.Engine.Transports : 0
[10] => returning cpsManager
)
07-06-07 02:44:21<br>INSERT INTO diamond (`DT`, `MS`, `LVL`, `DENT`, `MSG`) VALUES ('07-06-07 02:44:21', '798', 'INFO', 'Servlet.Engine.Transports : 0', 'returning cpsManager')<br><br />
<b>Warning</b>: mysqli_num_rows() expects parameter 1 to be mysqli_result, boolean given in <b>C:\xampp\htdocs\PHP\Parser2.php</b> on line <b>30</b><br />
INSERTED 0 rows.
<br />
<b>Warning</b>: mysqli_free_result() expects parameter 1 to be mysqli_result, boolean given in <b>C:\xampp\htdocs\PHP\Parser2.php</b> on line <b>32</b><br />
Array
(
[0] => 06/07/07 02:44:21.896 INFO Servlet.Engine.Transports : 0 Creating new connection to: cpsuat33.fmr.com:33443null ok: true
[1] => 06
[2] => 07
[3] => 07
[4] => 02
[5] => 44
[6] => 21
[7] => 896
[8] => INFO
[9] => Servlet.Engine.Transports : 0
[10] => Creating new connection to: cpsuat33.fmr.com:33443null ok: true
)
07-06-07 02:44:21<br>INSERT INTO diamond (`DT`, `MS`, `LVL`, `DENT`, `MSG`) VALUES ('07-06-07 02:44:21', '896', 'INFO', 'Servlet.Engine.Transports : 0', 'Creating new connection to: cpsuat33.fmr.com:33443null ok: true')<br><br />
<b>Warning</b>: mysqli_num_rows() expects parameter 1 to be mysqli_result, boolean given in <b>C:\xampp\htdocs\PHP\Parser2.php</b> on line <b>30</b><br />
INSERTED 0 rows.
<br />
<b>Warning</b>: mysqli_free_result() expects parameter 1 to be mysqli_result, boolean given in <b>C:\xampp\htdocs\PHP\Parser2.php</b> on line <b>32</b><br />
Array
(
)
Array
(
)

Lines 30 - 32
printf("INSERTED %d rows.\n", mysqli_num_rows($result));
//Begin Changed
mysqli_free_result($result); //moved line

macfaninpdx
Aug 20, 2007, 12:47 PM
I don't know why the error is not outputting if the INSERT is failing. Copy the following line and try to paste it directly in phpmyadmin.
INSERT INTO diamond (`DT`, `MS`, `LVL`, `DENT`, `MSG`) VALUES ('07-06-07 02:44:16', '516', 'INFO', 'Servlet.Engine.Transports : 0', 'PortalMonitorHandler is initialised with \'alertableLevel\' set to 15 and \'maxLevel\' set to 40')
To do this, click on the SQL tab in phpMyAdmin and paste it into the SQL field. Post the results.

barr08
Aug 20, 2007, 01:03 PM
So I went to lunch, and when I came back, the table was full with my parsed data!

About 10 minutes ago I emptied it and tried parsing again to make sure it wasn't a fluke. I am still waiting for the table to re-populate. I'll keep you posted.