Perl matching issue

Discussion in 'Mac Programming' started by GeeYouEye, Aug 10, 2009.

  1. GeeYouEye macrumors 68000

    GeeYouEye

    Joined:
    Dec 9, 2001
    Location:
    State of Denial
    #1
    I have a string $contents, set to $_.

    m/^.*/ returns true
    m/<table/ returns true.
    m/^.*<table/ returns false.

    I must be missing something, can anyone tell me what?
     
  2. lee1210 macrumors 68040

    lee1210

    Joined:
    Jan 10, 2005
    Location:
    Dallas, TX
    #2
    Try escaping the < with a \, < has some look-behind connotations.

    -Lee
     
  3. GeeYouEye thread starter macrumors 68000

    GeeYouEye

    Joined:
    Dec 9, 2001
    Location:
    State of Denial
  4. lee1210 macrumors 68040

    lee1210

    Joined:
    Jan 10, 2005
    Location:
    Dallas, TX
    #4
    Can you post the relevant line of your data file and your code. I tried the following:
    Code:
    #! /usr/bin/perl
    
    open(HANDLE,"test.html");
    @fileContents = <HANDLE>;
    
    if($fileContents[0] =~ m/^.*<table/) {
      print "Matches!\n";
    } else {
      print "No match!\n";
    }
    
    With a data file:
    Maybe if we can get a peek at your code and data we can help a bit more.

    -Lee
     
  5. GeeYouEye thread starter macrumors 68000

    GeeYouEye

    Joined:
    Dec 9, 2001
    Location:
    State of Denial
    #5
    Code:
    #!/usr/bin/perl -w
    
    open HANDLE, "test.jsp";
    @fileContents = <HANDLE>;
    $contents = join("", @fileContents);
    $_ = $contents;
    if(m/^.*<table/) {
      print "Matches!\n";
    } else {
      print "No match!\n";
    }
    And the contents of test.jsp are:

    Code:
    <%@ page language="java" import="java.util.*, java.io.*, java.text.*"%>
    <jsp:useBean id="cart" scope="session" class="com.chengdesign.Cart2" />
    <jsp:useBean id="members" scope="session" class="com.chengdesign.Members" />
    <jsp:useBean id="template" scope="page" class="com.concreteexchange.CETemplate" />
    <%
    if( request.getServerPort() == 443 ) {
        response.sendRedirect( "http://" + template.getBaseDomain() + request.getRequestURI() );
        return;
    };
    
    try {
    /********************
    * Member Access Control Logic
    * Throws ServletException, IOException
    * NOT_ALLOWED_URI: Optional. Adding this parameter, typically the login page, would force a non member to that URI for further action.
    * TERMS_URI: Optional. Adding this parameter, typically the terms page, would force a member that has not accepted the terms to that URI for further action. Leave blank for TERMS Page ONLY.
    ********************* 
    * members.directMemberAccess( REQUEST_OBJECT, RESPONSE_OBJECT, [NOT_ALLOWED_URI], [TERMS_URI] );
    ********************/
    	if( members.directMemberAccess( request, response, "", "https://" + template.getBaseDomain() + template.getBaseDirectory() + "member_terms.jsp?uri=" + request.getRequestURI() ) ) {
    		return; // End JSP Page if page has been redirected (TRUE)
    	};
    } catch ( Exception ex ) {
    	response.sendRedirect( "http://" + template.getBaseDomain() + template.getBaseDirectory() ); // Something Bad Happend Redirect to Home Page
    	return;
    };
    
    /********************
    * Template/Page Navigation Setup
    * Valid Range is 0 - Max Count
    * 0 = None Selected
    ********************* 
    * template.setHeaderProperties( PRIMARY_NAV_ID, SECONDARY_NAV_ID, MEMBERS_LEVEL );
    ********************/
    template.setHeaderProperties( 6, 5, members.getLevel() );
    
    %><html>
    <head>
       <title>Decorative Concrete Article in Pure Contemporary Magazine</title>
    	<meta name="description" content="Decorative concrete and its many uses inside the home - for floors, walls, fireplaces. Mix properties, pouring techniques and finishes. An article from Decorative Concrete, October/November 2005">										
       	<meta name="keywords" content="concrete countertops, kitchen, bathroom">						
        <link rel="stylesheet" type="text/css" href="<%= template.getBaseDirectory() %>css/style.css" title="master">
        <link rel="stylesheet" type="text/css" href="<%= template.getBaseDirectory() %>css/style_additional.css">
    	<script language="JavaScript" src="<%= template.getBaseDirectory() %>js/core.js"></script>
        <script language="JavaScript">
            <!--
            // Define PreLoad Image List
            var preLoadList = new Array;
            var preLoadListCount = 0;
            preLoadList[preLoadListCount++] = "<%= template.getBaseDirectory() %>images/top_cd_visit_m.gif";
            preLoadList[preLoadListCount++] = "<%= template.getBaseDirectory() %>images/secnav_login_m.gif";
    
            //NAV CODE BASE
            preloadImages();
            dqm__codebase = "<%= template.getBaseDirectory() %>js/";
            // -->
        </script>
    </head>
    <body bgcolor="#5f6669" bottommargin="0" topmargin="0" leftmargin="0" rightmargin="0" marginheight="0" marginwidth="0">
    <div align="center"><%= template.getPageHeaderHTML() %>
    <%= template.getPrimaryNavJS() %>
    <script language="JavaScript1.2" src="<%= template.getBaseDirectory() %>js/dqm_loader.js"></script>
    <%= template.getPrimaryNavHTML() %>
    <%= template.getSecondaryNavHTML() %>
    <table width="800" border="0" cellspacing="0" cellpadding="0">
    ...
    I'm trying to strip the everything but the content table from the top of the file (I'll handle the bottom later), and replace it with a new set of headers and includes, but I reduced it to this matching problem to eliminate the possibility that the other half of the s/// was the culprit.
     
  6. lee1210 macrumors 68040

    lee1210

    Joined:
    Jan 10, 2005
    Location:
    Dallas, TX
    #6
    You need to turn on multiline mode, using an m after the second slash in your regex.
    Code:
    #!/usr/bin/perl -w
    
    open HANDLE, "test.jsp";
    @fileContents = <HANDLE>;
    $contents = join("", @fileContents);
    $_ = $contents;
    if(m/^.*<table/m) {
      print "Matches!\n";
    } else {
      print "No match!\n";
    }
    
    Otherwise the anchors are specific to the start and end of the whole string. Note that I did not know this at all when starting to look at this, but i started to notice that if i put something instead of table that matched on the first line, it worked, and started poking around.

    -Lee
     
  7. GeeYouEye thread starter macrumors 68000

    GeeYouEye

    Joined:
    Dec 9, 2001
    Location:
    State of Denial
    #7
    Aha! That did it, thank you very much.

    That's rather unintuitive though, I have to say.
     
  8. GeeYouEye thread starter macrumors 68000

    GeeYouEye

    Joined:
    Dec 9, 2001
    Location:
    State of Denial
    #8
    Well crud, now it doesn't work in a substitution. I tried

    s/^.*<table/$header/m and
    s/\A.*<table/$header/.
    Neither worked.

    EDIT: n/m that was related to something else. Ignore.
     
  9. GeeYouEye thread starter macrumors 68000

    GeeYouEye

    Joined:
    Dec 9, 2001
    Location:
    State of Denial
  10. ChOas macrumors regular

    Joined:
    Nov 24, 2006
    Location:
    The Netherlands
    #10
    Only use regex's when you don't know what you are looking for. When you do know what you are looking for a substr/index is usually quicker.

    Code:
    my $content;
    
    {
     open HANDLE, "test.jsp" or die "aaaargh!: $!";
     local $/ = undef;
     $content = <HANDLE>;
     close HANDLE
    };
    
    my $table = substr($content,index($content,'<table '));
    
    print "Using substr:\n$table\n";
    
    # '?' is important here, .* is greedy. Without '?' your code would break on pages with multiple tables.
    $content =~s/.*?<table//s;
    
    print "With substitute: \n$content\n";
    
    prints:

    Code:
    Using substr:
    <table width="800" border="0" cellspacing="0" cellpadding="0">
    ...
    
    With substitute:
     width="800" border="0" cellspacing="0" cellpadding="0">
    ...
    
     

Share This Page