php regex to remove HTML

Discussion in 'Web Design and Development' started by Me1000, Apr 16, 2009.

  1. Me1000 macrumors 68000

    Me1000

    Joined:
    Jul 15, 2006
    #1
    Before we start, strip_tags() doesn't work.

    now,

    I've got some data that needs to be parsed, the problem is, I need to get rid of all the HTML that has been formated very strangely.
    the tags look like this

    HTML:
    < p > blah blah blah < / p > < a href= " link.html " > blah blah blah < /a >
    All the regexs I've been trying aren't working, and I don't know enough about regex formating to make them work. I don't care about preserving anything inside of the tags, and would prefer to get rid of the text inside a link if I could.

    Anyone have any idea?

    (I really need to just sit down and learn regular expressions one day)
     
  2. angelwatt Moderator emeritus

    angelwatt

    Joined:
    Aug 16, 2005
    Location:
    USA
    #2
    To just get rid of the tags you can use the regex,
    Code:
    <.+?>
    If you want to get ride of the tags plus what's inside of the tags use,
    Code:
    < [^/]+?>.*?< /.+?>
    but realize that would get rid of everything in your example string. You can try these out at my regex testing tool.
     
  3. Me1000 thread starter macrumors 68000

    Me1000

    Joined:
    Jul 15, 2006
    #3
    Thank you! Working great.

    That regex tool is awesome, thank you for that. :)


    :apple:
     

Share This Page