extracting information from an <a> tag using regexp in PHP

mrjamin · Apr 15, 2004

Ok, here's the dealio

I have a string which has a load of <a href="http://domain.tld" title="detailed description">link text</a> style things in it, one per line

How would I extract:

1) the URL
2) the value of title attribute
3) the link text

I figured regular expressions are the way to go, but I'm a little confused on where to start!

Any pointers? I came up with this:

PHP:

<?php
function extractLink($link) {
	$link = split("\n",trim($link));
	for($i = 0; isset($link[$i]); $i++){
		$link[$i] = explode("\"",$link[$i]);
		$link[$i]['url'] = substr($link[$i][1],7);
		$link[$i]['description'] = $link[$i][3];
		$link[$i]['title'] = substr($link[$i][4],1);
		$link[$i]['title'] = strrev(substr(strrev($link[$i]['title']),4));
	}
	for($i = 0; isset($link[$i]); $i++){
		foreach($link[$i] as $key => $value){
			if(is_numeric($key)){
				unset($link[$i][$key]);
			} else {
				$link[$i][$key] = htmlentities($value);
			}
		}
	}
	return $link;
}
?>

Which, while crude, does the job but it'd get messed up if there is no title attribute.

Thanks in advance,

MrJ

Knox · Apr 16, 2004

PHP:

preg_match("/<a href=\"([^\"]+)\"( title=\"([^\"]+)\"|)>([^\<]+)<\/a>/", 
$link, $matches);

That gives an array $matches with the various elements, the first element is always the full link. It also matches the link with or without the title attribute.

PHP:

Array
(
    [0] => <a href="http://domain.tld">link text</a>
    [1] => http://domain.tld
    [2] => 
    [3] => 
    [4] => link text
)

Array
(
    [0] => <a href="http://domain.tld" title="detailed description">link text</a>
    [1] => http://domain.tld
    [2] =>  title="detailed description"
    [3] => detailed description
    [4] => link text
)

Of course you can ignore element 2, there may be a way to get rid of that but I'm not sure.

I'll explain how it works if you want me to 🙂

Knox · Apr 16, 2004

Oh, come to think of it, if you're wanting to match lots of links then you'll want preg_match_all rather than just preg_match.

Search

Search

extracting information from an <a> tag using regexp in PHP

mrjamin

macrumors 65816

Knox

Administrator

Knox

Administrator

Our Staff