preg_replace to only allow

Cabbit · May 4, 2009

^_^ hi hi i would like my preg replace to only allow the tag if anyone can help.

looks like this just now

$value = preg_replace('/<\/?(?:\b(?!)[^>]+?)>/i', '', $value);

also i have bbcode for lists it looks like the following

[ LIST=1 ]
[*]item1
[*]item2
[ /LIST ]

but i am unsure how to get a good replace for it
// Numbered list //
$pattern[7] = "/\

(.*?)\[\/LIST\]/is";
$replace[7] = "<ol>$1</ol>";
$pattern[8] = "/\[*\](.*?)\[\/*\]/is";
$replace[8] = "<li>$1</li>";

angelwatt · May 4, 2009

I didn't thoroughly test this, but I believe this regex should work for the br,

Code:

/<\/?((?!(br))[^>]+?)>/i

The (?!(br)) is saying make sure the start of the tag does not contain 'br.' So this also would allow for <brown> but not <abraham>. Off hand I couldn't think of any legitimate HTML tags that have br in it. The regex could be modified to take care of such cases though.

I'll have to stare at the BBCode some more before I have a solution for that.

Cabbit · May 4, 2009

^_^ thanks, i am plodding along on a few other things before i look at it again later today. I tend to need a break before the answer comes.

What would be more useful and cleaner is i am trying to allow this to support line breaks from a text area to php. Would be good if the text blocks there made into paragraphs if there is a linebreak.

such as the user enters

Hello this is my post.
Hey

php returns just now
Hello this is my post. Hey

but
Hello this is my post.Hey
would be much better.

EDIT: sorted this bit with

PHP:

$paragraphs = explode("\n", $value);
		for ($i = 0; $i < count ($paragraphs); $i++)
		{
			$paragraphs[$i] = '<p>' . $paragraphs[$i] . '</p>';
		}
		$value = implode ('', $paragraphs);

angelwatt · May 4, 2009

OK got the [ * ] part figured out,

Code:

\[\*\]([\w\W]+?)\n?(?=(?:(?:\[\*\])|(?:\[\/list\])))

My test case for this was,

Code:

[plain][list=1][/plain]
[plain][*]hello there.[/plain]
[plain][*]second[*]oops[/plain]
[plain][*]third[/plain]
[plain][/list][/plain]

I think you pretty well had the ol part figured out. If not, let me know.

Note: Given this regex you'll want to do this regex before the list one.

Cabbit · May 4, 2009

angelwatt said:
Note: Given this regex you'll want to do this regex before the list one.

This is the function just now so were would i want to fit it in?

PHP:

// Function to convert the bbtages to html. //
	private function bbtags_html()
	{
		// Variables
		$value = $this->build_paragraphs();
		
		/*
			Formating Tags
		*/
		// Bold //
		$pattern[0] = "/\[b\](.*?)\[\/b\]/is";
		$replace[0] = "<strong>$1</strong>";
		
		// Italic //
		$pattern[1] = "/\[i\](.*?)\[\/i\]/is";
		$replace[1] = "<i>$1</i>";
		
		// Underlined //
		$pattern[2] = "/\[u\](.*?)\[\/u\]/is";
		$replace[2] = "<u>$1</u>";
		
		// url //
		$pattern[3] = "/\[url\](.*?)\[\/url\]/is";
		$replace[3] = "<a href=\"$1\">$1</a>";
		
		// img //
		$pattern[4] = "/\[img\](.*?)\[\/img\]/is";
		$replace[4] = "<img src=\"$1\" alt=\"$1\" \>";
		
		// quote //
		$pattern[5] = "/\[quote\](.*?)\[\/quote\]/is";
		$replace[5] = "<div class=\"quote\">$1</div>";
		
		// code //
		$pattern[6] = "/\[code\](.*?)\[\/code\]/is";
		$replace[6] = "<div class=\"code\">$1</div>";
		
		// Numbered list //
		$pattern[7] = "/\[LIST=1\](.*?)\[\/LIST\]/is";
		$replace[7] = "<ol>$1</ol>";
		// unordered list //
		$pattern[8] = "/\[LIST\](.*?)\[\/LIST\]/is";
		$replace[8] = "<ul>$1</ul>";
		
		// list element //
		$pattern[9] = "/\[*\](.*?)\[\/*\]/is";
		$replace[9] = "<li>$1</li>";
		
		/*
			Image Tags
		*/
		// smile
		$pattern[10] = '/\:\)/';
		$replace[10] = '<img src="/include/class/bbeditor/images/smilies/emoticon_smile.png" alt=":)" />';
	
		// tongue 
		$pattern[11] = '/\:p/';
		$replace[11] = '<img src="/include/class/bbeditor/images/smilies/emoticon_tongue.png" alt=":p" />';
		$pattern[12] = '/\:P/';
		$replace[12] = '<img src="/include/class/bbeditor/images/smilies/emoticon_tongue.png" alt=":p" />';
		
		
		/*
			Output
		*/
		$value = preg_replace($pattern, $replace, $value);
		return $value;
	}

angelwatt · May 4, 2009

babyjenniferLB said:
This is the function just now so were would i want to fit it in?

Mine would become #7 and your current #9 would disappear.

Cabbit · May 4, 2009

angelwatt said:
Mine would become #7 and your current #9 would disappear.

I done it as so

PHP:

// Numbered list //
		$pattern[7] = "\[\*\]([\w\W]+?)\n?(?=(?:(?:\[\*\])|(?:\[\/list\])))";
		$replace[7] = "<ol>$1</ol>";

though it seems to have nocked out the function as it does not return the value now.

angelwatt · May 4, 2009

babyjenniferLB said:
PHP:

// Numbered list // $pattern[7] = "\[\*\]([\w\W]+?)\n?(?=(?:(?:\[\*\])|(?:\[\/list\])))"; $replace[7] = "<ol>$1</ol>";

though it seems to have nocked out the function as it does not return the value now.

That regex was for the li not ol. So the replace array needs updating.

Cabbit · May 4, 2009

HTML:

// list element // 
        $pattern[7] = "\[\*\]([\w\W]+?)\n?(?=(?:(?:\[\*\])|(?:\[\/list\])))"; 
        $replace[7] = "<li>$1</li>";
		
		// Numbered list // 
        $pattern[8] = "/\[list=1\](.*?)\[\/LIST\]/is"; 
        $replace[8] = "<ol>$1</ol>"; 

        // unordered list // 
        $pattern[9] = "/\[LIST\](.*?)\[\/LIST\]/is"; 
        $replace[9] = "<ul>$1</ul>";

like this returns
Warning: preg_replace() [function.preg-replace]: Delimiter must not be alphanumeric or backslash in /home/abcomfor/development_html/include/class/textarea.class.php on line 152

angelwatt · May 4, 2009

babyjenniferLB said:
Warning: preg_replace() [function.preg-replace]: Delimiter must not be alphanumeric or backslash in /home/abcomfor/development_html/include/class/textarea.class.php on line 152

I believe it needs the / and / around the regex I gave. I've been messing around with the regex in a non-PHP environment so forgot to include them.

A note for the ol and ul regexes; They currently won't be able to handle nested lists. It is uncommon for people to use them, though admittedly I have here on Mac Rumors. A single regex won't be able to handle it, but I think a regex with a for loop may be able to do it. You may also want to download someone else's BBCode parser to see how they handle it. (A quick Google showed many parsers just ignored lists. The wimps

)

Essentially you could use,

Code:

/\[list\]([\w\W]+)\[\/list\]/

with a replace and loop over it until no match is found. At least that would work in theory. I have my doubts whether it would hold up when multiple lists are used in the post. It may need a fancier regex.

Cabbit · May 4, 2009

PHP:

 		// list element // 
        $pattern[7] = '/\[\*\]([\w\W]+?)\n?(?=(?:(?:\[\*\])|(?:\[\/list\])))/'; 
        $replace[7] = "<li>$1</li>";
		
		// Numbered list // 
        $pattern[8] = "/\[list=1\](.*?)\[\/LIST\]/is"; 
        $replace[8] = "<ol>$1</ol>"; 

        // unordered list // 
        $pattern[9] = "/\[LIST\](.*?)\[\/LIST\]/is"; 
        $replace[9] = "<ul>$1</ul>";

if you try it out (it posts the input formated) making a list 1 2 3 it returns <li>1</li><li>2[ * ]3

try me

and my other function happily shoves the <ol> ect in tags >.< need more logics.

PHP:

	private function build_paragraphs()
	{
		$value = $this->bbtags_html();
		
		// Rebuilding the paragraphs //
		$paragraphs = explode("\n", $value);
		for ($i = 0; $i < count ($paragraphs); $i++)
		{
			$paragraphs[$i] = '<p>' . $paragraphs[$i] . '</p>';
		}
		$value = implode ('', $paragraphs);
		
		// Making the paragraphs HTML
		$value = preg_replace('/<(.p?)>/', '<$1>', $value);
		
		// Return the value
		return $value;
	}

angelwatt · May 4, 2009

It seemed to work for me.
Given:

Code:

[plain][list][/plain]
[plain][*]1[/plain]
[plain][*]2[/plain]
[plain][*]3[/plain]
[plain][/list][/plain]

I got back,

HTML:

<ul>
<p></p><li>1
</li><li>2
</li><li>3
</li></ul>

Cabbit · May 4, 2009

with the same data i got back

HTML:

<p><ul>
</p><p><li>1
</li><li>2
</li>[*]3
</p><p></ul></p>

Ah i found the issue, the bbcode i am using is uppercase, but the actual replace is lower case

angelwatt · May 4, 2009

Hmm, interesting. For what it's worth I'm on Windows XP using Firefox 3.0.10.

Are you doing the paragraph part before or after the BBCode replacements?

Cabbit · May 4, 2009

angelwatt said:
Hmm, interesting. For what it's worth I'm on Windows XP using Firefox 3.0.10.

Are you doing the paragraph part before or after the BBCode replacements?

The paragraph bit happens after the bbcode, i have tired it both ways here is the full class for you to have a noodle. I got the list thing working by adding a is at the end this means i think ignore case.

PHP:

<?php
/************************************************************************************
  								Kittenbunny CMS
 								Filename: textarea.class.php
 								Class: Validate Textarea 
************************************************************************************/

class textarea
{
	// Variables
  	public $post;
		
	// Function to remove html tags //
	private function sanitize_htmltags()
	{
		$value = $this->post;

		// Sets the allowed taggs //
		$value = preg_replace('/<\/?(?:\b(?!)[^>]+?)>/i', '', $value);  
	
		// Does the htmlspecialchars bit //
		$value = htmlspecialchars($value, ENT_QUOTES, "UTF-8");
		
		
		
		return $value;
	}
	
	// Adds paragraph tags to the html
	private function build_paragraphs()
	{
		$value = $this->bbtags_html();
		
		// Rebuilding the paragraphs //
		$paragraphs = explode("\n", $value);
		for ($i = 0; $i < count ($paragraphs); $i++)
		{
			$paragraphs[$i] = '<p>' . $paragraphs[$i] . '</p>';
		}
		$value = implode ('', $paragraphs);
		
		// Making the paragraphs HTML
		$value = preg_replace('/<(.p?)>/', '<$1>', $value);
		
		// Return the value
		return $value;
	}
	
	// Function to convert the bbtages to html. //
	private function bbtags_html()
	{
		// Variables
		$value = $this->sanitize_htmltags();
		
		/*
			Formating Tags
		*/
		// Bold //
		$pattern[0] = "/\[b\](.*?)\[\/b\]/is";
		$replace[0] = "<strong>$1</strong>";
		
		// Italic //
		$pattern[1] = "/\[i\](.*?)\[\/i\]/is";
		$replace[1] = "<i>$1</i>";
		
		// Underlined //
		$pattern[2] = "/\[u\](.*?)\[\/u\]/is";
		$replace[2] = "<u>$1</u>";
		
		// url //
		$pattern[3] = "/\[url\](.*?)\[\/url\]/is";
		$replace[3] = "<a href=\"$1\">$1</a>";
		
		// img //
		$pattern[4] = "/\[img\](.*?)\[\/img\]/is";
		$replace[4] = "<img src=\"$1\" alt=\"$1\" \>";
		
		// quote //
		$pattern[5] = "/\[quote\](.*?)\[\/quote\]/is";
		$replace[5] = "<div class=\"quote\">$1</div>";
		
		// code //
		$pattern[6] = "/\[code\](.*?)\[\/code\]/is";
		$replace[6] = "<div class=\"code\">$1</div>";
		
		// list element // 
        $pattern[7] = '/\[\*\]([\w\W]+?)\n?(?=(?:(?:\[\*\])|(?:\[\/LIST\])))/is'; 
        $replace[7] = "<li>$1</li>";
		
		// Numbered list // 
        $pattern[8] = "/\[LIST=1\](.*?)\[\/LIST\]/is"; 
        $replace[8] = "<ol>$1</ol>"; 

        // unordered list // 
        $pattern[9] = "/\[LIST\](.*?)\[\/LIST\]/is"; 
        $replace[9] = "<ul>$1</ul>";  
		
		/*
			Image Tags
		*/
		
		// smile
		$pattern[10] = '/\:\)/';
		$replace[10] = '<img src="/include/class/bbeditor/images/smilies/emoticon_smile.png" alt=":)" />';
	
		// tongue 
		$pattern[11] = '/\:p/';
		$replace[11] = '<img src="/include/class/bbeditor/images/smilies/emoticon_tongue.png" alt=":p" />';
		$pattern[12] = '/\:P/';
		$replace[12] = '<img src="/include/class/bbeditor/images/smilies/emoticon_tongue.png" alt=":p" />';
		
		// happy 
		$pattern[13] = '/\:d/';
		$replace[13] = '<img src="/include/class/bbeditor/images/smilies/emoticon_happy.png" alt=":D" />';
		$pattern[14] = '/\:D/';
		$replace[14] = '<img src="/include/class/bbeditor/images/smilies/emoticon_happy.png" alt=":D" />';
		
		// Kitty Smile
		$pattern[15] = '/\:3/';
		$replace[15] = '<img src="/include/class/bbeditor/images/smilies/emoticon_waii.png" alt=":3" />';
		
		// grin 
		$pattern[16] = '/\:grin\:/';
		$replace[16] = '<img src="/include/class/bbeditor/images/smilies/emoticon_grin.png" alt=":grin:" />';
		$pattern[17] = '/\:GRIN\:/';
		$replace[17] = '<img src="/include/class/bbeditor/images/smilies/emoticon_grin.png" alt=":grin:" />';
		
		// wink
		$pattern[18] = '/\;\)/';
		$replace[18] = '<img src="/include/class/bbeditor/images/smilies/emoticon_wink.png" alt=";)" />';
		
		// twisted 
		$pattern[19] = '/\:twisted\:/';
		$replace[19] = '<img src="/include/class/bbeditor/images/smilies/emoticon_evilgrin.png" alt=":twisted:" />';
		$pattern[20] = '/\:TWISTED\:/';
		$replace[20] = '<img src="/include/class/bbeditor/images/smilies/emoticon_evilgrin.png" alt=":twisted:" />';
		
		// surprised
		$pattern[21] = '/\:o/';
		$replace[21] = '<img src="/include/class/bbeditor/images/smilies/emoticon_surprised.png" alt=":o" />';
		$pattern[22] = '/\:O/';
		$replace[22] = '<img src="/include/class/bbeditor/images/smilies/emoticon_surprised.png" alt=":o" />';
		
		// sad
		$pattern[23] = '/\:\(/';
		$replace[23] = '<img src="/include/class/bbeditor/images/smilies/emoticon_unhappy.png" alt=":(" />';

		
		/*
			Output
		*/
		$value = preg_replace($pattern, $replace, $value);
		return $value;
	}
	
	// Function to check for errors. //
	private function error_checking()
	{
		$value = $this->sanitize_htmltags();

		// Trimming excess space from the value. // 
		if(!$value || strlen($value = trim($value)) == 0)
		{
			// If the value is empty. //
			return "The post is empty.";
    	}
   	 	else if (preg_match('/[\w]{1,}/', $value)) 
    	{ 
			// Checking if the value is a number. //
        	if (is_numeric($value)) 
        	{ 
           	 	// Sets the is numeric error. // 
				return "The post is numeric it must be alpha-numeric.";
        	}
			// The value is not a number so lets proceed. //
        	else  
       	 	{ 
				// Now making sure the string is not to short to avoid laziness //
				if (strlen($value) < 5)
				{
					// Sets the error as to short. //
					return "The post is to short, It must be greater than 5 characters.";
				}
				else if (strlen($value) > 12500)
				{
					// Sets the error as to short. //
					return "The post is to long, It must be less than 12500 characters.";
				}
        	}
    	} 
	}
	
	// Function to return any errors. //
	public function error_return()
	{
		return $this->error_checking();
	}
	
	// Function to return the post with bbcode. html tags removed. //
	public function return_bbcode()
	{
		return $this->sanitize_htmltags();
	}
	
	// Function to return the post in html format for display. //
	public function return_formatedhtml()
	{
		return $this->build_paragraphs();
	}
}
?>

angelwatt · May 4, 2009

How attached are you for using list=1 for ol? Makes it quite difficult to do the regex.

Here's an update for the paragraph. I just added a if statement. It's essentially checking if the start of the line is a tag, and if so don't make it a paragraph. This stops it from wrapping ul/ol with a paragraph. It would also keep it from catching any line that starts with other tags, like strong. This can be further fined tuned if you like.

PHP:

            // Check is start a tagged area
            if (substr($paragraphs[$i], 0,1) != '<') {
              $paragraphs[$i] = '<p>' . $paragraphs[$i] . '</p>';
            }

On the pattern/replace arrays you don't need to give an index as you're adding to them so you can have,

PHP:

$pattern[] = '/.*/';

Here's patterns/replacements for the ul, ol, and li,

PHP:

        // unordered list // 
        $pattern[] = '/\[(\/?)list\]/is'; 
        $replace[] = '<$1ul>';       

        // Numbered list // 
        $pattern[] = '/\[(\/?)ol\]/is'; 
        $replace[] = '<$1ol>'; 

        // list element //         
        $pattern[] = '/\[\*\]([\w\W]+?)\n?(?=(?:(?:\[\*\])|(?:<\/ul>)|(?:<\/ul>)|(?:<li>)))/is'; 
        $replace[] = "<li>$1</li>\n";

It assumes you're up for changing the way ordered list are done.

The last piece is for handling nested lists, which gets executed right before all of the other replacements.

PHP:

        // handle nested lists inside list items
        $value = preg_replace('/\[\*\]([^\*]+?(?:\[list\]|\[ol\]).*?(?:\[\/list\]|\[\/ol\]))/is', '<li>$1</li>', $value);
        $value = preg_replace($pattern, $replace, $value);

Search

Search

preg_replace to only allow <br />

Cabbit

macrumors 68020

angelwatt

Moderator emeritus

Cabbit

macrumors 68020

angelwatt

Moderator emeritus

Cabbit

macrumors 68020

angelwatt

Moderator emeritus

Cabbit

macrumors 68020

angelwatt

Moderator emeritus

Cabbit

macrumors 68020

angelwatt

Moderator emeritus

Cabbit

macrumors 68020

angelwatt

Moderator emeritus

Cabbit

macrumors 68020

angelwatt

Moderator emeritus

Cabbit

macrumors 68020

angelwatt

Moderator emeritus

Our Staff