Remove HTML with Regular Expressions

Working with websites you often need to strip out HTML tags, tag attributes or the complete contents of a HTML tag from some text. Regular expressions can make this very easy and so we thought we would share some that we use all the time.

Find Html Tags


This expression will find all HTML starting and closing tags with or without attributes and so can allow you to strip out all HTML tags from an input string.

Find HTML Tag and Content


With this expression we are searching for an opening and closing <head> tag. This expression gives us the option to remove the complete <head> section from a document.

Using the Regular Expressions

The following C# code uses the second regular expressions to remove the <head> tag from the HTML content and replace it with an empty string:

using System.Text.RegularExpressions;
string content = "<html><head><title>Using Regular Expressions</title></head><body><h1>Using Regular Expressions</h1><p>Regular expressions are really quite powereful and can make replacing HTML really easy.";

string pattern = "<head.*?>(.|\n)*?</head>";
string replacedContet = Regex.Replace(content, pattern, string.Empty);

To remove all HTML attributes from some HTML you could use the first regular expression and a MatchEvaluator:

string content = "<div clas="a-class" id="an-id">Strip <em style="color:#0f0">any</em> HTML attributes from this content</div>";

string pattern = "<.*?>";
string filteredContent = System.Text.RegularExpressions.Regex.Replace(dirtyString, pattern, delegate(System.Text.RegularExpressions.Match match)
	// called for each time there is a match
	string m = match.ToString();
	// now replace anything after the first space
	int spacePosition = m.IndexOf(" ");
	if (spacePosition >= 0)
		return m.Substring(0, spacePosition) + ">";
		return m;
This entry was posted in Programming and tagged . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.


  1. Fatal error: Uncaught Error: Call to undefined function ereg() in /home/brightfunction/ Stack trace: #0 /home/brightfunction/ thematic_commenter_link() #1 /home/brightfunction/ thematic_comments(Object(WP_Comment), Array, 1) #2 /home/brightfunction/ Walker_Comment->start_el('', Object(WP_Comment), 1, Array) #3 /home/brightfunction/ Walker->display_element(Object(WP_Comment), Array, '5', 0, Array, '') #4 /home/brightfunction/ Walker_Comment->display_element(Object(WP_Comment), Array, '5', 0, Array, '') #5 /home/brightfunction/ Walker->paged_walk(Arra in /home/brightfunction/ on line 175