<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.windowsclient.net/utility/FeedStylesheets/atom.xsl" media="screen"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><title type="html">REA_ANDREW</title><subtitle type="html" /><id>http://blogs.windowsclient.net/rea_andrew/atom.aspx</id><link rel="alternate" type="text/html" href="http://blogs.windowsclient.net/rea_andrew/default.aspx" /><link rel="self" type="application/atom+xml" href="http://blogs.windowsclient.net/rea_andrew/atom.aspx" /><generator uri="http://communityserver.org" version="3.0.20423.869">Community Server</generator><updated>2008-05-03T09:14:35Z</updated><entry><title>Bad Word Filter With Regular Expressions</title><link rel="alternate" type="text/html" href="http://blogs.windowsclient.net/rea_andrew/archive/2008/05/03/bad-word-filter-with-regular-expressions.aspx" /><id>http://blogs.windowsclient.net/rea_andrew/archive/2008/05/03/bad-word-filter-with-regular-expressions.aspx</id><published>2008-05-03T08:14:35Z</published><updated>2008-05-03T08:14:35Z</updated><content type="html">&lt;p&gt;I have seen many versions of these and a lot of the time people are expecting that a bad word would be written complete, I.e. BADWORD.&amp;#160; Sometimes they overlook the fact that others get hold of this rule and simply bypass by adding symbols in between, I.e. B*A*D*W*O*R*D.&amp;#160; Of course this would not be recognized if simply searching the string for BADWORD.&lt;/p&gt;  &lt;p&gt;This technique I have used here relies on a base list in XML.&amp;#160; I have created a class which is called BarWordFilter and with this I use the singleton pattern.&amp;#160; I do this because the class has to first compile a list of Regexs from the words inside the base XML File, and as I do not want a re compilation of these at every bad word check, I have opted for the singleton pattern.&lt;/p&gt;  &lt;p&gt;for any word which is in the list the rendered pattern will follow a set trend.&amp;#160; So if we look again at BADWORD, the regular expression I have come with would be as follows.&lt;/p&gt;  &lt;div style="font-size:12px;margin:10px;position:relative;width:95%;border-bottom:#cccccc 1px solid;"&gt;&lt;span style="width:100%;border-bottom:#cccccc 1px solid;display:block;text-align:right;"&gt;&lt;a style="color:#3366cc;"&gt;Hide Code [-]&lt;/a&gt;&lt;/span&gt;     &lt;pre style="overflow-y:hidden;overflow-x:auto;padding-bottom:30px;"&gt;([b|B][\W]*[a|A][\W]*[d|D][\W]*[w|W][\W]*[o|O][\W]*[r|R][\W]*[d|D][\W]*)&lt;/pre&gt;

  &lt;div style="font-weight:bold;color:#ffffff;display:none;background-color:#3366cc;"&gt;{..} Click Show Code&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;&amp;#160;&lt;/p&gt;

&lt;p&gt;What I do is I create the pattern at runtime.&amp;#160; I look for instances of lower or upper case, and ultimately anything which, if we ignore anything which is not a character, spells our bad word.&lt;/p&gt;

&lt;p&gt;&amp;#160;&lt;/p&gt;

&lt;p&gt;I have create a simple test page here to have a go.&amp;#160; Please note I have only got the real serious words in the list for the purposes of this demonstration.&amp;#160; I have not published this list as I do not think it is necessary.&amp;#160; I have used a simple XML structure so please feel free to copy the code here, and generate as many bad words as you like &amp;lt;s&amp;gt;.&lt;/p&gt;

&lt;p&gt;&amp;#160;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example Page :&lt;/strong&gt; &lt;a title="http://andrewrea.co.uk/badwordfilter/Default.aspx" href="http://andrewrea.co.uk/badwordfilter/Default.aspx" target="_blank"&gt;http://andrewrea.co.uk/badwordfilter/Default.aspx&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&amp;#160;&lt;/p&gt;

&lt;h3&gt;The BadWordFilter class&lt;/h3&gt;

&lt;div style="font-size:12px;margin:10px;position:relative;width:95%;border-bottom:#cccccc 1px solid;"&gt;&lt;span style="width:100%;border-bottom:#cccccc 1px solid;display:block;text-align:right;"&gt;&lt;a style="color:#3366cc;"&gt;Hide Code [-]&lt;/a&gt;&lt;/span&gt; 

  &lt;pre style="overflow-y:hidden;overflow-x:auto;padding-bottom:30px;"&gt;&lt;span style="color:#0000ff;"&gt;using&lt;/span&gt; System;
&lt;span style="color:#0000ff;"&gt;using&lt;/span&gt; System.Collections.Generic;
&lt;span style="color:#0000ff;"&gt;using&lt;/span&gt; System.Text;
&lt;span style="color:#0000ff;"&gt;using&lt;/span&gt; System.Text.RegularExpressions;
&lt;span style="color:#0000ff;"&gt;using&lt;/span&gt; System.Web;
&lt;span style="color:#0000ff;"&gt;using&lt;/span&gt; System.Xml;

&lt;span style="color:#808080;"&gt;/// &amp;lt;summary&amp;gt;&lt;/span&gt;
&lt;span style="color:#808080;"&gt;/// Summary description for BadWordFilter&lt;/span&gt;
&lt;span style="color:#808080;"&gt;/// &amp;lt;/summary&amp;gt;&lt;/span&gt;
&lt;span style="color:#0000ff;"&gt;public&lt;/span&gt; &lt;span style="color:#0000ff;"&gt;class&lt;/span&gt; BadWordFilter
{

    &lt;span style="color:#808080;"&gt;/// &amp;lt;summary&amp;gt;&lt;/span&gt;
    &lt;span style="color:#808080;"&gt;/// These are the options which I use in order to determine the way I handle any bad text&lt;/span&gt;
    &lt;span style="color:#808080;"&gt;/// &amp;lt;/summary&amp;gt;&lt;/span&gt;
    &lt;span style="color:#0000ff;"&gt;public&lt;/span&gt; &lt;span style="color:#0000ff;"&gt;enum&lt;/span&gt; CleanUpOptions
    {
        ReplaceEachWord,
        BlankBadText,
        ReplaceWholeText
    }

    &lt;span style="color:#808080;"&gt;/// &amp;lt;summary&amp;gt;&lt;/span&gt;
    &lt;span style="color:#808080;"&gt;/// Private constructor and instantiate the list of regex&lt;/span&gt;
    &lt;span style="color:#808080;"&gt;/// &amp;lt;/summary&amp;gt;&lt;/span&gt;
    &lt;span style="color:#0000ff;"&gt;private&lt;/span&gt; BadWordFilter()
    {
        &lt;span style="color:#008000;"&gt;//&lt;/span&gt;
        &lt;span style="color:#008000;"&gt;// TODO: Add constructor logic here&lt;/span&gt;
        &lt;span style="color:#008000;"&gt;//&lt;/span&gt;
        patterns = &lt;span style="color:#0000ff;"&gt;new&lt;/span&gt; List&amp;lt;Regex&amp;gt;();
    }

    &lt;span style="color:#808080;"&gt;/// &amp;lt;summary&amp;gt;&lt;/span&gt;
    &lt;span style="color:#808080;"&gt;/// The patterns&lt;/span&gt;
    &lt;span style="color:#808080;"&gt;/// &amp;lt;/summary&amp;gt;&lt;/span&gt;
    &lt;span style="color:#0000ff;"&gt;private&lt;/span&gt; List&amp;lt;Regex&amp;gt; patterns;

    
    &lt;span style="color:#0000ff;"&gt;public&lt;/span&gt; List&amp;lt;Regex&amp;gt; Patterns
    {
        &lt;span style="color:#0000ff;"&gt;get&lt;/span&gt; { &lt;span style="color:#0000ff;"&gt;return&lt;/span&gt; patterns; }
        &lt;span style="color:#0000ff;"&gt;set&lt;/span&gt; { patterns = &lt;span style="color:#0000ff;"&gt;value&lt;/span&gt;; }
    }

    &lt;span style="color:#0000ff;"&gt;private&lt;/span&gt; &lt;span style="color:#0000ff;"&gt;static&lt;/span&gt; BadWordFilter m_instance = &lt;span style="color:#0000ff;"&gt;null&lt;/span&gt;;

    &lt;span style="color:#0000ff;"&gt;public&lt;/span&gt; &lt;span style="color:#0000ff;"&gt;static&lt;/span&gt; BadWordFilter Instance
    {
        &lt;span style="color:#0000ff;"&gt;get&lt;/span&gt;
        {
            &lt;span style="color:#0000ff;"&gt;if&lt;/span&gt; (m_instance == &lt;span style="color:#0000ff;"&gt;null&lt;/span&gt;)
                m_instance = CreateBadWordFilter(HttpContext.Current.Server.MapPath(&amp;quot;listofwords&lt;span style="color:#8b0000;"&gt;.xml&lt;/span&gt;&amp;quot;));

            &lt;span style="color:#0000ff;"&gt;return&lt;/span&gt; m_instance;
        }
    }

    &lt;span style="color:#808080;"&gt;/// &amp;lt;summary&amp;gt;&lt;/span&gt;
    &lt;span style="color:#808080;"&gt;/// Create all the patterns required and add them to the list&lt;/span&gt;
    &lt;span style="color:#808080;"&gt;/// &amp;lt;/summary&amp;gt;&lt;/span&gt;
    &lt;span style="color:#808080;"&gt;/// &amp;lt;param name=&amp;quot;badWordFile&amp;quot;&amp;gt;&amp;lt;/param&amp;gt;&lt;/span&gt;
    &lt;span style="color:#808080;"&gt;/// &amp;lt;returns&amp;gt;&amp;lt;/returns&amp;gt;&lt;/span&gt;
    &lt;span style="color:#0000ff;"&gt;protected&lt;/span&gt; &lt;span style="color:#0000ff;"&gt;static&lt;/span&gt; BadWordFilter CreateBadWordFilter(&lt;span style="color:#0000ff;"&gt;string&lt;/span&gt; badWordFile)
    {
        BadWordFilter filter = &lt;span style="color:#0000ff;"&gt;new&lt;/span&gt; BadWordFilter();
        XmlDocument badWordDoc = &lt;span style="color:#0000ff;"&gt;new&lt;/span&gt; XmlDocument();
        badWordDoc.Load(badWordFile);

        &lt;span style="color:#008000;"&gt;//Loop through the xml document for each bad word in the list&lt;/span&gt;
        &lt;span style="color:#0000ff;"&gt;for&lt;/span&gt; (&lt;span style="color:#0000ff;"&gt;int&lt;/span&gt; i = 0; i &amp;lt; badWordDoc.GetElementsByTagName(&amp;quot;&lt;span style="color:#8b0000;"&gt;word&lt;/span&gt;&amp;quot;).Count; i++)
        {
            &lt;span style="color:#008000;"&gt;//Split each word into a character array&lt;/span&gt;
            &lt;span style="color:#0000ff;"&gt;char&lt;/span&gt;[] characters = badWordDoc.GetElementsByTagName(&amp;quot;&lt;span style="color:#8b0000;"&gt;word&lt;/span&gt;&amp;quot;)[i].InnerText.ToCharArray();
            
            &lt;span style="color:#008000;"&gt;//We need a fast way of appending to an exisiting string&lt;/span&gt;
            StringBuilder patternBuilder = &lt;span style="color:#0000ff;"&gt;new&lt;/span&gt; StringBuilder();

            &lt;span style="color:#008000;"&gt;//The start of the patterm&lt;/span&gt;
            patternBuilder.Append(&amp;quot;&lt;span style="color:#8b0000;"&gt;(&lt;/span&gt;&amp;quot;);

            &lt;span style="color:#008000;"&gt;//We next go through each letter and append the part of the pattern.&lt;/span&gt;
            &lt;span style="color:#008000;"&gt;//It is this stage which generates the upper and lower case variations&lt;/span&gt;
            &lt;span style="color:#0000ff;"&gt;for&lt;/span&gt; (&lt;span style="color:#0000ff;"&gt;int&lt;/span&gt; j = 0; j &amp;lt; characters.Length; j++)
            {
                patternBuilder.AppendFormat(&amp;quot;&lt;span style="color:#8b0000;"&gt;[{0}|{1}][\\W]*&lt;/span&gt;&amp;quot;, characters[j].ToString().ToLower(), characters[j].ToString().ToUpper());
            }

            &lt;span style="color:#008000;"&gt;//End the pattern&lt;/span&gt;
            patternBuilder.Append(&amp;quot;&lt;span style="color:#8b0000;"&gt;)&lt;/span&gt;&amp;quot;);

            &lt;span style="color:#008000;"&gt;//Add the new pattern to our list.&lt;/span&gt;
            filter.Patterns.Add(&lt;span style="color:#0000ff;"&gt;new&lt;/span&gt; Regex(patternBuilder.ToString()));
        }
        &lt;span style="color:#0000ff;"&gt;return&lt;/span&gt; filter;
    }

    &lt;span style="color:#808080;"&gt;/// &amp;lt;summary&amp;gt;&lt;/span&gt;
    &lt;span style="color:#808080;"&gt;/// The function which returns the manipulated string&lt;/span&gt;
    &lt;span style="color:#808080;"&gt;/// &amp;lt;/summary&amp;gt;&lt;/span&gt;
    &lt;span style="color:#808080;"&gt;/// &amp;lt;param name=&amp;quot;input&amp;quot;&amp;gt;&amp;lt;/param&amp;gt;&lt;/span&gt;
    &lt;span style="color:#808080;"&gt;/// &amp;lt;param name=&amp;quot;options&amp;quot;&amp;gt;&amp;lt;/param&amp;gt;&lt;/span&gt;
    &lt;span style="color:#808080;"&gt;/// &amp;lt;returns&amp;gt;&amp;lt;/returns&amp;gt;&lt;/span&gt;
    &lt;span style="color:#0000ff;"&gt;public&lt;/span&gt; &lt;span style="color:#0000ff;"&gt;string&lt;/span&gt; GetCleanString(&lt;span style="color:#0000ff;"&gt;string&lt;/span&gt; input, CleanUpOptions options)
    {
        &lt;span style="color:#0000ff;"&gt;if&lt;/span&gt; (options == CleanUpOptions.BlankBadText)
        {
            &lt;span style="color:#0000ff;"&gt;for&lt;/span&gt; (&lt;span style="color:#0000ff;"&gt;int&lt;/span&gt; i = 0; i &amp;lt; patterns.Count; i++)
            {
                &lt;span style="color:#008000;"&gt;//In this instance we want to return an empty string if we find any bad word&lt;/span&gt;
                &lt;span style="color:#0000ff;"&gt;if&lt;/span&gt; (patterns[i].Match(input).Success)
                    &lt;span style="color:#0000ff;"&gt;return&lt;/span&gt; String.Empty;
            }
        }
        &lt;span style="color:#0000ff;"&gt;else&lt;/span&gt; &lt;span style="color:#0000ff;"&gt;if&lt;/span&gt; (options == CleanUpOptions.ReplaceWholeText)
        {
            &lt;span style="color:#0000ff;"&gt;for&lt;/span&gt; (&lt;span style="color:#0000ff;"&gt;int&lt;/span&gt; i = 0; i &amp;lt; patterns.Count; i++)
            {
                &lt;span style="color:#008000;"&gt;//In this instance we want to return a specified statement if we find any bad word&lt;/span&gt;
                &lt;span style="color:#0000ff;"&gt;if&lt;/span&gt; (patterns[i].Match(input).Success)
                    &lt;span style="color:#0000ff;"&gt;return&lt;/span&gt; &amp;quot;&lt;span style="color:#8b0000;"&gt;The text contains unsuitable content&lt;/span&gt;&amp;quot;;
            }
        }
        &lt;span style="color:#0000ff;"&gt;else&lt;/span&gt;
        {
            &lt;span style="color:#0000ff;"&gt;for&lt;/span&gt; (&lt;span style="color:#0000ff;"&gt;int&lt;/span&gt; i = 0; i &amp;lt; patterns.Count; i++)
            {
                &lt;span style="color:#008000;"&gt;//In this instance we actually replace each instance of any bad word with a specified string.&lt;/span&gt;
                input = patterns[i].Replace(input, &amp;quot;&lt;span style="color:#8b0000;"&gt;**Unsuitable Word**&lt;/span&gt;&amp;quot;);
            }
        }

        &lt;span style="color:#008000;"&gt;//return the manipulated string&lt;/span&gt;
        &lt;span style="color:#0000ff;"&gt;return&lt;/span&gt; input;
    }
}&lt;/pre&gt;

  &lt;div style="font-weight:bold;color:#ffffff;display:none;background-color:#3366cc;"&gt;{..} Click Show Code&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;&amp;#160;&lt;/p&gt;

&lt;p&gt;The XML file which I have used is below.&amp;#160; Dead simple, but does the job.&lt;/p&gt;

&lt;div style="font-size:12px;margin:10px;position:relative;width:95%;border-bottom:#cccccc 1px solid;"&gt;&lt;span style="width:100%;border-bottom:#cccccc 1px solid;display:block;text-align:right;"&gt;&lt;a style="color:#3366cc;"&gt;Hide Code [-]&lt;/a&gt;&lt;/span&gt; 

  &lt;pre style="overflow-y:hidden;overflow-x:auto;padding-bottom:30px;"&gt;&lt;span style="color:#0000ff;"&gt;&amp;lt;?&lt;/span&gt;xml version=&amp;quot;1.0&amp;quot; encoding=&amp;quot;utf-8&amp;quot; &lt;span style="color:#0000ff;"&gt;?&amp;gt;&lt;/span&gt;
&lt;span style="color:#0000ff;"&gt;&amp;lt;&lt;/span&gt;&lt;span style="color:#800000;"&gt;words&lt;/span&gt;&lt;span style="color:#0000ff;"&gt;&amp;gt;&lt;/span&gt;
  &lt;span style="color:#0000ff;"&gt;&amp;lt;&lt;/span&gt;&lt;span style="color:#800000;"&gt;word&lt;/span&gt;&lt;span style="color:#0000ff;"&gt;&amp;gt;&lt;/span&gt;bad word&lt;span style="color:#0000ff;"&gt;&amp;lt;/&lt;/span&gt;&lt;span style="color:#800000;"&gt;word&lt;/span&gt;&lt;span style="color:#0000ff;"&gt;&amp;gt;&lt;/span&gt;
  &lt;span style="color:#0000ff;"&gt;&amp;lt;&lt;/span&gt;&lt;span style="color:#800000;"&gt;word&lt;/span&gt;&lt;span style="color:#0000ff;"&gt;&amp;gt;&lt;/span&gt;ugly word&lt;span style="color:#0000ff;"&gt;&amp;lt;/&lt;/span&gt;&lt;span style="color:#800000;"&gt;word&lt;/span&gt;&lt;span style="color:#0000ff;"&gt;&amp;gt;&lt;/span&gt;
  &lt;span style="color:#0000ff;"&gt;&amp;lt;&lt;/span&gt;&lt;span style="color:#800000;"&gt;word&lt;/span&gt;&lt;span style="color:#0000ff;"&gt;&amp;gt;&lt;/span&gt;bla bla bla&lt;span style="color:#0000ff;"&gt;&amp;lt;/&lt;/span&gt;&lt;span style="color:#800000;"&gt;word&lt;/span&gt;&lt;span style="color:#0000ff;"&gt;&amp;gt;&lt;/span&gt;
&lt;span style="color:#0000ff;"&gt;&amp;lt;/&lt;/span&gt;&lt;span style="color:#800000;"&gt;words&lt;/span&gt;&lt;span style="color:#0000ff;"&gt;&amp;gt;&lt;/span&gt;&lt;/pre&gt;

  &lt;div style="font-weight:bold;color:#ffffff;display:none;background-color:#3366cc;"&gt;{..} Click Show Code&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;&amp;#160;&lt;/p&gt;

&lt;p&gt;Cheers,&lt;/p&gt;

&lt;p&gt;&amp;#160;&lt;/p&gt;

&lt;p&gt;Andrew :-)&lt;/p&gt;&lt;img src="http://blogs.windowsclient.net/aggbug.aspx?PostID=26281" width="1" height="1"&gt;</content><author><name>REA_ANDREW</name><uri>http://blogs.windowsclient.net/members/REA_5F00_ANDREW.aspx</uri></author></entry></feed>