profanity filter for umbraco

Press Ctrl / CMD + C to copy this to your clipboard.

Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at

Lee Kelleher 4026 posts 15837 karma points MVP 13x admin c-trib

Mar 04, 2010 @ 15:41

0

Profanity Filter for Umbraco

A long time ago (Sept 2008) I posted on the old forum about a Profanity Filter for Umbraco. I ended up writing one. It was quite quick-n-dirty (sic) but it did the job.

Been thinking that I should package up the code and release it on Our Umbraco - which I'll do soon.

So, my question is... at present the bad words are all hard-coded (in English), obviously this needs to be i18n/L10n-ized and customisable. Where should this be done? Via a .config file? or a custom section (appTree) in the back-office admin?

Also should I be considering that it might be used on multi-lingual sites? (i.e. would applying the English profanity filter on German content cause any undesired effects?)

Any suggestions?

Thanks, Lee.

Copy Link
dandrayne 1138 posts 2262 karma points

Mar 04, 2010 @ 15:58

1

F***ing good idea, just be sure not to make the clbuttic mistake when making it ;-)

As for configuration, I'd be happy using a config file for this kind of thing.

Copy Link
Douglas Robar 3570 posts 4711 karma points MVP ∞ admin c-trib

Mar 04, 2010 @ 16:07
3
I'd go for a .config file. It could always be edited via the config file package.

And, unless there's a big performance penalty for lots of words in the filter I think a person could just have one file with words for all langs in it. One word/phrase per line, case-insensitive:
```
blah
yada
howdy
good oh
good-o
```
If you really wanted to be thorough you could have a config file that allows for the culture as well, and you'd use the appropriate culture based on the page being served. And a section that words for all cultures to save duplication if there is any. Something like:
```
<Words>
    blah
    yada
</Words>
<Words culture="EN-US">
    howdy
</Words>
<Words culture="EN-UK">
    good oh
    good-o
</Words>
```
Or for those who just have to have full XML:
```
<ProfanityFilter>
    <Words>
        <Word>blah</Word>
        <Word>yada</Word>
    </Words>

    <Words culture="EN-US">
        <Word>howdy</Word>
    </Words>

    <Words culture="EN-UK">
        <Word>good oh</Word>
        <Word>good-o</Word>
    </Words>
</ProfanityFilter>
```
cheers,
doug.
Copy Link
jaygreasley 416 posts 403 karma points

Mar 04, 2010 @ 20:32

1

I love that even Doug's swear words are no worse than Howdy ;-)

Copy Link
Chriztian Steinmeier 2800 posts 8791 karma points MVP 8x admin c-trib

Mar 04, 2010 @ 22:47
2
Just because I'm a pedant - if you go the XML route (which I'd also vote for) please, please, please use the xml:lang attribute for designating the culture, as in:
```
<ProfanityFilter>
    <Words>
        <Word>blah</Word>
        <Word>yada</Word>
    </Words>

    <Words xml:lang="en-US">
        <Word>howdy</Word>
    </Words>

    <Words xml:lang="en-UK">
        <Word>good oh</Word>
        <Word>good-o</Word>
    </Words>
</ProfanityFilter>
```
XPath has a companion function lang() which selects nodes based on their language, e.g., to select all the english (whether UK- or US-variant) Word elements:
```
<xsl:apply-templates select="/ProfanityFilter/Words/Word[lang('en')]" />
```
or to grab only the US-variant:
```
<xsl:apply-templates select="/ProfanityFilter/Words/Word[lang('en-US')]" />
```
Anyway, you get the idea...

/Chriztian
Copy Link
Lee Kelleher 4026 posts 15837 karma points MVP 13x admin c-trib

Mar 05, 2010 @ 12:24
0
Thanks for the responses guys... I've gone with Doug's suggestion of the XML config, (with Chriztian's xml:lang attribute suggestion - although I doubt the bad-words will be ever accessible via XSLT).
```
<ProfanityFilter>
    <words xml:lang="en-US">
        <![CDATA[damn
dangnamit
gosh
poohsticks
]]>
    </words>
</ProfanityFilter>
```
I haven't gone for the "default" set of stop-words... (maybe in a future version?) The words are newline/tab delimited - I find it easier to read (and parse in code) ... otherwise there's too much XML (IMHO).

Next question.... what default words should I release it with?

I have a long list of en-GB bad words, (which I wont publish here - too rude!) ... anyone know of a good resource for bad-words in other languages? i.e. Dutch, German, French, etc.

Thanks, Lee.
Copy Link
Douglas Robar 3570 posts 4711 karma points MVP ∞ admin c-trib

Mar 05, 2010 @ 12:58

1

Using the list in DansGaurdian seems to be the basic starting point for most. Here's a more detailed description and list (english)... http://stackoverflow.com/questions/273516/how-do-you-implement-a-good-profanity-filter

cheers,
doug.

Copy Link
Lee Kelleher 4026 posts 15837 karma points MVP 13x admin c-trib

Mar 05, 2010 @ 13:52

0

Thanks Doug.

I ended up releasing with just the en-GB version ... otherwise I wouldn't have got the package out for the weekend.

I'll look at adding extra language profanities for next version. (Hopefully that will be soon... i.e. "Release early, release often").

Cheers, Lee.

Copy Link
is working on a reply...

This forum is in read-only mode while we transition to the new forum.

You can continue this topic on the new forum by tapping the "Continue discussion" link below.

Please Sign in or register to post replies

Flag this post as spam?

Profanity Filter for Umbraco