Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Lee Kelleher 4026 posts 15837 karma points MVP 13x admin c-trib
    Mar 04, 2010 @ 15:41
    Lee Kelleher
    0

    Profanity Filter for Umbraco

    A long time ago (Sept 2008) I posted on the old forum about a Profanity Filter for Umbraco.  I ended up writing one.  It was quite quick-n-dirty (sic) but it did the job.

    Been thinking that I should package up the code and release it on Our Umbraco - which I'll do soon.

    So, my question is... at present the bad words are all hard-coded (in English), obviously this needs to be i18n/L10n-ized and customisable. Where should this be done? Via a .config file? or a custom section (appTree) in the back-office admin?

    Also should I be considering that it might be used on multi-lingual sites?  (i.e. would applying the English profanity filter on German content cause any undesired effects?)

    Any suggestions?

    Thanks, Lee.

  • dandrayne 1138 posts 2262 karma points
    Mar 04, 2010 @ 15:58
    dandrayne
    1

    F***ing good idea, just be sure not to make the clbuttic mistake when making it ;-)

    As for configuration, I'd be happy using a config file for this kind of thing.

  • Douglas Robar 3570 posts 4711 karma points MVP ∞ admin c-trib
    Mar 04, 2010 @ 16:07
    Douglas Robar
    3

    I'd go for a .config file. It could always be edited via the config file package.

    And, unless there's a big performance penalty for lots of words in the filter I think a person could just have one file with words for all langs in it. One word/phrase per line, case-insensitive:

    blah
    yada
    howdy
    good oh
    good-o

    If you really wanted to be thorough you could have a config file that allows for the culture as well, and you'd use the appropriate culture based on the page being served. And a section that words for all cultures to save duplication if there is any. Something like:

    <Words>
    blah
    yada
    </Words>
    <Words culture="EN-US">
    howdy
    </Words>
    <Words culture="EN-UK">
    good oh
    good-o
    </Words>

    Or for those who just have to have full XML:

    <ProfanityFilter>
    <Words>
    <Word>blah</Word>
    <Word>yada</Word>
    </Words>

    <Words culture="EN-US">
    <Word>howdy</Word>
    </Words>

    <Words culture="EN-UK">
    <Word>good oh</Word>
    <Word>good-o</Word>
    </Words>
    </ProfanityFilter>

     

    cheers,
    doug.

  • jaygreasley 416 posts 403 karma points
    Mar 04, 2010 @ 20:32
    jaygreasley
    1

    I love that even Doug's swear words are no worse than Howdy ;-)

  • Chriztian Steinmeier 2800 posts 8791 karma points MVP 8x admin c-trib
    Mar 04, 2010 @ 22:47
    Chriztian Steinmeier
    2

    Just because I'm a pedant - if you go the XML route (which I'd also vote for) please, please, please use the xml:lang attribute for designating the culture, as in:

    <ProfanityFilter>
        <Words>
            <Word>blah</Word>
            <Word>yada</Word>
        </Words>
    
        <Words xml:lang="en-US">
            <Word>howdy</Word>
        </Words>
    
        <Words xml:lang="en-UK">
            <Word>good oh</Word>
            <Word>good-o</Word>
        </Words>
    </ProfanityFilter>

    XPath has a companion function lang() which selects nodes based on their language, e.g., to select all the english (whether UK- or US-variant) Word elements:

    <xsl:apply-templates select="/ProfanityFilter/Words/Word[lang('en')]" />

    or to grab only the US-variant:

    <xsl:apply-templates select="/ProfanityFilter/Words/Word[lang('en-US')]" />

    Anyway, you get the idea... 

    /Chriztian

  • Lee Kelleher 4026 posts 15837 karma points MVP 13x admin c-trib
    Mar 05, 2010 @ 12:24
    Lee Kelleher
    0

    Thanks for the responses guys... I've gone with Doug's suggestion of the XML config, (with Chriztian's xml:lang attribute suggestion - although I doubt the bad-words will be ever accessible via XSLT).

    <ProfanityFilter>
        <words xml:lang="en-US">
            <![CDATA[damn
    dangnamit
    gosh
    poohsticks
    ]]>
        </words>
    </ProfanityFilter>

    I haven't gone for the "default" set of stop-words... (maybe in a future version?)  The words are newline/tab delimited - I find it easier to read (and parse in code) ... otherwise there's too much XML (IMHO).

     

    Next question.... what default words should I release it with?

    I have a long list of en-GB bad words, (which I wont publish here - too rude!) ... anyone know of a good resource for bad-words in other languages? i.e. Dutch, German, French, etc.

    Thanks, Lee.

  • Douglas Robar 3570 posts 4711 karma points MVP ∞ admin c-trib
    Mar 05, 2010 @ 12:58
    Douglas Robar
    1

    Using the list in DansGaurdian seems to be the basic starting point for most. Here's a more detailed description and list (english)... http://stackoverflow.com/questions/273516/how-do-you-implement-a-good-profanity-filter

    cheers,
    doug.

  • Lee Kelleher 4026 posts 15837 karma points MVP 13x admin c-trib
    Mar 05, 2010 @ 13:52
    Lee Kelleher
    0

    Thanks Doug.

    I ended up releasing with just the en-GB version ... otherwise I wouldn't have got the package out for the weekend.

    I'll look at adding extra language profanities for next version. (Hopefully that will be soon... i.e. "Release early, release often").

    Cheers, Lee.

  • This forum is in read-only mode while we transition to the new forum.

    You can continue this topic on the new forum by tapping the "Continue discussion" link below.

Please Sign in or register to post replies