Designing a Filter to Remove Profanity Strings

Duffield

Hey Guys,

I just want to see if anyone has any input. I'm working with an Arduino GSM shield, and am doing a festival which is public / family friendly, and the audience can send SMS messages in the space to control some visualizations and sound. The SMS messages will be projected and on display for the public. I am trying to design a filter to remove swearing / profanity. I know it will not be perfect, but I'm trying to build a basic dictionary of profanity / terms that may be sexually and / or racially offensive and replace them with another string (i.e., @#$%).

I've been working with [regexp] and I admittedly haven't full wrapped my head around it, however I'm also wondering if there's another way to do string matching for a large number of words. For example, I can think of like five different ways to trick the system just to say "ass" (hope no one's offended ;) ) (i.e., ass, a s s, a_s_s, etc.), and this is where [regexp] is great! However, although I can filter out many variations of the word, a word like 'bass' also triggers as a profane word (code pasted below). I can't seem to figure out how to filter if it's part of another word or at the end of a word. Additionally, making one giant regexp for any word that might be offensive, seems like a pretty large object and I don't know if efficiency will be an issue.

However, I almost prefer something like [ coll ] or a textfile with [ zl.filter ] as I can just "type out my profanity dictionary" easily. However, obviously the problem with coll is that I can't get it to match things like "a s s" or any special characters.

1. Any advice or alternate objects / methods to look into?
2. If not, any way I could be pointed in the right direction so that words that may end in a profane word? (i.e., 'bass' registers as 'ass' in [regexpr])

Max Patch

Copy patch and select New From Clipboard in Max.

Swearing Filter Code:

Thanks in advance!

Jan M

Another way would be to leave that to a 3rd party app. There are some profanity filters out there in the web that can be used with API calls (REST/XML). I found this one: http://www.purgomalum.com It should be straight forward to call the api with maxurl.

hz37

Max Patch

Copy patch and select New From Clipboard in Max.

There's always the \b (word boundary) you can use in a [regexp] to distinguish between "ass" and "bass", as you can see in the example below. But I guess it's a tricky business, because word boundaries will let you miss aggregate words. Jan's advise is awesome, unless you can't guarantee a flawless internet connection. Good luck!

Duffield

Thanks a lot! My Internet connection is unknown, but that word boundary could be the key! Also, the purgomalum has a list of profane words, so that makes it easier for me to compile.

I'll play with this stuff and if I think it's good enough, I'll post my results!

Duffield

So here's what I cooked up because I've learned to not always count on proper Wifi for public events ;)

Max Patch

Copy patch and select New From Clipboard in Max.

It's not perfect, and I tried to include as many words / aggregates as I could fathom. I ended up chaining together a bunch of [regexp] filters. I'm sure there's a better / more efficient / elegant way of doing this, but for anyone who needs a vanilla Max profanity remover, here it is! Feel free to use (although pasting my name in the patch is always courteous ;))! I made a subpatch because I hate keeping track of abstractions. I tested it as best as I could but if there's an issue let me know...although I don't plan on really maintaining this.

hz37

Excellent! The regexp's read like poetry! Good luck with the installation.

Wetterberg

snippet stolen! Good work!