Designing a Filter to Remove Profanity Strings
Hey Guys,
I just want to see if anyone has any input. I'm working with an Arduino GSM shield, and am doing a festival which is public / family friendly, and the audience can send SMS messages in the space to control some visualizations and sound. The SMS messages will be projected and on display for the public. I am trying to design a filter to remove swearing / profanity. I know it will not be perfect, but I'm trying to build a basic dictionary of profanity / terms that may be sexually and / or racially offensive and replace them with another string (i.e., @#$%).
I've been working with [regexp] and I admittedly haven't full wrapped my head around it, however I'm also wondering if there's another way to do string matching for a large number of words. For example, I can think of like five different ways to trick the system just to say "ass" (hope no one's offended ;) ) (i.e., ass, a s s, a_s_s, etc.), and this is where [regexp] is great! However, although I can filter out many variations of the word, a word like 'bass' also triggers as a profane word (code pasted below). I can't seem to figure out how to filter if it's part of another word or at the end of a word. Additionally, making one giant regexp for any word that might be offensive, seems like a pretty large object and I don't know if efficiency will be an issue.
However, I almost prefer something like [ coll ] or a textfile with [ zl.filter ] as I can just "type out my profanity dictionary" easily. However, obviously the problem with coll is that I can't get it to match things like "a s s" or any special characters.
1. Any advice or alternate objects / methods to look into?
2. If not, any way I could be pointed in the right direction so that words that may end in a profane word? (i.e., 'bass' registers as 'ass' in [regexpr])
Swearing Filter Code:
Thanks in advance!
Another way would be to leave that to a 3rd party app. There are some profanity filters out there in the web that can be used with API calls (REST/XML). I found this one: http://www.purgomalum.com It should be straight forward to call the api with maxurl.
There's always the \b (word boundary) you can use in a [regexp] to distinguish between "ass" and "bass", as you can see in the example below. But I guess it's a tricky business, because word boundaries will let you miss aggregate words. Jan's advise is awesome, unless you can't guarantee a flawless internet connection. Good luck!
Thanks a lot! My Internet connection is unknown, but that word boundary could be the key! Also, the purgomalum has a list of profane words, so that makes it easier for me to compile.
I'll play with this stuff and if I think it's good enough, I'll post my results!
So here's what I cooked up because I've learned to not always count on proper Wifi for public events ;)
It's not perfect, and I tried to include as many words / aggregates as I could fathom. I ended up chaining together a bunch of [regexp] filters. I'm sure there's a better / more efficient / elegant way of doing this, but for anyone who needs a vanilla Max profanity remover, here it is! Feel free to use (although pasting my name in the patch is always courteous ;))! I made a subpatch because I hate keeping track of abstractions. I tested it as best as I could but if there's an issue let me know...although I don't plan on really maintaining this.
Excellent! The regexp's read like poetry! Good luck with the installation.
snippet stolen! Good work!