Substrings/subsitutions/regular expressions in C


    Nov 25 2008 | 11:56 am
    As standard C does not come with a regular expressions library (as far as I know) so what would be the best way to implement these expressions I use in the regexp object:
    [regexp -M[\w].+(?i:ms)] - substrings
    [regexp (-M|ms) @substitute " "] - substitutions
    There may be a way to make it simpler. I want to search a list of various atoms which I have modified simplelist to do and then match elements using regular expressions and output them in a different form.
    So using the two regexp I can convert -M34.6ms to 34.6.
    Perhaps this is possible using 'if then' but I can't think how to split the symbols up without regular expressions.
    I hope this is clear.
    Thanks!

    • Nov 25 2008 | 12:12 pm
      Here's a patch in max which demonstrates roughly what I'm trying to do:
      I want to write this in an object as I have lots of different atoms to match of varying formats which would be criminally inefficient in Max (and I want to practice my dev skills! ;)
    • Nov 25 2008 | 1:59 pm
      On Nov 25, 2008, at 3:56 AM, fairesigneaumachiniste wrote:
      > As standard C does not come with a regular expressions library (as > far as I know) so what would be the best way to implement these > expressions I use in the regexp object:
      I could suggest you use the PCRE library (http://www.pcre.org), which is what the regexp max object does, but if the max regexp object is slow, then you will not see any performance improvement in your own object which uses PCRE. So what would I suggest? A tight string walking function of your own. or sscanf/strtok/etc might be useful if you have a decent expectation of what the input is like.
      From your simple example you probably can do something along the lines of the following quicko email client code
      void stripnumber(char *dst, const char *src) { char c;
      // keep numbers, spaces, and periods // strip everything else while (c = *src++) { switch (c) { case ' ': case '.': case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': *dst++ = c; // copy char from input to output break; default: //skip } }
      *dst = ''; // null terminate output }
      Here's a simple tutorial on working with strings in C: http://www.eskimo.com/~scs/cclass/notes/sx8.html
      You can also get into finite automata implementations for regular expressions which can be much faster than the typical backtracking algorithms like perl and pcre use. Here's one reasonably clear paper on the subject with code samples if you're feeling really nerdy.
      If you get deeper into string processing in C, you'll also need to pay attention to UTF-8 unicode representation as well if you want to handle non ASCII characters:
      Hope this gets you started. If you have further questions about this stuff, I'd suggest you search online. Obviously lots of info out there.
      -Joshua
    • Nov 26 2008 | 8:37 am
      I'll also add that on OSX, there is a regex library installed, I believe it is the POSIX implementation, see regex.h
      That said, from what I have read, I would be careful as to how you use it because it can be slow. I have not witnessed this yet in my implementation of it in my external.
      The functions of interest or:
      regcomp regexec
      You can search for substrings with () very easily using an array of offsets to the matches. I then use the standard string functions to build up what I want.
      However, if efficiency is what you want, I'd follow the advise and create very specific functions with the c string commands that are going to be ULTIMATELY faster.