regexp question (truly exact match ?...)


    Dec 17 2010 | 12:13 am
    hi,
    after having done some serious research in the forum and the web concerning regexp code i still haven't found the solution for this (i hope) very trivial problem:
    i want to check, if a sentence (i.e. a string) contains the word "there".
    as i am looking for an exact match i surround it in b boundaries so the object looks like this:
    regexp (\bthere\b)
    the problem is, that strings like "there's a xxx" are also matched. could anybody help me modifying the regexp so that apostrophes are excluded ?
    (the general issue here is that i really miss something like a NOT operator in regexp... or did i just overlook something ?)
    thanks for any hint !

    • Dec 17 2010 | 12:20 am
      The caret or ^ is a not operator just within a pair of square brackets []
      I forget exactly how regexp works a lot of the time so whenever I want to write a regexp I end up here:
      HTH
      Alex
    • Dec 17 2010 | 12:43 am
      I'm not really familiar with regexp but it seems to me you have to include the space's in front of, and after "there" so it will only include "stand-alone" there's
      Like this; " there "
      Good luck!
      FRid
    • Dec 17 2010 | 12:45 am
      hi, alex thanks a lot for your reply. only ... it doesn't work ;-)
      the problem here is that with this caret in brackets REGEXP searches for an extra element, whereas i want it to match exactly after "there" and not match after "there's"
      i.e. the formula
      ============================ regexp (\bthere[^']\b) ============================
      wouldn't match "there" anymore
      mmmhhh....
    • Dec 17 2010 | 1:38 am
      You probably need to roll your own version of \b possibly making use of the pipe operator to match several slightly different scenarios.
      Start by enumerating *exactly* when you want it to match and when not.
      This might work:
      (\At|[^'\w]t)her(e\Z|e[^'\w])
      This should match at the start of the input or after a non-word character except ' Same kind of thing at the end.
      Might have got it wrong though - haven't tested it.
      A.
    • Dec 17 2010 | 1:51 am
      yeah ! that really seem's to work ! i did no heavy testing yet, but the example strings behave like they should!
      thanks a lot, alex !
      (but hey, what a monster of regexp code, really ugly to look at ...)
      ciao
      oliver
    • Dec 17 2010 | 1:52 am
      you also *might* be able to do it with numbers through [atoi]. Run the whole string through [atoi] and you'll get a list of the "number-chars", then try [zl sub] to compare the list to the word "there" (also run through [atoi], which gives 32 116 104 101 114 101 32, including spaces at beginning and end). So you'd be looking for that group of values in the master list, and [zl sub] will give you "found: 1" and "position: 10" or wherever.
    • Dec 17 2010 | 2:58 am
      Don't bother with the b and stuff..
      If you only want to check for "there" in a sentence then all you need to do is tell the regexp that it should look for "there" where 'something' obviously is whatever it is; its not of your concern.
      So: "^*.there.*$" (no "", but see snippet below).
      This means: From the start of the sentence (^) check for whatever kind of character (.* (. any character and * "one or x appearances")) until you get to a point where some letter combination needs to be there ('there'). After this combination any kind of characters can appear until the end of the sentence ($).
      And because you want to pinpoint to 'there' you want to use () so that it becomes a back reference.
      Alas, here's my proof of example:
      Edit: The trigger object doesn't have to be there, but I used it here to make my example as easy to understood as possible.
    • Dec 17 2010 | 4:03 am
      That last example will match "theremin" and "weathered" though. You probably need to include all the punctuation you don't mind preceeding/following the word in non-capturing brackets (or the option of it being the very first or last character) like so:
      regexp (?:[\s({[]|^)(there)(?:[]\s}).,!?';:]|$)
    • Dec 17 2010 | 2:58 pm
      hi, regexp gurus !
      thanks a lot for joining into this discussion and for all your nice help. now i already gained way more inside into the whole thing than i had before.
      concerning the simplicity of the task i always re-learn that regular expression is really quite a beast to tame ...
      cheers, guys !
      oliver
    • Dec 17 2010 | 3:00 pm
      sorry, should have been "insight"
      greetings from the land of PISA loosers ;-)