Regxp filtering
Hi,
I am looking for a method to filter a string containing slashes. If the input is "I saw Mr/Mrs Barack/Michele Obama", I need to get "I saw Mr Barack Obama" (all parts of the words BEFORE slashes) and "I saw Mrs Michele Obama" (all parts of the words AFTER slashes).
I guess using one (or two ?) regxp would be the best method, but I've never really understood it. Could you help me a little ?
Thanks a lot !
V.
An approach would be to target every occurrence of a forward slash (/) and any word characters after or before it (\w). You'd write that like this /\w+
for a slash and a word after it.
The plus after the \w makes it match one or more \w until a non \w character. This is in regular regex syntax, in Max you need to double the backwards slash. Now that you've matched what you want you can remove it by substituting it with nothing (%0).
Regex is hard to understand and after understanding it's hard to remember, I recommend using a regex cheat sheet. Here's the one I use.
As is typical when comparing regular expressions to code that does the same thing, Dimitri's pure Max solution with regexp is far shorter than my Javascript solution with js. But many people find text parsing with code easier to understand than regular expressions. That's why I gave up trying to learn regex. Take your pick!
i am regexp-blind, too, so i would solve that by list iteration, atoi, lots of registers and logic, probably requiring about 35 objects. (could have learned regexp in the same time?)
Haha yes, doing it in pure Max and without regexp would surely be by far the most long-winded way! Maybe you could even easier have learned Javascript in the same time, if you don't already know it. ;-)
it is just a bit of synthax and 4-5 things it can do, so it should not be more difficult than [if] and [expr] and [sprintf].
my favorite excuse is that i only figured around 2013 or so that there is actually a third party regexp external for my outdated version of max, which originally came without.
parts of the synthax dont make much sense to me, i would eventually need a worded explanation of "what it does" before i start using it... and before i can give proper support or make a "missing tutorial" file.
https://regexone.com/
https://regexone.com/lesson/introduction_abcs
or maybe the whole idea of "regular language" is already imcompatible with 110.
wouldn´t you have to zl.iter it anway for the OPs request?
then the "matching" part is easy: .\/.
but how the hell do you remove the / (or take whatever altering action) after you have found it ?
re-reading his post i think i would write the strings into a coll or zl nth and reassemble the sentences from there using integer numbers.
The way to handle this is a with a Regex like "(\w*)\/(\w*)" and then running a substitution with "%1" or "%2" as the substitution, depending on whether you want the first or second word.
The pair of (\w*) match words on either side of a /, and capture them. The sub values of %1 or %2 contain the matched words. So the regex operation reads as "match every pair of words separated by a /, and replace that pair with either the first (%1) or second (%2) word.
I don't think this can all be done in a single execution of Regex, but I guess only a single regex object is needed if you zl.iter over it with %1 and %2 in a list.
Yes, thanks you all for your help, you're wonderful !
@Dimitri, I definitely need to learn this kind of magic and a regexp cheat sheet is a good start. Thanks a lot !
@Simon, nice use of js, it seems less scary than I thought ;-)
@Roman : haha, using pure Max is the way I should have done it too if I this forum had not existed !
useful for trying out:
RegExr