Forums > MaxMSP

regexp & line feed …


tep
January 19, 2012 | 4:04 am

Here i need to extract different strings & values in an html code. The interesting elements are separated by line feeds

Here is the type of parts which interest me, as shown in jit.textfile :

Aasiaat




-13 


January 19, 2012 | 11:32 am

[jit.str.regexp @re "

([^< ]+)\s+

\s+

\s+([^< ]+)"]

I tend to use \s+ to get past line breaks, as sometimes they are line feeds and sometimes carriage returns and often followed by tabs or spaces and this method matches at least one consecutive white-space characters.

I’d also recommend ([^>]+) for finding text between HTML tags, as it involves less backtracking to find actual matches. It searches for anything but a closing triangle bracket and (as long as you don’t have any literal ones in your text string – they should be encoded in valid HTML) will work a lot faster. I tend to avoid .* unless I can’t find a way around it.

The only other changes I’ve made is replacing your string of dots with \d{5} as in the URL it appears that the only unique part is a row of 5 numbers and escaping the literal quotes that appear around the URL.

I hope this helps, let me know if you need a better explanation!



tep
January 22, 2012 | 5:44 pm

Thanks again for this Luke. Very useful.
I think i couldn’t make it much cleaner, removed all *, it works fine.

Now i would have liked to use @substitute, in order (if possible) to "print" all data on a single line, but still it’s no hard work to group them in a coll.
Yet i don’t understand the use of @substitute…

– Pasted Max Patch, click to expand. –

THANKS !


Viewing 3 posts - 1 through 3 (of 3 total)