regexp & line feed ...

tep's icon

Here i need to extract different strings & values in an html code. The interesting elements are separated by line feeds

Here is the type of parts which interest me, as shown in jit.textfile :Aasiaat

-13

Luke Hall's icon

[jit.str.regexp @re "
([^\s+

\s+
\s+([^"]

I tend to use \s+ to get past line breaks, as sometimes they are line feeds and sometimes carriage returns and often followed by tabs or spaces and this method matches at least one consecutive white-space characters.

I'd also recommend ([^>]+) for finding text between HTML tags, as it involves less backtracking to find actual matches. It searches for anything but a closing triangle bracket and (as long as you don't have any literal ones in your text string - they should be encoded in valid HTML) will work a lot faster. I tend to avoid .* unless I can't find a way around it.

The only other changes I've made is replacing your string of dots with \d{5} as in the URL it appears that the only unique part is a row of 5 numbers and escaping the literal quotes that appear around the URL.

I hope this helps, let me know if you need a better explanation!

tep's icon

Thanks again for this Luke. Very useful.
I think i couldn't make it much cleaner, removed all *, it works fine.

Max Patch
Copy patch and select New From Clipboard in Max.

Now i would have liked to use @substitute, in order (if possible) to "print" all data on a single line, but still it's no hard work to group them in a coll.
Yet i don't understand the use of @substitute...

THANKS !