regexp & line feed ...
[jit.str.regexp @re "
([^\s+
\s+
\s+([^"]
I tend to use \s+ to get past line breaks, as sometimes they are line feeds and sometimes carriage returns and often followed by tabs or spaces and this method matches at least one consecutive white-space characters.
I'd also recommend ([^>]+) for finding text between HTML tags, as it involves less backtracking to find actual matches. It searches for anything but a closing triangle bracket and (as long as you don't have any literal ones in your text string - they should be encoded in valid HTML) will work a lot faster. I tend to avoid .* unless I can't find a way around it.
The only other changes I've made is replacing your string of dots with \d{5} as in the URL it appears that the only unique part is a row of 5 numbers and escaping the literal quotes that appear around the URL.
I hope this helps, let me know if you need a better explanation!
Thanks again for this Luke. Very useful.
I think i couldn't make it much cleaner, removed all *, it works fine.
Now i would have liked to use @substitute, in order (if possible) to "print" all data on a single line, but still it's no hard work to group them in a coll.
Yet i don't understand the use of @substitute...
THANKS !