download text from wikipedia

freepoulite's icon

Hello

I would like to be able to download some text from wikipedia and get it into a message box
I've looked at several post on the forum but i can't figure out how to use jit.uldl properly ( i've always got error -1 )
any help would be really appreciated

thank you

PB

$Adam's icon

Hi,

here's a short example using [sadam.tcpClient] (see https://cycling74.com/forums/announce-the-sadam-library-version-2012-10-08 ) that will download the page about Giovanni Pierluigi da Palestrina from the English Wikipedia. Unfortunately this will give you the whole HTML document, so you'll need some additional processing there to get the actual text.

Max Patch
Copy patch and select New From Clipboard in Max.

Hope this helps,
Ádám

freepoulite's icon

thank you a lot Ádám
Unfortunatly the "additional processing" looks really tricky for me...
But I will work on it :)
thx again

PB

$Adam's icon

Hi again,

here's a more advanced example. This will first search for a tag and throw everything that comes before that. Then it will parse the HTML with [sadam.rapidXML], which will give you structured access to the HTML document. This might not be the best choice for you (since XML is not the best representation for you, as you will probably see, since for example, formatting commands will be parsed as separate XML elements), but it might give you a good starting point.

Max Patch
Copy patch and select New From Clipboard in Max.

HTH,
Ádám

freepoulite's icon

really great! it will help me a lot!!

many thanks

PB

$Adam's icon

Hi,

and here's another version. This will get directly the printer-friendly version of the page, without the sidebar and other Wikipedia-related stuff, which means that there's a lot less stuff to care about. However, the links are still structured as XML elements, which might still give some headaches...

Max Patch
Copy patch and select New From Clipboard in Max.

HTH,
Ádám

freepoulite's icon

thanks a lot Ádám!

cj od's icon

Hi,

how about an .aspx or .php address?

$Adam's icon

Hi,

there should be no difference. If you check the last example that I posted, it is executing a PHP query on Wikipedia. The point is, you always send a 'GET' command to the web server (which is quite broadly documented on the web with examples etc.) and then process whatever answer you get back.

HTH,
Ádám