download text from wikipedia
Hello
I would like to be able to download some text from wikipedia and get it into a message box
I've looked at several post on the forum but i can't figure out how to use jit.uldl properly ( i've always got error -1 )
any help would be really appreciated
thank you
PB
Hi,
here's a short example using [sadam.tcpClient]
(see https://cycling74.com/forums/announce-the-sadam-library-version-2012-10-08 ) that will download the page about Giovanni Pierluigi da Palestrina from the English Wikipedia. Unfortunately this will give you the whole HTML document, so you'll need some additional processing there to get the actual text.
Hope this helps,
Ádám
thank you a lot Ádám
Unfortunatly the "additional processing" looks really tricky for me...
But I will work on it :)
thx again
PB
Hi again,
here's a more advanced example. This will first search for a tag and throw everything that comes before that. Then it will parse the HTML with [sadam.rapidXML]
, which will give you structured access to the HTML document. This might not be the best choice for you (since XML is not the best representation for you, as you will probably see, since for example, formatting commands will be parsed as separate XML elements), but it might give you a good starting point.
HTH,
Ádám
really great! it will help me a lot!!
many thanks
PB
Hi,
and here's another version. This will get directly the printer-friendly version of the page, without the sidebar and other Wikipedia-related stuff, which means that there's a lot less stuff to care about. However, the links are still structured as XML elements, which might still give some headaches...
HTH,
Ádám
thanks a lot Ádám!
Hi,
how about an .aspx or .php address?
Hi,
there should be no difference. If you check the last example that I posted, it is executing a PHP query on Wikipedia. The point is, you always send a 'GET' command to the web server (which is quite broadly documented on the web with examples etc.) and then process whatever answer you get back.
HTH,
Ádám