Forums > MaxMSP

XML to plain text

December 13, 2008 | 5:31 am

Basically, I’m looking for a way to convert XML to plain text. Specifically, I’m working on a patch that prompts the user for the input of a url (for example, a cnn.com news story) and then returns just the main text block (the article itself) to max for parsing.

Right now, I’m using the jit.uldl object to return the website to a jit.textfile in XML format. From everything I’ve read, it seems I’ll have to use javascript to convert this to plain text. I’m wondering if there is another way to do this, as I have no experience with js or using it with Max. Does anyone at least know of a precedent or a sample patch that I could learn from?

Another specific of the project is that I would like to divide the text block itself into an array of sentences that can be accessed individually. For example, given the word "economy", locate and print every sentence in the text block that contains the word "economy". Should I be looking at pattr objects for this?

Thanks, I am obviously very new to this and sincerely appreciate any suggestions, hints, etc.


December 13, 2008 | 8:09 am

You should be able to do this with the [regexp] or [jit.regexp] objects. I’m about to leave for work but when I can get back I can post some examples of how you might want to start with this project.

lh


December 13, 2008 | 12:26 pm

Hi Nathan,

you might have a look at the [regexp] object:
some useful info concerning the syntax can be found here:

http://www.python.org/doc/2.5.2/lib/re-syntax.html

Best,

Martijn

Nathan wrote:
> Basically, I’m looking for a way to convert XML to plain text. Specifically, I’m working on a patch that prompts the user for the input of a url (for example, a cnn.com news story) and then returns just the main text block (the article itself) to max for parsing.
>
> Right now, I’m using the jit.uldl object to return the website to a jit.textfile in XML format. From everything I’ve read, it seems I’ll have to use javascript to convert this to plain text. I’m wondering if there is another way to do this, as I have no experience with js or using it with Max. Does anyone at least know of a precedent or a sample patch that I could learn from?
>
> Another specific of the project is that I would like to divide the text block itself into an array of sentences that can be accessed individually. For example, given the word "economy", locate and print every sentence in the text block that contains the word "economy". Should I be looking at pattr objects for this?
>
> Thanks, I am obviously very new to this and sincerely appreciate any suggestions, hints, etc.
>
>


December 13, 2008 | 8:11 pm

have a look at [detox] in my collection. it’s a basic xml-parser
external.

various people have had success in parsing web-content based xml files
with it.

it is freely available at this location http://www.jasch.ch/dl/

cheers

/*j


December 14, 2008 | 5:25 am

Thanks everyone for your help! This has been very informative already. Thanks jasch, these are great tools!

Right now I’m downloading to a jit.textfile matrix using jit.uldl. I can’t seem to figure out how to convert the matrix into a symbol (or symbols) to use with the detox object. Really, I can’t figure out how to get any form of the text out of the jit.textfile object. Any suggestions? Thanks!


December 14, 2008 | 8:49 am

> Really, I can’t figure out how to get any form of the text out of
> the jit.textfile object. Any suggestions? Thanks!

connect [jit.texfile]‘s middle outlet to [jit.spill] then to [itoa]
which gives you a symbol

/*j


December 14, 2008 | 10:31 am

thanks again jasch!


December 14, 2008 | 11:40 am

also have a look at mzed’s weather-patch in this thread. it uses the
exact combination of objects you mentioned.
message #78731 Thu, 31 August 2006 20:39

http://www.cycling74.com/forums/index.php?t=msg&th=21535&rid=0&S=22e2f02a52001ed90e73030499bb6175

/*j


December 14, 2008 | 4:50 pm

What about tap.xml.sax from the Tap tools – from electrotap.com:
" tap.xml.sax is a streaming XML file parser that allows you use any of a
myriad XML-based formats including music-xml, xhtml, and SVG (Scalable
Vector Graphics) files. "
Any feedback?

More fun would be to make your own lisp program within maxlispj (should be
there: http://music.columbia.edu/~brad/maxlispj )

J-F.

> also have a look at mzed’s weather-patch in this thread. it uses the
> exact combination of objects you mentioned.
> message #78731 Thu, 31 August 2006 20:39
> http://www.cycling74.com/forums/index.php?t=msg&th=21535&rid=0&S=22e2f02a52001
> ed90e73030499bb6175


December 16, 2008 | 8:04 am

thanks for all the help!


Viewing 10 posts - 1 through 10 (of 10 total)