html to plain text

dave leith's icon

This question seems to pop up now and again on the list.

Has any created an external to parse html and extract the text component?
Earlier posts refer to java swing html parser or the Maxtent tagger from
Stanford or using java script to strip the tags. Has anyone created a
external or patch that does this? or some pointers in the right direction.

Thanks