haarbol

I'm trying to import a .txt file with several columns in it that can all contain sentences with any number of words. What i would like to have is a method of outputting each of those sentences to a different outlet. The columns are seperated by a tab.

I am using the [text] object to open the textfile and have tried to use the [zl] object to break it up, but i haven't found a way to split it at the tabs. I seem to only be able to shave off an x-amount of words.

Is there something i'm overlooking? Is there a way to split up a line in a .txt file by tab seperated column?

split-up-tab-seperated-textfile

SENTENCE01 THE PRESIDENT YELLED, LOUD HELLO THIS IS A SENTENCE

SENTENCE01  THE PRESIDENT  YELLED, LOUD  HELLO THIS IS A SENTENCE
SENTENCE02  ME   TEXT


can you just re-create the original file with only one sentence per line? that is, do you need them to be in separate columns?

[regexp] is a great idea for many many things, but...would it recognize tabs as tabs, or just as whitespace? maybe you could check if there were multiple spaces in a row. I am far from an expert on that object so I could be way off.

Another way is to use [atoi] and see when you get a tab character, which should be the number 9, I think.

Thanks for both suggestions. I tried to look into the regexp object yesterday, including the tutorial patch with usage examples, but i have no idea where to start with that, since i've never worked with regular expressions before. Is there anyone who could give me some pointers on that, maybe an example patch?

Seejay: i don't think so, it's an export from a larger Excel file that someone else makes (copy/paste), which exists out of three columns. To manually change the layout each time would slow down the process too much.

I will look into the atoi object tonight. Thank you for the suggestion!

Hi,
Thanks for both suggestions. I tried to look into the regexp object yesterday, including the tutorial patch with usage examples, but i have no idea where to start with that, since i've never worked with regular expressions before. Is there anyone who could give me some pointers on that, maybe an example patch?

I will look into the atoi object tonight. Thank you for the suggestion!


essentially, you have a char class of everything thats not a tab [^\t], group by multiple of these ([^\t]*), which will just split by tab (\t), and that's what goes into each substring. Then take the substrings output from the [regexp] object. 

That works like a charm, thanks big_pause!

There is one small inconvenience with it, but i think that has more to do with the way the [text] object processes its input:

If you import a line from a text document with tabs in it, they get converted to normal spaces. Your method works perfecly if you put "" at the beginning of the sentence in the text document, but that means that you have to copy the text from Excel, and then manually add the quotation marks. Is there a way to import text without having to use the quotation mark and still keep the tabs in there after import?

Hmm, never realised that, you learn something new every day.

Unless anyone has some bright ideas, I think its roll your own text file reader time in js or java. Seems a bit shit though.

Unless anyone has some bright ideas, I think its roll your own text file reader time in js or java.  Seems a bit shit though.


Yeah, and i still have to try this out, but i think i could probably get away with having a first column with a quotation mark in it for every line. Small inconvenience.

Also, if i don't use Excel but Open Office you can save as a .csv file where it will put quotation marks around every column, so i could probably look into how to break it up by content within quotation marks. :)

Split up tab seperated textfile