Split up tab seperated textfile

Mar 22, 2012 at 7:44am

Split up tab seperated textfile

Hi all,

I’m trying to import a .txt file with several columns in it that can all contain sentences with any number of words. What i would like to have is a method of outputting each of those sentences to a different outlet. The columns are seperated by a tab.

I am using the [text] object to open the textfile and have tried to use the [zl] object to break it up, but i haven’t found a way to split it at the tabs. I seem to only be able to shave off an x-amount of words.

Is there something i’m overlooking? Is there a way to split up a line in a .txt file by tab seperated column?

Thank you!

#62534
Mar 22, 2012 at 10:38am

To give an example of two sentences -

SENTENCE01 THE PRESIDENT YELLED, LOUD HELLO THIS IS A SENTENCE
SENTENCE02
ME TEXT

#225851
Mar 22, 2012 at 2:56pm

check out the [regexp] object

#225852
Mar 22, 2012 at 10:35pm

can you just re-create the original file with only one sentence per line? that is, do you need them to be in separate columns?

[regexp] is a great idea for many many things, but…would it recognize tabs as tabs, or just as whitespace? maybe you could check if there were multiple spaces in a row. I am far from an expert on that object so I could be way off.

Another way is to use [atoi] and see when you get a tab character, which should be the number 9, I think.

#225853
Mar 23, 2012 at 7:31am

Hi,
Thanks for both suggestions. I tried to look into the regexp object yesterday, including the tutorial patch with usage examples, but i have no idea where to start with that, since i’ve never worked with regular expressions before. Is there anyone who could give me some pointers on that, maybe an example patch?

Seejay: i don’t think so, it’s an export from a larger Excel file that someone else makes (copy/paste), which exists out of three columns. To manually change the layout each time would slow down the process too much.

I will look into the atoi object tonight. Thank you for the suggestion!

#225854
Mar 23, 2012 at 11:41am

how’s this for you

essentially, you have a char class of everything thats not a tab [^\t], group by multiple of these ([^\t]*), which will just split by tab (\t), and that’s what goes into each substring. Then take the substrings output from the [regexp] object.

– Pasted Max Patch, click to expand. –
#225855
Mar 24, 2012 at 9:34am

That works like a charm, thanks big_pause!

There is one small inconvenience with it, but i think that has more to do with the way the [text] object processes its input:

If you import a line from a text document with tabs in it, they get converted to normal spaces. Your method works perfecly if you put “” at the beginning of the sentence in the text document, but that means that you have to copy the text from Excel, and then manually add the quotation marks. Is there a way to import text without having to use the quotation mark and still keep the tabs in there after import?

#225856
Mar 24, 2012 at 4:48pm

Hmm, never realised that, you learn something new every day.

Unless anyone has some bright ideas, I think its roll your own text file reader time in js or java. Seems a bit shit though.

#225857
Mar 25, 2012 at 8:13am

Yeah, and i still have to try this out, but i think i could probably get away with having a first column with a quotation mark in it for every line. Small inconvenience.

Also, if i don’t use Excel but Open Office you can save as a .csv file where it will put quotation marks around every column, so i could probably look into how to break it up by content within quotation marks. :)

#225858

You must be logged in to reply to this topic.