Removing a entire line from a txt file

toni.bird

Hi everyone, hope anyone can help.
I have a txt file that looks like:
|0150|90|REDECARD S.A|1058|01425787000104|||3550308||AV. JUSCELINO |
|0150|1254|INTEGRA FROTAS LTDA.|1058|25265787000144|||3304557||FRANCISCO |
|0150|2|EAI CLUBE AUTOMOBILISTA S/A|1058|34656383000172|||3550308||AVENIDA BRIGADEIRO |
|0150|1253|VEGAS CARD DO BRASIL CARTÕES DE CRÉDITO |
|0150|203|EMPRESA BRAS DE TECNOLOGIA E ADMIN DE CONVENIOS HAAG |
|0150|91|SERVNET ADMINISTRAÇÃO DE CARTÕES LTDA|1058|29759316000143|||3170206||RUA |
|0190|LT|Litros|
|0190|UN|Unidade|
|0200|2262|IPI F1 MASTER SINT 0W20 SN 1LT|||LT|00|27101932||27||18|0600700|

I need to remove lines that starts with |0150| ...

many thanks

Source Audio

if it is max intern, and you don't want to use any binaries like awk, grep, etc

regexp could filter out lines which contain 150
regexp (0150?)
if you must include |0150 in search (ascii 124)
then
regexp (\\|0150?)

because | has a function in regexp

Max Patch

Copy patch and select New From Clipboard in Max.

pdelges

I would add a ^ at the beginnning of the expression, or all lines that include |0150 would be rejected instead of only those that start with |0150.

[regexp ^(\\|0150?)]

toni.bird

Thank you all for your help, tried regexp but was far away on getting right the formatting .

cheers

toni.bird

Hi and thanks again , its working nicely , just having issues with special characters like Ç , Ã , Õ they gat swapped by «√ , … , ”. thanks in advance

Source Audio

I get no such substitutions in text object and also not in written text file.
Can you describe exactly where that happens,
which OS you use, and which app handles text files.

Maybe wrong encoding set

toni.bird

test.maxpat

Max Patch

z1.txt

txt 2.73 MB

txt

pdelges

Are you using Windows? I can't find any character substitution here (OSX) when running your patch (please use "copy compressed" in the future) and your big file (maybe a shorter one, with only lines 13 and 14 was enough to illustrate your problem).

Source Audio

You upload > 40000 lines long text file ?
And no infos at all ?
as you can see, on my max and my system - no problem.

Nothing to do with text encoding , but this is one mistake - to keep text through all regexp objects

why this repetitions ?

toni.bird

Hi, thanks for your comments, sorry for the long text , I´ve actually typed some infos , but not sure why it was not on my reply, sorry for that. I am on Windows 10- max Version 8.3.1

Max Patch

Copy patch and select New From Clipboard in Max.

still having the issue, but with your help now a better patch , and compressed... thanks for the advice

Source Audio

I will add that same text file you posted here
and downloaded looks different then if one
opens it raw in new window, copy-paste into text document.

But -- whatever was read into max text object,
got rewritten to disk without any changes .
---------

you can place all strings into single regexp...
regexp (\\|0400?)|(\\|0450?)|(\\|0460?)|(\\|0990?)|(\\|1001?)|(\\|1010?)|(\\|B001?)|(\\|B990?)|(\\|C001?)|(\\|C800?)|(\\|C850?)|(\\|D001?)|(\\|D100?)|(\\|D190?)|(\\|D990?)|(\\|E001?)|(\\|E100?)|(\\|E110?)|(\\|E116?)|(\\|G001?)|(\\|G990?)|(\\|H001?)|(\\|H005?)|(\\|H010?)|(\\|H990?)|(\\|K001?)|(\\|K990?)

but that long text files take ages to dump from text.

On mac it gets done in few seconds using shell - grep

on windows you can try findstr

Source Audio

P.S.
maybe inserting atoi - itoa combo with set utf8 would fix that chars problem.

reuploaded the patch, max stripped leading zeros
in (\\|0400?)|(\\|0450?)|(\\|0460?)|(\\|0990?) strings while pasting them

Max Patch

Copy patch and select New From Clipboard in Max.

Source Audio

there is another issue, I compared regexp output
to mac grep - a lot more gets removed using regexp.
Taking a close look -
filter I1000 wipes also I100000
you would have to precise what really needs to be matched

toni.bird

Hi and thanks very much for all support , I have noticed that the swapping only happens only if I use [read] or [open] in the text object , if I copy and paste the txt file inside the text object it keeps all chars . and works well

Source Audio

I think it has nothing to do with text object itself, but with the text file encoding.
It downloads as ANSI , browser displays it as utf-8.
open downloaded file with notepad and resave it as utf-8.
Then text object will have no problems to read it.

toni.bird

Hi and many thanks again for all your help. Got it sorted converting to utf-8 with notepad++ first.

Source Audio

I had a close look at your big text file...
Max is NOT capable of processing this in effective way.
you have single or double quotas in quite few lines, backslashes, semicolons, commas etc.
Max text object can not cope with it just like that.
Even If you removed matching lines using findstr command,
and read resulting text file in max,
output from text object will remain troublesome
for example this line with 1 single "
|0200|376|PALHETA TECHONE 14"|||UN|00|85129000||85||18||
or this with backslash
|0450|24|Pedido Ipiranga 469081 Baseado em Pedidos de venda 95779. \nPV95779|
or semicolons
|0450|25|DOCUMENTO EMITIDO POR ME OU EPP OPTANTE PELO SIMPLES NACIONAL; II - NAO PERMITE O CREDITO FISCAL DO IPI;III - VALOR APROX. DOS TRIBUTOS: R$ (20,92), (38%)|

That will never come out of text object same way.
---
In first place when running read text lines through regexp, that meta chars will
cause unwanted output, that's why I suggested to remove lines that match
search criteria using shell and script, or outside of max in first place.
-----

Depending on what you need that text for, one could suggest some workarrounds...

kLSDiz

One could check for the first part of each line via [fromsymbol @separator |]. Far from ideal, but a bit more "Maxy” than multi-escaped RE magic. Here’s my entry for the least efficient algorithm competition:

Max Patch

Copy patch and select New From Clipboard in Max.

Source Audio

that is another option, but also as unefficient as text -> regexp - it takes almost 1 minute
to dump the text file...
as now not only 0150 , but a bunch of other strings need to get matched , also with leading zeros,
like I0850I which will get striped to 850 if converted to int, this will not work reliably
because any line beginning with 150 etc would get removed, but we look for I0150I

If one really wants to fight that battle in max,
better option is to use filein, then dump the file through itoa, set to discover if text file is
utf8 or not and adjust itself, then run through regexp and insert remaining lines into text object.
in the ascii stream, one could filter out meta chars if needed
; \ , " etc to keep text file more max conform.

But that depends on what that text file needs to be used for, and
if remaining lines must be left intact.

reading and dumping same text file takes only few seconds, compared to
text read -> dump.

Max Patch

Copy patch and select New From Clipboard in Max.

-----------------------

but on windows, this line sent to shell (with real file path) would do it much faster...

or create small script droplet ...

kLSDiz

yep, Wrong Tool for the Task (tm) ;) Text processing is one of those areas where Max excels not. Without resorting to external tools we also have [js], and I would probably go that way (File:readLine), but I didn’t measure its efficiency.