atoi in latin-1 character encoding rather than unicode?
Hello everyone,
I need to make a text string and rotate it. (ticker-style)
I first used the atoi object, did some zl rotate action on it, and back to itoa. It works fine. The only problem is: French! ;)
In French, and other non-English languages, we have letters that are not part of ASCII. That means that in UTF-8 (unicode) these characters are encoded in two bytes. That makes operations on the list of integers that atoi gives us rather tricky. Is there a atoi-like object of script somewhere that deals with latin-1 encoding, instead? The latin-1 encoding uses only one byte per character. It would be easier to deal with.
Another option would be to use a script to handle the text and do the handling. Maybe JavaScript or Lua! If anyone has a code snippet to do this kind of text manipulation and output is as a symbol, it would be quite useful.
See the attached patch to view an example of the issue I am facing.
Best regards,
Alex
I would also like to know how t do this
Thanks
phiol
You'll probably need to parse the stream of ints yourself. When you get output from atoi in the range 96–223, then you need to pull it, together with the following byte, out of the list and combine them into a single atom (either pack into a list and tosymbol, or perform an invertible mathematical mapping of your choice, e.g., 256*$i1+$i2). Then put the combined item back into the list you pass into [zl group]. Don't forget to reverse the packing process between zl and itoa.
Should work.
You ought to perform similar processing for the 3-, 4-, 5-, and 6-byte combinations in UTF-8, but if you're only concerned about Western European Roman, the 2-byte case will probably get you through the night.
Latin-1 would sort of be a step backwards, n'est-ce pas?
Hi Peter,
thanks for the suggestion
Here is an example patch of the problem I'm encountering (the problem is written in red)
I get a weird symbol change when used with jit.gl.text.
Would what you suggestion, help in this situation ?
Thanks again :-)
Hi.
You may want to have a look at Daniele Ghisi's sy package:
https://cycling74.com/forums/introducing-sy-a-library-for-handling-symbols-in-max/
cheers
My suggestion is fiddly enough that my first choice would be to write an external in C. It can be done in Max, but the details will take more thinking than I have spare time until way after Easter.
(But, for starters, you'll need a [zl iter 1] to break the list up into a stream [or substitute spell for atoi], then [split 0 127] to funnel off single-byte UTF-8, daisy-chained to a [split 192 223] to handle chars with a 2-byte encoding in UTF-8 and probably controlling a gate to redirect the next byte into, for instance, an expr like the one I posted previously. Then merge the result back into the main stream, which is fed into a [zl group] or something before passing onto [zl rot]. And then you're going to have to [zl iter] through the rotated result, split up the two-byte-characters-encoded-as-a-single-int and merge those back into the stream, etc.)
I said it was fiddly. You may find Daniele's library a lot easier. (Actually, I just had a slightly easier idea, but it's still probably more fiddly than you want to know about.)
The important thing is that you can't allow multibyte chars to be split up, not by [zl rot] or by anything else. This is a fundamental warning that Apple and MS and Unix libraries have been preaching to developers for over two decades. I suppose Cycling could pass on the caveat to Max users, who mostly don't get developer memos. I've no idea where this should be documented, though.