OK, here’s one way of converting UTF-8 sequences to their Unicode values.
The “help” file has examples from the lowest and highest Unicode characters for single-byte, double-byte, and triple-byte UTF-8 sequences (well, close to the extreme values, the absolute extremes are mostly not printable characters or even used by Unicode). I’ve checked the calculated values against the values given in the Character View palette from the Input menu on Mac OS.
I tried some 4-byte sequences, but itoa seems to be generating incorrect UTF-8 sequences for Unicode codepoints above U-10000. So now someone’s got to write a bug report:-(
Hope this helps,
(PS: nice seeing you guys at the weekend.)
Jan 31, 2011 at 1:01pm #196697