Unicode

Data Structures

struct  t_charset_converter
 The charset_converter object. More...

Functions

t_max_err charset_convert (t_symbol *src_encoding, const char *in, long inbytes, t_symbol *dest_encoding, char **out, long *outbytes)
 A convenience function that simplifies usage by wrapping the other charset functions.
unsigned short * charset_utf8tounicode (char *s, long *outlen)
 Convert a UTF8 C-String into a 16-bit-wide-character array.
char * charset_unicodetoutf8 (unsigned short *s, long len, long *outlen)
 Convert a 16-bit-wide-character array into a UTF C-string.
long charset_utf8_count (char *utf8, long *bytecount)
 Returns utf8 character count, and optionally bytecount.
char * charset_utf8_offset (char *utf8, long charoffset, long *byteoffset)
 Returns utf8 character offset (positive or negative), and optionally byte offset.

Detailed Description

Character Encodings

Currently supported character encodings

Example Usage

    t_charset_converter *conv = object_new(CLASS_NOBOX, gensym("charset_converter"), ps_macroman, ps_ms_ansi);
    char *cstr = "Text to convert";
    char *cvtbuffer = NULL; // to-be-allocated data buffer
    long cvtbuflen = 0; // length of buffer on output

    if (conv) {
        // note that it isn't necessary to send in a 0-terminated string, although we do so here
        if (object_method(conv, gensym("convert"), cstr, strlen(cstr) + 1, &cvtbuffer, &cvtbuflen) == ERR_NONE) {
            // do something with the converted buffer
            sysmem_freeptr(cvtbuffer); // free newly allocated data buffer
        }
        object_free(conv); // free converter
    }

Function Documentation

t_max_err charset_convert ( t_symbol src_encoding,
const char *  in,
long  inbytes,
t_symbol dest_encoding,
char **  out,
long *  outbytes 
)

A convenience function that simplifies usage by wrapping the other charset functions.

Parameters:
src_encodingThe name encoding of the input.
inThe input string.
inbytesThe number of bytes in the input string.
dest_encodingThe name of the encoding to use for the output.
outThe address of a char*, which will be allocated and filled with the string in the new encoding.
outbytesThe address of a value that will hold the number of bytes long the output is upon return.
Returns:
A Max error code.
Remarks:
Remember to call sysmem_freeptr(*out) to free any allocated memory.
char* charset_unicodetoutf8 ( unsigned short *  s,
long  len,
long *  outlen 
)

Convert a 16-bit-wide-character array into a UTF C-string.

Accepts either null termination, or not (len is zero in the latter case).

Parameters:
sAn array of wide (16-bit) unicode characters.
lenThe length of s.
outlenThe address of a variable to hold the size of the number of chars but does not include the NULL terminator in the count.
Returns:
A UTF8-encoded C-string.
long charset_utf8_count ( char *  utf8,
long *  bytecount 
)

Returns utf8 character count, and optionally bytecount.

Parameters:
utf8The UTF-8 encoded string whose characters are to be counted.
bytecountThe address of a variable to hold the byte count on return. Pass NULL if you don't require the byte count.
Returns:
The number of characters in the UTF8 string.
char* charset_utf8_offset ( char *  utf8,
long  charoffset,
long *  byteoffset 
)

Returns utf8 character offset (positive or negative), and optionally byte offset.

Parameters:
utf8A UTF-8 encoded string.
charoffsetThe char offset into the string at which to find the byte offset.
byteoffsetThe address of a variable to hold the byte offset on return. Pass NULL if you don't require the byte offset.
Returns:
The character offset.
unsigned short* charset_utf8tounicode ( char *  s,
long *  outlen 
)

Convert a UTF8 C-String into a 16-bit-wide-character array.

Parameters:
sThe string to be converted to unicode.
outlenThe address of a variable to hold the size of the number of chars but does not include the NULL terminator in the count.
Returns:
A pointer to the buffer of unicode (wide) characters.