Unicode

Data Structures

struct  t_charset_converter
 The charset_converter object. More...
 

Functions

t_max_err charset_convert (t_symbol *src_encoding, const char *in, long inbytes, t_symbol *dest_encoding, char **out, long *outbytes)
 A convenience function that simplifies usage by wrapping the other charset functions. More...
 
unsigned short * charset_utf8tounicode (char *s, long *outlen)
 Convert a UTF8 C-String into a 16-bit-wide-character array. More...
 
char * charset_unicodetoutf8 (unsigned short *s, long len, long *outlen)
 Convert a 16-bit-wide-character array into a UTF C-string. More...
 
long charset_utf8_count (char *utf8, long *bytecount)
 Returns utf8 character count, and optionally bytecount. More...
 
char * charset_utf8_offset (char *utf8, long charoffset, long *byteoffset)
 Returns utf8 character offset (positive or negative), and optionally byte offset. More...
 

Detailed Description

Character Encodings

Currently supported character encodings

Example Usage

t_charset_converter *conv = object_new(CLASS_NOBOX, gensym("charset_converter"), ps_macroman, ps_ms_ansi);
char *cstr = "Text to convert";
char *cvtbuffer = NULL; // to-be-allocated data buffer
long cvtbuflen = 0; // length of buffer on output
if (conv) {
// note that it isn't necessary to send in a 0-terminated string, although we do so here
if (object_method(conv, gensym("convert"), cstr, strlen(cstr) + 1, &cvtbuffer, &cvtbuflen) == ERR_NONE) {
// do something with the converted buffer
sysmem_freeptr(cvtbuffer); // free newly allocated data buffer
}
object_free(conv); // free converter
}

Function Documentation

t_max_err charset_convert ( t_symbol src_encoding,
const char *  in,
long  inbytes,
t_symbol dest_encoding,
char **  out,
long *  outbytes 
)

A convenience function that simplifies usage by wrapping the other charset functions.

Parameters
src_encodingThe name encoding of the input.
inThe input string.
inbytesThe number of bytes in the input string.
dest_encodingThe name of the encoding to use for the output.
outThe address of a char*, which will be allocated and filled with the string in the new encoding.
outbytesThe address of a value that will hold the number of bytes long the output is upon return.
Returns
A Max error code.
Remarks
Remember to call sysmem_freeptr(*out) to free any allocated memory.
char* charset_unicodetoutf8 ( unsigned short *  s,
long  len,
long *  outlen 
)

Convert a 16-bit-wide-character array into a UTF C-string.

Accepts either null termination, or not (len is zero in the latter case).

Parameters
sAn array of wide (16-bit) unicode characters.
lenThe length of s.
outlenThe address of a variable to hold the size of the number of chars but does not include the NULL terminator in the count.
Returns
A UTF8-encoded C-string.
long charset_utf8_count ( char *  utf8,
long *  bytecount 
)

Returns utf8 character count, and optionally bytecount.

Parameters
utf8The UTF-8 encoded string whose characters are to be counted.
bytecountThe address of a variable to hold the byte count on return. Pass NULL if you don't require the byte count.
Returns
The number of characters in the UTF8 string.
char* charset_utf8_offset ( char *  utf8,
long  charoffset,
long *  byteoffset 
)

Returns utf8 character offset (positive or negative), and optionally byte offset.

Parameters
utf8A UTF-8 encoded string.
charoffsetThe char offset into the string at which to find the byte offset.
byteoffsetThe address of a variable to hold the byte offset on return. Pass NULL if you don't require the byte offset.
Returns
The character offset.
unsigned short* charset_utf8tounicode ( char *  s,
long *  outlen 
)

Convert a UTF8 C-String into a 16-bit-wide-character array.

Parameters
sThe string to be converted to unicode.
outlenThe address of a variable to hold the size of the number of chars but does not include the NULL terminator in the count.
Returns
A pointer to the buffer of unicode (wide) characters.
  Copyright © 2015, Cycling '74