Data Structures
struct	t_charset_converter
	The charset_converter object. More...

Functions
t_max_err	charset_convert (t_symbol src_encoding, const char in, long inbytes, t_symbol dest_encoding, char out, long outbytes)
	A convenience function that simplifies usage by wrapping the other charset functions. More...

unsigned short *	charset_utf8tounicode (char s, long outlen)
	Convert a UTF8 C-String into a 16-bit-wide-character array. More...

char *	charset_unicodetoutf8 (unsigned short s, long len, long outlen)
	Convert a 16-bit-wide-character array into a UTF C-string. More...

long	charset_utf8_count (char utf8, long bytecount)
	Returns utf8 character count, and optionally bytecount. More...

char *	charset_utf8_offset (char utf8, long charoffset, long byteoffset)
	Returns utf8 character offset (positive or negative), and optionally byte offset. More...

Detailed Description

Character Encodings

Currently supported character encodings

_sym_utf_8; // utf-8, no bom
_sym_utf_16; // utf-16, big-endian
_sym_utf_16be; // utf-16, big-endian
_sym_utf_16le; // utf-16, little-endian
_sym_iso_8859_1; // iso-8859-1 (latin-1)
_sym_us_ascii; // us-ascii 7-bit
_sym_ms_ansi; // ms-ansi (microsoft code page 1252)
_sym_macroman; // mac roman
_sym_charset_converter;
_sym_convert;

Example Usage

t_charset_converter *conv = object_new(CLASS_NOBOX, gensym("charset_converter"), ps_macroman, ps_ms_ansi);
char *cstr = "Text to convert";
char *cvtbuffer = NULL; // to-be-allocated data buffer
long cvtbuflen = 0; // length of buffer on output
if (conv) {
    // note that it isn't necessary to send in a 0-terminated string, although we do so here
    if (object_method(conv, gensym("convert"), cstr, strlen(cstr) + 1, &cvtbuffer, &cvtbuflen) == ERR_NONE) {
        // do something with the converted buffer
        sysmem_freeptr(cvtbuffer); // free newly allocated data buffer
    }
    object_free(conv); // free converter
}

Function Documentation

t_max_err charset_convert	(	t_symbol *	src_encoding,
		const char *	in,
		long	inbytes,
		t_symbol *	dest_encoding,
		char **	out,
		long *	outbytes
	)

A convenience function that simplifies usage by wrapping the other charset functions.

Parameters

src_encoding	The name encoding of the input.
in	The input string.
inbytes	The number of bytes in the input string.
dest_encoding	The name of the encoding to use for the output.
out	The address of a char*, which will be allocated and filled with the string in the new encoding.
outbytes	The address of a value that will hold the number of bytes long the output is upon return.

Returns: A Max error code.

Remarks: Remember to call sysmem_freeptr(*out) to free any allocated memory.

char* charset_unicodetoutf8	(	unsigned short *	s,
		long	len,
		long *	outlen
	)

Convert a 16-bit-wide-character array into a UTF C-string.

Accepts either null termination, or not (len is zero in the latter case).

Parameters

s	An array of wide (16-bit) unicode characters.
len	The length of s.
outlen	The address of a variable to hold the size of the number of chars but does not include the NULL terminator in the count.

Returns: A UTF8-encoded C-string.

long charset_utf8_count	(	char *	utf8,
		long *	bytecount
	)

Returns utf8 character count, and optionally bytecount.

Parameters

utf8	The UTF-8 encoded string whose characters are to be counted.
bytecount	The address of a variable to hold the byte count on return. Pass NULL if you don't require the byte count.

Returns: The number of characters in the UTF8 string.

char* charset_utf8_offset	(	char *	utf8,
		long	charoffset,
		long *	byteoffset
	)

Returns utf8 character offset (positive or negative), and optionally byte offset.

Parameters

utf8	A UTF-8 encoded string.
charoffset	The char offset into the string at which to find the byte offset.
byteoffset	The address of a variable to hold the byte offset on return. Pass NULL if you don't require the byte offset.

Returns: The character offset.

unsigned short* charset_utf8tounicode	(	char *	s,
		long *	outlen
	)

Convert a UTF8 C-String into a 16-bit-wide-character array.

Parameters

s	The string to be converted to unicode.
outlen	The address of a variable to hold the size of the number of chars but does not include the NULL terminator in the count.

Returns: A pointer to the buffer of unicode (wide) characters.

Data Structures

Functions

Detailed Description

Character Encodings

Example Usage

Function Documentation