Files
python3-cookbook/source/c15/p15_converting_c_string_to_python.rst

67 lines
2.2 KiB
ReStructuredText
Raw Normal View History

==============================
15.15 C字符串转换为Python字符串
==============================
----------
问题
----------
2015-05-07 15:43:57 +08:00
You want to convert strings from C to Python bytes or a string object.
2015-05-07 16:05:14 +08:00
|
----------
解决方案
----------
2015-05-07 15:43:57 +08:00
For C strings represented as a pair char *, int, you must decide whether or not you
want the string presented as a raw byte string or as a Unicode string. Byte objects can
be built using Py_BuildValue() as follows:
2015-05-07 15:43:57 +08:00
char *s; /* Pointer to C string data */
int len; /* Length of data */
/* Make a bytes object */
PyObject *obj = Py_BuildValue("y#", s, len);
If you want to create a Unicode string and you know that s points to data encoded as
UTF-8, you can use the following:
PyObject *obj = Py_BuildValue("s#", s, len);
If s is encoded in some other known encoding, you can make a string using PyUni
code_Decode() as follows:
PyObject *obj = PyUnicode_Decode(s, len, "encoding", "errors");
/* Examples /*
obj = PyUnicode_Decode(s, len, "latin-1", "strict");
obj = PyUnicode_Decode(s, len, "ascii", "ignore");
If you happen to have a wide string represented as a wchar_t *, len pair, there are a
few options. First, you could use Py_BuildValue() as follows:
wchar_t *w; /* Wide character string */
int len; /* Length */
PyObject *obj = Py_BuildValue("u#", w, len);
Alternatively, you can use PyUnicode_FromWideChar():
PyObject *obj = PyUnicode_FromWideChar(w, len);
For wide character strings, no interpretation is made of the character data—it is assumed
to be raw Unicode code points which are directly converted to Python.
2015-05-07 16:05:14 +08:00
|
----------
讨论
----------
2015-05-07 15:43:57 +08:00
Conversion of strings from C to Python follow the same principles as I/O. Namely, the
data from C must be explicitly decoded into a string according to some codec. Common
encodings include ASCII, Latin-1, and UTF-8. If youre not entirely sure of the encoding
or the data is binary, youre probably best off encoding the string as bytes instead.
When making an object, Python always copies the string data you provide. If necessary,
its up to you to release the C string afterward (if required). Also, for better reliability,
you should try to create strings using both a pointer and a size rather than relying on
NULL-terminated data.