Core API: Strings

construct.setglobalstringencoding(encoding)

Sets the encoding globally for all String/PascalString/CString/GreedyString instances.

Parameters:encoding – a string like “utf8”, or None which means working with bytes (not unicode)
construct.String(length, encoding=None, padchar='\x00', paddir='right', trimdir='right')

Configurable, fixed-length or variable-length string field.

When parsing, the byte string is stripped of pad character (as specified) from the direction (as specified) then decoded (as specified). Length is a constant integer or a context function. When building, the string is encoded (as specified) then padded (as specified) from the direction (as specified) or trimmed (as specified).

The padding character and direction must be specified for padding to work. The trim direction must be specified for trimming to work.

If encoding is not specified, it works with bytes (not unicode strings).

Parameters:
  • length – length in bytes (not unicode characters), as integer or context function
  • encoding – encoding (eg. “utf8”) or None for bytes
  • padchar – bytes character to pad out strings (by default b”x00”)
  • paddir – direction to pad out strings (one of: right left both)
  • trimdir – direction to trim strings (one of: right left)

Example:

>>> d = String(10)
>>> d.build(b"hello")
b'hello\x00\x00\x00\x00\x00'
>>> d.parse(_)
b'hello'
>>> d.sizeof()
10

>>> d = String(10, encoding="utf8")
>>> d.build(u"Афон")
b'\xd0\x90\xd1\x84\xd0\xbe\xd0\xbd\x00\x00'
>>> d.parse(_)
u'Афон'

>>> d = String(10, padchar=b"XYZ", paddir="center")
>>> d.build(b"abc")
b'XXXabcXXXX'
>>> d.parse(b"XYZabcXYZY")
b'abc'

>>> d = String(10, trimdir="right")
>>> d.build(b"12345678901234567890")
b'1234567890'
construct.PascalString(lengthfield, encoding=None)

Length-prefixed string. The length field can be variable length (such as VarInt) or fixed length (such as Int64ul). VarInt is recommended for new designs. Stored length is in bytes, not characters.

Parameters:
  • lengthfield – a field used to parse and build the length (eg. VarInt Int64ul)
  • encoding – encoding (eg. “utf8”), or None for bytes

Example:

>>> d = PascalString(VarInt, encoding="utf8")
>>> d.build(u"Афон")
b'\x08\xd0\x90\xd1\x84\xd0\xbe\xd0\xbd'
>>> d.parse(_)
u'Афон'
construct.CString(terminators='\x00', encoding=None)

String ending in a terminator byte.

By default, the terminator is the x00 byte character. Terminators field can be a longer bytes, and any one of the characters breaks parsing. First terminator byte is used when building.

Parameters:
  • terminators – sequence of valid terminators, first is used when building, all are used when parsing
  • encoding – encoding (eg. “utf8”), or None for bytes

Warning

Do not use >1 byte encodings like UTF16 or UTF32 with CStrings, they are not safe.

Example:

>>> d = CString(encoding="utf8")
>>> d.build(u"Афон")
b'\xd0\x90\xd1\x84\xd0\xbe\xd0\xbd\x00'
>>> d.parse(_)
u'Афон'
construct.GreedyString(encoding=None)

String that reads the rest of the stream until EOF, and writes a given string as is. If no encoding is specified, this is essentially GreedyBytes.

Parameters:encoding – encoding (eg. “utf8”), or None for bytes

See also

Analog to GreedyBytes and the same when no enoding is used.

Example:

>>> d = GreedyString(encoding="utf8")
>>> d.build(u"Афон")
b'\xd0\x90\xd1\x84\xd0\xbe\xd0\xbd'
>>> d.parse(_)
u'Афон'