Core API: Strings

construct.setglobalstringencoding(encoding)

Sets the encoding globally for all String PascalString CString GreedyString instances.

Parameters:encoding – a string like “utf8” etc or None, which means working with bytes
construct.String(length, encoding=None, padchar='\x00', paddir='right', trimdir='right')

A configurable, fixed-length or variable-length string field.

When parsing, the byte string is stripped of pad character (as specified) from the direction (as specified) then decoded (as specified). Length is a constant integer or a function of the context. When building, the string is encoded (as specified) then padded (as specified) from the direction (as specified) or trimmed as bytes (as specified).

The padding character and direction must be specified for padding to work. The trim direction must be specified for trimming to work.

Parameters:
  • length – length in bytes (not unicode characters), as int or context function
  • encoding – encoding (e.g. “utf8”) or None for bytes
  • padchar – b-string character to pad out strings (by default b”x00”)
  • paddir – direction to pad out strings (one of: right left both)
  • trimdir – direction to trim strings (one of: right left)

Example:

>>> String(10).build(b"hello")
b'hello\x00\x00\x00\x00\x00'
>>> String(10).parse(_)
b'hello'
>>> String(10).sizeof()
10

>>> String(10, encoding="utf8").build("Афон")
b'\xd0\x90\xd1\x84\xd0\xbe\xd0\xbd\x00\x00'
>>> String(10, encoding="utf8").parse(_)
'Афон'

>>> String(10, padchar=b"XYZ", paddir="center").build(b"abc")
b'XXXabcXXXX'
>>> String(10, padchar=b"XYZ", paddir="center").parse(b"XYZabcXYZY")
b'abc'

>>> String(10, trimdir="right").build(b"12345678901234567890")
b'1234567890'
construct.PascalString(lengthfield, encoding=None)

A length-prefixed string.

PascalString is named after the string types of Pascal, which are length-prefixed. Lisp strings also follow this convention.

The length field will not appear in the same dict, when parsing. Only the string will be returned. When building, actual length is prepended before the encoded string. The length field can be variable length (such as VarInt). Stored length is in bytes, not characters.

Parameters:
  • lengthfield – a field used to parse and build the length
  • encoding – encoding (eg. “utf8”) or None for bytes

Example:

>>> PascalString(VarInt, encoding="utf8").build("Афон")
b'\x08\xd0\x90\xd1\x84\xd0\xbe\xd0\xbd'
>>> PascalString(VarInt, encoding="utf8").parse(_)
'Афон'
construct.CString(terminators='\x00', encoding=None)

A string ending in a terminator b-string character.

CString is similar to the strings of C.

By default, the terminator is the NULL byte (b’x00’). Terminators field can be a longer b-string, and any of the characters breaks parsing. First terminator byte is used when building.

Parameters:
  • terminators – sequence of valid terminators, first is used when building, all are used when parsing
  • encoding – encoding (e.g. “utf8”) or None for bytes

Example:

>>> CString(encoding="utf8").build("Афон")
b'\xd0\x90\xd1\x84\xd0\xbe\xd0\xbd\x00'
>>> CString(encoding="utf8").parse(_)
'Афон'
construct.GreedyString(encoding=None)

A string that reads the rest of the stream until EOF, and writes a given string as is. If no encoding is given, this is essentially GreedyBytes.

Parameters:encoding – encoding (e.g. “utf8”) or None for bytes

See also

Analog to GreedyBytes and the same when no enoding is used.

Example:

>>> GreedyString(encoding="utf8").build("Афон")
b'\xd0\x90\xd1\x84\xd0\xbe\xd0\xbd'
>>> GreedyString(encoding="utf8").parse(_)
'Афон'