The Basics


Fields are the most fundamental unit of construction: they parse (read data from the stream and return an object) and build (take an object and write it down onto a stream). There are many kinds of fields, each working with a different type of data (numeric, boolean, strings, etc.).

Some examples of parsing:

>>> from construct import Int16ub, Int16ul
>>> Int16ub.parse(b"\x01\x02")
>>> Int16ul.parse(b"\x01\x02")

Some examples of building:

>>> from construct import Int16ub, Int16sb

Other fields like:

>>> Flag.parse(b"\x01")
>>> Enum(Byte, g=8, h=11).parse(b"\x08")
>>> Enum(Byte, g=8, h=11).build(11)
>>> Single.parse(_)

Variable-length fields

>>> VarInt.sizeof()
construct.core.SizeofError: cannot calculate size

Fields are sometimes fixed size and some composites behave differently when they are composed of those. Keep that detail in mind. Classes that cannot determine size always raise SizeofError in response. There are few classes where same instance may return an integer or raise SizeofError depending on circumstances. Array size depends on whether count of elements is constant (can be a context lambda) and subcon is fixed size (can be variable size). For example, many classes take context lambdas and SizeofError is raised if the key is missing from the context.

>>> Int16ub[2].sizeof()
>>> VarInt[1].sizeof()
construct.core.SizeofError: cannot calculate size


For those of you familiar with C, Structs are very intuitive, but here’s a short explanation for the larger audience. A Struct is a collection of ordered and usually named fields (field means an instance of Construct class), that are parsed/built in that same order. Names are used for two reasons: (1) when parsed, values are returned in a dictionary where keys are matching the names, and when build, each field gets build with a value taken from a dictionary from a matching key (2) fields parsed and built values are inserted into the context dictionary under matching names.

>>> format = Struct(
...     "signature" / Const(b"BMP"),
...     "width" / Int8ub,
...     "height" / Int8ub,
...     "pixels" / Array(this.width * this.height, Byte),
... )
>>> format.parse(b'BMP\x03\x02\x07\x08\t\x0b\x0c\r')
Container(signature=b'BMP')(width=3)(height=2)(pixels=[7, 8, 9, 11, 12, 13])

Usually members are named but there are some classes that build from nothing and return nothing on parsing, so they have no need for a name (they can stay anonymous). Duplicated names within same struct can have unknown side effects.

>>> test = Struct(
...     Const(b"XYZ"),
...     Padding(2),
...     Pass,
...     Terminated,
... )
>>> test.parse(_)

Note that this syntax works ONLY on Python 3.6 due to ordered keyword arguments:

>>> Struct(a=Byte, b=Byte, c=Byte, d=Byte)

Operator + can also be used to make Structs, and to merge them. Structs are embedded (not nested) when added.

>>> st = "count"/Byte + "items"/Byte[this.count] + Terminated
>>> st.parse(b"\x03\x01\x02\x03")
Container(count=3)(items=[1, 2, 3])


What is that Container object, anyway? Well, a Container is a regular Python dictionary. It provides pretty-printing and allows accessing items as attributes as well as keys, and also preserves insertion order. Let’s see more of those:

>>> st = Struct("float"/Single)
>>> x = st.parse(b"\x00\x00\x00\x01")
>>> x.float
>>> x["float"]
>>> repr(x)
>>> print(x)
    float = 1.401298464324817e-45

As you can see, Containers provide human-readable representation of the data when printed, which is very important. By default, it truncates byte-strings and unicode-strings and hides EnumFlags unset flags (false values). If you would like a full print, you can use these functions:

>>> setGlobalPrintFalseFlags(True)
>>> setGlobalPrintFullStrings(True)

Thanks to blapid, containers can also be searched. Structs nested within Structs return containers within containers on parsing. One can search the entire “tree” of dicts for a particular name. Regular expressions are supported.

>>> con = Container(Container(a=1,d=Container(a=2)))
>>> con.search_all("a")
[1, 2]

Nesting and embedding

Structs can be nested. Structs can contain other Structs, as well as any other constructs. Here’s how it’s done:

>>> st = Struct(
...     "inner" / Struct(
...         "data" / Bytes(4),
...     )
... )
>>> st.parse(b"1234")
>>> print(_)
    inner = Container:
        data = b'1234'

A Struct can be embedded into an enclosing Struct. This means that all fields of the embedded Struct get merged into the fields of the enclosing Struct. This is useful when you want to split a big Struct into multiple parts, and then combine them all into one Struct. If names are duplicated, consistency is not guaranteed (you should avoid that).

>>> outer = Struct(
...     Embedded(Struct(
...         "data" / Bytes(4),
...     )),
... )
>>> outer.parse(b"1234")

Embedded structs should not be named, see Embedded .


Sequences are very similar to Structs, but operate with lists rather than containers. Sequences are less commonly used than Structs, but are very handy in certain situations. Since a list is returned in place of an attribute container, the names of the sub-constructs are not important. Two constructs with the same name will not override or replace each other. Names are used for the purposes of context dict.

>>> seq = Sequence(
...     Int16ub,
...     CString("utf8"),
...     GreedyBytes,
... )

Operator >> can also be used to make Sequences, or to merge them (but this syntax is not recommended).

>>> seq = Int16ub >> CString("utf8") >> GreedyBytes
>>> seq.parse(b"\x00\x80lalalaland\x00\x00\x00\x00\x00")
[128, 'lalalaland', b'\x00\x00\x00\x00']


Repeaters, as their name suggests, repeat a given unit for a specified number of times. At this point, we’ll only cover static repeaters where count is a constant integer. Meta-repeaters take values at parse/build time from the context and they will be covered in the meta-constructs tutorial. Arrays and GreedyRanges differ from Sequences in that they are homogenous, they process elements of same kind. We have three kinds of repeaters.

Arrays have a fixed constant count of elements. Operator [] is used instead of calling the Array class (and is recommended syntax).

>>> d = Array(10, Byte) or Byte[10]
>>> d.parse(b"1234567890")
[49, 50, 51, 52, 53, 54, 55, 56, 57, 48]

GreedyRange attempts to parse until EOF or subcon fails to parse correctly. Either way, when GreedyRange encounters either failure it seeks the stream back to a position after last successful subcon parsing. This means the stream must be seekable/tellable (doesnt work inside Bitwise).

>>> d = GreedyRange(Byte)
>>> d.parse(b"dsadhsaui")
[100, 115, 97, 100, 104, 115, 97, 117, 105]

RepeatUntil is different than the others. Each element is tested by a lambda predicate. The predicate signals when a given element is the terminal element. The repeater inserts all previous items along with the terminal one, and returns just the same.

Note that all elements accumulated during parsing are provided as additional lambda parameter (second in order).

>>> d = RepeatUntil(lambda obj,lst,ctx: obj > 10, Byte)
>>> d.parse(b"\x01\x05\x08\xff\x01\x02\x03")
[1, 5, 8, 255]
>>> d = RepeatUntil(lambda x,lst,ctx: lst[-2:] == [0,0], Byte)
>>> d.parse(b"\x01\x00\x00\xff")
[1, 0, 0]

Processing on-the-fly

Data can be parsed and processed before further items get parsed. Hooks can be attached by using * operator.

Repeater classes like GreedyRange support indexing feature, which inserts incremental numbers into the context under _index key, in case you want to enumerate the objects. If you dont want to process further data, just raise CancelParsing from within the hook, and the parse method will exit clean.

def printobj(obj, ctx):
    if ctx._._index+1 >= 3:
        raise CancelParsing
st = Struct(
    "first" / Byte * printobj,
    "second" / Byte,
d = GreedyRange(st * printobj)

If you want to process gigabyte-sized data, then GreedyRange has an option to discard each element after it was parsed (and processed by the hook). Otherwise you would end up consuming gigabytes of RAM, because GreedyRange normally accumulates all parsed objects and returns them in a list.

d = GreedyRange(Struct(...) * printobj, discard=True)