Custom Serialization¶
When we communicate data between computers we first convert that data into a sequence of bytes that can be communicated across the network.
Dask can convert data to bytes using the standard solutions of Pickle and Cloudpickle. However, sometimes pickle and cloudpickle are suboptimal so Dask also supports custom serialization formats for special types. This helps Dask to be faster on common formats like NumPy and Pandas and gives power-users more control about how their objects get moved around on the network if they want to extend the system.
We include a small example and then follow with the full API documentation
describing the serialize
and deserialize
functions, which convert
objects into a msgpack header and a list of bytestrings and back.
Example¶
Here is how we special case handling raw Python bytes objects. In this case
there is no need to call pickle.dumps
on the object. The object is
already a sequnce of bytes.
def serialize_bytes(obj):
header = {} # no special metadata
frames = [obj]
return header, frames
def deserialize_bytes(header, frames):
return frames[0]
register_serialization(bytes, serialize_bytes, deserialize_bytes)
API¶
register_serialization (cls, serialize, ...) |
Register a new class for custom serialization |
serialize (x) |
Convert object to a header and list of bytestrings |
deserialize (header, frames) |
Convert serialized header and list of bytestrings back to a Python object |
-
distributed.protocol.serialize.
register_serialization
(cls, serialize, deserialize)¶ Register a new class for custom serialization
Parameters: Examples
>>> class Human(object): ... def __init__(self, name): ... self.name = name
>>> def serialize(human): ... header = {} ... frames = [human.name.encode()] ... return header, frames
>>> def deserialize(header, frames): ... return Human(frames[0].decode())
>>> register_serialization(Human, serialize, deserialize) >>> serialize(Human('Alice')) ({}, [b'Alice'])
See also
-
distributed.protocol.serialize.
serialize
(x)¶ Convert object to a header and list of bytestrings
This takes in an arbitrary Python object and returns a msgpack serializable header and a list of bytes or memoryview objects. By default this uses pickle/cloudpickle but can use special functions if they have been pre-registered.
Examples
>>> serialize(1) ({}, [b'\x80\x04\x95\x03\x00\x00\x00\x00\x00\x00\x00K\x01.'])
>>> serialize(b'123') # some special types get custom treatment ({'type': 'builtins.bytes'}, [b'123'])
>>> deserialize(*serialize(1)) 1
Returns: - header (dictionary containing any msgpack-serializable metadata)
- frames (list of bytes or memoryviews, commonly of length one)
See also
deserialize()
- Convert header and frames back to object
to_serialize()
- Mark that data in a message should be serialized
register_serialization()
- Register custom serialization functions