This sub-package provides the capability to compress and decompress data using the block specification.
Because the LZ4 block format doesn’t define a container format, the Python bindings will by default insert the original data size as an integer at the start of the compressed payload. However, it is possible to disable this functionality, and you may wish to do so for compatibility with other language bindings, such as the Java bindings.
To use the lz4 block format bindings is straightforward:
>>> import lz4.block
>>> import os
>>> input_data = 20 * 128 * os.urandom(1024) # Read 20 * 128kb
>>> compressed_data = lz4.block.compress(input_data)
>>> output_data = lz4.block.decompress(compressed_data)
>>> input_data == output_data
True
In this simple example, the size of the uncompressed data is stored in the compressed data, and this size is then utilized when uncompressing the data in order to correctly size the buffer. Instead, you may want to not store the size of the uncompressed data to ensure compatibility with the Java bindings. The example below demonstrates how to use the block format without storing the size of the uncompressed data.
>>> import lz4.block
>>> data = b'0' * 255
>>> compressed = lz4.block.compress(data, store_size=False)
>>> decompressed = lz4.block.decompress(compressed, uncompressed_size=255)
>>> decompressed == data
True
The uncompressed_size
argument specifies an upper bound on the size
of the uncompressed data size rather than an absolute value, such that
the following example also works.
>>> import lz4.block
>>> data = b'0' * 255
>>> compressed = lz4.block.compress(data, store_size=False)
>>> decompressed = lz4.block.decompress(compressed, uncompressed_size=2048)
>>> decompressed == data
True
A common situation is not knowing the size of the uncompressed data at decompression time. The following example illustrates a strategy that can be used in this case.
>>> import lz4.block
>>> data = b'0' * 2048
>>> compressed = lz4.block.compress(data, store_size=False)
>>> usize = 255
>>> max_size = 4096
>>> while True:
... try:
... decompressed = lz4.block.decompress(compressed, uncompressed_size=usize)
... break
... except lz4.block.LZ4BlockError:
... usize *= 2
... if usize > max_size:
... print('Error: data too large or corrupt')
... break
>>> decompressed == data
True
In this example we are catching the lz4.block.LZ4BlockError
exception. This exception is raisedd if the LZ4 library call fails,
which can be caused by either the buffer used to store the
uncompressed data (as set by usize
) being too small, or the input
compressed data being invalid - it is not possible to distinguish the
two cases, and this is why we set an absolute upper bound (max_size
)
on the memory that can be allocated for the uncompressed data. If we
did not take this precaution, the code, if ppassed invalid compressed
data would continuously try to allocate a larger and larger buffer for
decompression until the system ran out of memory.
lz4.block.
compress
(source, mode='default', acceleration=1, compression=0, return_bytearray=False)¶Compress source, returning the compressed data as a string. Raises an exception if any error occurs.
source (str, bytes or buffer-compatible object) – Data to compress
mode (str) – If 'default'
or unspecified use the default LZ4
compression mode. Set to 'fast'
to use the fast compression
LZ4 mode at the expense of compression. Set to
'high_compression'
to use the LZ4 high-compression mode at
the exepense of speed.
acceleration (int) – When mode is set to 'fast'
this argument
specifies the acceleration. The larger the acceleration, the
faster the but the lower the compression. The default
compression corresponds to a value of 1
.
compression (int) – When mode is set to high_compression
this
argument specifies the compression. Valid values are between
1
and 12
. Values between 4-9
are recommended, and
9
is the default.
store_size (bool) – If True
(the default) then the size of the
uncompressed data is stored at the start of the compressed
block.
return_bytearray (bool) – If False
(the default) then the function
will return a bytes object. If True
, then the function will
return a bytearray object.
dict (str, bytes or buffer-compatible object) – If specified, perform compression using this initial dictionary.
Compressed data.
bytes or bytearray
lz4.block.
decompress
(source, uncompressed_size=-1, return_bytearray=False)¶Decompress source, returning the uncompressed data as a string. Raises an exception if any error occurs.
source (str, bytes or buffer-compatible object) – Data to decompress.
uncompressed_size (int) – If not specified or negative, the uncompressed
data size is read from the start of the source block. If specified,
it is assumed that the full source data is compressed data. If this
argument is specified, it is considered to be a maximum possible size
for the buffer used to hold the uncompressed data, and so less data
may be returned. If uncompressed_size
is too small, LZ4BlockError
will be raised. By catching LZ4BlockError
it is possible to increase
uncompressed_size
and try again.
return_bytearray (bool) – If False
(the default) then the function
will return a bytes object. If True
, then the function will
return a bytearray object.
dict (str, bytes or buffer-compatible object) – If specified, perform decompression using this initial dictionary.
Decompressed data.
bytes or bytearray
LZ4BlockError – raised if the call to the LZ4 library fails. This can be
caused by uncompressed_size
being too small, or invalid data.