bw_processing.io_parquet_helpers

This module contains some helpers to serialize/deserialize numpy.ndarray objects to/from Apache parquet files. We convert the nympy.ndarray objects to pyarrow.Table objects to do so.

Functions

load_ndarray_from_parquet(→ numpy.ndarray)

Deserialize a numpy ndarray from a parquet file.

read_parquet_file_to_ndarray(→ numpy.ndarray)

Read an ndarray from a parquet file.

save_arr_to_parquet(→ None)

Serialize a numpy ndarray to a parquet file.

write_ndarray_to_parquet_file(file, arr, meta_object, ...)

Serialize ndarray objects to file.

Module Contents

bw_processing.io_parquet_helpers.load_ndarray_from_parquet(file: io.RawIOBase) numpy.ndarray[source]

Deserialize a numpy ndarray from a parquet file.

Parameters

file (io.RawIOBase or fsspec file object): File to read from.

Returns

The corresponding numpy ndarray.

bw_processing.io_parquet_helpers.read_parquet_file_to_ndarray(file: io.RawIOBase) numpy.ndarray[source]

Read an ndarray from a parquet file.

Parameters:

file (io.RawIOBase or fsspec file object) – File to read from.

Raises:

WrongDatatype

Returns:

The corresponding numpy ndarray.

bw_processing.io_parquet_helpers.save_arr_to_parquet(file: io.RawIOBase, arr: numpy.ndarray, meta_object: str, meta_type: str) None[source]

Serialize a numpy ndarray to a parquet file.

Parameters

file (RawIOBase): The file to save to. arr (ndarray): The array object to save. meta_object (str): “vector” or “matrix”. meta_type (str): Type of object to serialize (see io_pyarrow_helpers.py).

bw_processing.io_parquet_helpers.write_ndarray_to_parquet_file(file: io.BufferedWriter, arr: numpy.ndarray, meta_object: str, meta_type: str)[source]

Serialize ndarray objects to file.

Parameters

file (io.BufferedWriter): File to save to. arr (ndarray): Array to serialize. meta_object (str): “vector” or “matrix”. meta_type (str): Type of object to serialize (see io_pyarrow_helpers.py).