bw_processing.io_pyarrow_helpers

This module contains some helpers to convert nympy.ndarrays to/from Apache Arrow Table.

We use pyarrow.Table objects to save/retrieve data into/from parquet format files. We use a metadata section in the pyarrow.Table (and the parquet files) to be able to recognize what type of data was serialized. Specific and generic codes exist.

The metadata object is a dict object that looks like this:

{“object”: “vector”, “type”: “generic”}. object can be vector (ndim == 1) or matrix (ndim == 2), type can be:

  • indices (dtype is INDICES_DTYPE);

  • distributions (dtype is UNCERTAINTY_DTYPE);

  • generic (dtype is a common type);

Attributes

INDICES_SCHEMA

NBR_UNCERTAINTY_FIELDS

PA_UNCERTAINTY_FIELDS

UNCERTAINTY_FIELDS_NAMES

UNCERTAINTY_SCHEMA

Functions

numpy_distributions_vector_to_pyarrow_distributions_vector_table(...)

Convert a specific distributions (numpy) vector to a (arrow) table.

numpy_generic_matrix_to_pyarrow_generic_matrix_table(...)

Convert a generic (numpy) matrix to a (arrow) table.

numpy_generic_vector_to_pyarrow_generic_vector_table(...)

Convert a generic (numpy) vector to a (arrow) table.

numpy_indices_vector_to_pyarrow_indices_vector_table(...)

Convert a specific indices (numpy) vector to a (arrow) table.

pyarrow_distributions_vector_table_to_numpy_distributions_vector(...)

Convert a specific distributions (arrow) vector table to a (numpy) array.

pyarrow_generic_matrix_table_to_numpy_generic_matrix(...)

Convert a generic (arrow) matrix table to a (numpy) array.

pyarrow_generic_vector_table_to_numpy_generic_vector(...)

Convert a generic (arrow) vector table to a (numpy) array.

pyarrow_indices_vector_table_to_numpy_indices_vector(...)

Convert a specific indices (arrow) vector table to a (numpy) array.

Module Contents

bw_processing.io_pyarrow_helpers.numpy_distributions_vector_to_pyarrow_distributions_vector_table(arr: numpy.ndarray) pyarrow.Table[source]

Convert a specific distributions (numpy) vector to a (arrow) table.

Parameters:

arr (np.ndarray) – A numpy array that corresponds to a distributions vector, i.e. its dimension is 1 and its dtype is UNCERTAINTY_DTYPE.

See:

pyarrow_distributions_vector_table_to_numpy_distributions_vector

Returns:

The corresponding pyarrow.Table object.

bw_processing.io_pyarrow_helpers.numpy_generic_matrix_to_pyarrow_generic_matrix_table(arr: numpy.ndarray) pyarrow.Table[source]

Convert a generic (numpy) matrix to a (arrow) table.

Parameters:

arr (ndarray) – A numpy array that corresponds to a generic matrix, i.e. its dimension is 2.

See:

pyarrow_generic_matrix_table_to_numpy_generic_matrix.

Returns:

The corresponding pyarrow.Table object.

bw_processing.io_pyarrow_helpers.numpy_generic_vector_to_pyarrow_generic_vector_table(arr: numpy.ndarray) pyarrow.Table[source]

Convert a generic (numpy) vector to a (arrow) table.

Parameters:

arr (ndarray) – A numpy array that corresponds to a vector, i.e. its dimension is 1.

See:

pyarrow_generic_vector_table_to_numpy_generic_vector.

Returns:

The corresponding pyarrow.Table object.

bw_processing.io_pyarrow_helpers.numpy_indices_vector_to_pyarrow_indices_vector_table(arr: numpy.ndarray) pyarrow.Table[source]

Convert a specific indices (numpy) vector to a (arrow) table.

Parameters:

arr (ndarray) – A numpy array that corresponds to an indices vector, i.e. its dimension is 1 and its dtype is INDICES_DTYPE.

See:

pyarrow_indices_vector_table_to_numpy_indices_vector.

Returns:

The corresponding pyarrow.Table object.

bw_processing.io_pyarrow_helpers.pyarrow_distributions_vector_table_to_numpy_distributions_vector(table: pyarrow.Table) numpy.ndarray[source]

Convert a specific distributions (arrow) vector table to a (numpy) array.

Parameters:

table (pa.Table) – A pyarrow table that corresponds to a distributions vector.

See:

numpy_distributions_vector_to_pyarrow_distributions_vector_table.

Returns:

The corresponding np.ndarray object.

bw_processing.io_pyarrow_helpers.pyarrow_generic_matrix_table_to_numpy_generic_matrix(table: pyarrow.Table) numpy.ndarray[source]

Convert a generic (arrow) matrix table to a (numpy) array.

Parameters:

table (pa.Table) – A pyarrow table that corresponds to a generic matrix.

See:

numpy_generic_matrix_to_pyarrow_generic_matrix_table.

Returns:

The corresponding np.ndarray object.

bw_processing.io_pyarrow_helpers.pyarrow_generic_vector_table_to_numpy_generic_vector(table: pyarrow.Table) numpy.ndarray[source]

Convert a generic (arrow) vector table to a (numpy) array.

Parameters:

table (pa.Table) – A pyarrow table that corresponds to a vector.

See:

numpy_generic_vector_to_pyarrow_generic_vector_table.

Returns:

The corresponding np.ndarray object.

bw_processing.io_pyarrow_helpers.pyarrow_indices_vector_table_to_numpy_indices_vector(table: pyarrow.Table) numpy.ndarray[source]

Convert a specific indices (arrow) vector table to a (numpy) array.

Parameters:

table (pa.Table) – A pyarrow table that corresponds to an indices vector.

See:

numpy_indices_vector_to_pyarrow_indices_vector_table.

Returns:

The corresponding np.ndarray object.

bw_processing.io_pyarrow_helpers.INDICES_SCHEMA[source]
bw_processing.io_pyarrow_helpers.NBR_UNCERTAINTY_FIELDS[source]
bw_processing.io_pyarrow_helpers.PA_UNCERTAINTY_FIELDS[source]
bw_processing.io_pyarrow_helpers.UNCERTAINTY_FIELDS_NAMES[source]
bw_processing.io_pyarrow_helpers.UNCERTAINTY_SCHEMA[source]