bw_processing.merging

Functions

add_resource_suffix(→ dict)

Update the name, path, and group values to include suffix. The suffix comes after the basename but after the data type suffix (e.g. indices, data).

mask_resource(→ Any)

merge_datapackages_with_mask(...)

Merge two resources using a Numpy boolean mask. Returns elements from first_dp where the mask is True, otherwise second_dp.

update_nrows(→ dict)

write_data_to_fs(→ None)

Module Contents

bw_processing.merging.add_resource_suffix(metadata: dict, suffix: str) dict[source]

Update the name, path, and group values to include suffix. The suffix comes after the basename but after the data type suffix (e.g. indices, data).

Given the suffix _foo and the metadata:

{
    "name": "sa-data-vector-from-dict.indices",
    "path": "sa-data-vector-from-dict.indices.npy",
    "group": "sa-data-vector-from-dict",
}

It will return:

{
    "name": "sa-data-vector-from-dict_foo.indices",
    "path": "sa-data-vector-from-dict_foo.indices.npy",
    "group": "sa-data-vector-from-dict_foo",
}
bw_processing.merging.mask_resource(obj: Any, mask: numpy.ndarray) Any[source]
bw_processing.merging.merge_datapackages_with_mask(first_dp: bw_processing.datapackage.DatapackageBase, first_resource_group_label: str, second_dp: bw_processing.datapackage.DatapackageBase, second_resource_group_label: str, mask_array: numpy.ndarray, output_fs: fsspec.AbstractFileSystem | None = None, metadata: dict | None = None) bw_processing.datapackage.DatapackageBase[source]

Merge two resources using a Numpy boolean mask. Returns elements from first_dp where the mask is True, otherwise second_dp.

Both resource arrays, and the filter mask, must have the same length.

Both datapackages must be static, i.e. not interfaces. This is because we don’t yet have the functionality to select only some of the values in a resource group in matrix_utils.

This function currently will not mask or filter JSON or CSV metadata.

Parameters:
  • first_dp (*) – The datapackage from whom values will be taken when mask_array is True.

  • first_resource_group_label (*) – Label of the resource group in first_dp to select values from.

  • second_dp (*) – The datapackage from whom values will be taken when mask_array is False.

  • second_resource_group_label (*) – Label of the resource group in second_dp to select values from.

  • mask_array (*) – Boolean numpy array

  • output_fs (*) – Filesystem to write new datapackage to, if any.

  • metadata (*) – Metadata for new datapackage, if any.

Returns:

A Datapackage instance. Will write the resulting datapackage to output_fs if provided.

bw_processing.merging.update_nrows(resource: dict, data: Any) dict[source]
bw_processing.merging.write_data_to_fs(resource: dict, data: Any, fs: fsspec.AbstractFileSystem) None[source]