bw_processing.indexing

Functions

_get_csv_data(datapackage, fsspec.AbstractFileSystem], ...)

Utility function to get CSV data from datapackage.

reindex(→ None)

Use the metadata to set the integer indices in datapackage to those used in data_iterable.

reset_index(→ bw_processing.datapackage.Datapackage)

Reset the numerical indices in datapackage to sequential integers starting from zero.

Module Contents

bw_processing.indexing._get_csv_data(datapackage: bw_processing.datapackage.Datapackage | fsspec.AbstractFileSystem, metadata_name: str)[source]

Utility function to get CSV data from datapackage.

Parameters:
  • datapackage (*) – datapackage or Filesystem. Input to load_datapackage function.

  • metadata_name (*) – Name identifying a CSV metadata resource in datapackage

Raises:
  • * KeyErrormetadata_name is not in datapackage

  • * ValueErrormetadata_name is not CSV metadata.

  • * KeyError – Resource referenced by CSV valid_for not in datapackage

Returns:

  • datapackage object

  • pandas DataFrame with CSV data

  • metadata (dict) stored with dataframe

  • list of indices arrays reference by CSV data

  • indices of arrays

bw_processing.indexing.reindex(datapackage: bw_processing.datapackage.Datapackage | fsspec.AbstractFileSystem, metadata_name: str, data_iterable: collections.abc.Iterable, fields: List[str] = None, id_field_datapackage: str = 'id', id_field_destination: str = 'id') None[source]

Use the metadata to set the integer indices in datapackage to those used in data_iterable.

Used in data exchange. Often, the integer ids provided in the data package are arbitrary, and need to be mapped to the values present in your database.

Updates the datapackage in place.

Parameters:
  • datapackage (*) – datapackage of Filesystem. Input to load_datapackage function.

  • metadata_name (*) – Name identifying a CSV metadata resource in datapackage

  • data_iterable (*) – Iterable which returns objects that support .get().

  • fields (*) – Optional list of fields to use while matching

  • id_field_datapackage (*) – String identifying the column providing an integer id in the datapackage

  • id_field_destination (*) – String identifying the column providing an integer id in data_iterable

Raises:
  • * KeyErrordata_iterable is missing id_field_destination field

  • * KeyErrormetadata_name is missing id_field_datapackage field

  • * NonUnique – Multiple objects found in data_iterable which matches fields in datapackage

  • * KeyErrormetadata_name is not in datapackage

  • * KeyError – No object found in data_iterable which matches fields in datapackage

  • * ValueErrormetadata_name is not CSV metadata.

  • * ValueError – The resources given for metadata_name are not present in this datapackage

  • * AttributeErrordata_iterable doesn’t support field retrieval using .get().

Returns:

Datapackage instance with modified data

bw_processing.indexing.reset_index(datapackage: bw_processing.datapackage.Datapackage | fsspec.AbstractFileSystem, metadata_name: str) bw_processing.datapackage.Datapackage[source]

Reset the numerical indices in datapackage to sequential integers starting from zero.

Updates the datapackage in place.

Parameters:
  • datapackage (*) – datapackage or Filesystem. Input to load_datapackage function.

  • metadata_name (*) – Name identifying a CSV metadata resource in datapackage

Returns:

Datapackage instance with modified data