bw_processing.unique_fields
Functions
|
Format |
|
|
|
Find unique set of attributes that uniquely identifies each element in |
Module Contents
- bw_processing.unique_fields.as_unique_attributes(data, exclude=None, include=None, raise_error=False)[source]
Format
dataas unique set of attributes and values for use increate_processed_datapackage.Each element in
datamust have the attributeid, and it must be unique. However, the field “id” is not used in selecting the unique set of attributes.If no set of attributes is found that uniquely identifies all features is found, all fields are used. To have this case raise an error, pass
raise_error=True.:data = [ {}, ]
- Parameters:
data (iterable) – List of dictionaries with the same fields.
exclude (iterable) – Fields to exclude during search for uniqueness.
idis Always excluded.include (iterable) – Fields to include when returning, even if not unique
- Returns:
(list of field names as strings, dictionary of data ids to values for given field names)
- Raises:
InconsistentFields – Not all features provides all fields.
- bw_processing.unique_fields.as_unique_attributes_dataframe(df, exclude=None, include=None, raise_error=False)[source]
- bw_processing.unique_fields.greedy_set_cover(data, exclude=None, raise_error=True)[source]
Find unique set of attributes that uniquely identifies each element in
data.Feature selection is a well known problem, and is analogous to the set cover problem, for which there is a well known heuristic.
Example:
data = [ {'a': 1, 'b': 2, 'c': 3}, {'a': 2, 'b': 2, 'c': 3}, {'a': 1, 'b': 2, 'c': 4}, ] greedy_set_cover(data) >>> {'a', 'c'}
- Parameters:
data (iterable) – List of dictionaries with the same fields.
exclude (iterable) – Fields to exclude during search for uniqueness.
idis Always excluded.
- Returns:
Set of attributes (strings)
- Raises:
NonUnique – The given fields are not enough to ensure uniqueness.
Note that
NonUniqueis not raised ifraise_erroris false.