bw_processing.unique_fields =========================== .. py:module:: bw_processing.unique_fields Functions --------- .. autoapisummary:: bw_processing.unique_fields.as_unique_attributes bw_processing.unique_fields.as_unique_attributes_dataframe bw_processing.unique_fields.greedy_set_cover Module Contents --------------- .. py:function:: as_unique_attributes(data, exclude=None, include=None, raise_error=False) Format ``data`` as unique set of attributes and values for use in ``create_processed_datapackage``. Each element in ``data`` must have the attribute ``id``, and it must be unique. However, the field "id" is not used in selecting the unique set of attributes. If no set of attributes is found that uniquely identifies all features is found, all fields are used. To have this case raise an error, pass ``raise_error=True``.:: data = [ {}, ] :param data: List of dictionaries with the same fields. :type data: iterable :param exclude: Fields to exclude during search for uniqueness. ``id`` is Always excluded. :type exclude: iterable :param include: Fields to include when returning, even if not unique :type include: iterable :returns: (list of field names as strings, dictionary of data ids to values for given field names) :raises InconsistentFields: Not all features provides all fields. .. py:function:: as_unique_attributes_dataframe(df, exclude=None, include=None, raise_error=False) .. py:function:: greedy_set_cover(data, exclude=None, raise_error=True) Find unique set of attributes that uniquely identifies each element in ``data``. Feature selection is a well known problem, and is analogous to the `set cover problem `__, for which there is a `well known heuristic `__. Example:: data = [ {'a': 1, 'b': 2, 'c': 3}, {'a': 2, 'b': 2, 'c': 3}, {'a': 1, 'b': 2, 'c': 4}, ] greedy_set_cover(data) >>> {'a', 'c'} :param data: List of dictionaries with the same fields. :type data: iterable :param exclude: Fields to exclude during search for uniqueness. ``id`` is Always excluded. :type exclude: iterable :returns: Set of attributes (strings) :raises NonUnique: The given fields are not enough to ensure uniqueness. Note that ``NonUnique`` is not raised if ``raise_error`` is false.