bw2data.backends.base#
Attributes#
Classes#
A base class for SQLite backends. |
Functions#
|
|
|
|
|
Module Contents#
- class bw2data.backends.base.SQLiteBackend(*args, **kwargs)[source]#
Bases:
bw2data.data_store.ProcessedDataStoreA base class for SQLite backends.
Subclasses must support at least the following calls:
load()write(data)
In addition, they should specify their backend with the
backendattribute (a unicode string).renamecopyfind_dependentsrandomprocess
For new classes to be recognized by the
DatabaseChooser, they need to be registered with theconfigobject, e.g.:config.backends['backend type string'] = BackendClass
Instantiation does not load any data. If this database is not yet registered in the metadata store, a warning is written to
stdout.The data schema for databases in voluptuous is:
exchange = { Required("input"): valid_tuple, Required("type"): basestring, } exchange.update(uncertainty_dict) lci_dataset = { Optional("categories"): Any(list, tuple), Optional("location"): object, Optional("unit"): basestring, Optional("name"): basestring, Optional("type"): basestring, Optional("exchanges"): [exchange] } db_validator = Schema({valid_tuple: lci_dataset}, extra=True)
- where:
valid_tupleis a dataset identifier, like("ecoinvent", "super strong steel")uncertainty_fieldsare fields from an uncertainty dictionary.
Processing a Database actually produces two parameter arrays: one for the exchanges, which make up the technosphere and biosphere matrices, and a geomapping array which links activities to locations.
- Parameters:
*name* (unicode string) – Name of the database to manage.
- _add_inventory_geomapping_to_datapackage(dp: bw_processing.Datapackage) None[source]#
Add the inventory geomapping array to an existing datapackage.
Separated out to allow for easier use in subclasses.
- _efficient_write_dataset(ds: dict, exchanges: list, activities: list, check_typos: bool = True)[source]#
- _efficient_write_many_data(data: list, indices: bool = True, check_typos: bool = True) None[source]#
- copy(name)[source]#
Make a copy of the database.
Internal links within the database will be updated to match the new database name, i.e.
("old name", "some id")will be converted to("new name", "some id")for all exchanges.- Parameters:
name (*) – Name of the new database. Must not already exist.
- copy_activities(activities: List[bw2data.backends.proxies.Activity], target_database: str, signal: bool = True) List[bw2data.backends.proxies.Activity][source]#
Copy multiple Activity instances and their exchanges to a new database.
This method copies the given activities and all their exchanges to the target database. Edges (exchanges) always have an input and output. Edges from copied activities are resolved to the new copied activities if they point to activities in the input list; otherwise, they remain pointing to the original database.
If input activities have type “process” and have functional edges (input or output) to “product” nodes, those product nodes are also copied to the new database. Product nodes that will be copied are logged.
- Parameters:
activities – List of Activity instances to copy
target_database – Name of the target database (must already exist)
signal – Whether to emit signals during save operations
- Returns:
List of new Activity instances in the target database
- Raises:
ValueError – If target_database does not exist
- delete(keep_params: bool = False, warn: bool = True, vacuum: bool = True, signal: bool = True)[source]#
Delete all data from SQLite database and search index
- delete_duplicate_exchanges(fields=['amount', 'type'])[source]#
Delete exchanges which are exact duplicates. Useful if you accidentally ran your input data notebook twice.
To determine uniqueness, we look at the exchange input and output nodes, and at the exchanges values for fields
fields.
- edges_to_dataframe(categorical: bool = True, formatters: List[Callable] | None = None) pandas.DataFrame[source]#
Return a pandas DataFrame with all database exchanges. Standard DataFrame columns are:
target_id: int, target_database: str, target_code: str, target_name: Optional[str], target_reference_product: Optional[str], target_location: Optional[str], target_unit: Optional[str], target_type: Optional[str] source_id: int, source_database: str, source_code: str, source_name: Optional[str], source_product: Optional[str], # Note different label source_location: Optional[str], source_unit: Optional[str], source_categories: Optional[str] # Tuple concatenated with “::” as in bw2io edge_amount: float, edge_type: str,
Target is the node consuming the edge, source is the node or flow being consumed. The terms target and source were chosen because they also work well for biosphere edges.
Args:
categoricalwill turn each string column in a pandas Categorical Series. This takes 1-2 extra seconds, but saves around 50% of the memory consumption.formattersis a list of callables that modify each row. These functions must take the following keyword arguments, and use the Wurst internal data format:node: The target node, as a dictedge: The edge, including attributes of the source noderow: The current row dict being modified.
The functions in
formattersdon’t need to return anything, they modifyrowin place.Returns a pandas
DataFrame.
- exchange_data_iterator(qs_func, dependents, flip=False)[source]#
Iterate over exchanges and format for
bw_processingarrays.dependentsis a set of dependent database names.flipmeans flip the numeric sign; seebw_processingdocs.Uses raw sqlite3 to retrieve data for ~2x speed boost.
- find_dependents(data=None, ignore=None)[source]#
Get sorted list of direct dependent databases (databases linked from exchanges).
- Parameters:
data (*) – Inventory data
ignore (*) – List of database names to ignore
- Returns:
List of database names
- find_graph_dependents()[source]#
Recursively get list of all dependent databases.
- Returns:
A set of database names
- load(*args, **kwargs)[source]#
Load the intermediate data for this object.
- Returns:
The intermediate data.
- new_node(code: str = None, **kwargs)[source]#
Create a new activity node in this database.
Creates a new Activity object (node) in the current database. The node is not automatically saved to the database; you must call
save()on the returned object.- Parameters:
code – Optional unique identifier for the node. If not provided, a random UUID will be generated. The code must be unique within the database.
**kwargs –
Additional attributes to set on the node. Common attributes include: -
name: Human-readable name for the activity -type: Node type (e.g., “process”, “product”, “emission”). Must be avalid node type, not an edge type (a warning will be issued if an edge type is used).
unit: Unit of measurement (e.g., “kg”, “m3”)location: Geographic location code (e.g., “GLO”, “US”). If not provided, defaults toconfig.global_location.categories: List or tuple of category classificationsAny other valid activity attributes
- Returns:
A new Activity proxy object with the specified attributes. The object is not yet saved to the database.
- Return type:
- Raises:
ValueError – If
databaseis provided in kwargs and doesn’t match this database’s name, or ifidis provided (ids are auto-generated).DuplicateNode – If a node with the same database/code combination already exists.
UserWarning – If an edge type (e.g., “technosphere”, “biosphere”) is used for the
typeparameter instead of a node type.
Examples
Create a simple process node:
>>> db = DatabaseChooser("my_db") >>> db.register() >>> activity = db.new_node(code="process_1", name="Steel production", ... type="process", unit="kg", location="GLO") >>> activity.save()
Create a node with auto-generated code:
>>> activity = db.new_node(name="Custom process", type="process") >>> print(activity["code"]) # Random UUID >>> activity.save()
Create a product node:
>>> product = db.new_node(code="steel", name="Steel", type="product", ... unit="kg", location="GLO") >>> product.save()
- nodes_to_dataframe(columns: List[str] | None = None, return_sorted: bool = True) pandas.DataFrame[source]#
Return a pandas DataFrame with all database nodes. Uses the provided node attributes by default, such as name, unit, location.
By default, returns a DataFrame sorted by name, reference product, location, and unit. Set
return_sortedtoFalseto skip sorting.Specify
columnsto get custom columns. You will need to write your own function to get more customization, there are endless possibilities here.Returns a pandas
DataFrame.
- process(csv=False)[source]#
Create structured arrays for the technosphere and biosphere matrices.
Uses
bw_processingfor array creation and metadata serialization.Also creates a
geomappingarray, linking activities to locations. Used for regionalized calculations.Use a raw SQLite3 cursor instead of Peewee for a ~2 times speed advantage.
- random(filters=True, true_random=False)[source]#
True random requires loading and sorting data in SQLite, and can be resource-intensive.
- register(write_empty=True, **kwargs)[source]#
Register a database with the metadata store.
Databases must be registered before data can be written.
- Writing data automatically sets the following metadata:
depends: Names of the databases that this database references, e.g. “biosphere”
number: Number of processes in this database.
- Parameters:
format (*) – Format that the database was converted from, e.g. “Ecospold”
- relabel_data(data: dict, old_name: str, new_name: str) dict[source]#
Relabel database keys and exchanges.
In a database which internally refer to the same database, update to new database name
new_name.Needed to copy a database completely or cut out a section of a database.
For example:
data = { ("old and boring", 1): {"exchanges": [ {"input": ("old and boring", 42), "amount": 1.0}, ] }, ("old and boring", 2): {"exchanges": [ {"input": ("old and boring", 1), "amount": 4.0} ] } } print(relabel_database(data, "shiny new")) >> { ("shiny new", 1): {"exchanges": [ {"input": ("old and boring", 42), "amount": 1.0}, ] }, ("shiny new", 2): {"exchanges": [ {"input": ("shiny new", 1), "amount": 4.0} ] } }
In the example, the exchange to
("old and boring", 42)does not change, as this is not part of the updated data.- Parameters:
data (*) – The data to modify
new_name (*) – The name of the modified database
- Returns:
The modified data
- rename(name)[source]#
Rename a database. Modifies exchanges to link to new name. Deregisters old database.
- Parameters:
name (*) – New name.
- Returns:
New
Databaseobject.
- search(string, **kwargs)[source]#
Search this database for
string.The searcher include the following fields:
name
comment
categories
location
reference product
stringcan include wild cards, e.g."trans*".By default, the
namefield is given the most weight. The full weighting set is called theboostdictionary, and the default weights are:{ "name": 5, "comment": 1, "product": 3, "categories": 2, "location": 3 }
Optional keyword arguments:
limit: Number of results to return.boosts: Dictionary of field names and numeric boosts - see default boost values above. New values must be in the same format, but with different weights.filter: Dictionary of criteria that search results must meet, e.g.{'categories': 'air'}. Keys must be one of the above fields.mask: Dictionary of criteria that exclude search results. Same format asfilter.facet: Field to facet results. Must be one ofname,product,categories,location, ordatabase.proxy: ReturnActivityproxies instead of dictionary index Models. Default isTrue.
Returns a list of
Activitydatasets.
- set_geocollections()[source]#
Set
geocollectionsattribute for databases which don’t currently have it.
- write(data: dict | list, process: bool = True, searchable: bool = True, check_typos: bool = True, signal: bool | None = None)[source]#
Write
datato database.datamust be a dictionary of the form:{ ('database name', 'dataset code'): {dataset} }
Writing a database will first deletes all existing data.