Data and data formats
Geocollections
Geocollections are containers that organize and describe sets of spatial
data identifiers, and possibly other data, including spatial supports.
They are similar to the concept of Databases in Brightway2: an inventory
dataset could be identified by ("My new database", "Dataset 14"),
while spatial unit could have a similar id:
("My new raster", "Cell 42, 11"). However, only metadata is stored for
each geocollection, so there is no Geocollection object, only
geocollections.
Geocollections are used by both inventory data sets (e.g. for custom locations) and impact assessment methods.
Geocollections can refer to vector or raster data. For example, one geocollection could be the set of world countries, as described by the Natural Earth data, while another could be the raster cells used in a particular impact assessment method.
Geocollections have two purposes in Brightway2:
They provide a conceptual and physical grouping of spatial data into manageable units
They provide a way to uniquely identify spatial data
A geocollection is not a geodatabase - no spatial data is required. However, if the original spatial data is available (as a vector or raster file), it can be specified and used later in analysis.
A geocollection is specified by a unique string, which is usually the name of the geocollection.
Metadata fields
There are no required metadata fields for geocollections, but some common fields are useful when the original data is available. All the below listed metadata field values should be strings:
filepath: Filepath for the vector or raster file
layer: Layer name (vector data only)
field: Field name that uniquely identifies each feature (vector data only)
vfs: Virtual file system used to load zipped shapefiles
encoding: Text encoding
Note
It is strongly preferred that ESRI shapefile be zipped to a single
file, with appropriate metadata. Ideally, each Geocollection can have
its associated spatial data in a single file. For example, the Natural
Earth 50m political data is specified as:
vfs="zip:///Users/cmutel/Downloads/Geodata/ne_50m_admin_0_countries.zip", filepath="/ne_50m_admin_0_countries.shp.
See the Fiona
manual for
more.
Standard geocollections
The following are some standard geocollections, and are installed automatically.
global: Only the global location “GLO”
countries: All the countries in the world, as defined by ISO, and identified by the ISO 2-letter codes.
regions: UN regions and subregions
ecoinvent 2: Special regions defined by ecoinvent in version 2 of the database
ecoinvent 3: Special regions defined by ecoinvent in version 3 of the database
Regionalized impact assessment {#regionalized-ia}
Regionalized impact assessment methods have characterization factors that depend both on biosphere flows and spatial units. Characterization factors can be static or uncertain.
In Brightway2, the Method object can store site-generic, regionalized,
or both site-generic and regionalized characterization factors.
Similarly, the methods metadata store works the same for site-generic
and regionalized IA methods.
Specifying spatial objects {#specifying-spatial}
Following the Brightway2 principle of KISS, spatial objects have a simple format with reasonable defaults:
If not location data is provided, the global location “GLO” is assumed
A two-letter ISO county code can be used
Otherwise, spatial objects should be identified by the combination of geocollection and id, just like objects in Databases:
(geocollection, spatial unit id).
Metadata
In addition to the standard metadata for IA methods, such as unit and
description, regionalized IA methods should include the following for
complete functionality:
geocollections: List of one or more strings identifying the geocollections. Normally only one geocollection is associated with a regionalized IA method. See
geocollections{.interpreted-text role=”ref”}.band: Band number in original raster data set. Needed to import characterization factors.
cf_field: Field name of characterization factor value. Needed to import characterization factors.
Data format
Site-generic IA methods have a simple data format:
[
[biosphere flow, maybe uncertainty],
]
Where maybe uncertainty is either a floating point number (implying no
uncertainty), or a stats_array uncertainty
dictionary, like:
{'loc': 2, 'scale': 0.5, 'uncertainty_type': NormalUncertainty.id}
Note
In site-generic CFs, where a location is not given, the “GLO” location is assumed.
Regionalized IA methods are almost the same; they just have an additional field for a location specifier.
[
[biosphere flow, maybe uncertainty, spatial object],
]
Loadings
Background loadings are data on the spatial patterns of emission, and are used to determine the relatively likelihood that a given inventory dataset occurs in a impact assessment spatial unit. The idea is that the existing patterns of emissions are reasonable predictors of where present or future emissions will occur.
In Brightway2, background loadings are represented by the Loading
objects, and metadata about all loadings is stored in loadings.
Because loadings are a density of predicted activity, their unit is physical quantity (e.g. mass or energy) per unit area. Loadings are multiplied by intersected areas, and then normalized by total loading, so their units are canceled out in the end.
As loadings are emission-specific, this could mean that different loadings for different biosphere flows could predict different spatial patterns of inventory activity. There is no real research on the importance of this inconsistency.
Background loadings are, in general, supplied by the impact assessment method developers. If no loadings are supplied, the generic fallback is to allocate impact assessment spatial units to inventory spatial unit by intersected area; however, proxy loadings can be independently calculated. See this example ipython notebook for one such procedure.
Background loadings should always use the same geocollection as their IA methods.
Metadata
Their are no required fields for background loadings, as most metadata
comes from the IA method. Fields such as description can be used.
Data format
[
[maybe_uncertainty, location_id],
]
Warning
Industry sector-specific loadings and emission-specific loadings are not yet supported in bw2regional.
Intersections
Data for the geographic transform matrix G is stored in Intersection
objects. In Brightway2, areal intersection data is represented by the
Intersection object, and the metadata store is intersections. Each
combination of geocollections should be a separate Intersection
object.
Intersection data is calculated using
pandarus. Conversion from the
pandarus data format is done with the utility function
load-from-pandarus{.interpreted-text role=”ref”}.
Metadata
No metadata is required for intersections.
The pandarus-filepath field is the filepath of the Pandarus file, and
is set automatically when imported.
Data format
We assume that data is written automatically after conversion from pandarus, so users shouldn’t be writing or manipulating intersection data themselves. Nevertheless, here is the data format:
[
[spatial object 1, spatial object 2, intersection area],
]
Extension tables
Metadata
In addition to the standard metadata for IA methods, such as unit and
description, regionalized IA methods should include the following for
complete functionality:
geocollection: String identifying a geocollection. See
geocollections{.interpreted-text role=”ref”}.xt_field: Field name used for extension table values. Only needed for vector spatial data.
band: Raster band index for extension table values.
Data format
[
[float, spatial object 1]
]