Infrastructure list
The infrastructure list contains sites and equipment. The sites are defined by a polygon and the equipment is defined by a point. The equipment is linked to a site by a site ID. The process in bridger_kmz/infrastructure/create_infrastructure_list.py
extracts sites and equipment from the input survey data, removes duplicates and combines polygons where required. The relevant code is found in the bridger_kmz/infrastructure
folder.
Each Bridger survey reports a list of infrastructure. However, this infrastructure list is not consistent between surveys. For sites the polygons can vary signficantly, sometimes overlapping pefectly but in other cases only covering different parts of the same site. Here we describe the process used to create a consistent infrastructure list.
Sites and site polygons
First we remove exact duplicates and polygons that almost entirely overlap (more than 90% overlap). In most cases these are nearly identical polygons that have been drawn slightly differently.
1.) If two sites have the same name and the same polygon we remove one of them. 2.) If either of the the two polygons overlap with the other by more than 90% then we take the entry with the largest polygon and discard the other.
Then we must deal with more complex cases.
3.) If two site polygons overlap by less than 90% but more than 30% then we combine the polygons into one. This is done by taking the union of the two polygons.
4.) If two site polygons overlap by less than 30% and have the same site names then we combine the polygons into one. This is done by taking the union of the two polygons.
5.) If two site polygons overlap by less than 30% and have different names then we treat them as separate sites. The overlapping part is removed from one of the polygons.
Finally sites are modified if there is equipment that is not within the site polygon. This is done by expanding the polygon to include the equipment. Equipment is discussed in more detail below.
6.) Once all equipment has been cleaned if there are items that are not within a site polygon, we extend the closest site polygon to include the equipment. There is a check in place to ensure that the equipment is not too far from the site polygon.
That concludes all changes that are made to site polygons.
Equipment
Equipment are defined by a position, an equipment ID and an equipment type. We see the same equipment across multiple surveys and we also see changes and inconsistencies between surveys. For example, the same piece of equipment may have a different ID in different surveys, or it may move slightly but keep the same ID. We need to create a consistent list of equipment, prioritising the most recent information we have.
1.) If two items have exactly the same position then we keep the most recent data. This is done by merging with the swath coverage report to get the time of the survey associated with this infrastructure report. The list of all Bridger equipment IDs at this location are then sorted in a new column so that we have a record of all the equipment IDs at this location.
2.) If two items have the same ID we take the most recent one.
We do extensive checking for duplication and equipment within close proximity to ensure there are no duplications remaining.
Combining sites and equipment
Once the equipment and sites have been cleaned we assign each item of equipment to a site polygon. Since the site polygons have already been cleaned and overlaps removed there is at most one site polygon at any given location. After this matching process there are some equipment items that are not within a site polygon. So we extend the site polygons to include this equipment, and then run the matching process again, checking that now all equipment is associated with a site.