Highway community matching shenanigans
The aim of this text is to enrich and proper the earlier one on the identical topic. In that article, I introduced an strategy to reconstructing lacking map-matched information from the Prolonged Automobile Vitality Dataset¹ (EVED) [1]. My method explored the mathematical properties of triangles to seek out the misplaced information on a highway community database shortly. Sadly, a refined error within the code made it fail beneath some circumstances.
Right here, I current a correction to the code and talk about the outcomes of making use of these methods to reconstruct greater than thirty-two thousand trajectories on the EVED.
Within the earlier article, I illustrated two strategies to suit map-matched areas to highway community segments utilizing triangular properties. The geometrical ideas that assist the concept are satisfactory, however the code must be corrected. The problem happens on the node filtering half and impacts each matching strategies, the ratio-based and the distance-based. Let us take a look at the unique code once more utilizing the ratio-based perform:
def get_matching_edge(self, latitude, longitude, bearing=None):loc = np.array([latitude, longitude])_, r = self.geo_spoke.query_knn(loc, 1)radius = self.max_edge_length + r[0]node_idx, dists = self.geo_spoke.query_radius(loc, radius)nodes = self.ids[node_idx]distances = dict(zip(nodes, dists))adjacent_set = set()graph = self.graph
best_edge = Nonefor node in nodes:if node not in adjacent_set:adjacent_nodes = np.intersect1d(np.array(graph.adj[node]),nodes, assume_unique=True)
adjacent_set.replace(adjacent_nodes)for adjoining in adjacent_nodes:edge_length = graph[node][adjacent][0][‘length’]ratio = (distances[node] + distances[adjacent]) / edge_lengthif best_edge is None or ratio < best_edge[2]:best_edge = (node, adjoining, ratio)
if bearing is just not None:best_edge = fix_edge_bearing(best_edge, bearing, graph)return best_edge
The perform makes use of a set to retailer all of the processed adjoining nodes to keep away from repetitions. And it does keep away from repetitions, sadly. That is the place the error lies as a result of by excluding a node, the perform can also be stopping all different adjoining edges from being examined. This example is particularly prevalent for nodes with a number of hyperlinks. Correcting the code requires a conceptual change, eradicating processed highway community edges as a substitute of processed nodes.
However one other challenge surfaced after utilizing the code with just a few completely different trajectories, a black swan. What occurs when the enter GPS location is near one of many nodes? In some uncommon conditions, the sampled GPS areas match one of many highway community nodes very carefully, which induces very refined floating level instabilities. The result’s the collection of a nonsensical edge, ruining the trail reconstruction. To resolve this challenge, I added a default parameter with the minimal distance from the enter GPS location to the closest node. When this distance is under one meter (three toes), the perform refuses to pick an edge and returns an empty worth. This answer is average as it will occur comparatively occasionally, and the following reconstruction mechanism compensates for that.
You’ll be able to see the corrected code under, that includes the set of examined highway community edges and the minimal distance examine.
def get_matching_edge(self, latitude, longitude,bearing=None, min_r=1.0):best_edge = Noneloc = np.array([latitude, longitude])_, r = self.geo_spoke.query_knn(loc, 1)if r > min_r:radius = self.max_edge_length + r[0]node_idx, dists = self.geo_spoke.query_radius(loc, radius)nodes = self.ids[node_idx]distances = dict(zip(nodes, dists))tested_edges = set()graph = self.graphnode_set = set(nodes)
for node in nodes:adjacent_nodes = node_set & set(graph.adj[node])
for adjoining in adjacent_nodes:if (node, adjoining) not in tested_edges:edge_length = graph[node][adjacent][0][‘length’]ratio = edge_length / (distances[node] + distances[adjacent])
if best_edge is None or ratio > best_edge[2]:best_edge = (node, adjoining, ratio)tested_edges.add((node, adjoining))tested_edges.add((adjoining, node))
if bearing is just not None:best_edge = fix_edge_bearing(best_edge, bearing, graph)return best_edge
Observe that I additionally modified the speed formulation in order that it ranges between zero and one, with bigger values equivalent to a greater match. Apparently, the rate-based perform works higher and sooner than the distance-based one.
The primary a part of the trail reconstruction implies making use of the perform above to each level of the enter trajectory and holding solely the distinctive highway community edges. The code under exhibits a Python perform that does that.
def match_edges(road_network, trajectory):edges = []unique_locations = set()edge_set = set()for p in trajectory:if p not in unique_locations:e = road_network.get_matching_edge(*p, min_r=1.0)if e is just not None:n0, n1, _ = eedge = (n0, n1)if edge not in edge_set:edge_set.add(edge)edges.append(edge)unique_locations.add(p)return edges
The perform assigns a highway community edge to every distinctive trajectory location. Determine 1 under exhibits a pattern of such matching.
As you may see, there are some gaps between the matched hyperlinks as a result of GPS sampling frequency. Reconstructing the trail requires us to fill in these gaps, and we achieve this by discovering the shortest method between the highway community edges. The code under does that by holding collectively the related hyperlinks and filling within the gaps for all of the others utilizing OSMnx’s companies [2].
def build_path(rn, edges):path = []for e0, e1 in pairwise(edges):if not len(path):path.append(e0[0])if e0[0] != e1[0] and e0[1] != e1[1]:if e0[1] == e1[0]:path.prolong([e0[1], e1[1]])else:n0, n1 = int(e0[1]), int(e1[0])sp = ox.distance.shortest_path(rn, n0, n1)if sp is just not None:path.prolong(sp[1:])return path
After calling this perform on the earlier matches, we must always have the trajectory totally reconstructed as a sequence of highway community edges. We see a steady outcome by superimposing the trajectory areas to the patched highway community edges, as depicted in Determine 2 under.
The Case for a Minimal Distance
Whereas reviewing this answer, I discovered a problem when iterating by means of the person trajectories. Determine 3 under particulars the issue.
A spurious cycle within the route arises from an surprising worth within the enter information for the patching course of. We will dig deeper into what is occurring by reverting to the view displaying which highway community edges the algorithm matches to those GPS areas.
Determine 4 under illustrates the state of affairs on the fork, the place the matched GPS location coincides (or is extraordinarily shut) to one of many highway community nodes. This may probably trigger some numeric instability yielding a flawed outcome. A strong algorithm ought to keep away from these conditions, however how?
After contemplating it, I made a decision to alter each matching algorithms and use the gap to the closest highway community node as a criterion to incorporate or exclude the enter location. If this distance is smaller than a given threshold (one meter or three toes as default), the match perform returns an empty worth signaling the calling code that it can’t safely assign a hyperlink. That is high-quality for the next step because it ought to infer the lacking hyperlinks as a result of that algorithm relies upon solely on highway community edges and never the enter areas.
Now that now we have sanitized the entire algorithm, allow us to see the way it performs on all the dataset.
To evaluate how the algorithm behaves on the entire EVED, I created a brand new standalone script that iterates by means of all trajectories, matches the sides, generates a path, and compares the unique trajectory’s size to the reconstructed path. The Python script information all of the variations so we are able to later analyze them utilizing a regular statistical toolset.
The script core lies in its essential loop, detailed within the code under.
def process_trajectories():rn = download_network()road_network = RoadNetwork(rn)
state = load_state()if state is None:state = {“trajectories”: get_trajectories(),”errors”: []}
save_counter = 0trajectories = state[“trajectories”]whereas len(trajectories) > 0:trajectory_id = trajectories[0]trajectory = load_trajectory_points(trajectory_id,distinctive=True)if len(trajectory) > 3:edges = match_edges(road_network, trajectory)path = build_path(rn, edges)
if len(path) > 0:diff = calculate_difference(rn, path, trajectory)print(f”Trajectory: {trajectory_id}, Distinction: {diff}”)state[“errors”].append((trajectory_id, diff))
trajectories = trajectories[1:]state[“trajectories”] = trajectories
save_counter += 1if save_counter % 100 == 0:save_state(state)
save_state(state)
The perform begins by downloading and preprocessing the highway community information from OpenStreetMap. As a result of it is a long-running script, I added a persistent state serialized as a JSON-formatted textual content file and saved each 100 trajectories. This characteristic permits the consumer to interrupt the Python script with out shedding all the information from already-processed trajectories. The perform that calculates the size distinction between the unique enter trajectory and the reconstructed path is displayed under.
def calculate_difference(rn, path, trajectory):p_loc = np.array([(rn.nodes[n][‘y’], rn.nodes[n][‘x’]) for n in path])t_loc = np.array([(t[0], t[1]) for t in trajectory])
p_length = vec_haversine(p_loc[1:, 0], p_loc[1:, 1], p_loc[:-1, 0], p_loc[:-1, 1]).sum()t_length = vec_haversine(t_loc[1:, 0], t_loc[1:, 1], t_loc[:-1, 0], t_loc[:-1, 1]).sum()return p_length – t_length
The code converts each inputs into sequences of geolocations, computes their cumulative lengths, and at last computes their distinction. When the script finishes, we are able to load the state information and analyze it utilizing Pandas.
We see how the distinction between each distances distributes in Determine 5 under (named “error”).
To see extra element round zero, I clipped the histogram and created the one in Determine 6 under.
As you may see, there are some variations within the detrimental a part of the spectrum. Nonetheless, it’s primarily optimistic, that means that the reconstructed path lengths are normally longer than trajectories, which is expectable. Do not forget that trajectory factors seldom match highway community nodes, that means that the most typical state of affairs must be a match halfway by means of a phase.
We will additionally examine the place the mid-fifty p.c of variations is by drawing a box-and-whiskers plot, as depicted under in Determine 7.
The picture above accommodates important data on the distribution of variations and the variety of outliers. Utilizing Tukey’s fences technique, we are able to classify as outliers 5,332² trajectories out of 32,528. This proportion represents a comparatively excessive proportion worth of 16.4%, that means there are most likely higher reconstruction strategies than this or that there’s a sizable quantity of poorly processed trajectories. We will look into a few of the most dramatic distance distinction outliers to try to perceive these outcomes.
Outlier Evaluation
As beforehand famous, there are over five-thousand trajectories with poor matching. We will shortly examine just a few to find out why the distinction between the sampled trajectory and the reconstruction is so important. We begin with a revealing case, trajectory quantity 59, depicted under in Determine 8.
The case above exhibits that the map-matching course of incorrectly positioned three GPS samples on a parallel highway, forcing the reconstruction course of to discover a spurious detour. Such a error may outcome from the insufficient parameterization of the map-matching course of, one thing to analysis in a later article.
Lastly, we are able to see a really radical case in Determine 9 under. The sampled GPS areas are so distant from the highway that the map-matching course of gave up. As you may see, the reconstruction course of assigns highway community hyperlinks that bear no relation to the meant trajectory.
On this article, I corrected beforehand revealed code on highway community edge matching and assessed its efficiency utilizing the EVED dataset. I collected the efficiency evaluation information utilizing a specialised Python script that exposed a comparatively excessive variety of matched trajectories with seemingly irregular deviations within the measured distances. To grasp why this outcome may very well be higher, I’ll flip subsequent to the unique map-matching software program, Valhalla, to try to repair the problems above.
The unique paper authors licensed the dataset beneath the Apache 2.0 License (see the VED and EVED GitHub repositories). Observe that this additionally applies to by-product work.The supply of all pictures and calculations is in a Jupyter pocket book, out there from the general public GitHub repository.
[1] Zhang, S., Fatih, D., Abdulqadir, F., Schwarz, T., & Ma, X. (2022). Prolonged car vitality dataset (eVED): An enhanced large-scale dataset for deep studying on car journey vitality consumption. ArXiv. https://doi.org/10.48550/arXiv.2203.08630
[2] Boeing, G. 2017. OSMnx: New Strategies for Buying, Developing, Analyzing, and Visualizing Advanced Road Networks. Computer systems, Atmosphere and City Programs 65, 126–139. doi:10.1016/j.compenvurbsys.2017.05.004