Saturday, October 22, 2011

PostGIS Nearest Neighbor Speed Tests


I've used several different techniques for finding the nearest neighbor in PostGIS but never really knew which was the fastest. Below are some speed tests using 3 different approaches, all limited the spatial analysis to within 200 miles. Test 1 was the slowest, but returns the entire dataset with Nulls for rows that did not meet the maximum distance requirement. Test 2 was consistently the fastest and used a PostgreSQL Windows Function to rank by distance,  it can also be used to return X many nearest neighbors, but requires a little more logic than the others. Test 3 uses a combination of Distinct and Order By to find the nearest neighbor, the advantage here is that if you would like to return the distance, you only have to calculate it once as it is required in your primary Select statement. Test results are in bold green and are in milliseconds. Obviously you'll have different reasons for using each, and I still need to test Boston GISs pgfnnn function, but hopefully this helps give you some guidance

The tests were run against 43,000 point features in the source dataset and 20 polyline features in the neighboring dataset.

--Test 1 1882, 1729, 1820, 1778, 1597, 1595
SELECT
a.gid,
(SELECT b.gid
FROM epa_iag.wecc_proposed b
WHERE ST_DWithin(a.the_geom, b.the_geom, 200*1609.344) AND a.fallow = 3
ORDER BY ST_Distance(a.the_geom, b.the_geom) LIMIT 1) AS b_gid
FROM
epa_iag.fallow_lands_cnty a
WHERE
a.fallow = 3

--Test 2 1667, 857, 1612, 694, 755, 702
SELECT
i.gid,
i.b_gid
--ST_Distance(i.the_geom, i.b_geom) / 1609.344 AS dist
FROM(
SELECT
a.gid,
b.gid AS b_gid,
a.the_geom, b.the_geom AS b_geom,
rank() OVER (PARTITION BY a.gid ORDER BY
ST_Distance(a.centroid_geom, b.the_geom) / 1609.344 ASC) AS pos
FROM
epa_iag.wecc_proposed b,
epa_iag.fallow_lands_cnty a
WHERE
ST_DWithin(a.the_geom, b.the_geom, 200*1609.344) AND a.fallow = 3 ) i
WHERE
pos = 1

--Test 3 2409, 1057, 1192, 1098, 1089, 2167
SELECT DISTINCT ON (a.gid)
a.gid,
b.gid AS b_gid,
ST_Distance(a.centroid_geom, b.the_geom) / 1609.344 AS dist
FROM
epa_iag.wecc_proposed b,
epa_iag.fallow_lands_cnty a
WHERE
ST_DWithin(a.the_geom, b.the_geom, 200*1609.344) AND a.fallow = 3
ORDER BY
a.gid, dist ASC;