Wednesday, November 21, 2012

PostGIS - Raster Tabulate Area

This post compares - in a very basic sense - tabulating raster area in ESRI and PostGIS.

1. Data
- Raster: 90 meter resolution covering most of the contiguous United States. Represents available land for solar development.
- Vector: Native American lands from the Bureau of Indian Affairs. Roughly 300 distinct boundaries.

2. Storage
-ESRI: 113 MB
-PostGIS: Loaded in using 100x100 tiles. Table = 137 MB, spatial index = 19 MB

3. Tabulate Area - Time Trial
-ESRI: tabulate area tool finished in roughly 1 minute.
-PostGIS: Using the SQL below, finished in roughly 13 minutes. Confirmed spatial index was used.

CREATE TABLE tabulate_area AS
SELECT
a.tribe_name 

       ,SUM(ST_Count(ST_Clip(b.rast, 1, a.the_geom)) pow(90,2)) AS square_meters
FROM tribal_boundaries a
INNER JOIN available_land b ON ST_Intersects(a.the_geom, b.rast)

4. Compare Results
I found that PostGIS likes to automatically snap to the raster whereas ESRI defaults to not snapping. If not snapping in ESRI, results between ESRI and PostGIS are different (as would be expected). If snapping in ESRI, then results are exactly the same. Loaded in the ESRI results and compared the two with the following SQL:

SELECT
    
post.tribe_name
  
,post.sqm AS post_sqm
  
,esri.sqm AS esri_sqm
  
,post.sqm - esri.sqm AS tabulate_diff

FROM tabulate_area AS post
LEFT OUTER JOIN esri ON post.tribe_name = esri.tribe_name;

5. Why is PostGIS so slow? It's not! Read on...
Well...12 minutes slower is simply not acceptable...so what is the reason? I decided to look at the tile size on the raster. My initial thought was, the tighter the tiles, the better the spatial index, so I set it at 100x100 or roughly 9 kilometers ((90cell sizex100tile size)/1000meters). Thinking on it, sure my ST_Intersects() was fast, but my clip was choking on all the individual tiles. So I loaded in the raster again, this time at 1000x1000tile size. Ran the query again and que bueno! New runtime for PostGIS is 1 1/2 minutes.

Lesson learned: think about tile size and the analysis you'll be performing before you load your raster.

Thursday, November 1, 2012

PostGIS - Majority Point Value in Polygon

Here I present a simple method for determining the majority point value within each polygon in PostGIS. This method utilizes the postgres Window function to rank order the count of point values within each polygon.  




--Majority Count of Point Values
SELECT
  
i.*

FROM(
SELECT
    
polygons.gid
  
,points.value
  
,COUNT(*) AS point_count
  
,RANK() OVER(PARTITION BY polygons.gid ORDER BY COUNT(*) DESC) AS point_value_rank

FROM polygons, points
WHERE ST_Intersects(polygons.geom, points.geom)
GROUP BY polygons.gid, points.value ) i
WHERE i.point_value_rank = 1;  

Oh, and by the way. My SQL has always look terrible because I usually get lazy halfway through edited the color etc (apologies). But now I have come across SQL "Prettify". Neat.

PostgreSQL - Copy Table to Another Server


My previous workflow for copying tables from one server to another was (1) pg_dump on server "A" then (2) SCP to server "B" then (3) restore via psql. Not a terrible workflow, but my new workflow is much more eloquent. 

First, I login to server B (my new home for the data), then pg_dump with connection params set to serverA (current location of data). Then pipe directly to psql with connection params set to serverB a.k.a. localhost. 

pg_dump -t schema.table -h serverA -U username databaseA | psql -h localhost -U username databaseB;

An important side note:

If your postgres databases are of different vintages, you should always pg_dump using the latest vintage i.e. If serverB is 9.1 and serverA is 8.4, dump using serverB. pg_* are usually backwards compatible, but not always forwards compatible.