Thursday, July 19, 2012

U.S. Renewable Energy Technical Potentials Report

Finally, finally, finally published! I've been working on this for some time and now, its finished. If your interested in using GIS to estimate renewable energy electric power potential, this report is for you.

See it on the NREL GIS webpage: http://www.nrel.gov/gis/re_potential.html

Sunday, July 15, 2012

PostGIS 2.0 - Intersect Raster and Polygon

I've just taken up the task to experiment with PostGIS 2.0 raster support. At this point in time, it seems that there is little documentation (by other folks) regarding raster analysis using PostGIS, so I figured I'd write down some of what I'm experiencing. First, raster support in PostGIS is HUGE!  I'm eternally thankful that ESRI is will no longer be my sole raster analysis tool (I know, there's more out there than just ESRI...).

The Data:
-One 10x10km grid cell from the NREL/Perez Direct Normal Irradiance grid
-90m Landscan Raster detailing population for greater Denver area from Oak Ridge National Lab

The Analysis Intent:
-Summarize population within grid cell, first using ESRI and then PostGIS and compare. Easy, no?

1. Load Some Data
First, loading the raster into PostGIS. This is pretty straight forward with one exception, tile size. What tile size do you use?  I would love to see theoretical or empirical analysis detailing the optimal tile size per given cell size and extent. Anyone? Until then, I'll guess.

-C : applies contraints, -M : vacuum analyze, -Y : use copy rather than insert, -t : tile size, -s : SRID, -U : username, -d : database
raster2pgsql -s 4326 -C -M -Y landscanden -t 100x100 ls_den | psql -U postgres -d postgis2

2. Get a Base Estimate
Get a base estimate using ESRI. Using Zonal Statistics I get the following:
Population: 207,578
Cell Count: 14,400

3. How Much Can We Explain?
If my results are different between ESRI and PostGIS, how much could be explained? Assume one program summarizes all border cells while the other does not. This will give us a percent difference we can easily explain. So, if I have a 10x10km grid cell as the zone, then I could fit 111 90m cells on each border for a total of 444 border cells. Square 111.9 to get 12,321 90m cells that could fit inside the 10x10km grid cell. Divide the border cells by the total 444/12,321*100 = 3.6%. We can explain 3.6% of the difference (if there is any) by just the border cells.

4. First Attempts at Raster Analysis in PostGIS
I thought I was thinking logically when I wrote this...should have just looked at the documentation first!
--Create a vectorized result of the geometric intersection between my raster layer and vector polygon
CREATE TABLE sum_pop AS
SELECT gid
,(gv).val AS population
,(gv).geom AS geom
FROM (
SELECT  a.gid
,ST_Intersection(b.rast, a.geom) gv
FROM perez_grid a, ls_den b
WHERE ST_Intersects(b.rast, a.geom) ) foo;

Spatial results looked good. Just needed to summarize
SELECT SUM(population), COUNT(*) AS cell_count FROM sum_pop;

Results:
Population: 193,492
Cell Count: 9,811
Population Percent Difference: ~7%

 Well...that doesn't look right...how do I explain the additional 3.4%? First things first, what is the deal with the count? Well, that one is pretty easy, when the raster is vectorized, it groups cells of like values. So count doesn't really matter.

5. A Hunch or Swing in the Dark?
Ok, not sure why I decided to try this, but I loaded in the exact same raster dataset except for this time I used 200x200 tile size. Shouldn't make a difference, right? WRONG!!! This still vexes me, if anyone can explain I would be grateful.

Results:
Population: 193,446
Count: 9,787

6. The Correct Method
After reading the documentation, I have a plan of attack. I'll follow their example and see what I get.
CREATE TABLE sum_pop2 AS
WITH 
   feat AS (SELECT gid, geom FROM perez_grid AS b ),
   b_stats AS
(SELECT  gid, (stats).*
FROM (
SELECT gid, ST_SummaryStats(ST_Clip(rast,1,geom)) AS stats
FROM ls_den
INNER JOIN feat
ON ST_Intersects(feat.geom,rast) ) AS foo )
SELECT gid, SUM(count) AS cell_count
  ,SUM(sum) AS population
FROM b_stats
 WHERE count > 0
GROUP BY gid
ORDER BY gid;

Results:
Population: 207,578
Cell Count: 14,400
Population Percent Difference: 0%

SUCCESS, however, I'm not sure why examples must be so verbose!  It is a beautiful query don't get me wrong...

7. Discussion
So whats difference between this amd my first attempt?
The WITH clause simply allows you to join queries, you could right this as a query with a subquery and get the same results. Using ST_Clip(), the raster is never converted to vector, the vector is used to clip the raster than a simple SummaryStats() is performed. That's the bulk of it. The GROUP BY clause aggregates all the rids (aka tiles).
The BIG question is, why does the first attempt produce the wrong answer? The intersection produces geomval pairs for only those cells that intersect, so why the difference? I would have guessed that it would have resulted in more population because it grabs even the slightest sliver that intersects (its vector on vector) while ESRI grabs only cells whose centroid falls within the intersect, but PostGIS resulted in less! 

8. A Simple Alternative to the Correct Method
For those that find the above query verbose and hard to understand, I'll provide an alternative query that produces the same result in similar return time.

CREATE TABLE sum_pop3 AS 
SELECT gid, SUM((ST_SummaryStats(ST_Clip(rast,1,geom))).sum)
FROM perez_grid, ls_den
WHERE ST_Intersects(geom,rast) 
GROUP BY gid