Sunday, February 12, 2012

PostgreSQL - Generate Series

Generate Series (PostgreSQL Doc) is a powerful uncomplicated function to have in your tool-belt. Straight from the PostgreSQL help docs, generate series, generates a series from a start and stop value.

i.e.
SELECT * FROM generate_series(1,3) AS series
series
_ _ _ _ _ 
1
2
3


So, how you ask can you use this? The other week I was processing weather files in an hourly PV simulation model via Python and passing results to a PostgreSQL table. I had ~2.4million weather files (100,000 unique spatial locations * 8 years * 3 different system types) to process but was sure some would fail, but I did not want to have to restart the process every time a file failed so I threw everything in a try, except block. With the processing finished I needed to check what finished and what failed, so I needed a table with all possible outcomes to join my results to - in came generate_series.

--Create table of all possible outcomes
CREATE TABLE unique_outcomes AS
SELECT
    gid AS spatial_location_id,
    generate_series(1998,2005) AS years,
    generate_series(0,2) AS system
FROM
    weather_files

spatial_location_id, years, system
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
1, 1998, 0
1, 1998, 1
1, 1998, 2
1, 1999, 0
1, 2999, 1
...


Next step was to simply join my results to the unique_outcomes table and select null joined values. A simple and powerful function; and yes, I could have just saved my failed ids in my try, except block, but then I wouldn't have anything to post, ha.