Prod Geography Postgres & PGUnit Setup

1. Overview

This document overviews how to configure the lehd geoprocessing database. These steps were performed on RHEL7 using the Red Hat repositories.

2. Requirements

Postgres 9.2 and PostGIS 2.0 configured using the postgres user as the owner.
To continue these steps personal user will need to be able to sudo into the postgres user.
Individual users should also be able to access postgres and postgis executables such as psql and shp2pgsql.

3. Cluster Initialization

This process creates a cluster on disk - make sure there is an appropriate amount of space available.

# login to the server
ssh servername
# access the postgres cluster owner user
sudo su - postgres
# initialize a database
initdb -D /path/to/cluster/on/disk
# start the database
pg_ctl -D /path/to/cluster/on/disk start

4. Create DB and Enable PostGIS

Create a database within the new cluster.

createdb geoprocessing
psql -d geoprocessing

CREATE EXTENSION postgis;
GRANT SELECT, INSERT ON TABLE public.spatial_ref_sys TO public;
\q

5. Grant DB Access to User

# create a user in the database
createuser yourusername

# login as personal user
psql -d geoprocessing

Test postgis access is working correctly.

select * from spatial_ref_sys limit 1;

6. Insert the Albers North America Spatial Reference

Create the esri SRID reference following the EPSG definition.

INSERT into spatial_ref_sys (srid, auth_name, auth_srid, proj4text, srtext) values ( 102003, 'ESRI', 102003, '+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=37.5 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs ', 'PROJCS["USA_Contiguous_Albers_Equal_Area_Conic",GEOGCS["GCS_North_American_1983",DATUM["North_American_Datum_1983",SPHEROID["GRS_1980",6378137,298.257222101]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["False_Easting",0],PARAMETER["False_Northing",0],PARAMETER["longitude_of_center",-96],PARAMETER["Standard_Parallel_1",29.5],PARAMETER["Standard_Parallel_2",45.5],PARAMETER["latitude_of_center",37.5],UNIT["Meter",1],AUTHORITY["EPSG","102003"]]');
\q

Logout of the postgres user

exit

7. Download the Existing Geoprocessing Code

A future version of these docs will reference an internal git repo rather than this personal copy.

cd /path/to/location
git clone https://githubusername:githubaccesskey@github.com/jodyhoonstarr/lehd-geoprocessing.git

If run internally this needs some odd handling to get through the firewall. If auth fails when running clone then

GIT_CURL_VERBOSE=1 GIT_TRACE=1 git clone ...

Look at the 307 redirect output then open the url in browser.

Remove the git history since this currently references a personal repository.

cd lehd-geoprocessing
rm -rf .git

8. Allow Create Access

External users can’t access the db so allow any internal user create access.

sudo su - postgres
psql geoprocessing

grant create on database geoprocessing to public;
\q

Logout of the postgres user

exit

9. Set log level

sudo su - postgres
psql geoprocessing

ALTER DATABASE geoprocessing SET client_min_messages TO WARNING;
\q

10. PGUnit Setup

cd /path/to/location/lehd-geoprocessing
psql geoprocessing -f unit_testing_setup/pgunit_edited.sql
sudo su - postgres
psql geoprocessing

ALTER DATABASE geoprocessing SET search_path TO public, pgunit;
\q

11. PGTune Settings

The default configuration for postgres is inadequate for processing the data in a timely manner. Use the online PGTune utility to generate settings as needed. The available resources on the data cluster are enormous, but the postgres server will be disconnected from the typical job scheduler - as such it should play nice with the rest of the system resources. It is tenatively assumed that the behavior should match that of a moderate class single instance database server (12 cpu, 32G mem) and should behave like a data warehouse (dw).

sudo su - postgres
cd /path/to/postgres/data/

Modify postgresql.conf to include the following config changes.

# DB Version: 9.2
# OS Type: linux
# DB Type: mixed
# Total Memory (RAM): 264 GB
# Data Storage: san

max_connections = 100
shared_buffers = 66GB
effective_cache_size = 198GB
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 300
work_mem = 346030kB
checkpoint_segments = 32