1. Overview

This process is nearly identical to the OnTheMap process as the requirements are very similar. To reduce code complexity the full set of LODES geography is used in the EM process but deleted at the end. In this way there are less chances for divergence in handling the data. These OTM attributes are dropped from the output data during export. EM specific instructions are documented below.

2. Set environment vars

TIGER_YEAR=YYYY

3. Unpack Faces Data

runtime: 10 minutes

Unpack faces data needed for xwalk production. The full set of faces and facesah are used in creating the block crosswalk files. This step loads the raw data into the database.

mkdir ~/workspace/postgis_tiger/otm_tiger_current
cd ~/workspace/postgis_tiger/otm_tiger_current
ln -s /data/production/prod/current/extiger/us/tl_${TIGER_YEAR}*faces*gz .
time gunzip *.gz -f
cd ..

4. Set environment variables

This file defines a standard set of environment variables used by the geography processing modules. In order to keep distinct vintages separate there exists a schema prefix which will be appended to the beginning of the postgresql schema. For example, if 'otm2018' is set as the schema prefix, data will be stored in 'otm2018public', 'otm2018final', 'otm2018names' and so on. Set the new schema prefix and environment variables prior to running any new vintage as it will drop/overwrite existing schemas and tables.

vi 00_setenvvars.bash

5. Load faces

runtime: 3 hours

This program loads the faces or facesah shape and dbf files into the prefixed public postgres schema. All files in the provided directory that match the form of faces.shp or facesah.dbf will be loaded.

time ./01_facesloader.bash /path/to/files faces

6. Format faces for crosswalking

runtime: 3.5 hours

This program takes the raw faces data in the prefixed public schema and formats the fields into crosswalkable identifiers. Fields are given 9 flags and state fips codes if necessary.

time ./02_xwalk.bash faces

7. Unpack shapefile data

runtime: 10 minutes

Unpack the shape data needed for processing the full suite of polygons and crosswalks. As of this writing the needed tables are identical to those required for OnTheMap although this can change and be trimmed in future iterations.

cd ~/workspace/postgis_tiger/otm_tiger_current
ln -s /data/production/prod/current/extiger/us/tl_${TIGER_YEAR}*{aiannh,aitsn,anrc,arealm,bg,cbsa,cd116,county,cousub,
elsd,mil,
necta,place,pointlm,rails,scsd,sldl,sldu,state,tabblock10,tract,unsd,zcta510}*gz .
time gunzip *.gz -f
cd ..

8. Load shp→ db

runtime: 30 minutes

This program loads TIGER shapefile into the public schema. Providing a path (/path otm) and type will load all application files expected for that type. Providing a path and single layer will load a single layer.

time ./04_shapeloader.bash otm_tiger_current otm

9. Get WIB csv

unpack the wib csv data

mkdir wib_csvs
cd wib_csvs
ln -s /data/production/prod/current/exwib/us/*${TIGER_YEAR}.csv.gz .
gunzip *.gz -f
chmod u+w *.csv
cd ..

10. Load WIB Defs

runtime: 10 seconds

This program loads the wib csv definitions into the working.wib_definitions table. This process will also correct expected encoding issues in the wib csv definitions.

time ./05_wibcsvloader.bash path/to/wib_csvs/

11. Create public WIBs

runtime: 1 minute

This program creates a wib table of the component parts from county, county sub, and place tables in the prefixed public schema. A scratch table called working.wib_geometry is also used.

time ./06_wibshape.bash wib

12. Validate Schemas

This program validates the table schemas in postgres vs the csv schemas in the ./schemas directory. The postgis tables checked are in the otmgeo database. The schemas verified are of raw tables only - those loaded in the public schema. Check that the output lists the expected tables as "validated". Any errors that occur here will require manual adjustment of downstream processes.

./07_checkschemas_qa.bash [otm|qwi]

13. Project Geography

runtime: process dependent

This program projects tables in the public schema into new tables in the working schema. All non-geomtry data is copied over unchanged. A set of predefined tables and projections are associated with the 'otm' and 'qwi' keywords but single layers can be projected as needed. See the program help instructions for more info.

time ./08_project.bash [otm|qwi]

14. Remove undefined

runtime: 10 seconds

This program removes undefined congressional districts, state legislative districts (upper and lower), and county subdivisions

time ./09_removeundefined.bash all

15. Remove unpopulated

runtime: 6 minutes

This program removes unpopulated and unused areas from the geography tables, either all tables or individual. Guam, American Samoa, and the volcanic Hawiian chain are removed. A working.invalid table is created that contains all of the geographies to be deleted. This table persists so it can be used to delete elements from the tables that created it.

The block definitions for the AK islands is now required. Located at: ./input_data/ak_island_blocks.csv

time ./10_removeunpopulated.bash otm

16. Create Schools

runtime: 20 seconds

This program creates the schools layer from the unsd and elsd tables in the prefixed working schema

./11_createschools.bash schools

17. Format names

runtime: 1 second

This program adjusts names on individual layers. The current iteration only makes adjustments to the mil layer.

./12_formatnames.bash mil

18. Crosswalk

runtime: 3 days

This program creates the takes the faces data and chooses the centermost face on a block to use for an internal point. The result data, block_xwalk_intpt in the prefixed working schema, represents the block level data. All non-centermost faces have been deleted in this file.

time ./13_intptxwalk.bash xwalk

runtime: 20 seconds

19. CD Block Splits

The crosswalk assigns higher level geographies based on the centermost face. Congressional districts get special handling because of their political import. Geo produces a pdf file that specifies a handful of assignments that we use wholesale.
DO NOT USE THE FULL CSV FILE OF BLOCK ASSIGNMENTS

Verify that this is the correct congress session, then download the CD116 Block Splits pdf file and convert to a csv of the form:

State Name State Code County Name County Code Tract Code Block CD’s within Block Data Tabulated to CD

Kentucky

21

Harrison

097

950100

1068

4,6

4

Kentucky

21

Harrison

097

950100

1070

4,6

4

Kentucky

21

Harrison

097

950400

2024

4,6

6

Kentucky

21

Harrison

097

950400

2032

4,6

6

Kentucky

21

Jefferson

111

010316

1013

3,4

4

Use the following commands to load and patch the crosswalk with the updated congressional districts.

time ./14_assigncdxwalk.bash path/to/CD.csv

20. Create the formatted names

Create the formatted names for all OnTheMap layers.

runtime: 14 hrs

time ./15_createnames.bash otm

21. Assign exoids to layers

Take the exoids from the crosswalk and update the references in each layer table

runtime: 45 minutes

time ./16_assignexoid.bash all

22. Generate water mask

runtime: 20 minutes

Generate the coastal mask from the difference between the CBF and TIGER.

The CBF layer is hardcoded as it’s not released on the same schedule as tiger. Verify the most recent CBF release and update the year reference in the program below if necessary.
time ./17_generatecbfwatermask.bash otm

23. Finalize the layer tables

runtime: 1 day

Create a final version of all polygon layer tables in the expected layout.

time ./18_finalizegeo.bash em

24. Export the Crosswalk CSV

runtime: 1.5 hours

Create a directory to store the final crosswalk csv and export all relevant columns. There are a few different versions of the crosswalk available. Choose as needed.

time ./19_exportxwalks.bash /some/folder acs_xwalk

25. Export shapefiles

runtime: 15 minutes

Export all the shapefiles for EM needed by the acs_xwalk. This process generates checksums and gzips the output files.

time ./24_export.bash /some/folder em

26. Deliver files to Excensus dropbox

Export all the shapefiles for EM needed by the acs_xwalk. This process generates checksums and gzips the output files.

rsync -avz -e "ssh -A dev ssh" ./src lehd@treehorn2:/dst

27. Set table ownership

Once all layers are complete, hand ownership of tables over to the otmgeo user

./00_setpermissions.bash