1. Overview
This process is nearly identical to the OnTheMap process as the requirements are very similar. To reduce code complexity the full set of LODES geography is used in the EM process but deleted at the end. In this way there are less chances for divergence in handling the data. These OTM attributes are dropped from the output data during export. EM specific instructions are documented below.
2. Set environment vars
TIGER_YEAR=YYYY
3. Unpack Faces Data
runtime: 10 minutes
Unpack faces data needed for xwalk production. The full set of faces and facesah are used in creating the block crosswalk files. This step loads the raw data into the database.
mkdir ~/workspace/postgis_tiger/otm_tiger_current
cd ~/workspace/postgis_tiger/otm_tiger_current
ln -s /data/production/prod/current/extiger/us/tl_${TIGER_YEAR}*faces*gz .
time gunzip *.gz -f
cd ..
4. Set environment variables
This file defines a standard set of environment variables used by the geography processing modules. In order to keep distinct vintages separate there exists a schema prefix which will be appended to the beginning of the postgresql schema. For example, if 'otm2018' is set as the schema prefix, data will be stored in 'otm2018public', 'otm2018final', 'otm2018names' and so on. Set the new schema prefix and environment variables prior to running any new vintage as it will drop/overwrite existing schemas and tables.
vi 00_setenvvars.bash
5. Load faces
runtime: 3 hours
This program loads the faces or facesah shape and dbf files into the prefixed public postgres schema. All files in the provided directory that match the form of faces.shp or facesah.dbf will be loaded.
time ./01_facesloader.bash /path/to/files faces
6. Format faces for crosswalking
runtime: 3.5 hours
This program takes the raw faces data in the prefixed public schema and formats the fields into crosswalkable identifiers. Fields are given 9 flags and state fips codes if necessary.
time ./02_xwalk.bash faces
7. Unpack shapefile data
runtime: 10 minutes
Unpack the shape data needed for processing the full suite of polygons and crosswalks. As of this writing the needed tables are identical to those required for OnTheMap although this can change and be trimmed in future iterations.
cd ~/workspace/postgis_tiger/otm_tiger_current
ln -s /data/production/prod/current/extiger/us/tl_${TIGER_YEAR}*{aiannh,aitsn,anrc,arealm,bg,cbsa,cd116,county,cousub,
elsd,mil,
necta,place,pointlm,rails,scsd,sldl,sldu,state,tabblock10,tract,unsd,zcta510}*gz .
time gunzip *.gz -f
cd ..
8. Load shp→ db
runtime: 30 minutes
This program loads TIGER shapefile into the public schema. Providing a path (/path otm) and type will load all application files expected for that type. Providing a path and single layer will load a single layer.
time ./04_shapeloader.bash otm_tiger_current otm
9. Get WIB csv
unpack the wib csv data
mkdir wib_csvs
cd wib_csvs
ln -s /data/production/prod/current/exwib/us/*${TIGER_YEAR}.csv.gz .
gunzip *.gz -f
chmod u+w *.csv
cd ..
10. Load WIB Defs
runtime: 10 seconds
This program loads the wib csv definitions into the working.wib_definitions table. This process will also correct expected encoding issues in the wib csv definitions.
time ./05_wibcsvloader.bash path/to/wib_csvs/
11. Create public WIBs
runtime: 1 minute
This program creates a wib table of the component parts from county, county sub, and place tables in the prefixed public schema. A scratch table called working.wib_geometry is also used.
time ./06_wibshape.bash wib
12. Validate Schemas
This program validates the table schemas in postgres vs the csv schemas in the ./schemas directory. The postgis tables checked are in the otmgeo database. The schemas verified are of raw tables only - those loaded in the public schema. Check that the output lists the expected tables as "validated". Any errors that occur here will require manual adjustment of downstream processes.
./07_checkschemas_qa.bash [otm|qwi]
13. Project Geography
runtime: process dependent
This program projects tables in the public schema into new tables in the working schema. All non-geomtry data is copied over unchanged. A set of predefined tables and projections are associated with the 'otm' and 'qwi' keywords but single layers can be projected as needed. See the program help instructions for more info.
time ./08_project.bash [otm|qwi]
14. Remove undefined
runtime: 10 seconds
This program removes undefined congressional districts, state legislative districts (upper and lower), and county subdivisions
time ./09_removeundefined.bash all
15. Remove unpopulated
runtime: 6 minutes
This program removes unpopulated and unused areas from the geography tables, either all tables or individual. Guam, American Samoa, and the volcanic Hawiian chain are removed. A working.invalid table is created that contains all of the geographies to be deleted. This table persists so it can be used to delete elements from the tables that created it.
The block definitions for the AK islands is now required. Located at: ./input_data/ak_island_blocks.csv
time ./10_removeunpopulated.bash otm
16. Create Schools
runtime: 20 seconds
This program creates the schools layer from the unsd and elsd tables in the prefixed working schema
./11_createschools.bash schools
17. Format names
runtime: 1 second
This program adjusts names on individual layers. The current iteration only makes adjustments to the mil layer.
./12_formatnames.bash mil
18. Crosswalk
runtime: 3 days
This program creates the takes the faces data and chooses the centermost face on a block to use for an internal point. The result data, block_xwalk_intpt in the prefixed working schema, represents the block level data. All non-centermost faces have been deleted in this file.
time ./13_intptxwalk.bash xwalk
runtime: 20 seconds
19. CD Block Splits
The crosswalk assigns higher level geographies based on the centermost face. Congressional districts get special handling because of their political import. Geo produces a pdf file that specifies a handful of assignments that we use wholesale. |
DO NOT USE THE FULL CSV FILE OF BLOCK ASSIGNMENTS |
Verify that this is the correct congress session, then download the CD116 Block Splits pdf file and convert to a csv of the form:
State Name | State Code | County Name | County Code | Tract Code | Block | CD’s within Block | Data Tabulated to CD |
---|---|---|---|---|---|---|---|
Kentucky |
21 |
Harrison |
097 |
950100 |
1068 |
4,6 |
4 |
Kentucky |
21 |
Harrison |
097 |
950100 |
1070 |
4,6 |
4 |
Kentucky |
21 |
Harrison |
097 |
950400 |
2024 |
4,6 |
6 |
Kentucky |
21 |
Harrison |
097 |
950400 |
2032 |
4,6 |
6 |
Kentucky |
21 |
Jefferson |
111 |
010316 |
1013 |
3,4 |
4 |
Use the following commands to load and patch the crosswalk with the updated congressional districts.
time ./14_assigncdxwalk.bash path/to/CD.csv
20. Create the formatted names
Create the formatted names for all OnTheMap layers.
runtime: 14 hrs
time ./15_createnames.bash otm
21. Assign exoids to layers
Take the exoids from the crosswalk and update the references in each layer table
runtime: 45 minutes
time ./16_assignexoid.bash all
22. Generate water mask
runtime: 20 minutes
Generate the coastal mask from the difference between the CBF and TIGER.
The CBF layer is hardcoded as it’s not released on the same schedule as tiger. Verify the most recent CBF release and update the year reference in the program below if necessary. |
time ./17_generatecbfwatermask.bash otm
23. Finalize the layer tables
runtime: 1 day
Create a final version of all polygon layer tables in the expected layout.
time ./18_finalizegeo.bash em
24. Export the Crosswalk CSV
runtime: 1.5 hours
Create a directory to store the final crosswalk csv and export all relevant columns. There are a few different versions of the crosswalk available. Choose as needed.
time ./19_exportxwalks.bash /some/folder acs_xwalk
25. Export shapefiles
runtime: 15 minutes
Export all the shapefiles for EM needed by the acs_xwalk. This process generates checksums and gzips the output files.
time ./24_export.bash /some/folder em
26. Deliver files to Excensus dropbox
Export all the shapefiles for EM needed by the acs_xwalk. This process generates checksums and gzips the output files.
rsync -avz -e "ssh -A dev ssh" ./src lehd@treehorn2:/dst
27. Set table ownership
Once all layers are complete, hand ownership of tables over to the otmgeo user
./00_setpermissions.bash