1. Validate the table/column identities
1.1. Preparation
-
TODO turn YYYY into params before running this again
-
download the tableshells Example: https://www2.census.gov/programs-surveys/acs/summary_file/2019/documentation/user_tools/ACS2019_Table_Shells .xlsx[ACS2019_Table_Shells]
-
convert them to csv files Open in excel and save as UTF-8 CSV
-
scp over to the production servers
scp ACS2019_TableShells.csv prodserver:/path/to/workdir
-
remove any windows text formatting with:
dos2unix filename
-
login to database
1.2. Create schema and tableshells
CREATE SCHEMA acs;
CREATE TABLE acs.tableshells2018 (
TableID TEXT,
Line TEXT,
UniqueID TEXT,
Stub TEXT,
DataRelease TEXT
);
CREATE TABLE acs.tableshells2019 (
TableID TEXT,
Line TEXT,
UniqueID TEXT,
Stub TEXT,
DataRelease TEXT
);
1.3. Load data
\copy acs.tableshells2018 FROM 'ACS2018_Table_Shells_fix.csv' DELIMITER ',' CSV HEADER
\copy acs.tableshells2019 FROM 'ACS2019_Table_Shells_fix.csv' DELIMITER ',' CSV HEADER
1.4. Drop all with data with a null uniqueid
DELETE FROM acs.tableshells2018 WHERE uniqueid IS NULL;
DELETE FROM acs.tableshells2019 WHERE uniqueid IS NULL;
1.5. Check differences in EM Lookups
SELECT l.tableid, l.uniqueid, l.stub, r.stub
FROM acs.tableshells2017 l
LEFT JOIN acs.tableshells2018 r
ON l.uniqueid = r.uniqueid
WHERE l.tableid in ('B02001', 'B22010', 'B25034', 'B09020', 'C17002', 'B11002', 'B25040', 'B25024', 'B25044', 'B11007', 'C21007', 'B16004', 'B19059', 'B17021', 'B19055', 'B19057', 'B19056', 'B19051', 'B03003', 'B25082') AND
l.stub <> r.stub;
Check that the output here returns 0 rows. Also, consider removing the stub <> comparison to test the join. |
1.6. Clear tables
TRUNCATE acs.tableshells2017;
TRUNCATE acs.tableshells2018;
1.7. Reload the data
\copy acs.tableshells2018 FROM 'ACS2018_Table_Shells_fix.csv' DELIMITER ',' CSV HEADER
\copy acs.tableshells2019 FROM 'ACS2019_Table_Shells_fix.csv' DELIMITER ',' CSV HEADER
1.8. Drop rows with no tableid
DELETE FROM acs.tableshells2018 WHERE tableid IS NULL;
DELETE FROM acs.tableshells2019 WHERE tableid IS NULL;
1.9. Drop all data with uniqueid (to get table info)
DELETE FROM acs.tableshells2018 WHERE uniqueid IS NOT NULL;
DELETE FROM acs.tableshells2019 WHERE uniqueid IS NOT NULL;
1.10. Drop rows where text says "Universe:"
DELETE FROM acs.tableshells2018 WHERE stub LIKE '%Universe:%';
DELETE FROM acs.tableshells2019 WHERE stub LIKE '%Universe:%';
1.11. Drop where text is blank
DELETE FROM acs.tableshells2018 WHERE stub IS NULL;
DELETE FROM acs.tableshells2019 WHERE stub IS NULL;
1.12. Test that the tablenames haven’t changed
SELECT l.tableid, l.stub, r.stub
FROM acs.tableshells2017 l
LEFT JOIN acs.tableshells2018 r
ON l.tableid = r.tableid
WHERE l.tableid in ('B02001', 'B22010', 'B25034', 'B09020', 'C17002', 'B11002', 'B25040', 'B25024', 'B25044', 'B11007', 'C21007', 'B16004', 'B19059', 'B17021', 'B19055', 'B19057', 'B19056', 'B19051', 'B03003', 'B25082') AND l.stub <> r.stub;
Check that the output here returns 0 rows. Also, consider removing the stub <> comparison to test the join. |
2. Validate the Sequence and Offset Values
ACS Produces a lookup table for getting seq and offset. The 2017 5-year sheet can be found [here](https://www2.census.gov/programs-surveys/acs/summary_file/2017/documentation/user_tools/ACS_5yr_Seq_Table_Number_Lookup.xls).
2.1. Validate table below vs ACS excel sheet
See the planio tickets for the most recent version of this table. The one below is an example.
Table | Sequence | Start Positon |
---|---|---|
B02001 |
4 |
7 |
B22010 |
75 |
121 |
B25034 |
114 |
7 |
B09020 |
34 |
114 |
C17002 |
50 |
219 |
B11002 |
36 |
97 |
B25040 |
114 |
63 |
B25024 |
113 |
65 |
B25044 |
114 |
114 |
B11007 |
37 |
61 |
C21007 |
74 |
16 |
B16004 |
46 |
7 |
B19059 |
63 |
110 |
B17021 |
55 |
58 |
B19055 |
63 |
98 |
B19057 |
63 |
104 |
B19056 |
63 |
101 |
B19051 |
63 |
86 |
B03003 |
5 |
59 |
B25082 |
116 |
28 |
At this point the ACS Tables are valid and ready for loading in EM |