4.3. covid19_stats.engine.gis module
The meat of the functionality, that generates post-processed GIS data describing the territorial units (counties, etc.) of the United States. Territorial unit information includes their identifiers, latitude and longitude boundary arrays as of 2018, and their estimated population in 2019.
- covid19_stats.engine.gis.calculate_total_bbox(shapes)
This gets the bounding box – minimum and maximum latitude, and minimum and maximum longitude – of a
listof shapes.For example, take the Sacramento, CA, metropolitan statistical area. It consists of four counties: El Dorado County, Placer County, Sacramento County, and Yolo County. Fig. 4.2 demonstrates how this algorithm works. It takes the bounding boxes of each county’s shape (shown in pink), and then takes the minimum and maximum latitude and longitude of each county’s bbox to get the total bounding box (in green).
Fig. 4.2 Demonstration of this functionality on the four counties in the Sacramento, CA MSA. In green is the total bounding box of the lat/longitude shapes of all four counties.
Now here is the API description.
- Parameters:
shapes (list) – A
listof shapes. Each shape is an \(N \times 2\) shapedarray, with \(N\) points describing the boundary. Each row is the latitude and longitude of a point – first is the latitude, and second is the longitude. Seegis_calculate_total_bbox_sacramento.pkl.gzfor a clear example given by Fig. 4.2.- Returns:
a four element
tupleof the total bounding box of the shape collection: minimum longitude, minimum latitude, maximum longitude, and maximum latitude.- Return type:
- covid19_stats.engine.gis.construct_adjacency(fips_data, filename='/usr/WS2/islam5/covid19_stats/covid19_stats/resources/fips_2019_adj.pkl.gz')
Creates, and then stores (or loads) the adjacency dictionary of all US counties and territorial units. If the storage file, which is by default
fips_2019_adj.pkl.gz, does not exist, then will create and store this data into the storage file. Will return the data in the end.- Parameters:
fips_data (dict) – the US county
dictproduced by, for example,create_and_store_fips_2018.filename (str) – the location of the adjacency dictionary file, which is by default
fips_2019_adj.pkl.gzlocated in thecovid19_statsresource directory.
- Returns:
a
dictof adjacency. Each key is a FIPS code of a county, and each value is asetof counties and other territories adjacent to it. Seeget_fips_adjacencyto see an example of this adjacency information for a single county.- Return type:
- covid19_stats.engine.gis.create_and_store_fips_2019()
Utility function that loads in the US CENSUS 2018 county information, located in
cb_2019_us_county_500k.shpas a collection of Shapefiles, and returns adictof county information.If there is no serialized version of this dictionary, this method also serializes the data structure,
fips_2019_data.pkl.gz, for easy reloading.Subsequently, if
fips_2019_data.pkl.gzexists, then loads that file and returns that object.- Returns:
a
dictof US county geographic data. The key is the FIPS code for the county. Each value is adictbboxis the lat/lng bounding box for that county.pointsis a list of shapes for that county. Each shape is an \(N \times 2\) shapedarray, with \(N\) points describing the boundary. Each row is the latitude and longitude of a point – first is the latitude, and second is the longitude
This method uses shapefile.Reader to load in
cb_2019_us_county_500k.shpiffips_2019_data.pkl.gzdoes not exist.
- covid19_stats.engine.gis.create_and_store_fips_counties_2019()
- Returns:
a two element
tuple. The first element is adictof FIPS code to adictvalue:countyandstate. The second element is the reversedictof atuple(of county and state) to its FIPS code.The first
dictis stored inall_2019_fips_cs_dict.pkl.gz, and the seconddictis stored inall_2019_cs_fips_dict.pkl.gz.If either file does not exist, then the dictionary is created and stored into the appropriate file.
If the file exists, then the object is loaded from that file.
- Return type:
- covid19_stats.engine.gis.create_and_store_msas_and_fips_2019()
This returns a fully normalized
dictof MSAs consistent with the the NY Times COVID-19 database. Also stores this data into the file,msa_2019_dict.pkl.gz, if it does not exist. If it does exist, then loads the filemsa_2019_dict.pkl.gzand returns that data. It will also dump normalizedlistof MSA data intomsa_2019_post.pkl.gz.This method does four things:
merges San Francisco, San Jose, and Napa MSAs into the SF Bay Area.
merges NYC into the NYC metro area.
renames Washington, DC to the DC metro area.
merges Los Angeles, Riverside, and Oxnard MSAs into the “Los Angeles” metro area (greater Los Angeles).
- Returns:
a
dictof MSA information. The key is the MSA dataprefix, and the value is adictofprefix,region name,fips, andpopulation. For example, for St. Louis, it is,{'stlouis': {'prefix': 'stlouis', 'region name': 'St. Louis Metro Area', 'fips': {'17005', '17013', '17027', '17083', '17117', '17119', '17133', '17163', '29071', '29099', '29113', '29183', '29189', '29219', '29510'}, 'population': 2803228}}
- Return type:
- covid19_stats.engine.gis.create_fips_popmap_2019()
Creates a
dictof estimated 2019 US Census population in each US county or territory. Also stores this data into the file,fips_2019_popdict.pkl.gz, if it does not exist. If it does exist, then loads the filefips_2019_popdict.pkl.gzand returns that data.
- covid19_stats.engine.gis.create_msa_2019()
Creates and returns raw and unnormalized
listof Metropolitan statistical areas initially recorded inmsa_2019.csv, sorted by population from smallest to largest, and stores the object intomsa_2019.pkl.gzif it does not exist. Ifmsa_2019.pkl.gz, then loads this files and returns the subsequent object.Each entry in the
listlooks like this. For example, for St. Louis, MO MSA,{'msa': 41180, 'pop est 2019': 2803228, 'fips': {'17005', '17013', '17027', '17083', '17117', '17119', '17133', '17163', '29071', '29099', '29113', '29183', '29189', '29219', '29510'}, 'state': 'MO-IL', 'RNAME': 'St. Louis', 'prefix': 'stlouis', 'region name': 'St. Louis Metro Area'}
The keys for each MSA are
msa(an integer code),pop est 2019is the US Census 2019 estimated population,fipsis asetof counties by FIPS code located in this MSA, thestateare the states this MSA covers,RNAMEis a legend name for plotting,prefixis the name used to identify those files that contain data for this MSA, andregion nameis the common and accepted MSA name.create_and_store_msas_and_fipscontains the fully normalizeddictof Metropolitan statistical areas used by the NY Times COVID-19 database, andmerge_msasperforms the normalization.
- covid19_stats.engine.gis.do_bbox_intersect(bbox1, bbox2)
Checks if two bounding boxes intersect.
- covid19_stats.engine.gis.get_fips_adjacency(fips, fips_data)
Finds the FIPS code of all counties adjacent to a specified county. For example, Sacramento County, with FIPS code of 06067, has eight counties adjacent to it: 06005, 06013, 06017, 06061, 06077, 06095, 06101, 06113. Fig. 4.3 demonstrates that.
Fig. 4.3 Visualization of the eight counties adjacent to Sacramento County (06067). Sacramento County is accentuated for easier visualization here.
Now here is the API description.
- covid19_stats.engine.gis.merge_msas(regionName, prefix, msaids, all_data_msas)
This takes an input MSA, defined by its
prefix, gives it a new or existingregionName, by merging one or moresetofmsaids, in alistof MSAs as returned by, e.g.,create_msa_2019. It then returns a newlistof MSAs in the same format asall_data_msas.This is used by, for example, normalizing the MSA data by merging all five boroughs in NYC into a single fake county,
NYC, in the New York City MSA.- Parameters:
regionName (str) – the region name (
region namekey) of the merged MSA.prefix (str) – the named identifier of the MSA to be merged.
msaids (set) – the collection of MSAs to be merged into
prefixMSA.all_data_msas (list) – the input
listof county or US territory FIPS codes. Implicitly,all_data_msasmust contain those the MSAs identified byprefix.
- Returns:
a new
listof MSAs in the same style asall_data_msas, sorted by population from lowest to highest. None of the MSAs in this new collection contain MSAs inmsaids.- Return type: