4.3. covid19_stats.engine.gis module
The meat of the functionality, that generates post-processed GIS data describing the territorial units (counties, etc.) of the United States. Territorial unit information includes their identifiers, latitude and longitude boundary arrays as of 2018, and their estimated population in 2019.
- covid19_stats.engine.gis.calculate_total_bbox(shapes)
This gets the bounding box – minimum and maximum latitude, and minimum and maximum longitude – of a
list
of shapes.For example, take the Sacramento, CA, metropolitan statistical area. It consists of four counties: El Dorado County, Placer County, Sacramento County, and Yolo County. Fig. 4.2 demonstrates how this algorithm works. It takes the bounding boxes of each county’s shape (shown in pink), and then takes the minimum and maximum latitude and longitude of each county’s bbox to get the total bounding box (in green).
Now here is the API description.
- Parameters:
shapes (list) – A
list
of shapes. Each shape is an \(N \times 2\) shapedarray
, with \(N\) points describing the boundary. Each row is the latitude and longitude of a point – first is the latitude, and second is the longitude. Seegis_calculate_total_bbox_sacramento.pkl.gz
for a clear example given by Fig. 4.2.- Returns:
a four element
tuple
of the total bounding box of the shape collection: minimum longitude, minimum latitude, maximum longitude, and maximum latitude.- Return type:
- covid19_stats.engine.gis.construct_adjacency(fips_data, filename='/usr/WS2/islam5/covid19_stats/covid19_stats/resources/fips_2019_adj.pkl.gz')
Creates, and then stores (or loads) the adjacency dictionary of all US counties and territorial units. If the storage file, which is by default
fips_2019_adj.pkl.gz
, does not exist, then will create and store this data into the storage file. Will return the data in the end.- Parameters:
fips_data (dict) – the US county
dict
produced by, for example,create_and_store_fips_2018
.filename (str) – the location of the adjacency dictionary file, which is by default
fips_2019_adj.pkl.gz
located in thecovid19_stats
resource directory.
- Returns:
a
dict
of adjacency. Each key is a FIPS code of a county, and each value is aset
of counties and other territories adjacent to it. Seeget_fips_adjacency
to see an example of this adjacency information for a single county.- Return type:
- covid19_stats.engine.gis.create_and_store_fips_2019()
Utility function that loads in the US CENSUS 2018 county information, located in
cb_2019_us_county_500k.shp
as a collection of Shapefiles, and returns adict
of county information.If there is no serialized version of this dictionary, this method also serializes the data structure,
fips_2019_data.pkl.gz
, for easy reloading.Subsequently, if
fips_2019_data.pkl.gz
exists, then loads that file and returns that object.- Returns:
a
dict
of US county geographic data. The key is the FIPS code for the county. Each value is adict
bbox
is the lat/lng bounding box for that county.points
is a list of shapes for that county. Each shape is an \(N \times 2\) shapedarray
, with \(N\) points describing the boundary. Each row is the latitude and longitude of a point – first is the latitude, and second is the longitude
This method uses shapefile.Reader to load in
cb_2019_us_county_500k.shp
iffips_2019_data.pkl.gz
does not exist.
- covid19_stats.engine.gis.create_and_store_fips_counties_2019()
- Returns:
a two element
tuple
. The first element is adict
of FIPS code to adict
value:county
andstate
. The second element is the reversedict
of atuple
(of county and state) to its FIPS code.The first
dict
is stored inall_2019_fips_cs_dict.pkl.gz
, and the seconddict
is stored inall_2019_cs_fips_dict.pkl.gz
.If either file does not exist, then the dictionary is created and stored into the appropriate file.
If the file exists, then the object is loaded from that file.
- Return type:
- covid19_stats.engine.gis.create_and_store_msas_and_fips_2019()
This returns a fully normalized
dict
of MSAs consistent with the the NY Times COVID-19 database. Also stores this data into the file,msa_2019_dict.pkl.gz
, if it does not exist. If it does exist, then loads the filemsa_2019_dict.pkl.gz
and returns that data. It will also dump normalizedlist
of MSA data intomsa_2019_post.pkl.gz
.This method does four things:
merges San Francisco, San Jose, and Napa MSAs into the SF Bay Area.
merges NYC into the NYC metro area.
renames Washington, DC to the DC metro area.
merges Los Angeles, Riverside, and Oxnard MSAs into the “Los Angeles” metro area (greater Los Angeles).
- Returns:
a
dict
of MSA information. The key is the MSA dataprefix
, and the value is adict
ofprefix
,region name
,fips
, andpopulation
. For example, for St. Louis, it is,{'stlouis': {'prefix': 'stlouis', 'region name': 'St. Louis Metro Area', 'fips': {'17005', '17013', '17027', '17083', '17117', '17119', '17133', '17163', '29071', '29099', '29113', '29183', '29189', '29219', '29510'}, 'population': 2803228}}
- Return type:
- covid19_stats.engine.gis.create_fips_popmap_2019()
Creates a
dict
of estimated 2019 US Census population in each US county or territory. Also stores this data into the file,fips_2019_popdict.pkl.gz
, if it does not exist. If it does exist, then loads the filefips_2019_popdict.pkl.gz
and returns that data.
- covid19_stats.engine.gis.create_msa_2019()
Creates and returns raw and unnormalized
list
of Metropolitan statistical areas initially recorded inmsa_2019.csv
, sorted by population from smallest to largest, and stores the object intomsa_2019.pkl.gz
if it does not exist. Ifmsa_2019.pkl.gz
, then loads this files and returns the subsequent object.Each entry in the
list
looks like this. For example, for St. Louis, MO MSA,{'msa': 41180, 'pop est 2019': 2803228, 'fips': {'17005', '17013', '17027', '17083', '17117', '17119', '17133', '17163', '29071', '29099', '29113', '29183', '29189', '29219', '29510'}, 'state': 'MO-IL', 'RNAME': 'St. Louis', 'prefix': 'stlouis', 'region name': 'St. Louis Metro Area'}
The keys for each MSA are
msa
(an integer code),pop est 2019
is the US Census 2019 estimated population,fips
is aset
of counties by FIPS code located in this MSA, thestate
are the states this MSA covers,RNAME
is a legend name for plotting,prefix
is the name used to identify those files that contain data for this MSA, andregion name
is the common and accepted MSA name.create_and_store_msas_and_fips
contains the fully normalizeddict
of Metropolitan statistical areas used by the NY Times COVID-19 database, andmerge_msas
performs the normalization.
- covid19_stats.engine.gis.do_bbox_intersect(bbox1, bbox2)
Checks if two bounding boxes intersect.
- covid19_stats.engine.gis.get_fips_adjacency(fips, fips_data)
Finds the FIPS code of all counties adjacent to a specified county. For example, Sacramento County, with FIPS code of 06067, has eight counties adjacent to it: 06005, 06013, 06017, 06061, 06077, 06095, 06101, 06113. Fig. 4.3 demonstrates that.
Now here is the API description.
- covid19_stats.engine.gis.merge_msas(regionName, prefix, msaids, all_data_msas)
This takes an input MSA, defined by its
prefix
, gives it a new or existingregionName
, by merging one or moreset
ofmsaids
, in alist
of MSAs as returned by, e.g.,create_msa_2019
. It then returns a newlist
of MSAs in the same format asall_data_msas
.This is used by, for example, normalizing the MSA data by merging all five boroughs in NYC into a single fake county,
NYC
, in the New York City MSA.- Parameters:
regionName (str) – the region name (
region name
key) of the merged MSA.prefix (str) – the named identifier of the MSA to be merged.
msaids (set) – the collection of MSAs to be merged into
prefix
MSA.all_data_msas (list) – the input
list
of county or US territory FIPS codes. Implicitly,all_data_msas
must contain those the MSAs identified byprefix
.
- Returns:
a new
list
of MSAs in the same style asall_data_msas
, sorted by population from lowest to highest. None of the MSAs in this new collection contain MSAs inmsaids
.- Return type: