4.3. covid19_stats.engine.gis module

The meat of the functionality, that generates post-processed GIS data describing the territorial units (counties, etc.) of the United States. Territorial unit information includes their identifiers, latitude and longitude boundary arrays as of 2018, and their estimated population in 2019.

covid19_stats.engine.gis.calculate_total_bbox(shapes)

This gets the bounding box – minimum and maximum latitude, and minimum and maximum longitude – of a list of shapes.

For example, take the Sacramento, CA, metropolitan statistical area. It consists of four counties: El Dorado County, Placer County, Sacramento County, and Yolo County. Fig. 4.2 demonstrates how this algorithm works. It takes the bounding boxes of each county’s shape (shown in pink), and then takes the minimum and maximum latitude and longitude of each county’s bbox to get the total bounding box (in green).

../_images/gis_calculate_total_bbox_sacramento.png

Fig. 4.2 Demonstration of this functionality on the four counties in the Sacramento, CA MSA. In green is the total bounding box of the lat/longitude shapes of all four counties.

Now here is the API description.

Parameters:

shapes (list) – A list of shapes. Each shape is an \(N \times 2\) shaped array, with \(N\) points describing the boundary. Each row is the latitude and longitude of a point – first is the latitude, and second is the longitude. See gis_calculate_total_bbox_sacramento.pkl.gz for a clear example given by Fig. 4.2.

Returns:

a four element tuple of the total bounding box of the shape collection: minimum longitude, minimum latitude, maximum longitude, and maximum latitude.

Return type:

tuple

covid19_stats.engine.gis.construct_adjacency(fips_data, filename='/usr/WS2/islam5/covid19_stats/covid19_stats/resources/fips_2019_adj.pkl.gz')

Creates, and then stores (or loads) the adjacency dictionary of all US counties and territorial units. If the storage file, which is by default fips_2019_adj.pkl.gz, does not exist, then will create and store this data into the storage file. Will return the data in the end.

Parameters:
  • fips_data (dict) – the US county dict produced by, for example, create_and_store_fips_2018.

  • filename (str) – the location of the adjacency dictionary file, which is by default fips_2019_adj.pkl.gz located in the covid19_stats resource directory.

Returns:

a dict of adjacency. Each key is a FIPS code of a county, and each value is a set of counties and other territories adjacent to it. See get_fips_adjacency to see an example of this adjacency information for a single county.

Return type:

dict

covid19_stats.engine.gis.create_and_store_fips_2019()

Utility function that loads in the US CENSUS 2018 county information, located in cb_2019_us_county_500k.shp as a collection of Shapefiles, and returns a dict of county information.

If there is no serialized version of this dictionary, this method also serializes the data structure, fips_2019_data.pkl.gz, for easy reloading.

Subsequently, if fips_2019_data.pkl.gz exists, then loads that file and returns that object.

Returns:

a dict of US county geographic data. The key is the FIPS code for the county. Each value is a dict

  • bbox is the lat/lng bounding box for that county.

  • points is a list of shapes for that county. Each shape is an \(N \times 2\) shaped array, with \(N\) points describing the boundary. Each row is the latitude and longitude of a point – first is the latitude, and second is the longitude

This method uses shapefile.Reader to load in cb_2019_us_county_500k.shp if fips_2019_data.pkl.gz does not exist.

covid19_stats.engine.gis.create_and_store_fips_counties_2019()
Returns:

a two element tuple. The first element is a dict of FIPS code to a dict value: county and state. The second element is the reverse dict of a tuple (of county and state) to its FIPS code.

The first dict is stored in all_2019_fips_cs_dict.pkl.gz, and the second dict is stored in all_2019_cs_fips_dict.pkl.gz.

If either file does not exist, then the dictionary is created and stored into the appropriate file.

If the file exists, then the object is loaded from that file.

Return type:

tuple

covid19_stats.engine.gis.create_and_store_msas_and_fips_2019()

This returns a fully normalized dict of MSAs consistent with the the NY Times COVID-19 database. Also stores this data into the file, msa_2019_dict.pkl.gz, if it does not exist. If it does exist, then loads the file msa_2019_dict.pkl.gz and returns that data. It will also dump normalized list of MSA data into msa_2019_post.pkl.gz.

This method does four things:

  • merges San Francisco, San Jose, and Napa MSAs into the SF Bay Area.

  • merges NYC into the NYC metro area.

  • renames Washington, DC to the DC metro area.

  • merges Los Angeles, Riverside, and Oxnard MSAs into the “Los Angeles” metro area (greater Los Angeles).

Returns:

a dict of MSA information. The key is the MSA data prefix, and the value is a dict of prefix, region name, fips, and population. For example, for St. Louis, it is,

{'stlouis': {'prefix': 'stlouis',
  'region name': 'St. Louis Metro Area',
  'fips': {'17005',
   '17013',
   '17027',
   '17083',
   '17117',
   '17119',
   '17133',
   '17163',
   '29071',
   '29099',
   '29113',
   '29183',
   '29189',
   '29219',
   '29510'},
  'population': 2803228}}

Return type:

dict

covid19_stats.engine.gis.create_fips_popmap_2019()

Creates a dict of estimated 2019 US Census population in each US county or territory. Also stores this data into the file, fips_2019_popdict.pkl.gz, if it does not exist. If it does exist, then loads the file fips_2019_popdict.pkl.gz and returns that data.

Returns:

a dict of FIPS code to estimated 2019 US census population.

Return type:

dict

covid19_stats.engine.gis.create_msa_2019()

Creates and returns raw and unnormalized list of Metropolitan statistical areas initially recorded in msa_2019.csv, sorted by population from smallest to largest, and stores the object into msa_2019.pkl.gz if it does not exist. If msa_2019.pkl.gz, then loads this files and returns the subsequent object.

Each entry in the list looks like this. For example, for St. Louis, MO MSA,

{'msa': 41180,
 'pop est 2019': 2803228,
 'fips': {'17005',
  '17013',
  '17027',
  '17083',
  '17117',
  '17119',
  '17133',
  '17163',
  '29071',
  '29099',
  '29113',
  '29183',
  '29189',
  '29219',
  '29510'},
 'state': 'MO-IL',
 'RNAME': 'St. Louis',
 'prefix': 'stlouis',
 'region name': 'St. Louis Metro Area'}

The keys for each MSA are msa (an integer code), pop est 2019 is the US Census 2019 estimated population, fips is a set of counties by FIPS code located in this MSA, the state are the states this MSA covers, RNAME is a legend name for plotting, prefix is the name used to identify those files that contain data for this MSA, and region name is the common and accepted MSA name.

create_and_store_msas_and_fips contains the fully normalized dict of Metropolitan statistical areas used by the NY Times COVID-19 database, and merge_msas performs the normalization.

Returns:

a sorted, but unnormalized, list of Metropolitan statistical areas as defined by the 2019 US Census.

Return type:

list

covid19_stats.engine.gis.do_bbox_intersect(bbox1, bbox2)

Checks if two bounding boxes intersect.

Parameters:
  • bbox1 (tuple) – the four-element tuple of bounding box #1: minimum lng/lat, and maximum lng/lat.

  • bbox2 (tuple) – the four-element tuple of bounding box #2: minimum lng/lat, and maximum lng/lat.

Returns:

True if intersect, False otherwise.

Return type:

bool

covid19_stats.engine.gis.get_fips_adjacency(fips, fips_data)

Finds the FIPS code of all counties adjacent to a specified county. For example, Sacramento County, with FIPS code of 06067, has eight counties adjacent to it: 06005, 06013, 06017, 06061, 06077, 06095, 06101, 06113. Fig. 4.3 demonstrates that.

../_images/gis_get_fips_adjacency_sacramento.png

Fig. 4.3 Visualization of the eight counties adjacent to Sacramento County (06067). Sacramento County is accentuated for easier visualization here.

Now here is the API description.

Parameters:
Returns:

a set of FIPS codes of counties adjacent to fips.

Return type:

set

covid19_stats.engine.gis.merge_msas(regionName, prefix, msaids, all_data_msas)

This takes an input MSA, defined by its prefix, gives it a new or existing regionName, by merging one or more set of msaids, in a list of MSAs as returned by, e.g., create_msa_2019. It then returns a new list of MSAs in the same format as all_data_msas.

This is used by, for example, normalizing the MSA data by merging all five boroughs in NYC into a single fake county, NYC, in the New York City MSA.

Parameters:
  • regionName (str) – the region name (region name key) of the merged MSA.

  • prefix (str) – the named identifier of the MSA to be merged.

  • msaids (set) – the collection of MSAs to be merged into prefix MSA.

  • all_data_msas (list) – the input list of county or US territory FIPS codes. Implicitly, all_data_msas must contain those the MSAs identified by prefix.

Returns:

a new list of MSAs in the same style as all_data_msas, sorted by population from lowest to highest. None of the MSAs in this new collection contain MSAs in msaids.

Return type:

list