4.4. covid19_stats.engine.core module

This module provides the core functionality that the covid19_stats command line tools use to summarize and visualize COVID-19 case and death statistics. Here are utility methods to identify data on geographical regions such as MSAs, states, or the CONUS; to create cumulative case and death data for geographical data; and to print out summary reports of COVID-19 cases and deaths for MSAs in different formats.

covid19_stats.engine.core.create_readme_from_template(mainURL='https://tanimislam.sfo3.digitaloceanspaces.com/covid19movies', dirname_for_readme_location='/usr/WS2/islam5/covid19_stats/docsrc/source', verify=True, topN_json=None)

This recreates the README.rst to reflect the latest COVID-19 data, using the Jinja2 templated README_template.rst. This is the back-end method to covid19_update_readme.

Parameters:
  • mainURL (str) – the URL directory for which to look for a manifest JSON file of cumulative COVID-19 cases and deaths for the top population MSAs, covid19_topN_LATEST.json. This manifest file’s location is <mainURL>/covid19_topN_LATEST.json. By default, this is https://tanimislam.sfo3.digitaloceanspaces.com/covid19movies.

  • dirname_for_readme_location (str) – the location, on disk, where the covid19_stats downloaded repository lives. By default this is the current working directory.

  • verify (bool) – optional argument, whether to verify SSL connections. Default is True.

  • topN_json (str) – optional argument, the location of the manifest JSON file on disk. If specified, then this method ignores the online location, <mainURL>/covid19_topN_LATEST.json.

covid19_stats.engine.core.display_tabulated_metros(form='simple', selected_metros=None)

Prints summary COVID-19 cumulative cases and deaths of all or selected MSAs to stdout a nice tabulated table, in either a simple format with simple, Github flavored Markdown with github, reStructuredText with rst, or list-tabled reStructuredText with rst-simple.

Otherwise, if one chooses json, then returns a list of that information.

This acts as an API back-end to summarizing MSAs functionality in covid19_create_movie_or_summary. Please see demonstration output for what this data looks like on screen.

Parameters:
  • form (str) –

    If one of simple, github, rst, or rst-simple, then prints the table of MSA summary COVID-19 data to screen. If one of simple, github, or rst, then uses tabulate to format the data.

    If json, then returns a list of summary data of COVID-19 cumulative cases and deaths for all or specified MSAs as dict entries. This list is sorted from largest MSA population to smallest. An example output is core_summary_data.json. Here are the first two entries,

    [{'RANK': 1,
      'PREFIX': 'nyc',
      'NAME': 'NYC Metro Area',
      'POPULATION': 19216182,
      'FIRST INC.': '01 March 2020',
      'NUM DAYS': 324,
      'NUM CASES': 1390557,
      'NUM DEATHS': 50378,
      'MAX CASE COUNTY': 541846,
      'MAX CASE COUNTY NAME': 'New York City, New York'},
     {'RANK': 2,
      'PREFIX': 'losangeles',
      'NAME': 'LA Metro Area',
      'POPULATION': 18711436,
      'FIRST INC.': '25 January 2020',
      'NUM DAYS': 360,
      'NUM CASES': 1828244,
      'NUM DEATHS': 21240,
      'MAX CASE COUNTY': 1032277,
      'MAX CASE COUNTY NAME': 'Los Angeles County, California'}]
    

  • selected_metros (list) – Optional argument. By default, will print or return information on all MSAs. Otherwise specify the list of MSAs available as keys of the data_msas_2019 dictionary.

covid19_stats.engine.core.display_tabulated_metros_fromjson(summary_data_json)

Takes the list data of summary COVID-19 death in MSAs, and prints out to list-tabled reStructuredText. An example of this data structure is core_summary_data.json.

Parameters:

summary_data_json (list) – a list of summary data of COVID-19 cumulative cases and deaths for MSAs as dict entries. This list is sorted from largest MSA population to smallest. An example output is core_summary_data.json and shown here.

covid19_stats.engine.core.get_boundary_dict(fips_collection)

Returns a dict of FIPS code to the collection of geographic areas for that county. The geographic data comes from fips_data_2019.

Parameters:

fips_collection (set) – the set of FIPS codes of counties or US territorial units for which to get geographic information.

Returns:

a dict. The key is the FIPS code of a county or territorial unit, and the value is the list of geographic regions for that county or unit.

Return type:

dict

covid19_stats.engine.core.get_clustering_fips(collection_of_fips, adj=None)

Finds the separate clusters of counties or territorial units that are clustered together. This is used to identify possibly different clusters of counties that may be separate from each other. If one does not supply an adjacency dict, it uses the adjacency dictionary that fips_adj_2018 returns. Look at fips_2019_adj.pkl.gz to see what this dictionary looks like.

Parameters:
  • collection_of_fips (list) – the list of counties or territorial units, each identified by its FIPS code.

  • adj (dict) – optionally specified adjacency dictionary. Otherwise it uses the fips_adj_2018 returned dictionary. Look at fips_2019_adj.pkl.gz to see what this dictionary looks like.

Returns:

a list of counties clustered together. Each cluster is a set of FIPS codes of counties grouped together.

Return type:

list

covid19_stats.engine.core.get_county_state(fips)
Parameters:

fips (str) – the FIPS code of the county or territorial unit.

Returns:

a dict of county and state.

Return type:

dict

covid19_stats.engine.core.get_data_fips(fips)

Given a county identified by its FIPS code, Returns the COVID-19 cumulative cases and deaths record of a single county or territorial unit identified by its FIPS code. Takes the data from the cumulative cases and deaths record of the NY Times COVID-19 database (see all_counties_nytimes_covid19_data).

Parameters:

fips (str) – the FIPS code of the county or territorial unit.

Returns:

a two-element tuple. First element is the fips, and the second is the DataFrame representing the cumulative COVID-19 cases and deaths ordered by earliest to latest date. This DataFrame has three columns: date is the date of recorded incidence in that county, cases_<fips> is the cumulative COVID-19 cases on that date, and deaths_<fips> is the cumulative COVID-19 deaths on that date. Here, <fips> is the FIPS code of that county.

Return type:

tuple

covid19_stats.engine.core.get_fips_msa(county, state)

Given a county and state of a county or territorial unit, returns its FIPS code and the data structure on the MSA in which this county lies.

Parameters:
  • county (str) – county name.

  • state (str) – state name.

Returns:

a two-element tuple. First element is the county or territorial unit FIPS code. Second element is the geographical information on the MSA (see St. Louis data for example).

Return type:

tuple

covid19_stats.engine.core.get_incident_data(data=None, multiprocess=True)

Given geographical information on a region, will return COVID-19 cumulative statistics on all the counties or territorial units of that structure. Best to show by example.

For example, for the bayearea MSA, the output incident data structure for 26 February 2021 lives in core_incident_data_bayarea.pkl.gz. This structure is a dict with the following keys and values.

  • bbox is a 4-element tuple of the region bounding box: minimum lat/lng, and maximum lat/lng.

  • boundaries is a dict of boundary information. Each key is the FIPS code, and its value is a list of boundary lat/lngs for that county or territorial unit. Look at gis_calculate_total_bbox_sacramento.pkl.gz for an example of this data structure.

  • last day is the number of days (from first COVID-19 incident) in this incident data set.

  • df is the DataFrame that contains COVID-19 cumulative case and death data for all counties or territorial units in that region.

  • df_1day is the DataFrame that contains the 1-day averaged COVID-19 new case and death data for all counties or territorial units in that region.

  • df_7day is the DataFrame that contains the 7-day averaged COVID-19 new case and death data for all counties or territorial units in that region.

  • prefix is the string inherited from the input prefix key in the data dict.

  • region name is the string inherited from the input region name key in the data dict.

  • population is the int inherited from the input population key in the data dict.

  • fips is the set inherited from the input fips key in the data dict.

This Pandas DataFrame, located under the df key, has the following columns ordered by first to last incident date.

  • days_from_beginning is the day relative to the first incident. It starts at 0 and ends at last day.

  • date contains the date of the incident day, from first to last.

  • cases are the cumulative COVID-19 cases for the whole region from first to last incident date.

  • death are the cumulative COVID-19 deaths for the whole region from first to last incident date.

  • cases_<NUM> are the cumulative COVID-19 cases for a given county or territorial unit in the region (<NUM> is its FIPS code) from first to last incident date.

  • deaths_<NUM> are the cumulative COVID-19 deaths for a given county or territorial unit in the region (<NUM> is its FIPS code) from first to last incident date.

Parameters:
  • data (dict) – Optional argument, but if specified is the geographical information of a region. By default is the bayarea MSA. See St. Louis data for an example of an MSA. See Rhode Island data for an example of a US state or territory. See CONUS data for the CONUS.

  • multiprocess (bool) – if True, then use multiprocessing to get the incident data information, otherwise do not. Default is True.

Returns:

the dict described above, see core_incident_data_bayarea.pkl.gz.

Return type:

dict

covid19_stats.engine.core.get_max_cases_county(inc_data)

Convenience method that returns a dict of the FIPS code, county, state, and cases for the county or territorial unit, for the county in a region that has the worst number of COVID-19 cases.

Parameters:

inc_data (dict) – the incident data structure for a region. See get_incident_data for what this output looks like.

Returns:

a dict of summary information on the worst-perfoming county in the region, COVID-19 case wise. For core_incident_data_bayarea.pkl.gz, this is,

{'fips': '06085',
 'cases': 94366,
 'county': 'Santa Clara County',
 'state': 'California'}

Return type:

dict

covid19_stats.engine.core.get_maximum_cases(inc_data)

Convenience method that returns a two-element tuple of the FIPS code and number of COVID-19 cases, for the worst-performing county, case-wise.

Parameters:

inc_data (dict) – the incident data structure for a region. See get_incident_data for what this output looks like.

Returns:

the two-element tuple of FIPS code and cumulative number of COVID-19 cases. For core_incident_data_bayarea.pkl.gz, this is,

('06085', 94366)

Return type:

tuple

covid19_stats.engine.core.get_mp4_album_name(inc_data)

This method operates on MP4 movie output from command line tools that produce COVID-19 case and death summary movies – such as covid19_create_movie_or_summary, covid19_state_summary, or covid19_movie_updates – or methods that create MP4 files – such as create_summary_cases_or_deaths_movie_frombeginning or create_summary_movie_frombeginning.

It determines whether the geographic region is classified as a STATE, CONUS, METROPOLITAN STATISTICAL AREA, or CUSTOM REGION. It is used in four CLI functionalities:

Parameters:

inc_data (dict) –

the dict that contains geographic data on the identified region. See St. Louis data for an example of an MSA geographic data dict. See Rhode Island data for an example of a US state or territory. See CONUS data for the CONUS.

{'prefix': 'providence',
 'region name': 'Providence Metro Area',
 'fips': {'25005', '44001', '44003', '44005', '44007', '44009'},
 'population': 1624578}

Returns:

If the prefix is one of the MSAs, METROPOLITAN STATISTICAL AREA. If its prefix is identified as a state, then STATE. If its prefix is conus, then CONUS.

Return type:

str

covid19_stats.engine.core.get_msa_data(msaname)
Parameters:

msaname (str) – the identifier name for the MSA, which must be one of the keys in the dict that data_msas_2019 returns.

Returns:

the MSA geographical information (see St. Louis data for example).

Return type:

dict