4.4. covid19_stats.engine.core module
This module provides the core functionality that the covid19_stats command line tools use to summarize and visualize COVID-19 case and death statistics. Here are utility methods to identify data on geographical regions such as MSAs, states, or the CONUS; to create cumulative case and death data for geographical data; and to print out summary reports of COVID-19 cases and deaths for MSAs in different formats.
- covid19_stats.engine.core.create_readme_from_template(mainURL='https://tanimislam.sfo3.digitaloceanspaces.com/covid19movies', dirname_for_readme_location='/usr/WS2/islam5/covid19_stats/docsrc/source', verify=True, topN_json=None)
This recreates the
README.rstto reflect the latest COVID-19 data, using the Jinja2 templatedREADME_template.rst. This is the back-end method to covid19_update_readme.- Parameters:
mainURL (str) – the URL directory for which to look for a manifest JSON file of cumulative COVID-19 cases and deaths for the top population MSAs,
covid19_topN_LATEST.json. This manifest file’s location is<mainURL>/covid19_topN_LATEST.json. By default, this is https://tanimislam.sfo3.digitaloceanspaces.com/covid19movies.dirname_for_readme_location (str) – the location, on disk, where the covid19_stats downloaded repository lives. By default this is the current working directory.
verify (bool) – optional argument, whether to verify SSL connections. Default is
True.topN_json (str) – optional argument, the location of the manifest JSON file on disk. If specified, then this method ignores the online location,
<mainURL>/covid19_topN_LATEST.json.
- covid19_stats.engine.core.display_tabulated_metros(form='simple', selected_metros=None)
Prints summary COVID-19 cumulative cases and deaths of all or selected MSAs to stdout a nice tabulated table, in either a simple format with
simple, Github flavored Markdown withgithub, reStructuredText withrst, or list-tabled reStructuredText withrst-simple.Otherwise, if one chooses
json, then returns alistof that information.This acts as an API back-end to summarizing MSAs functionality in covid19_create_movie_or_summary. Please see demonstration output for what this data looks like on screen.
- Parameters:
form (str) –
If one of
simple,github,rst, orrst-simple, then prints the table of MSA summary COVID-19 data to screen. If one ofsimple,github, orrst, then usestabulateto format the data.If
json, then returns alistof summary data of COVID-19 cumulative cases and deaths for all or specified MSAs asdictentries. Thislistis sorted from largest MSA population to smallest. An example output iscore_summary_data.json. Here are the first two entries,[{'RANK': 1, 'PREFIX': 'nyc', 'NAME': 'NYC Metro Area', 'POPULATION': 19216182, 'FIRST INC.': '01 March 2020', 'NUM DAYS': 324, 'NUM CASES': 1390557, 'NUM DEATHS': 50378, 'MAX CASE COUNTY': 541846, 'MAX CASE COUNTY NAME': 'New York City, New York'}, {'RANK': 2, 'PREFIX': 'losangeles', 'NAME': 'LA Metro Area', 'POPULATION': 18711436, 'FIRST INC.': '25 January 2020', 'NUM DAYS': 360, 'NUM CASES': 1828244, 'NUM DEATHS': 21240, 'MAX CASE COUNTY': 1032277, 'MAX CASE COUNTY NAME': 'Los Angeles County, California'}]
selected_metros (list) – Optional argument. By default, will print or return information on all MSAs. Otherwise specify the
listof MSAs available as keys of thedata_msas_2019dictionary.
- covid19_stats.engine.core.display_tabulated_metros_fromjson(summary_data_json)
Takes the
listdata of summary COVID-19 death in MSAs, and prints out to list-tabled reStructuredText. An example of this data structure iscore_summary_data.json.
- covid19_stats.engine.core.get_boundary_dict(fips_collection)
Returns a
dictof FIPS code to the collection of geographic areas for that county. The geographic data comes fromfips_data_2019.
- covid19_stats.engine.core.get_clustering_fips(collection_of_fips, adj=None)
Finds the separate clusters of counties or territorial units that are clustered together. This is used to identify possibly different clusters of counties that may be separate from each other. If one does not supply an adjacency
dict, it uses the adjacency dictionary thatfips_adj_2018returns. Look atfips_2019_adj.pkl.gzto see what this dictionary looks like.- Parameters:
collection_of_fips (list) – the
listof counties or territorial units, each identified by its FIPS code.adj (dict) – optionally specified adjacency dictionary. Otherwise it uses the
fips_adj_2018returned dictionary. Look atfips_2019_adj.pkl.gzto see what this dictionary looks like.
- Returns:
a
listof counties clustered together. Each cluster is asetof FIPS codes of counties grouped together.- Return type:
- covid19_stats.engine.core.get_county_state(fips)
- covid19_stats.engine.core.get_data_fips(fips)
Given a county identified by its FIPS code, Returns the COVID-19 cumulative cases and deaths record of a single county or territorial unit identified by its FIPS code. Takes the data from the cumulative cases and deaths record of the NY Times COVID-19 database (see
all_counties_nytimes_covid19_data).- Parameters:
fips (str) – the FIPS code of the county or territorial unit.
- Returns:
a two-element
tuple. First element is thefips, and the second is theDataFramerepresenting the cumulative COVID-19 cases and deaths ordered by earliest to latest date. ThisDataFramehas three columns:dateis thedateof recorded incidence in that county,cases_<fips>is the cumulative COVID-19 cases on thatdate, anddeaths_<fips>is the cumulative COVID-19 deaths on thatdate. Here,<fips>is the FIPS code of that county.- Return type:
- covid19_stats.engine.core.get_fips_msa(county, state)
Given a county and state of a county or territorial unit, returns its FIPS code and the data structure on the MSA in which this county lies.
- covid19_stats.engine.core.get_incident_data(data=None, multiprocess=True)
Given geographical information on a region, will return COVID-19 cumulative statistics on all the counties or territorial units of that structure. Best to show by example.
For example, for the
bayeareaMSA, the output incident data structure for 26 February 2021 lives incore_incident_data_bayarea.pkl.gz. This structure is adictwith the following keys and values.bboxis a 4-elementtupleof the region bounding box: minimum lat/lng, and maximum lat/lng.boundariesis adictof boundary information. Each key is the FIPS code, and its value is alistof boundary lat/lngs for that county or territorial unit. Look atgis_calculate_total_bbox_sacramento.pkl.gzfor an example of this data structure.last dayis the number of days (from first COVID-19 incident) in this incident data set.dfis theDataFramethat contains COVID-19 cumulative case and death data for all counties or territorial units in that region.df_1dayis theDataFramethat contains the 1-day averaged COVID-19 new case and death data for all counties or territorial units in that region.df_7dayis theDataFramethat contains the 7-day averaged COVID-19 new case and death data for all counties or territorial units in that region.prefixis thestringinherited from the inputprefixkey in thedatadict.region nameis thestringinherited from the inputregion namekey in thedatadict.populationis theintinherited from the inputpopulationkey in thedatadict.fipsis thesetinherited from the inputfipskey in thedatadict.
This
Pandas DataFrame, located under thedfkey, has the following columns ordered by first to last incident date.days_from_beginningis the day relative to the first incident. It starts at 0 and ends atlast day.datecontains thedateof the incident day, from first to last.casesare the cumulative COVID-19 cases for the whole region from first to last incident date.deathare the cumulative COVID-19 deaths for the whole region from first to last incident date.cases_<NUM>are the cumulative COVID-19 cases for a given county or territorial unit in the region (<NUM> is its FIPS code) from first to last incident date.deaths_<NUM>are the cumulative COVID-19 deaths for a given county or territorial unit in the region (<NUM> is its FIPS code) from first to last incident date.
- Parameters:
data (dict) – Optional argument, but if specified is the geographical information of a region. By default is the
bayareaMSA. See St. Louis data for an example of an MSA. See Rhode Island data for an example of a US state or territory. See CONUS data for the CONUS.multiprocess (bool) – if
True, then use multiprocessing to get the incident data information, otherwise do not. Default isTrue.
- Returns:
the
dictdescribed above, seecore_incident_data_bayarea.pkl.gz.- Return type:
- covid19_stats.engine.core.get_max_cases_county(inc_data)
Convenience method that returns a
dictof the FIPS code, county, state, and cases for the county or territorial unit, for the county in a region that has the worst number of COVID-19 cases.- Parameters:
inc_data (dict) – the incident data structure for a region. See
get_incident_datafor what this output looks like.- Returns:
a
dictof summary information on the worst-perfoming county in the region, COVID-19 case wise. Forcore_incident_data_bayarea.pkl.gz, this is,{'fips': '06085', 'cases': 94366, 'county': 'Santa Clara County', 'state': 'California'}
- Return type:
- covid19_stats.engine.core.get_maximum_cases(inc_data)
Convenience method that returns a two-element
tupleof the FIPS code and number of COVID-19 cases, for the worst-performing county, case-wise.- Parameters:
inc_data (dict) – the incident data structure for a region. See
get_incident_datafor what this output looks like.- Returns:
the two-element
tupleof FIPS code and cumulative number of COVID-19 cases. Forcore_incident_data_bayarea.pkl.gz, this is,('06085', 94366)
- Return type:
- covid19_stats.engine.core.get_mp4_album_name(inc_data)
This method operates on MP4 movie output from command line tools that produce COVID-19 case and death summary movies – such as covid19_create_movie_or_summary, covid19_state_summary, or covid19_movie_updates – or methods that create MP4 files – such as
create_summary_cases_or_deaths_movie_frombeginningorcreate_summary_movie_frombeginning.It determines whether the geographic region is classified as a
STATE,CONUS,METROPOLITAN STATISTICAL AREA, orCUSTOM REGION. It is used in four CLI functionalities:movie mode functionality for covid19_create_movie_or_summary.
movie cases death mode functionality for covid19_create_movie_or_summary.
movie cases death mode functionality for covid19_state_summary.
- Parameters:
inc_data (dict) –
the
dictthat contains geographic data on the identified region. See St. Louis data for an example of an MSA geographic datadict. See Rhode Island data for an example of a US state or territory. See CONUS data for the CONUS.{'prefix': 'providence', 'region name': 'Providence Metro Area', 'fips': {'25005', '44001', '44003', '44005', '44007', '44009'}, 'population': 1624578}
- Returns:
If the
prefixis one of the MSAs,METROPOLITAN STATISTICAL AREA. If itsprefixis identified as a state, thenSTATE. If its prefix isconus, thenCONUS.- Return type:
- covid19_stats.engine.core.get_msa_data(msaname)
- Parameters:
msaname (str) – the identifier name for the MSA, which must be one of the keys in the
dictthatdata_msas_2019returns.- Returns:
the MSA geographical information (see St. Louis data for example).
- Return type: