4.4. covid19_stats.engine.core module
This module provides the core functionality that the covid19_stats
command line tools use to summarize and visualize COVID-19 case and death statistics. Here are utility methods to identify data on geographical regions such as MSAs, states, or the CONUS; to create cumulative case and death data for geographical data; and to print out summary reports of COVID-19 cases and deaths for MSAs in different formats.
- covid19_stats.engine.core.create_readme_from_template(mainURL='https://tanimislam.sfo3.digitaloceanspaces.com/covid19movies', dirname_for_readme_location='/usr/WS2/islam5/covid19_stats/docsrc/source', verify=True, topN_json=None)
This recreates the
README.rst
to reflect the latest COVID-19 data, using the Jinja2 templatedREADME_template.rst
. This is the back-end method to covid19_update_readme.- Parameters:
mainURL (str) – the URL directory for which to look for a manifest JSON file of cumulative COVID-19 cases and deaths for the top population MSAs,
covid19_topN_LATEST.json
. This manifest file’s location is<mainURL>/covid19_topN_LATEST.json
. By default, this is https://tanimislam.sfo3.digitaloceanspaces.com/covid19movies.dirname_for_readme_location (str) – the location, on disk, where the covid19_stats downloaded repository lives. By default this is the current working directory.
verify (bool) – optional argument, whether to verify SSL connections. Default is
True
.topN_json (str) – optional argument, the location of the manifest JSON file on disk. If specified, then this method ignores the online location,
<mainURL>/covid19_topN_LATEST.json
.
- covid19_stats.engine.core.display_tabulated_metros(form='simple', selected_metros=None)
Prints summary COVID-19 cumulative cases and deaths of all or selected MSAs to stdout a nice tabulated table, in either a simple format with
simple
, Github flavored Markdown withgithub
, reStructuredText withrst
, or list-tabled reStructuredText withrst-simple
.Otherwise, if one chooses
json
, then returns alist
of that information.This acts as an API back-end to summarizing MSAs functionality in covid19_create_movie_or_summary. Please see demonstration output for what this data looks like on screen.
- Parameters:
form (str) –
If one of
simple
,github
,rst
, orrst-simple
, then prints the table of MSA summary COVID-19 data to screen. If one ofsimple
,github
, orrst
, then usestabulate
to format the data.If
json
, then returns alist
of summary data of COVID-19 cumulative cases and deaths for all or specified MSAs asdict
entries. Thislist
is sorted from largest MSA population to smallest. An example output iscore_summary_data.json
. Here are the first two entries,[{'RANK': 1, 'PREFIX': 'nyc', 'NAME': 'NYC Metro Area', 'POPULATION': 19216182, 'FIRST INC.': '01 March 2020', 'NUM DAYS': 324, 'NUM CASES': 1390557, 'NUM DEATHS': 50378, 'MAX CASE COUNTY': 541846, 'MAX CASE COUNTY NAME': 'New York City, New York'}, {'RANK': 2, 'PREFIX': 'losangeles', 'NAME': 'LA Metro Area', 'POPULATION': 18711436, 'FIRST INC.': '25 January 2020', 'NUM DAYS': 360, 'NUM CASES': 1828244, 'NUM DEATHS': 21240, 'MAX CASE COUNTY': 1032277, 'MAX CASE COUNTY NAME': 'Los Angeles County, California'}]
selected_metros (list) – Optional argument. By default, will print or return information on all MSAs. Otherwise specify the
list
of MSAs available as keys of thedata_msas_2019
dictionary.
- covid19_stats.engine.core.display_tabulated_metros_fromjson(summary_data_json)
Takes the
list
data of summary COVID-19 death in MSAs, and prints out to list-tabled reStructuredText. An example of this data structure iscore_summary_data.json
.
- covid19_stats.engine.core.get_boundary_dict(fips_collection)
Returns a
dict
of FIPS code to the collection of geographic areas for that county. The geographic data comes fromfips_data_2019
.
- covid19_stats.engine.core.get_clustering_fips(collection_of_fips, adj=None)
Finds the separate clusters of counties or territorial units that are clustered together. This is used to identify possibly different clusters of counties that may be separate from each other. If one does not supply an adjacency
dict
, it uses the adjacency dictionary thatfips_adj_2018
returns. Look atfips_2019_adj.pkl.gz
to see what this dictionary looks like.- Parameters:
collection_of_fips (list) – the
list
of counties or territorial units, each identified by its FIPS code.adj (dict) – optionally specified adjacency dictionary. Otherwise it uses the
fips_adj_2018
returned dictionary. Look atfips_2019_adj.pkl.gz
to see what this dictionary looks like.
- Returns:
a
list
of counties clustered together. Each cluster is aset
of FIPS codes of counties grouped together.- Return type:
- covid19_stats.engine.core.get_county_state(fips)
- covid19_stats.engine.core.get_data_fips(fips)
Given a county identified by its FIPS code, Returns the COVID-19 cumulative cases and deaths record of a single county or territorial unit identified by its FIPS code. Takes the data from the cumulative cases and deaths record of the NY Times COVID-19 database (see
all_counties_nytimes_covid19_data
).- Parameters:
fips (str) – the FIPS code of the county or territorial unit.
- Returns:
a two-element
tuple
. First element is thefips
, and the second is theDataFrame
representing the cumulative COVID-19 cases and deaths ordered by earliest to latest date. ThisDataFrame
has three columns:date
is thedate
of recorded incidence in that county,cases_<fips>
is the cumulative COVID-19 cases on thatdate
, anddeaths_<fips>
is the cumulative COVID-19 deaths on thatdate
. Here,<fips>
is the FIPS code of that county.- Return type:
- covid19_stats.engine.core.get_fips_msa(county, state)
Given a county and state of a county or territorial unit, returns its FIPS code and the data structure on the MSA in which this county lies.
- covid19_stats.engine.core.get_incident_data(data=None, multiprocess=True)
Given geographical information on a region, will return COVID-19 cumulative statistics on all the counties or territorial units of that structure. Best to show by example.
For example, for the
bayearea
MSA, the output incident data structure for 26 February 2021 lives incore_incident_data_bayarea.pkl.gz
. This structure is adict
with the following keys and values.bbox
is a 4-elementtuple
of the region bounding box: minimum lat/lng, and maximum lat/lng.boundaries
is adict
of boundary information. Each key is the FIPS code, and its value is alist
of boundary lat/lngs for that county or territorial unit. Look atgis_calculate_total_bbox_sacramento.pkl.gz
for an example of this data structure.last day
is the number of days (from first COVID-19 incident) in this incident data set.df
is theDataFrame
that contains COVID-19 cumulative case and death data for all counties or territorial units in that region.df_1day
is theDataFrame
that contains the 1-day averaged COVID-19 new case and death data for all counties or territorial units in that region.df_7day
is theDataFrame
that contains the 7-day averaged COVID-19 new case and death data for all counties or territorial units in that region.prefix
is thestring
inherited from the inputprefix
key in thedata
dict
.region name
is thestring
inherited from the inputregion name
key in thedata
dict
.population
is theint
inherited from the inputpopulation
key in thedata
dict
.fips
is theset
inherited from the inputfips
key in thedata
dict
.
This
Pandas DataFrame
, located under thedf
key, has the following columns ordered by first to last incident date.days_from_beginning
is the day relative to the first incident. It starts at 0 and ends atlast day
.date
contains thedate
of the incident day, from first to last.cases
are the cumulative COVID-19 cases for the whole region from first to last incident date.death
are the cumulative COVID-19 deaths for the whole region from first to last incident date.cases_<NUM>
are the cumulative COVID-19 cases for a given county or territorial unit in the region (<NUM> is its FIPS code) from first to last incident date.deaths_<NUM>
are the cumulative COVID-19 deaths for a given county or territorial unit in the region (<NUM> is its FIPS code) from first to last incident date.
- Parameters:
data (dict) – Optional argument, but if specified is the geographical information of a region. By default is the
bayarea
MSA. See St. Louis data for an example of an MSA. See Rhode Island data for an example of a US state or territory. See CONUS data for the CONUS.multiprocess (bool) – if
True
, then use multiprocessing to get the incident data information, otherwise do not. Default isTrue
.
- Returns:
the
dict
described above, seecore_incident_data_bayarea.pkl.gz
.- Return type:
- covid19_stats.engine.core.get_max_cases_county(inc_data)
Convenience method that returns a
dict
of the FIPS code, county, state, and cases for the county or territorial unit, for the county in a region that has the worst number of COVID-19 cases.- Parameters:
inc_data (dict) – the incident data structure for a region. See
get_incident_data
for what this output looks like.- Returns:
a
dict
of summary information on the worst-perfoming county in the region, COVID-19 case wise. Forcore_incident_data_bayarea.pkl.gz
, this is,{'fips': '06085', 'cases': 94366, 'county': 'Santa Clara County', 'state': 'California'}
- Return type:
- covid19_stats.engine.core.get_maximum_cases(inc_data)
Convenience method that returns a two-element
tuple
of the FIPS code and number of COVID-19 cases, for the worst-performing county, case-wise.- Parameters:
inc_data (dict) – the incident data structure for a region. See
get_incident_data
for what this output looks like.- Returns:
the two-element
tuple
of FIPS code and cumulative number of COVID-19 cases. Forcore_incident_data_bayarea.pkl.gz
, this is,('06085', 94366)
- Return type:
- covid19_stats.engine.core.get_mp4_album_name(inc_data)
This method operates on MP4 movie output from command line tools that produce COVID-19 case and death summary movies – such as covid19_create_movie_or_summary, covid19_state_summary, or covid19_movie_updates – or methods that create MP4 files – such as
create_summary_cases_or_deaths_movie_frombeginning
orcreate_summary_movie_frombeginning
.It determines whether the geographic region is classified as a
STATE
,CONUS
,METROPOLITAN STATISTICAL AREA
, orCUSTOM REGION
. It is used in four CLI functionalities:movie mode functionality for covid19_create_movie_or_summary.
movie cases death mode functionality for covid19_create_movie_or_summary.
movie cases death mode functionality for covid19_state_summary.
- Parameters:
inc_data (dict) –
the
dict
that contains geographic data on the identified region. See St. Louis data for an example of an MSA geographic datadict
. See Rhode Island data for an example of a US state or territory. See CONUS data for the CONUS.{'prefix': 'providence', 'region name': 'Providence Metro Area', 'fips': {'25005', '44001', '44003', '44005', '44007', '44009'}, 'population': 1624578}
- Returns:
If the
prefix
is one of the MSAs,METROPOLITAN STATISTICAL AREA
. If itsprefix
is identified as a state, thenSTATE
. If its prefix isconus
, thenCONUS
.- Return type:
- covid19_stats.engine.core.get_msa_data(msaname)
- Parameters:
msaname (str) – the identifier name for the MSA, which must be one of the keys in the
dict
thatdata_msas_2019
returns.- Returns:
the MSA geographical information (see St. Louis data for example).
- Return type: