RKI Data Scraper
At the beginning of the COVID-19 pandemic I felt the need to play around with the RKI data myself and found out, that it is not so easy to get to the data.
After a while I found a CSV file with the corona data of the RKI and wrote a script to parse it into a pandas data frame:
import pandas as pd
import requests
def scrape(rki_csv_path='covid_19.csv'):
"""Download the RKI CSV and write it to given path."""
url = 'https://opendata.arcgis.com/datasets/dd4580c810204019a7b8eb3e0b329dd6_0.csv'
r = requests.get(url, allow_redirects=True)
open(rki_csv_path, 'wb').write(r.content)
def get_rki(rki_csv_path='covid_19.csv'):
"""Read the RKI CSV from path and return a Pandas dataframe."""
def dateparser(x):
"""Parse RKI CSV date formats (there a two different ones)."""
try:
return pd.datetime.strptime(x, "%d.%m.%Y, %H:%M Uhr")
except ValueError:
return pd.datetime.strptime(x, "%Y/%m/%d %H:%M:%S")
rki = pd.read_csv(rki_csv_path,
index_col=0,
parse_dates=[8, 10, 13],
date_parser=dateparser)
return rki
# MAIN
scrape()
rki = get_rki()
# Data for Germany:
deutschland = rki[['AnzahlFall', 'AnzahlTodesfall', 'Meldedatum']].groupby('Meldedatum').sum().cumsum()
deutschland.to_csv('covid-19_germany.csv')
# Data for Baden-Württemberg:
bw = rki.query("Bundesland == 'Baden-Württemberg'")[['AnzahlFall', 'AnzahlTodesfall', 'Meldedatum']].groupby('Meldedatum').sum().cumsum()
bw.to_csv('covid-19_bw.csv')
# Data for the Ostalbkreis:
oak = rki.query("Landkreis == 'LK Ostalbkreis'")[['AnzahlFall', 'AnzahlTodesfall', 'Meldedatum']].groupby('Meldedatum').sum().cumsum()
oak.to_csv('covid-19_oak.csv')
Other sources for COVID19 data are:
- Johns Hopkins University
- Worldometer.info
- Süddeutsche Zeitung
- or, if you want to create your own simulation: CovidSim