RKI Data Scraper

At the beginning of the COVID-19 pandemic I felt the need to play around with the RKI data myself and found out, that it is not so easy to get to the data.

After a while I found a CSV file with the corona data of the RKI and wrote a script to parse it into a pandas data frame:

import pandas as pd
import requests

def scrape(rki_csv_path='covid_19.csv'):
    """Download the RKI CSV and write it to given path."""

    url = 'https://opendata.arcgis.com/datasets/dd4580c810204019a7b8eb3e0b329dd6_0.csv'
    r = requests.get(url, allow_redirects=True)
    open(rki_csv_path, 'wb').write(r.content)

def get_rki(rki_csv_path='covid_19.csv'):
    """Read the RKI CSV from path and return a Pandas dataframe."""

    def dateparser(x):
        """Parse RKI CSV date formats (there a two different ones)."""
        try:
            return pd.datetime.strptime(x, "%d.%m.%Y, %H:%M Uhr")
        except ValueError:
            return pd.datetime.strptime(x, "%Y/%m/%d %H:%M:%S")

    rki = pd.read_csv(rki_csv_path,
                      index_col=0,
                      parse_dates=[8, 10, 13],
                      date_parser=dateparser)
    return rki

# MAIN
scrape()
rki = get_rki()

# Data for Germany:
deutschland = rki[['AnzahlFall', 'AnzahlTodesfall', 'Meldedatum']].groupby('Meldedatum').sum().cumsum()
deutschland.to_csv('covid-19_germany.csv')

# Data for Baden-Württemberg:
bw = rki.query("Bundesland == 'Baden-Württemberg'")[['AnzahlFall', 'AnzahlTodesfall', 'Meldedatum']].groupby('Meldedatum').sum().cumsum()
bw.to_csv('covid-19_bw.csv')

# Data for the Ostalbkreis:
oak = rki.query("Landkreis == 'LK Ostalbkreis'")[['AnzahlFall', 'AnzahlTodesfall', 'Meldedatum']].groupby('Meldedatum').sum().cumsum()
oak.to_csv('covid-19_oak.csv')

Other sources for COVID19 data are:

Johns Hopkins University
Worldometer.info
Süddeutsche Zeitung
or, if you want to create your own simulation: CovidSim