Dealing with missing data
In the previous example, the rolling_sum column has Nan values, so we can use that data to demonstrate how to deal with missing data.
Null values appear as NaN in Data Frame when a CSV file contains null values. Fillna() handles and lets the user replace NaN values with their own values, similar to how the pandas dropna() function maintains and removes Null values from a data frame. Filling the missing values in the dataframe in a backward manner is accomplished by passing backfill as the method argument value in fillna(). Fillna() fills the missing values in the dataframe in a forward direction by passing ffill as the method parameter value.
Python3
# importing pandas import pandas as pd from datetime import datetime # reading csv file data = pd.read_csv( 'covid_data.csv' ) # converting string data to datetime data[ 'ObservationDate' ] = pd.to_datetime(data[ 'ObservationDate' ]) data[ 'Last Update' ] = pd.to_datetime(data[ 'Last Update' ]) # setting index data = data.set_index( 'ObservationDate' ) data = data[[ 'Last Update' , 'Confirmed' ]] data[ 'rolling_sum' ] = data.rolling( 5 ). sum () print (data.head()) # dealing with missing data data[ 'rolling_backfilled' ] = data[ 'rolling_sum' ].fillna(method = 'backfill' ) print (data.head( 5 )) |
Output:
Last Update Confirmed rolling_sum ObservationDate 2020-01-22 2020-01-22 17:00:00 1.0 NaN 2020-01-22 2020-01-22 17:00:00 14.0 NaN 2020-01-22 2020-01-22 17:00:00 6.0 NaN 2020-01-22 2020-01-22 17:00:00 1.0 NaN 2020-01-22 2020-01-22 17:00:00 0.0 22.0 Last Update Confirmed rolling_sum rolling_backfilled ObservationDate 2020-01-22 2020-01-22 17:00:00 1.0 NaN 22.0 2020-01-22 2020-01-22 17:00:00 14.0 NaN 22.0 2020-01-22 2020-01-22 17:00:00 6.0 NaN 22.0 2020-01-22 2020-01-22 17:00:00 1.0 NaN 22.0 2020-01-22 2020-01-22 17:00:00 0.0 22.0 22.0
Manipulating Time Series Data in Python
A collection of observations (activity) for a single subject (entity) at various time intervals is known as time-series data. In the case of metrics, time series are equally spaced and in the case of events, time series are unequally spaced. We may add the date and time for each record in this Pandas module, as well as fetch dataframe records and discover data inside a specific date and time range.