[ This Article was originally published on medium on May29, 2020 ]
As Corona virus is spreading exponentially and has caused so much damage to the mankind across the globe, it will be interesting to analyse and predict the trend of spread.
Here we’ll be performing time series analysis on covid-19 infected patients in India and will perform the forecasting for next 2 months ( June and July , 2020)
About The Dataset
The Dataset has been downloaded from Kaggle and have been formatted and cleaned before using it.
Importing Necessary Libraries
Load The Dataset
Below graph shows the Covid-19 infection count ( 01 Mar — 28 May)
Visualizing Trend, Seasonal and Irregular components
Plotting Rolling-Mean and Rolling-Standard-Deviation for the dataset
Performing Augmented Dickey-Fuller Test to check if dataset is stationary
We can see from above ADF test that the dataset is not stationary.
Taking Log of the dataset since the dataset is not stationary
Train , Test Split of the dataset
Using AR model to Fit the train data
Checking Mean Squared Error
Plotting AR model For the Train data
Using ARIMA Model for Prediction
Fitting the model:
Plotting Predictions for 2 More months( June and July) with 95% Confidence:
Forecasting for 2 Months:
Plotting Of The Forecast
The Final Forecast
Below Graph shows the final prediction. while the Blue line represents the actual data point, the red line represents the “prediction”. The prediction is made till July, 2020.
While I have taken utmost care while analysing the dataset, however suggestions are welcome. The prediction is made based on current trend and might change if there’s any change in trend.
Hope you enjoy the reading!!!