A telecommunications company is concerned about the number of customers leaving their land-line business for cable competitors. They need to understand who is leaving. We will help the company by finding out who is leaving and why.
In this article, we will be creating a Machine learning model for The Telecommunication Company using “Logistic Regression”.
Understanding The Dataset
We will be telecommunications dataset for predicting customer churn. This is a historical customer dataset where each row represents one customer.
The dataset includes information about:
Customers who left within the last month — the column is called Churn
Services that each customer has signed up for — phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies
Customer account information — how long they had been a customer, contract, payment method, paperless billing, monthly charges, and total charges
Demographic info about customers — gender, age range, and if they have partners and dependents
Import Important Libraries
Understand The Data
churn.describe() function will give you details like Mean, Median, IQR range and Standard Deviation
Data pre-processing and Feature selection
Let’s select some features for Modeling
‘tenure’, ‘age’, ‘address’, ‘income’, ‘ed’, ‘employ’, ‘equip’, ‘callcard’, ‘wireless’ will be used for Feature Columns
Below Correlation Heatmap doesn’t show any High correlation between any feature column and Target Variable “Churn”.
Let’s Split The Dataset in X and y.
Splitting Train and Test data
Modeling using Logistic regression
We will be using Logistic regression technique for Prediction.
We will get output similar to below:
Now we can predict using our train and test set:
1. jaccard index
2. Confusion Matrix
Another way of looking at accuracy of classifier is to look at confusion matrix.
This means 28 + 5 = 33 correct predictions & 4 + 3 = 7 false predictions.
That’s it !!!
Hope You have enjoyed the article.
Happy Learning !!!