Customer Churn Prediction

Objective: Predict telecom customer churn with XGBoost + SHAP explainability, deployed as a live API and dashboard.
Stack: Python · XGBoost · FastAPI · Streamlit · Docker · HuggingFace
Dataset: IBM Telco Customer Churn — 7,043 customers, 21 features
Result: 0.85+ AUC-ROC · End-to-end deployed on HuggingFace Spaces

Project Overview

Customer churn prediction is a classic binary classification problem in the telecom domain. The goal is to predict whether a customer will stop using a service (churn = 1) or continue (churn = 0) based on their usage patterns, demographics, and subscription details.

"It costs 5–25x more to acquire a new customer than to retain an existing one. Even a 5% reduction in churn can increase profits by 25–95%. Every prediction in this project comes with a SHAP explanation so business teams understand WHY a customer is at risk."

Dataset: IBM Telco Customer Churn (Kaggle) — 7,043 customers, 21 features. Target distribution: 73.5% Stay, 26.5% Churn — class imbalanced. Features include 3 numerical (tenure, MonthlyCharges, TotalCharges) and 16 categorical (Contract, PaymentMethod, InternetService, etc.).

Phase 1 — Exploratory Data Analysis

EDA is the process of understanding data before modelling — exploring patterns, distributions, correlations, and anomalies.

Key steps: Loaded dataset with pandas, checked shape/dtypes/missing values. Found TotalCharges was stored as string due to 11 blank entries — converted using pd.to_numeric(errors='coerce'). Visualized churn distribution and plotted distributions for all features.

Key findings from EDA:

Phase 2 — Data Preprocessing

Preprocessing converts raw data into a format ML models can understand. Models can't work with strings or missing values — everything must be numerical and clean.

Pipeline steps: Removed customerID (no predictive power). Fixed TotalCharges. Encoded target (Yes → 1, No → 0). Train/Test split 80/20 with stratify=y. Built a ColumnTransformer: StandardScaler for numerical, OneHotEncoder for categorical. Applied SMOTE only on training data to handle class imbalance.

Technique What It Does Why Used Here
StandardScaler Normalizes features to mean=0, std=1 Prevents TotalCharges from dominating the model
OneHotEncoder Converts categories to binary columns ML models need numbers, not strings
SMOTE Creates synthetic minority class examples Balances 4,100 Stay vs 1,400 Churn → 4,100 vs 4,100
Pipeline Bundles preprocessing + model together Ensures same transformations in training and production
"SMOTE is applied ONLY on training data — never on test data. Applying SMOTE to test data causes data leakage, where the model 'sees' test data during training and reports fake accuracy."

Phase 3 — Model Training & Selection

Four models were trained and compared. XGBoost was selected as the final model based on AUC-ROC performance.

Model Characteristics Role in Project
Logistic Regression Simple, interpretable, fast Baseline
Random Forest Ensemble of trees, handles non-linearity Comparison
XGBoost Gradient boosted trees, regularization, handles missing values Final model (best AUC)
LightGBM Faster than XGBoost, histogram-based Comparison

Hyperparameter tuning with Optuna: 50 trials using Bayesian optimization, optimizing for AUC-ROC via cross-validation. Parameters tuned: n_estimators, max_depth, learning_rate, subsample, colsample_bytree, reg_alpha, reg_lambda. All experiments tracked with MLflow.

Why AUC-ROC and not accuracy? With 73.5% non-churners, a model that predicts "never churn" achieves 73.5% accuracy but is completely useless. AUC-ROC measures the model's ability to rank churners above non-churners regardless of threshold. Our model achieved 0.85+ AUC-ROC.

Phase 4 — SHAP Explainability

SHAP (SHapley Additive exPlanations) explains WHY the model made a specific prediction. It assigns an importance value to each feature for each individual prediction, grounded in game theory (Shapley values).

How SHAP values work: A positive SHAP value pushes the prediction toward Churn. A negative SHAP value pushes toward No Churn. The sum of all SHAP values plus the base value equals the final prediction score.

Visualizations created:

Key SHAP findings: Low tenure is the strongest predictor of churn. High MonthlyCharges increases churn risk. Month-to-month contract is a strong churn indicator. Absence of TechSupport or OnlineSecurity significantly increases churn risk.

Phase 5 — FastAPI Backend

FastAPI is a modern Python web framework for building REST APIs. The ML model lives in the API — the dashboard sends customer data and receives predictions back. This separation allows the model to be used by any frontend.

Endpoint Method Purpose
/health GET Returns API status — used for monitoring
/ GET Root endpoint, confirms API is running
/predict POST Single customer prediction + SHAP explanation
/predict/batch POST Batch predictions for multiple customers
"Critical bug fixed: the model was loaded BEFORE the FastAPI app was created. HuggingFace's health check hit /health immediately on startup, but the app wasn't ready yet, causing a timeout crash. Fix: create app = FastAPI() first, register /health second, then load the model third."

Phase 6 — Streamlit Dashboard

Streamlit is a Python library for building interactive web apps without writing HTML/CSS/JavaScript. Three pages were built: a Home overview, a Single Prediction form (19 customer features → churn probability + SHAP), and a Batch Prediction page (CSV upload → download results).

How the dashboard talks to the API: User fills form → app.py sends POST request to FastAPI → FastAPI returns JSON → app.py displays result. The dashboard requires no ML libraries — all ML work happens in the API.

Environment fix: Local machine uses http://127.0.0.1:8000, HuggingFace uses the live URL. os.getenv() was used so the URL can be set via environment variable without changing code.

Phase 7 — Docker & Deployment

Docker packages the application and all dependencies into a container — a portable, self-contained unit that runs the same everywhere. Each HuggingFace Space is a Git repository: push code and it automatically builds and deploys.

HuggingFace Spaces specs: Free tier offers 512MB RAM, 0.1 CPU. Port 7860 is the only exposed port. Spaces sleep after 48h inactivity and wake on visit (1–2 min cold start).

Key Docker issues resolved:

Problems Faced & Solutions

Problem Cause Fix
Streamlit blank screen Python 3.14 incompatibility Downgraded to Python 3.11.9
Windows path errors Hardcoded Windows paths broke on other OS Used pathlib.Path throughout
Git push 403 GitHub no longer accepts passwords Used Personal Access Token (PAT)
HF push rejected Binary files (.png, .pkl) too large for HF git Used git filter-branch to remove history
numpy conflict shap needs numpy≥2, streamlit 1.32 needs numpy<2 Upgraded to streamlit 1.45+
scikit-learn mismatch Model trained on 1.8.0, Docker installed 1.3.2 Pinned scikit-learn==1.8.0 in requirements
API timeout on HF Model loaded before FastAPI app was created Moved app = FastAPI() to top of main.py
Port blocked on HF API ran on 8000, HF only exposes 7860 Changed port to 7860 in Dockerfile CMD

Designed and crafted with ❤️ by Biswajit Pradhan