CIS 9660 - Group 2 Final Presentation
2026-05-11
CIS 9660 - Data Mining for Business Analytics
Baruch College > Spring 2026
Research Question
Can we predict whether a telecom customer will churn based on their contract type, monthly charges, and service usage?
Predictive: Build a model that correctly classifies customers as likely to churn or stay
Inferential: Test whether the combination of InternetService and Contract type significantly affects churn
🔗 kaggle.com/datasets/blastchar/telco-customer-churn
| Property | Value |
|---|---|
| Observations | 7,032 (after cleaning) |
| Features | 23 (after encoding) |
| Target | Churn (Yes / No) |
| Churn Rate | 26.6% |
TotalCharges.map()InternetService, Contract, PaymentMethodMonthlyCharges, TotalCharges, tenureContract Type
| Contract | Churn Rate |
|---|---|
| Month-to-month | 42.7% |
| One year | 11.3% |
| Two year | 2.8% |
Internet Service
| Service | Churn Rate |
|---|---|
| Fiber optic | 41.9% |
| DSL | 19.0% |
| No internet | 7.4% |
Continuous Variables
| Variable | No Churn | Churn |
|---|---|---|
| Monthly Charges (median) | $64 | $80 |
| Tenure (median) | 38 mo | 10 mo |
Payment Method
| Method | Churn Rate |
|---|---|
| Electronic check | 45.3% |
| Bank transfer | 16.7% |
| Credit card | 15.3% |
| Mailed check | 19.2% |
Correlation: -0.35
Strongest numeric predictor → longer-tenured customers are significantly less likely to leave
Correlation: +0.19
Higher bills are associated with increased churn risk
Correlation with tenure: 0.83
Excluded from modeling to avoid multicollinearity
Logistic regression using all features after preprocessing
max_iter=1000 for convergenceExtends baseline with 4 interaction terms
Contract1 × FiberOpticContract2 × FiberOpticContract1 × NoInternetContract2 × NoInternetMonth-to-month × DSL serve as the reference baseline
| Metric | Baseline | Interaction | Target |
|---|---|---|---|
| ROC-AUC | 0.8289 | 0.8341 | ≥ 0.80 ✅ |
| Accuracy | 79.8% | 80.1% | ≥ 80% ✅ |
| Churn Recall | 56.4% | 56.4% | ≥ 75% ❌ |
| Churn Precision | 63.6% | 64.0% | — |
Recall falls short at the default threshold → addressed through threshold tuning
| Threshold | Accuracy | Churn Recall | Precision |
|---|---|---|---|
| 0.50 | 80.1% | 56.4% | 64.0% |
| 0.40 | 78.3% | 67.1% | 57.8% |
| 0.30 | 74.5% | 75.1% | 51.4% |
At threshold = 0.30 the model catches ~75% of actual churners ✅
The trade-off is acceptable: Missing a churner = losing a customer
Flagging a loyal customer = small cost of a discount offer
It’s not just fiber optic that drives churn, it’s fiber optic customers who are locked in that are most at risk
| Feature | Coefficient |
|---|---|
| InternetService_fiber_optic | +1.33 |
| MonthlyCharges | +0.72 |
| StreamingTV | +0.49 |
| PaymentMethod_electronic_check | +0.41 |
| Feature | Coefficient |
|---|---|
| tenure | -0.82 |
| Contract_two_year | -0.69 |
| InternetService_no | -1.02 |
| Contract_one_year | -0.29 |
Target month-to-month customers with predicted churn probability ≥ 0.30 with discounted annual contract offers
Loyalty discounts for high-charge customers in the top risk quartile
The 41.9% churn rate, especially among long-term fiber optic customers, signals a service quality or pricing issue worth investigating
Churn risk is highest in the first 10 months ✅
Invest in onboarding and early retention programs!
The four-term interaction model revealed that it’s not simply fiber optic that drives churn, but rather the long-term contract holders on fiber optic who are most at risk, suggesting dissatisfaction with a service they feel locked into
statsmodels: identified which specific interaction terms are statistically significant* p<0.05, ** p<0.01, *** p<0.001)Both models agree on the same top predictors (tenure, contract type, and fiber optic service) strengthening confidence that these reflect genuine patterns and supporting the choice of the simpler, more interpretable logistic regression
Dataset: Telco Customer Churn (Kaggle) 7,032 obs × 23 features
Method: Logistic Regression + 4 Interaction Terms Contract × InternetService
Key Finding: Long-term fiber optic customers are the highest-risk churn segment
Full Report: RaulSolaNavarro.github.io/CIS9660-2026-SPRING/churn-report.html
CIS 9660 · Group 2 Project · Spring 2026