Machine Learning for Credit Risk Assessment Credit Default Prediction Model

Late credit card payments disrupt cash flow and increase bad debt. This project aims to build a classification model to identify high-risk customers early, enabling proactive collection strategies.

🎯 The Objective

Develop a model to predict Unpaid Tagging (Default) with a target Recall > 60%.

⚠️ Why Recall?

Minimizing False Negatives is crucial. We must catch as many at-risk customers as possible, even if it means flagging some safe ones.

🧪 Experimental Approach

To find the best predictor, I conducted two scenarios using Logistic Regression, Gradient Boosting, and Random Forest:

Experiment 1 (Annual Review): Analyzing behavior over the last 12 months (Q1-Q4).
Experiment 2 (Semester Review): Focusing on recent behavior over the last 6 months (Q3-Q4).

📊 Feature Importance (Gradient Boosting)

📉 Model Performance Results

Algorithm	Test Accuracy	Test Recall	Validation Recall
Logistic Regression	77.7%	43.5%	26.2%
Gradient Boosting (Exp 1)	68.6%	60.6% ✅	44.9% 📉
Random Forest	80.8%	34.0%	35.0%

💡 Evaluation & Next Steps

Key Insight: Vintage_CR (Credit Card Tenure) and Delta Balance (Balance Fluctuation) are the strongest predictors. This suggests that how long a customer has been with us and how drastically their balance changes are the biggest indicators of default risk.

The Challenge: While Gradient Boosting achieved the 60% Recall target on the Test data, it dropped to ~45% on the Validation set. This indicates the model struggles to generalize to completely unseen data (Potential Overfitting).

Optimization Plan: To improve robustness, the next iteration will focus on:

Oversampling (SMOTE): To handle the class imbalance (Defaulters are minority).
Feature Selection: Removing low-impact variables to reduce noise.
Extending Data Horizon: Using more than 1 year of historical data for better trend capture.

View Project on GitHub / Colab

Wanda's Portfolio

Machine Learning for Credit Risk Assessment Credit Default Prediction Model

🎯 The Objective

⚠️ Why Recall?

🧪 Experimental Approach

📉 Model Performance Results

💡 Evaluation & Next Steps

Post a Comment

Interactive Dashboard for Profitability & Trend Analysis in Microsoft Excel US Superstore Sales Intelligence

Strategic Analysis of Customer Churn of TelcoNet: A Foundational Statistics

Central System of UMKM (CSU): Optimizing National Economic Recovery through Integrated Data Science (K-Means Clustering, Naive Bayes, MBA)

End-to-End Data Warehousing Analysis: Leveraging CTEs, Window Functions, and SQL Views

Implementation of K-Means Clustering in Gritkost: A Next-Level Housing Concept to Boost Student Grit

Market Basket Analysis & Product Bundling Optimizing Cross-Selling Strategy using Apriori Algorithm