HDB resale price predictor
Supervised learning on Singapore HDB resale transactions from data.gov.sg (resale flat prices from Jan 2017 onwards). The pipeline trains and compares several regressors; the trained bundle powers a Streamlit app and a Flask HTML form for indicative price estimates from town, flat type, model, storey range, floor area, lease, and valuation year.
Overview
Data is fetched from the official dataset, preprocessed (scaling, one-hot encoding), and augmented with features such as remaining lease and flat age. Models include linear regression, Random Forest, XGBoost, and LightGBM, with metrics like RMSE, MAE, and R², plus plots and SHAP-style explanations in the offline pipeline. The hosted Streamlit app uses the exported model bundle—the same inputs as the local Flask UI.
Tech & Tools
- Python, Pandas
- scikit-learn, XGBoost, LightGBM
- Streamlit; Flask for the optional local HTML form
- Pipeline and bundle export under
hdb_ml/
Links
Predictions are indicative only; markets drift and the README discusses retraining, time-based splits, and ethical use. See the repo for fetching data from data.gov.sg, running run_pipeline.py, and exporting models/model_bundle.joblib for Streamlit (Community Cloud expects the bundle committed under models/).