Table of Contents
Ensemble-cross-validation #
sklearn_ensemble_cv
is a Python module ([Github]) for performing accurate and efficient ensemble cross-validation methods from various projects.
Features #
- The module builds on
scikit-learn
/sklearn
to provide the most flexibility on various base predictors. - The module includes functions for creating ensembles of models, training the ensembles using cross-validation, and making predictions with the ensembles.
- The module also includes utilities for evaluating the performance of the ensembles and the individual models that make up the ensembles.
from sklearn.tree import DecisionTreeRegressor
from sklearn_ensemble_cv import ECV
# Hyperparameters for the base regressor
grid_regr = {
'max_depth':np.array([6,7], dtype=int),
}
# Hyperparameters for the ensemble
grid_ensemble = {
'max_features':np.array([0.9,1.]),
'max_samples':np.array([0.6,0.7]),
}
# Build 50 trees and get estimates until 100 trees
res_ecv, info_ecv = ECV(
X_train, y_train, DecisionTreeRegressor, grid_regr, grid_ensemble,
M=50, M_max=100, return_df=True
)
It currently supports bagging- and subagging-type ensembles under square loss.
The hyperparameters of the base predictor are listed at sklearn.tree.DecisionTreeRegressor
and the hyperparameters of the ensemble are listed at sklearn.ensemble.BaggingRegressor
.
Using other sklearn Regressors (regr.is_regressor = True
) as base predictors is also supported.
Cross-validation methods #
This project is currently in development. More CV methods will be added shortly.
- split CV
- K-fold CV
- ECV
- GCV
- CGCV
- CGCV non-square loss
- ALOCV
Usage #
Check out Jupyter Notebooks in the tutorials folder:
Name | Description |
---|---|
basics.ipynb | Basics about how to apply ECV/CGCV on risk estimation and hyperparameter tuning for ensemble learning. |
cgcv_l1_huber.ipynb | Custom CGCV for M-estimator: l1-regularized Huber ensembles. |
multitask.ipynb | Apply ECV on risk estimation and hyperparameter tuning for multi-task ensemble learning. |
The code is tested with scikit-learn == 1.3.1
.
The document is available.
The module can be installed via PyPI:
pip install sklearn-ensemble-cv