sklearn_ensemble_cv

Table of Contents

PyPI PyPI-Downloads

Ensemble-cross-validation #

sklearn_ensemble_cv is a Python module ([Github]) for performing accurate and efficient ensemble cross-validation methods from various projects.

Features #

  • The module builds on scikit-learn/sklearn to provide the most flexibility on various base predictors.
  • The module includes functions for creating ensembles of models, training the ensembles using cross-validation, and making predictions with the ensembles.
  • The module also includes utilities for evaluating the performance of the ensembles and the individual models that make up the ensembles.
from sklearn.tree import DecisionTreeRegressor
from sklearn_ensemble_cv import ECV

# Hyperparameters for the base regressor
grid_regr = {    
    'max_depth':np.array([6,7], dtype=int), 
    }
# Hyperparameters for the ensemble
grid_ensemble = {
    'max_features':np.array([0.9,1.]),
    'max_samples':np.array([0.6,0.7]),
}

# Build 50 trees and get estimates until 100 trees
res_ecv, info_ecv = ECV(
    X_train, y_train, DecisionTreeRegressor, grid_regr, grid_ensemble, 
    M=50, M_max=100, return_df=True
)

It currently supports bagging- and subagging-type ensembles under square loss. The hyperparameters of the base predictor are listed at sklearn.tree.DecisionTreeRegressor and the hyperparameters of the ensemble are listed at sklearn.ensemble.BaggingRegressor. Using other sklearn Regressors (regr.is_regressor = True) as base predictors is also supported.

Cross-validation methods #

This project is currently in development. More CV methods will be added shortly.

  • split CV
  • K-fold CV
  • ECV
  • GCV
  • CGCV
  • CGCV non-square loss
  • ALOCV

Usage #

Check out Jupyter Notebooks in the tutorials folder:

Name Description
basics.ipynb Basics about how to apply ECV/CGCV on risk estimation and hyperparameter tuning for ensemble learning.
cgcv_l1_huber.ipynb Custom CGCV for M-estimator: l1-regularized Huber ensembles.
multitask.ipynb Apply ECV on risk estimation and hyperparameter tuning for multi-task ensemble learning.

The code is tested with scikit-learn == 1.3.1.

The document is available.

The module can be installed via PyPI:

pip install sklearn-ensemble-cv