OGBoost

class ordinal_xai.models.ogboost.OGBoost(base_learner=None, n_estimators: int = 100, learning_rate: float = 0.1, learning_rate_thresh: float = 0.001, validation_fraction: float = 0.1, n_iter_no_change: int | None = None, tol: float = 0.0001, link_function: str = 'probit', subsample: float = 1.0, verbose: int = 0, random_state: int | None = None, cv_early_stopping_splits: int | None = None)[source]

Bases: BaseEstimator, BaseOrdinalModel

Ordinal Gradient Boosting Model for ordinal regression.

This class implements a wrapper around the GradientBoostingOrdinal model from the ogboost package. The model uses gradient boosting to learn ordinal relationships and is particularly effective for complex non-linear patterns in ordinal data.

Parameters:
  • base_learner (estimator, default=DecisionTreeRegressor(max_depth=3)) – The base learner used to update the latent function

  • n_estimators (int, default=100) – Maximum number of boosting iterations

  • learning_rate (float, default=0.1) – Learning rate for the latent function updates

  • learning_rate_thresh (float, default=0.001) – Learning rate for the threshold updates

  • validation_fraction (float, default=0.1) – Fraction of data to use as a holdout set for early stopping

  • n_iter_no_change (int or None, default=None) – Number of iterations with no improvement to wait before stopping early

  • tol (float, default=1e-4) – Tolerance for measuring improvement in early stopping

  • link_function ({'probit', 'logit', 'loglog', 'cloglog', 'cauchit'}, default='probit') – Link function used to transform latent scores to probabilities

  • subsample (float, default=1.0) – Fraction of samples used to fit each base learner

  • verbose (int, default=0) – Verbosity level

  • random_state (int, RandomState instance or None, default=None) – Seed or random state for reproducibility

  • cv_early_stopping_splits (int or None, default=None) – If an integer > 1, uses K-fold cross-validation for early stopping

feature_names_

Names of features used during training

Type:

list

n_features_in_

Number of features seen during training

Type:

int

ranks_

Unique ordinal class labels

Type:

ndarray

_encoder

Encoder for categorical features

Type:

OneHotEncoder

_scaler

Scaler for numerical features

Type:

StandardScaler

_model

The fitted ogboost GradientBoostingOrdinal model

Type:

GradientBoostingOrdinal

is_fitted_

Whether the model has been fitted

Type:

bool

Notes

  • The model handles both categorical and numerical features automatically

  • Categorical features are one-hot encoded

  • Numerical features are standardized

  • The model assumes ordinal classes are consecutive integers starting from 0

__init__(base_learner=None, n_estimators: int = 100, learning_rate: float = 0.1, learning_rate_thresh: float = 0.001, validation_fraction: float = 0.1, n_iter_no_change: int | None = None, tol: float = 0.0001, link_function: str = 'probit', subsample: float = 1.0, verbose: int = 0, random_state: int | None = None, cv_early_stopping_splits: int | None = None)[source]

Initialize the Ordinal Gradient Boosting Model.

Parameters:
  • base_learner (estimator, default=None) – The base learner used to update the latent function. If None, uses DecisionTreeRegressor(max_depth=3)

  • n_estimators (int, default=100) – Maximum number of boosting iterations

  • learning_rate (float, default=0.1) – Learning rate for the latent function updates

  • learning_rate_thresh (float, default=0.001) – Learning rate for the threshold updates

  • validation_fraction (float, default=0.1) – Fraction of data to use as a holdout set for early stopping

  • n_iter_no_change (int or None, default=None) – Number of iterations with no improvement to wait before stopping early

  • tol (float, default=1e-4) – Tolerance for measuring improvement in early stopping

  • link_function (str, default='probit') – Link function used to transform latent scores to probabilities

  • subsample (float, default=1.0) – Fraction of samples used to fit each base learner

  • verbose (int, default=0) – Verbosity level

  • random_state (int, RandomState instance or None, default=None) – Seed or random state for reproducibility

  • cv_early_stopping_splits (int or None, default=None) – If an integer > 1, uses K-fold cross-validation for early stopping

get_params(deep: bool = True) Dict[str, any][source]

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

Parameter names mapped to their values

Return type:

dict

set_params(**params: any) OGBoost[source]

Set the parameters of this estimator.

Parameters:

**params (dict) – Estimator parameters

Returns:

self – The estimator instance

Return type:

OGBoost

fit(X: DataFrame, y: Series) OGBoost[source]

Fit the Ordinal Gradient Boosting Model.

This method fits the model to the training data, handling both categorical and numerical features appropriately.

Parameters:
  • X (pd.DataFrame of shape (n_samples, n_features)) – Training data

  • y (pd.Series of shape (n_samples,)) – Target values

Returns:

self – The fitted model

Return type:

OGBoost

Raises:

ValueError – If the input data contains invalid values

predict(X: DataFrame) ndarray[source]

Predict ordinal class labels.

Parameters:

X (pd.DataFrame of shape (n_samples, n_features)) – Samples to predict

Returns:

Predicted ordinal class labels

Return type:

ndarray of shape (n_samples,)

Raises:

NotFittedError – If the model has not been fitted

predict_proba(X: DataFrame) ndarray[source]

Predict class probabilities.

Parameters:

X (pd.DataFrame of shape (n_samples, n_features)) – Samples to predict probabilities for

Returns:

Predicted class probabilities

Return type:

ndarray of shape (n_samples, n_classes)

Raises:

NotFittedError – If the model has not been fitted

transform(X: DataFrame, fit: bool = False, no_scaling: bool = False) DataFrame[source]

Transform input data into the format expected by the model.

This method handles both categorical and numerical features: - Categorical features are one-hot encoded - Numerical features are standardized (unless no_scaling=True)

Parameters:
  • X (pd.DataFrame of shape (n_samples, n_features)) – Input data to transform

  • fit (bool, default=False) – Whether to fit new encoder/scaler or use existing ones

  • no_scaling (bool, default=False) – Whether to skip scaling of numerical features

Returns:

Transformed data

Return type:

pd.DataFrame

Raises:

ValueError – If the input data has different features than training data

decision_function(X: DataFrame) ndarray[source]

Compute the latent function values for input samples.

This method returns the scalar value of the latent function for each observation, which can be used as a high-resolution alternative to class labels for comparing and ranking observations.

Parameters:

X (pd.DataFrame of shape (n_samples, n_features)) – Samples to compute decision function for

Returns:

Latent function values

Return type:

ndarray of shape (n_samples,)

Raises:

NotFittedError – If the model has not been fitted

feature_importances_() ndarray[source]

Get feature importances from the fitted model.

Note: This method may not be available for all base learners.

Returns:

Feature importances if available

Return type:

ndarray of shape (n_features,)

Raises:
  • NotFittedError – If the model has not been fitted

  • AttributeError – If the base learner doesn’t support feature importances

get_booster_params() Dict[str, any][source]

Get parameters of the underlying boosting model.

Returns:

Parameters of the underlying GradientBoostingOrdinal model

Return type:

dict

Raises:

NotFittedError – If the model has not been fitted

_abc_impl = <_abc._abc_data object>
classmethod _build_request_for_signature(router, method)

Build the MethodMetadataRequest for a method using its signature.

This method takes all arguments from the method signature and uses None as their default request value, except X, y, Y, Xt, yt, *args, and **kwargs.

Parameters:
  • router (MetadataRequest) – The parent object for the created MethodMetadataRequest.

  • method (str) – The name of the method.

Returns:

method_request – The prepared request using the method’s signature.

Return type:

MethodMetadataRequest

_doc_link_module = 'sklearn'
property _doc_link_template
_doc_link_url_param_generator = None
classmethod _get_default_requests()

Collect default request values.

This method combines the information present in __metadata_request__* class attributes, as well as determining request keys from method signatures.

_get_doc_link()

Generates a link to the API documentation for a given estimator.

This method generates the link to the estimator’s documentation page by using the template defined by the attribute _doc_link_template.

Returns:

url – The URL to the API documentation for this estimator. If the estimator does not belong to module _doc_link_module, the empty string (i.e. “”) is returned.

Return type:

str

_get_metadata_request()

Get requested metadata for the instance.

Please check User Guide on how the routing mechanism works.

Returns:

request – A MetadataRequest instance.

Return type:

MetadataRequest

classmethod _get_param_names()

Get parameter names for the estimator

_get_params_html(deep=True)

Get parameters for this estimator with a specific HTML representation.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values. We return a ParamsDict dictionary, which renders a specific HTML representation in table form.

Return type:

ParamsDict

_html_repr()

Build a HTML representation of an estimator.

Read more in the User Guide.

Parameters:

estimator (estimator object) – The estimator to visualize.

Returns:

html – HTML representation of estimator.

Return type:

str

Examples

>>> from sklearn.utils._repr_html.estimator import estimator_html_repr
>>> from sklearn.linear_model import LogisticRegression
>>> estimator_html_repr(LogisticRegression())
'<style>#sk-container-id...'
property _repr_html_

HTML representation of estimator. This is redundant with the logic of _repr_mimebundle_. The latter should be favored in the long term, _repr_html_ is only implemented for consumers who do not interpret _repr_mimbundle_.

_repr_html_inner()

This function is returned by the @property _repr_html_ to make hasattr(estimator, “_repr_html_”) return `True or False depending on get_config()[“display”].

_repr_mimebundle_(**kwargs)

Mime bundle used by jupyter kernels to display estimator

_validate_params()

Validate types and values of constructor parameters

The expected type and values must be defined in the _parameter_constraints class attribute, which is a dictionary param_name: list of constraints. See the docstring of validate_parameter_constraints for a description of the accepted constraints.

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing – A MetadataRequest encapsulating routing information.

Return type:

MetadataRequest

set_transform_request(*, fit: bool | None | str = '$UNCHANGED$', no_scaling: bool | None | str = '$UNCHANGED$') OGBoost

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

fitstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for fit parameter in transform.

no_scalingstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for no_scaling parameter in transform.

selfobject

The updated object.