OGBoost

class ordinal_xai.models.ogboost.OGBoost(base_learner=None, n_estimators: int = 100, learning_rate: float = 0.1, learning_rate_thresh: float = 0.001, validation_fraction: float = 0.1, n_iter_no_change: int | None = None, tol: float = 0.0001, link_function: str = 'probit', subsample: float = 1.0, verbose: int = 0, random_state: int | None = None, cv_early_stopping_splits: int | None = None)[source]

Bases: BaseEstimator, BaseOrdinalModel

Ordinal Gradient Boosting Model for ordinal regression.

This class implements a wrapper around the GradientBoostingOrdinal model from the ogboost package. The model uses gradient boosting to learn ordinal relationships and is particularly effective for complex non-linear patterns in ordinal data.

Parameters:

base_learner (estimator, default=DecisionTreeRegressor(max_depth=3)) – The base learner used to update the latent function
n_estimators (int, default=100) – Maximum number of boosting iterations
learning_rate (float, default=0.1) – Learning rate for the latent function updates
learning_rate_thresh (float, default=0.001) – Learning rate for the threshold updates
validation_fraction (float, default=0.1) – Fraction of data to use as a holdout set for early stopping
n_iter_no_change (int or None, default=None) – Number of iterations with no improvement to wait before stopping early
tol (float, default=1e-4) – Tolerance for measuring improvement in early stopping
link_function ({'probit', 'logit', 'loglog', 'cloglog', 'cauchit'}, default='probit') – Link function used to transform latent scores to probabilities
subsample (float, default=1.0) – Fraction of samples used to fit each base learner
verbose (int, default=0) – Verbosity level
random_state (int, RandomState instance or None, default=None) – Seed or random state for reproducibility
cv_early_stopping_splits (int or None, default=None) – If an integer > 1, uses K-fold cross-validation for early stopping

feature_names_

Names of features used during training

Type:: list

n_features_in_

Number of features seen during training

Type:: int

ranks_

Unique ordinal class labels

Type:: ndarray

_encoder

Encoder for categorical features

Type:: OneHotEncoder

_scaler

Scaler for numerical features

Type:: StandardScaler

_model

The fitted ogboost GradientBoostingOrdinal model

Type:: GradientBoostingOrdinal

is_fitted_

Whether the model has been fitted

Type:: bool

Notes

The model handles both categorical and numerical features automatically
Categorical features are one-hot encoded
Numerical features are standardized
The model assumes ordinal classes are consecutive integers starting from 0

__init__(base_learner=None, n_estimators: int = 100, learning_rate: float = 0.1, learning_rate_thresh: float = 0.001, validation_fraction: float = 0.1, n_iter_no_change: int | None = None, tol: float = 0.0001, link_function: str = 'probit', subsample: float = 1.0, verbose: int = 0, random_state: int | None = None, cv_early_stopping_splits: int | None = None)[source]

Initialize the Ordinal Gradient Boosting Model.

Parameters:

base_learner (estimator, default=None) – The base learner used to update the latent function. If None, uses DecisionTreeRegressor(max_depth=3)
n_estimators (int, default=100) – Maximum number of boosting iterations
learning_rate (float, default=0.1) – Learning rate for the latent function updates
learning_rate_thresh (float, default=0.001) – Learning rate for the threshold updates
validation_fraction (float, default=0.1) – Fraction of data to use as a holdout set for early stopping
n_iter_no_change (int or None, default=None) – Number of iterations with no improvement to wait before stopping early
tol (float, default=1e-4) – Tolerance for measuring improvement in early stopping
link_function (str, default='probit') – Link function used to transform latent scores to probabilities
subsample (float, default=1.0) – Fraction of samples used to fit each base learner
verbose (int, default=0) – Verbosity level
random_state (int, RandomState instance or None, default=None) – Seed or random state for reproducibility
cv_early_stopping_splits (int or None, default=None) – If an integer > 1, uses K-fold cross-validation for early stopping

get_params(deep: bool = True) → Dict[str, any][source]

Get parameters for this estimator.

Parameters:: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: Parameter names mapped to their values
Return type:: dict

set_params(**params: any) → OGBoost[source]

Set the parameters of this estimator.

Parameters:: **params (dict) – Estimator parameters
Returns:: self – The estimator instance
Return type:: OGBoost

fit(X: DataFrame, y: Series) → OGBoost[source]

Fit the Ordinal Gradient Boosting Model.

This method fits the model to the training data, handling both categorical and numerical features appropriately.

Parameters:

X (pd.DataFrame of shape (n_samples, n_features)) – Training data
y (pd.Series of shape (n_samples,)) – Target values

Returns:

self – The fitted model

Return type:

OGBoost

Raises:

ValueError – If the input data contains invalid values

predict(X: DataFrame) → ndarray[source]

Predict ordinal class labels.

Parameters:: X (pd.DataFrame of shape (n_samples, n_features)) – Samples to predict
Returns:: Predicted ordinal class labels
Return type:: ndarray of shape (n_samples,)
Raises:: NotFittedError – If the model has not been fitted

predict_proba(X: DataFrame) → ndarray[source]

Predict class probabilities.

Parameters:: X (pd.DataFrame of shape (n_samples, n_features)) – Samples to predict probabilities for
Returns:: Predicted class probabilities
Return type:: ndarray of shape (n_samples, n_classes)
Raises:: NotFittedError – If the model has not been fitted

transform(X: DataFrame, fit: bool = False, no_scaling: bool = False) → DataFrame[source]

Transform input data into the format expected by the model.

This method handles both categorical and numerical features: - Categorical features are one-hot encoded - Numerical features are standardized (unless no_scaling=True)

Parameters:

X (pd.DataFrame of shape (n_samples, n_features)) – Input data to transform
fit (bool, default=False) – Whether to fit new encoder/scaler or use existing ones
no_scaling (bool, default=False) – Whether to skip scaling of numerical features

Returns:

Transformed data

Return type:

pd.DataFrame

Raises:

ValueError – If the input data has different features than training data

decision_function(X: DataFrame) → ndarray[source]

Compute the latent function values for input samples.

This method returns the scalar value of the latent function for each observation, which can be used as a high-resolution alternative to class labels for comparing and ranking observations.

Parameters:: X (pd.DataFrame of shape (n_samples, n_features)) – Samples to compute decision function for
Returns:: Latent function values
Return type:: ndarray of shape (n_samples,)
Raises:: NotFittedError – If the model has not been fitted

feature_importances_() → ndarray[source]

Get feature importances from the fitted model.

Note: This method may not be available for all base learners.

Returns:

Feature importances if available

Return type:

ndarray of shape (n_features,)

Raises:

NotFittedError – If the model has not been fitted
AttributeError – If the base learner doesn’t support feature importances

get_booster_params() → Dict[str, any][source]

Get parameters of the underlying boosting model.

Returns:: Parameters of the underlying GradientBoostingOrdinal model
Return type:: dict
Raises:: NotFittedError – If the model has not been fitted

_abc_impl = <_abc._abc_data object>

classmethod _build_request_for_signature(router, method)

Build the MethodMetadataRequest for a method using its signature.

This method takes all arguments from the method signature and uses None as their default request value, except X, y, Y, Xt, yt, *args, and **kwargs.

Parameters:

router (MetadataRequest) – The parent object for the created MethodMetadataRequest.
method (str) – The name of the method.

Returns:

method_request – The prepared request using the method’s signature.

Return type:

MethodMetadataRequest

_doc_link_module = 'sklearn'

property _doc_link_template

_doc_link_url_param_generator = None

classmethod _get_default_requests()

Collect default request values.

This method combines the information present in __metadata_request__* class attributes, as well as determining request keys from method signatures.

_get_doc_link()

Generates a link to the API documentation for a given estimator.

This method generates the link to the estimator’s documentation page by using the template defined by the attribute _doc_link_template.

Returns:: url – The URL to the API documentation for this estimator. If the estimator does not belong to module _doc_link_module, the empty string (i.e. “”) is returned.
Return type:: str

_get_metadata_request()

Get requested metadata for the instance.

Please check User Guide on how the routing mechanism works.

Returns:: request – A MetadataRequest instance.
Return type:: MetadataRequest

classmethod _get_param_names(): Get parameter names for the estimator

_get_params_html(deep=True)

Get parameters for this estimator with a specific HTML representation.

Parameters:: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: params – Parameter names mapped to their values. We return a ParamsDict dictionary, which renders a specific HTML representation in table form.
Return type:: ParamsDict

_html_repr()

Build a HTML representation of an estimator.

Read more in the User Guide.

Parameters:: estimator (estimator object) – The estimator to visualize.
Returns:: html – HTML representation of estimator.
Return type:: str

Examples

>>> from sklearn.utils._repr_html.estimator import estimator_html_repr
>>> from sklearn.linear_model import LogisticRegression
>>> estimator_html_repr(LogisticRegression())
'<style>#sk-container-id...'

property _repr_html_: HTML representation of estimator. This is redundant with the logic of _repr_mimebundle_. The latter should be favored in the long term, _repr_html_ is only implemented for consumers who do not interpret _repr_mimbundle_.

_repr_html_inner(): This function is returned by the @property _repr_html_ to make hasattr(estimator, “_repr_html_”) return `True or False depending on get_config()[“display”].

_repr_mimebundle_(**kwargs): Mime bundle used by jupyter kernels to display estimator

_validate_params()

Validate types and values of constructor parameters

The expected type and values must be defined in the _parameter_constraints class attribute, which is a dictionary param_name: list of constraints. See the docstring of validate_parameter_constraints for a description of the accepted constraints.

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:: routing – A MetadataRequest encapsulating routing information.
Return type:: MetadataRequest

set_transform_request(*, fit: bool | None | str = '$UNCHANGED$', no_scaling: bool | None | str = '$UNCHANGED$') → OGBoost

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

False: metadata is not requested and the meta-estimator will not pass it to transform.

None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

fitstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for fit parameter in transform.

no_scalingstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for no_scaling parameter in transform.

selfobject
The updated object.