OBD
- class ordinal_xai.models.obd.OBD(base_classifier='logistic', decomposition_type='one-vs-following', **kwargs)[source]
Bases:
BaseEstimator,BaseOrdinalModelOrdinal Binary Decomposition (OBD) Model.
This model implements ordinal classification by decomposing the problem into a series of binary classification tasks. It supports two decomposition strategies and various base classifiers.
- Parameters:
base_classifier (str, default='logistic') – The base classifier to use for each binary classification task. Options are: - ‘logistic’: LogisticRegression with default parameters - ‘svm’: SVC with probability=True and balanced class weights - ‘rf’: RandomForestClassifier with conservative defaults - ‘xgb’: XGBClassifier with conservative defaults
decomposition_type (str, default='one-vs-following') – The type of binary decomposition to use. Options are: - ‘one-vs-following’: Each class is compared against all following classes - ‘one-vs-next’: Each class is compared only against the next class
**kwargs (dict) – Additional parameters to pass to the base classifier. These will override the default parameters set for each classifier type.
- feature_names_
Names of the features used during training
- Type:
list
- n_features_in_
Number of features seen during training
- Type:
int
- ranks_
Unique ordinal class labels in ascending order
- Type:
ndarray
- _models
List of trained binary classifiers
- Type:
list
- _encoder
Feature encoder used for categorical variables
- Type:
object
- _scaler
Feature scaler used for numerical variables
- Type:
object
- is_fitted_
Flag indicating whether the model has been fitted
- Type:
bool
Notes
The model automatically handles categorical and numerical features through the transform_features utility
For each binary classification task, if only one class is present in the training data, a DummyClassifier is used instead of the specified base classifier
The predict_proba method returns class probabilities that sum to 1 for each sample
The model is compatible with scikit-learn’s cross-validation and grid search
Examples
>>> from models.obd import OBD >>> import pandas as pd >>> import numpy as np >>> >>> # Create sample data >>> X = pd.DataFrame(np.random.randn(100, 5)) >>> y = pd.Series(np.random.randint(0, 3, 100)) >>> >>> # Initialize model with SVM base classifier >>> model = OBD(base_classifier='svm', decomposition_type='one-vs-next') >>> >>> # Train the model >>> model.fit(X, y) >>> >>> # Make predictions >>> predictions = model.predict(X) >>> probabilities = model.predict_proba(X)
- __init__(base_classifier='logistic', decomposition_type='one-vs-following', **kwargs)[source]
Initialize the OBD model.
- Parameters:
base_classifier (str, default='logistic') – The base classifier to use. Options are: - ‘logistic’: LogisticRegression - ‘svm’: SVC with probability=True - ‘rf’: RandomForestClassifier - ‘xgb’: XGBClassifier
decomposition_type (str, default='one-vs-following') – The type of binary decomposition to use. Options are: - ‘one-vs-following’: Each class is compared against all following classes - ‘one-vs-next’: Each class is compared only against the next class
**kwargs (dict) – Additional parameters to pass to the base classifier.
- _get_base_classifier()[source]
Get the appropriate base classifier instance with sensible defaults.
- Returns:
estimator – An instance of the specified base classifier with appropriate default parameters.
- Return type:
object
- Raises:
ValueError – If an unknown base classifier is specified.
- get_params(deep=True)[source]
Get parameters for this estimator.
- Parameters:
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
params – Parameter names mapped to their values.
- Return type:
dict
- set_params(**params)[source]
Set the parameters of this estimator.
- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
object
- fit(X: DataFrame, y: Series) OBD[source]
Fit the ordinal binary decomposition model.
- Parameters:
X (pd.DataFrame) – Training data of shape (n_samples, n_features)
y (pd.Series) – Target values of shape (n_samples,)
- Returns:
self – Returns self.
- Return type:
object
- Raises:
TypeError – If X is not a DataFrame or y is not a Series
ValueError – If X and y have different number of samples If X or y contains missing values If decomposition_type is not ‘one-vs-following’ or ‘one-vs-next’
- predict(X: DataFrame) ndarray[source]
Predict ordinal class labels.
- Parameters:
X (pd.DataFrame) – Samples of shape (n_samples, n_features)
- Returns:
y_pred – Predicted class labels of shape (n_samples,)
- Return type:
ndarray
- predict_proba(X: DataFrame) ndarray[source]
Predict class probabilities.
- Parameters:
X (pd.DataFrame) – Samples of shape (n_samples, n_features)
- Returns:
proba – Class probabilities of shape (n_samples, n_classes)
- Return type:
ndarray
- Raises:
TypeError – If X is not a DataFrame
ValueError – If X contains missing values If X has different number of features than training data
Notes
The probabilities are computed differently based on the decomposition type:
For ‘one-vs-following’: - P(class 0) = 1 - P(class > 0) - P(class i) = P(class > i-1) - P(class > i) - P(class K-1) = P(class > K-2)
For ‘one-vs-next’: - P(class 0) = 1 - P(class > 0) - P(class i) = P(class > i-1) * (1 - P(class > i)) - P(class K-1) = P(class > K-2)
- transform(X: DataFrame, fit=False, no_scaling=False) DataFrame[source]
Transform input data into the format expected by the model.
- Parameters:
X (pd.DataFrame) – Input data of shape (n_samples, n_features)
fit (bool, default=False) – Whether this is being called during fit or predict
no_scaling (bool, default=False) – Whether to skip feature scaling
- Returns:
X_transformed – Transformed data ready for model input
- Return type:
pd.DataFrame
- _abc_impl = <_abc._abc_data object>
- set_transform_request(*, fit: bool | None | str = '$UNCHANGED$', no_scaling: bool | None | str = '$UNCHANGED$') OBD
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- fitstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
fitparameter intransform.- no_scalingstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
no_scalingparameter intransform.
- selfobject
The updated object.