Evaluation Metrics
Evaluation metrics for ordinal regression and classification.
This module provides a comprehensive set of metrics specifically designed for evaluating ordinal regression and classification models. It includes both hard-label metrics (based on predicted class labels) and probability-based metrics (based on predicted probabilities).
The metrics are designed to account for the ordinal nature of the data, where classes have a natural ordering and misclassification costs increase with the distance between predicted and true classes.
Available Metrics:
Hard Label Metrics: - accuracy: Standard classification accuracy - adjacent_accuracy: Proportion of predictions within one class of true label - mze: Mean Zero-One Error (1 - accuracy) - mae: Mean Absolute Error - mse: Mean Squared Error - weighted_kappa: Cohen’s Kappa with linear or quadratic weights - cem: Closeness Evaluation Measure - spearman_correlation: Spearman’s rank correlation - kendall_tau: Kendall’s Tau correlation
Probability-Based Metrics: - ranked_probability_score: RPS for probabilistic predictions - ordinal_weighted_ce: Ordinal weighted cross-entropy loss (Ordinal Log Loss)
- ordinal_xai.utils.evaluation_metrics.accuracy(y_true, y_pred)[source]
Calculate accuracy for ordinal regression.
This is the standard classification accuracy, measuring the proportion of correct predictions. While simple, it doesn’t account for the ordinal nature of the data.
- Parameters:
y_true (array-like of shape (n_samples,)) – True ordinal labels
y_pred (array-like of shape (n_samples,)) – Predicted ordinal labels
- Returns:
Accuracy score between 0 and 1, where 1 indicates perfect predictions
- Return type:
float
- ordinal_xai.utils.evaluation_metrics.mze(y_true, y_pred)[source]
Calculate Mean Zero-One Error (MZE) for ordinal regression.
MZE is the complement of accuracy (1 - accuracy). It measures the proportion of incorrect predictions, treating all misclassifications equally regardless of their distance from the true class.
- Parameters:
y_true (array-like of shape (n_samples,)) – True ordinal labels
y_pred (array-like of shape (n_samples,)) – Predicted ordinal labels
- Returns:
Mean Zero-One Error between 0 and 1, where 0 indicates perfect predictions
- Return type:
float
- ordinal_xai.utils.evaluation_metrics.mae(y_true, y_pred)[source]
Calculate Mean Absolute Error (MAE) for ordinal regression.
MAE measures the average absolute difference between predicted and true labels. Unlike accuracy, it accounts for the ordinal nature of the data by penalizing predictions based on their distance from the true class.
- Parameters:
y_true (array-like of shape (n_samples,)) – True ordinal labels
y_pred (array-like of shape (n_samples,)) – Predicted ordinal labels
- Returns:
Mean Absolute Error, where 0 indicates perfect predictions
- Return type:
float
- ordinal_xai.utils.evaluation_metrics.mse(y_true, y_pred)[source]
Calculate Mean Squared Error (MSE) for ordinal regression.
MSE measures the average squared difference between predicted and true labels. It penalizes larger errors more heavily than MAE due to the squaring operation.
- Parameters:
y_true (array-like of shape (n_samples,)) – True ordinal labels
y_pred (array-like of shape (n_samples,)) – Predicted ordinal labels
- Returns:
Mean Squared Error, where 0 indicates perfect predictions
- Return type:
float
- ordinal_xai.utils.evaluation_metrics.weighted_kappa(y_true, y_pred, weights='quadratic')[source]
Calculate weighted kappa for ordinal regression.
Weighted kappa extends Cohen’s kappa to account for the ordinal nature of the data by applying weights to the confusion matrix (Cohen (1968)). The weights can be linear or quadratic, with quadratic weights penalizing larger misclassifications more heavily.
- Parameters:
y_true (array-like of shape (n_samples,)) – True ordinal labels
y_pred (array-like of shape (n_samples,)) – Predicted ordinal labels
weights ({'linear', 'quadratic', 'none'}, default='quadratic') – Weighting scheme for the confusion matrix: - ‘linear’: Linear weights based on distance - ‘quadratic’: Quadratic weights (squared distance) - ‘none’: No weights (standard kappa)
- Returns:
Weighted kappa score between -1 and 1, where: - 1 indicates perfect agreement - 0 indicates agreement equivalent to chance - -1 indicates perfect disagreement
- Return type:
float
- ordinal_xai.utils.evaluation_metrics._get_class_counts(y)[source]
Calculate the count of items per class.
- Parameters:
y (array-like of shape (n_samples,)) – Array of class labels
- Returns:
Dictionary mapping class labels to their counts
- Return type:
dict
- ordinal_xai.utils.evaluation_metrics._calculate_proximity(c1, c2, class_counts, total_items)[source]
Calculate proximity between two classes.
This is a helper function for the CEM metric that calculates the proximity between two classes based on their positions and the distribution of classes in the dataset.
- Parameters:
c1 (int) – First class label
c2 (int) – Second class label
class_counts (dict) – Dictionary mapping class labels to their counts
total_items (int) – Total number of items in the dataset
- Returns:
Proximity value between the two classes
- Return type:
float
- ordinal_xai.utils.evaluation_metrics.cem(y_true, y_pred, class_counts=None)[source]
Calculate Closeness Evaluation Measure (CEM) for ordinal classification.
CEM is a metric proposed by Amigo et al. (2020) that evaluates the performance of ordinal classifiers based on measure and information theory. It uses a proximity-based approach that penalizes misclassifications based on their distance from the true class and the distribution of classes in the dataset.
- Parameters:
y_true (array-like of shape (n_samples,)) – True ordinal labels
y_pred (array-like of shape (n_samples,)) – Predicted ordinal labels
class_counts (dict, optional) – Dictionary mapping class labels to their counts. If None, calculated from y_true. Useful for local explanations where class distribution might differ from training.
- Returns:
CEM score between 0 and 1, where: - 1 indicates perfect predictions - 0 indicates worst possible predictions
- Return type:
float
- ordinal_xai.utils.evaluation_metrics.spearman_correlation(y_true, y_pred)[source]
Calculate Spearman rank correlation for ordinal regression.
Spearman (1904)’s rank correlation measures the monotonic relationship between predicted and true labels. It’s particularly useful for ordinal data as it only considers the ranking of values, not their absolute differences.
- Parameters:
y_true (array-like of shape (n_samples,)) – True ordinal labels
y_pred (array-like of shape (n_samples,)) – Predicted ordinal labels
- Returns:
Spearman rank correlation coefficient between -1 and 1, where: - 1 indicates perfect positive correlation - 0 indicates no correlation - -1 indicates perfect negative correlation
- Return type:
float
- ordinal_xai.utils.evaluation_metrics.kendall_tau(y_true, y_pred)[source]
Calculate Kendall’s Tau correlation coefficient for ordinal data.
Kendall(1945)’s Tau-b measures the ordinal association between two rankings. It’s particularly suitable for ordinal data as it considers the concordance of pairs of observations and the number of tied ranks.
- Parameters:
y_true (array-like of shape (n_samples,)) – True ordinal labels
y_pred (array-like of shape (n_samples,)) – Predicted ordinal labels
- Returns:
Kendall’s Tau correlation coefficient between -1 and 1, where: - 1 indicates perfect agreement in rankings - 0 indicates no association between rankings - -1 indicates perfect disagreement in rankings
- Return type:
float
- ordinal_xai.utils.evaluation_metrics._create_one_hot_encoding(y_true, n_classes=None, zero_indexed=False)[source]
Create one-hot encoding for ordinal labels.
This helper function converts ordinal labels to one-hot encoded format, handling arbitrary label ranges by shifting to 0-based indexing.
- Parameters:
y_true (array-like of shape (n_samples,)) – True ordinal labels
n_classes (int, optional) – Number of classes. If None, inferred from unique labels.
- Returns:
(one_hot_matrix, min_label, n_classes) - one_hot_matrix: 2D array of shape (n_samples, n_classes) - min_label: Minimum label value in the original data - n_classes: Number of unique classes
- Return type:
tuple
- ordinal_xai.utils.evaluation_metrics.ranked_probability_score(y_true, y_pred_proba, zero_indexed=False)[source]
Calculate Ranked Probability Score (RPS) for ordinal regression.
Epstein (1969)’s Ranked Probability Score (RPS) evaluates probabilistic predictions for ordinal data by comparing the cumulative predicted probabilities with the cumulative observed probabilities. It penalizes predictions that deviate from the true class distribution.
- Parameters:
y_true (array-like of shape (n_samples,)) – True ordinal labels
y_pred_proba (array-like of shape (n_samples, n_classes)) – Predicted probabilities for each class
- Returns:
Ranked Probability Score, where: - 0 indicates perfect predictions - Higher values indicate worse predictions
- Return type:
float
- ordinal_xai.utils.evaluation_metrics.ordinal_weighted_ce(y_true, y_pred_proba, alpha=1, zero_indexed=False)[source]
Calculate ordinal weighted cross-entropy loss.
This loss function extends standard cross-entropy to account for the ordinal nature of the data by weighting the loss based on the distance between predicted and true classes, see Polat et al. (2025). Also known as ordinal log loss (Castagnos et al. (2022)).
- Parameters:
y_true (array-like of shape (n_samples,)) – True ordinal labels
y_pred_proba (array-like of shape (n_samples, n_classes)) – Predicted probabilities for each class
alpha (float, default=1) – Exponent for the absolute difference. Higher values increase the penalty for predictions far from the true class.
- Returns:
Loss value, where: - Lower values indicate better predictions - The loss is always non-negative
- Return type:
float
- ordinal_xai.utils.evaluation_metrics.adjacent_accuracy(y_true, y_pred)[source]
Calculate Adjacent Accuracy for ordinal regression.
Adjacent accuracy measures the proportion of predictions that are either correct or off by one class. This is particularly useful for ordinal data where predictions close to the true class are more acceptable than those far away.
- Parameters:
y_true (array-like of shape (n_samples,)) – True ordinal labels
y_pred (array-like of shape (n_samples,)) – Predicted ordinal labels
- Returns:
Adjacent accuracy score between 0 and 1, where: - 1 indicates all predictions are either correct or off by one class - 0 indicates all predictions are off by more than one class
- Return type:
float
- ordinal_xai.utils.evaluation_metrics.evaluate_ordinal_model(y_true, y_pred, y_pred_proba=None, metrics=None, class_counts=None, zero_indexed=False)[source]
Evaluate an ordinal regression model using multiple metrics.
This function computes a comprehensive set of evaluation metrics for ordinal regression models, including both hard-label metrics and probability-based metrics if probability predictions are available.
- Parameters:
y_true (array-like of shape (n_samples,)) – True ordinal labels
y_pred (array-like of shape (n_samples,)) – Predicted ordinal labels
y_pred_proba (array-like of shape (n_samples, n_classes), optional) – Predicted class probabilities
metrics (list of str, optional) – List of metric names to compute. If None, all available metrics are used.
class_counts (dict, optional) – Dictionary mapping class labels to their counts. If None, calculated from y_true. Useful for local explanations where class distribution might differ from training.
zero_indexed (bool, optional) – Whether the labels are zero-indexed. If False, the labels are shifted to zero-indexed.
- Returns:
Dictionary containing evaluation results for each metric
- Return type:
dict
Notes
The function automatically selects appropriate metrics based on the available predictions. Probability-based metrics are only computed if y_pred_proba is provided.
- ordinal_xai.utils.evaluation_metrics.print_evaluation_results(results)[source]
Print evaluation results in a formatted way.
This function provides a clear, formatted output of the evaluation metrics, grouping them into hard label metrics and probability-based metrics.
- Parameters:
results (dict) – Dictionary containing evaluation metrics as returned by evaluate_ordinal_model
Notes
Metrics are printed with 4 decimal places
Hard label metrics are printed first, followed by probability-based metrics
Metric names are formatted for better readability
This module contains functions for evaluating ordinal regression models:
- evaluate_ordinal_model: Comprehensive evaluation of ordinal models
- print_evaluation_results: Display evaluation metrics in a readable format