Data Utilities

ordinal_xai.utils.data_utils.load_data(data_path: str, target: int | str = -1, sep: str = ';', label_map: dict | None = None, drop: list | None = None, handle_nan: str = 'drop') → Tuple[DataFrame, Series][source]

Load and preprocess a dataset from a file.

Parameters:

data_path (str) – Full path to the data file
target (Union[int, str], default=-1) – Target variable specification. Can be: - int: Index of target column (e.g., -1 for last column) - str: Name of target column
sep (str, default=';') – Delimiter to use when reading the file
label_map (Optional[dict], default=None) – Optional mapping to convert target labels to numeric values. If None, labels will be mapped to 0-based continuous indices.
drop (Optional[list], default=None) – List of feature indices or names to drop from the features DataFrame.
handle_nan (str, default='drop') – How to handle NaN values. Options are: - ‘drop’: Drop rows containing any NaN values - ‘error’: Raise an error if NaN values are found - ‘warn’: Print a warning if NaN values are found but continue

Returns:

X: Features DataFrame y: Target Series with mapped labels

Return type:

Tuple[pd.DataFrame, pd.Series]

Raises:

FileNotFoundError – If the data file doesn’t exist
ValueError – If target specification is invalid If handle_nan is not one of [‘drop’, ‘error’, ‘warn’] If handle_nan=’error’ and NaN values are found

ordinal_xai.utils.data_utils.transform_features(X: DataFrame, fit: bool = False, no_scaling: bool = False, encoder: OneHotEncoder | None = None, scaler: StandardScaler | None = None, categorical_columns: list | None = None, handle_nan: str = 'drop') → Tuple[DataFrame, OneHotEncoder, StandardScaler | None][source]

Transform input data using one-hot encoding for categoricals and scaling for numericals.

Parameters:

X (pd.DataFrame) – Input data to transform
fit (bool, default=False) – Whether to fit new encoder/scaler or use existing ones
no_scaling (bool, default=False) – Whether to skip scaling of numerical features
encoder (Optional[OneHotEncoder], default=None) – Existing encoder to use if fit=False
scaler (Optional[StandardScaler], default=None) – Existing scaler to use if fit=False
categorical_columns (Optional[list], default=None) – List of categorical column names. If None, inferred from data types
handle_nan (str, default='drop') – How to handle NaN values. Options are: - ‘drop’: Drop rows containing any NaN values - ‘error’: Raise an error if NaN values are found - ‘warn’: Print a warning if NaN values are found but continue

Returns:

X_transformed: Transformed DataFrame encoder: Fitted OneHotEncoder scaler: Fitted StandardScaler (None if no_scaling=True)

Return type:

Tuple[pd.DataFrame, OneHotEncoder, Optional[StandardScaler]]

Raises:

ValueError – If handle_nan is not one of [‘drop’, ‘error’, ‘warn’] If handle_nan=’error’ and NaN values are found

This module provides functions for data loading and preprocessing: - load_data: Load and prepare datasets for ordinal regression - transform_features: Transform features for model training