skfair.datasets

from skfair.datasets import load_adult, load_german, load_heart_disease, load_compas, load_ricci

skfair.datasets.load_adult(*, return_X_y=True, as_frame=True, preprocessed=True, target_column='Probability')

Load the Adult dataset from the installed package data.

Parameters:
  • return_X_y (bool, default: True ) –

    If True, returns (X, y). If False, returns a Bunch object.

  • as_frame (bool, default: True ) –

    If True, returns pandas objects (DataFrame / Series). If False, returns NumPy arrays.

  • preprocessed (bool, default: True ) –

    If True, one-hot encode categorical columns and standardize numerical columns before returning. The sensitive column is encoded in place and kept in X.

  • target_column (str, default: "Probability" ) –

    Name of the target column in the CSV.

Returns:
  • data( Bunch or (X, y) ) –

    If return_X_y is True, returns (X, y) where X is a DataFrame (or ndarray when as_frame=False) that includes the sensitive column.

    If return_X_y is False, returns a Bunch with fields:

    • data : features including sensitive column (DataFrame or ndarray)
    • target : target (Series or ndarray)
    • frame : full DataFrame with features and target
    • feature_names : list of feature column names
    • DESCR : short description string

skfair.datasets.load_german(*, return_X_y=True, as_frame=True, preprocessed=True, target_column='credit')

Load the German Credit dataset from the installed package data.

Parameters:
  • return_X_y (bool, default: True ) –

    If True, returns (X, y). If False, returns a Bunch object.

  • as_frame (bool, default: True ) –

    If True, returns pandas objects (DataFrame / Series). If False, returns NumPy arrays.

  • preprocessed (bool, default: True ) –

    If True, one-hot encode categorical columns and standardize numerical columns before returning. The sensitive column is encoded in place (male=1, female=0) and kept in X.

  • target_column (str, default: "credit" ) –

    Name of the target column.

Returns:
  • data( Bunch or (X, y) ) –

    If return_X_y is True, returns (X, y) where X is a DataFrame (or ndarray when as_frame=False) that includes the sensitive column.

    If return_X_y is False, returns a Bunch with fields:

    • data : features including sensitive column (DataFrame or ndarray)
    • target : target (Series or ndarray)
    • frame : full DataFrame with features and target
    • feature_names : list of feature column names
    • DESCR : short description string

skfair.datasets.load_heart_disease(*, return_X_y=True, as_frame=True, preprocessed=True, target_column='heart_disease')

Load the Heart Disease dataset (Statlog) from the installed package data.

Parameters:
  • return_X_y (bool, default: True ) –

    If True, returns (X, y). If False, returns a Bunch object.

  • as_frame (bool, default: True ) –

    If True, returns pandas objects (DataFrame / Series). If False, returns NumPy arrays.

  • preprocessed (bool, default: True ) –

    If True, one-hot encode categorical columns and standardize numerical columns before returning. The sensitive column is encoded in place and kept in X.

  • target_column (str, default: "heart_disease" ) –

    Name of the target column.

Returns:
  • data( Bunch or (X, y) ) –

    If return_X_y is True, returns (X, y) where X is a DataFrame (or ndarray when as_frame=False) that includes the sensitive column.

    If return_X_y is False, returns a Bunch with fields:

    • data : features including sensitive column (DataFrame or ndarray)
    • target : target (Series or ndarray)
    • frame : full DataFrame with features and target
    • feature_names : list of feature column names
    • DESCR : short description string

skfair.datasets.load_compas(*, return_X_y=True, as_frame=True, preprocessed=True, target_column='two_year_recid')

Load the COMPAS recidivism dataset from the installed package data.

Parameters:
  • return_X_y (bool, default: True ) –

    If True, returns (X, y). If False, returns a Bunch object.

  • as_frame (bool, default: True ) –

    If True, returns pandas objects (DataFrame / Series). If False, returns NumPy arrays.

  • preprocessed (bool, default: True ) –

    If True, one-hot encode categorical columns and standardize numerical columns before returning. The sensitive columns are encoded in place and kept in X.

  • target_column (str, default: "two_year_recid" ) –

    Name of the target column in the CSV.

Returns:
  • data( Bunch or (X, y) ) –

    If return_X_y is True, returns (X, y) where X is a DataFrame (or ndarray when as_frame=False) that includes the sensitive columns.

    If return_X_y is False, returns a Bunch with fields:

    • data : features including sensitive columns (DataFrame or ndarray)
    • target : target (Series or ndarray)
    • frame : full DataFrame with features and target
    • feature_names : list of feature column names
    • DESCR : short description string

skfair.datasets.load_ricci(*, return_X_y=True, as_frame=True, preprocessed=True, target_column='Combine')

Load the Ricci v. DeStefano firefighter promotions dataset.

Parameters:
  • return_X_y (bool, default: True ) –

    If True, returns (X, y). If False, returns a Bunch object.

  • as_frame (bool, default: True ) –

    If True, returns pandas objects (DataFrame / Series). If False, returns NumPy arrays.

  • preprocessed (bool, default: True ) –

    If True, encode categorical columns as numeric: Race: W=1, else=0; Position: Captain=1, Lieutenant=0.

  • target_column (str, default: "Combine" ) –

    Name of the target column in the CSV.

Returns:
  • data( Bunch or (X, y) ) –

    If return_X_y is True, returns (X, y) where X is a DataFrame (or ndarray when as_frame=False) that includes the sensitive columns.

    If return_X_y is False, returns a Bunch with fields:

    • data : features including sensitive columns (DataFrame or ndarray)
    • target : target (Series or ndarray)
    • frame : full DataFrame with features and target
    • feature_names : list of feature column names
    • DESCR : short description string