Experimentation

The skfair.experimentation module provides the Experiment class for running automated dataset × method × classifier comparison experiments with cross-validation.


Python API

Basic usage

from skfair.experimentation import Experiment

exp = Experiment(
    datasets=["adult", "compas"],
    methods=["Massaging", "FairSmote", "ReweighingClassifier"],
    n_splits=5,
)
results = exp.run(verbose=True)
print(results)

run() returns a DataFrame with one row per (dataset, method, classifier) combination and one column per metric. Pass std=True to include {metric}_std columns.

Constructor parameters

Parameter Default Description
datasets ["adult"] List of dataset names (str) from DATASET_REGISTRY, or dicts for custom datasets (see below)
methods All registered List of method names from METHOD_REGISTRY
classifiers LogisticRegression Dict {"name": estimator} or list of dotted paths
metrics All registered List of metric keys from METRIC_REGISTRY
n_splits 5 Number of CV folds (1 = single train/test split)
random_state 42 Random seed
dataset_config None Per-dataset overrides, e.g., {"adult": {"sens_attr": "race"}}
method_config None Per-method parameter overrides
audit_bias False Create a BiasAuditor per dataset
audit_fairness False Store out-of-fold predictions for FairnessAuditor
save_results_csv False Write results CSV after run()
save_object_pkl False Pickle full Experiment after run()
save_report_html False Generate HTML report after run()
save_path "experiment" Base path for saved files
save_models None Dict to save fitted models (see below)
std False Include {metric}_std columns in results
config None Path to YAML config (overrides all other arguments)

Custom datasets

You can pass user-provided datasets alongside (or instead of) registry names. Each custom entry is a dict with keys name, data, sens_attr, and optionally priv_group (default 1):

import pandas as pd
from skfair.experimentation import Experiment

# Your own data
X = pd.DataFrame({"feat1": [1, 2, 3, 4], "feat2": [5, 6, 7, 8], "group": [0, 1, 0, 1]})
y = [0, 1, 0, 1]

exp = Experiment(
    datasets=[
        "ricci",                                          # registry dataset
        {"name": "my_data", "data": (X, y),               # custom dataset
         "sens_attr": "group", "priv_group": 1},
    ],
    methods=["Baseline"],
    n_splits=2,
)
results = exp.run()

Use exp.dataset_names to get a clean list of names (without the internal dict details).


YAML configuration

Experiments can also be defined via YAML configuration files:

exp = Experiment.from_config("config.yaml")
results = exp.run()

Or pass the YAML path directly to the constructor:

exp = Experiment(config="config.yaml")

YAML save section

The save: block controls automatic output after run() completes:

save:
  results_csv: true     # write results DataFrame to {path}.csv
  object_pkl: true      # pickle full Experiment to {path}.pkl
  report_html: true     # generate HTML report to {path}.html
  path: outputs/my_experiment   # base path (without extension)

These map 1:1 to the Python constructor flags save_results_csv, save_object_pkl, save_report_html, and save_path.

YAML save_models section

The save_models: block (separate from save:) controls model persistence:

save_models:
  full_data_retrain: true   # retrain on full data before saving (default)
  models: all               # save all combinations

# or save specific combinations:
save_models:
  full_data_retrain: true
  models:
    - method: FairSmote
      classifier: LogReg
    - method: Massaging
      classifier: LogReg

Models are saved as .pkl files in a {save_path}_models/ directory.

YAML schema reference

Top-level key Sub-keys Notes
datasets name, sens_attr, priv_group List of dataset entries; name is required
methods List of method name strings
classifiers path, name, plus any constructor kwargs path is a dotted import path (e.g. sklearn.linear_model.LogisticRegression)
cv n_splits, random_state Cross-validation settings
audit bias, fairness Boolean flags for auditing
metrics List of metric keys (omit to use all registered)
save results_csv, object_pkl, report_html, path Auto-save options
save_models full_data_retrain, models Model persistence options

End-to-end: YAML to HTML report

A complete pipeline in three steps:

1. Define — write a YAML config with save.report: true:

datasets:
  - name: adult
  - name: compas

methods:
  - Baseline
  - FairSmote
  - Massaging

classifiers:
  - path: sklearn.linear_model.LogisticRegression
    name: LogReg
    solver: liblinear
    max_iter: 1000

cv:
  n_splits: 5

save:
  results_csv: true
  object_pkl: true
  report_html: true
  path: outputs/my_experiment

2. Run — two lines of Python:

from skfair.experimentation import Experiment

exp = Experiment.from_config("config.yaml")
results = exp.run()

3. Result — three files are auto-generated:

  • outputs/my_experiment.csv — results DataFrame
  • outputs/my_experiment.pkl — full Experiment object
  • outputs/my_experiment.html — interactive HTML report (see Comparison — HTML report for report contents)

Post-run analysis

ComparisonReport

Convert results to a visual comparison report (see Comparison):

report = exp.to_report()
report.plot_metric_bar(metric="accuracy")
report.plot_tradeoff(fairness_metric="spd", performance_metric="accuracy")

FairnessAuditor

Get a FairnessAuditor for a specific (dataset, method, classifier) combination (requires audit_fairness=True):

exp = Experiment(
    datasets=["adult"],
    methods=["Massaging"],
    audit_fairness=True,
)
exp.run()

# Default: fairness metrics are averaged across CV folds
fa = exp.get_fairness_auditor("adult", "Massaging", "LogisticRegression")
print(fa.fairness_metrics())
fa.plot_fairness_radar()

# aggregate=True: compute metrics on all concatenated out-of-fold predictions
fa_agg = exp.get_fairness_auditor("adult", "Massaging", "LogisticRegression",
                                   aggregate=True)
print(fa_agg.fairness_metrics())

Save and load

Saving

# Via constructor flags
exp = Experiment(
    datasets=["adult"],
    save_results_csv=True,   # saves {save_path}.csv
    save_object_pkl=True,    # saves {save_path}.pkl
    save_path="my_experiment",
)
exp.run()

# Or manually after run
exp.save(path="my_experiment", results_csv=True, object_pkl=True)

Saving fitted models

# Save all models, retrained on full data
exp = Experiment(
    datasets=["adult"],
    methods=["FairSmote", "Massaging"],
    save_models={"models": "all", "full_data_retrain": True},
    save_path="my_experiment",
)
exp.run()

# Access in-memory
pipe = exp.models_[("Adult", "FairSmote", "LogReg")]
pipe.predict(X_new)

# Load from disk
import joblib
pipe = joblib.load("my_experiment_models/Adult_FairSmote_LogReg.pkl")

Loading

exp = Experiment.load("my_experiment.pkl")
print(exp.results_)

Registries

Three registries define what is available by name in experiments:

DATASET_REGISTRY

from skfair.experimentation import DATASET_REGISTRY
print(list(DATASET_REGISTRY.keys()))
# ['adult', 'compas', 'german', 'heart_disease', 'ricci']

Each entry specifies the loader function, default sens_attr, and priv_group.

METHOD_REGISTRY

from skfair.experimentation import METHOD_REGISTRY
print(list(METHOD_REGISTRY.keys()))
# ['Baseline', 'Massaging', 'FairSmote', 'FairOversampling', 'FAWOS',
#  'HeterogeneousFOS', 'FairwayRemover', 'DisparateImpactRemover',
#  'LearningFairRepresentations', 'ReweighingClassifier',
#  'FairBalanceClassifier', 'FairMask']

METRIC_REGISTRY

from skfair.experimentation import METRIC_REGISTRY
print(list(METRIC_REGISTRY.keys()))
# ['accuracy', 'balanced_accuracy', 'disparate_impact', 'spd', 'eod', 'aod']

Each entry maps a short key to a metric function and its type (performance or fairness).