Comparison

The skfair.comparison module provides ComparisonReport, a visualisation tool for comparing multiple fairness preprocessing methods across datasets and classifiers.


Creating a report

ComparisonReport takes a results DataFrame with columns dataset, method, classifier, plus one column per metric (e.g. accuracy, spd). Optional {metric}_std columns are preserved but not used in plots.

The easiest way to get this DataFrame is from an Experiment (see Experimentation):

from skfair.experimentation import Experiment
from skfair.comparison import ComparisonReport

exp = Experiment(
    datasets=["adult", "compas"],
    methods=["Massaging", "FairSmote", "ReweighingClassifier"],
    n_splits=5,
)
results = exp.run()

report = ComparisonReport(results)
# or equivalently:
report = exp.to_report()

On construction, ComparisonReport auto-detects which columns are performance metrics and which are fairness metrics.

You can also filter the report at construction time:

report = ComparisonReport(
    results,
    datasets=["adult"],
    methods=["Massaging", "FairSmote"],
    classifiers=["LogReg"],
)

Summary tables

tables = report.summary_tables()

Returns a dictionary of DataFrames — one per metric — with method means averaged over classifiers, pivoted by dataset.

The classifier parameter controls how classifiers are aggregated:

  • None or "average" (default) — average over all classifiers.
  • "best" — keep only the best-performing classifier per method (by accuracy).
  • A specific name (e.g. "LogReg") — filter to that classifier only.
# Only show the best classifier per method
tables = report.summary_tables(classifier="best")

# Filter to a specific classifier
tables = report.summary_tables(classifier="LogReg")

Plot methods

All plot methods return (fig, axes) tuples.

Metric bar chart

report.plot_metric_bar(metric="accuracy")   # performance
report.plot_metric_bar(metric="spd")        # fairness (auto reference line)

Grouped bar chart for any single metric across datasets. For fairness metrics, a reference line is added automatically.

Fairness–performance tradeoff

report.plot_tradeoff(fairness_metric="spd", performance_metric="accuracy")

Scatter plot of |fairness metric| vs. performance metric for each method, faceted by dataset. Helps identify methods that achieve a good balance.

Method ranking

report.plot_ranking()

Heatmap of method rankings per dataset across all metrics. Lower rank (closer to 1) is better.

The classifier parameter works the same way as in summary_tables():

# Rank using the best classifier per method
report.plot_ranking(classifier="best")

# Rank using a specific classifier
report.plot_ranking(classifier="LogReg")

All plots at once

report.plot_all(fairness_metric="spd")

Runs all plot methods (one per metric, plus tradeoff and ranking) and returns a list of (fig, axes) tuples.


HTML report

report.to_html("report.html")

Generates a self-contained interactive HTML file with embedded matplotlib charts (base64-encoded), tab navigation, filtering checkboxes, and a PDF export button.

Report tabs

The HTML report is organized into five tabs:

Tab Contents
Tables Summary tables for every metric, with methods as rows and datasets as columns
Performance Bar charts for each performance metric, grouped by dataset and method
Fairness Bar charts for each fairness metric, with automatic reference lines at zero
Rankings Heatmap of method rankings across datasets and metrics (lower rank = better)
Tradeoff Scatter plot of fairness vs. performance, faceted by dataset

Interactive controls

Each tab includes filtering checkboxes for datasets, classifiers, and metrics. Use the Select All / Deselect All buttons to quickly toggle visibility. Filtering is applied client-side — no re-generation needed.

PDF export

Click the Export PDF button in the top-right corner to produce a print-friendly version. Collapsed sections stay collapsed in the PDF output.

to_html() parameters

Parameter Type Default Description
path str Output file path (e.g. "report.html")
datasets list[str] None Filter to specific datasets
methods list[str] None Filter to specific methods
classifiers list[str] None Filter to specific classifiers
metrics list[str] None Metrics to include (defaults to all detected)
fairness_metric str "spd" Fairness metric for tradeoff and detail charts
performance_metric str "accuracy" Performance metric for tradeoff chart
classifier str None Classifier aggregation: None/"average", "best", or a specific name
# Full example with all parameters
report.to_html(
    "report.html",
    datasets=["adult"],
    methods=["Massaging", "FairSmote"],
    fairness_metric="spd",
    performance_metric="accuracy",
    classifier="best",
)

Future benchmark

The comparison module will be expanded as the authors design their own comprehensive fairness preprocessing benchmark. Future versions will include additional analysis tools and visualization options.