Quick Start¶
This guide walks through a minimal end-to-end example: load a dataset, apply a fairness preprocessor, train a classifier, and evaluate both performance and fairness.
Load data¶
from skfair.datasets import load_adult
X, y = load_adult(return_X_y=True, as_frame=True)
print(X.shape) # (48842, 14)
print(X["sex"].value_counts())
The Adult census dataset contains a binary sex attribute (1 = male / privileged, 0 = female / unprivileged) and a binary income label.
Baseline: no preprocessing¶
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from skfair.metrics import disparate_impact, statistical_parity_difference, accuracy
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
clf = LogisticRegression(max_iter=1000)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
sens = X_test["sex"].values
print(f"Accuracy: {accuracy(y_test.values, y_pred):.3f}")
print(f"DI: {disparate_impact(y_test.values, y_pred, sens):.3f}") # ideally 1.0
print(f"SPD: {statistical_parity_difference(y_test.values, y_pred, sens):.3f}") # ideally 0.0
Apply Massaging¶
Massaging is a label-modification technique that promotes unprivileged positive candidates and demotes privileged negative ones until the discrimination is minimised.
from skfair.preprocessing import Massaging
sampler = Massaging(sens_attr="sex", priv_group=1)
X_fair, y_fair = sampler.fit_resample(X_train, y_train)
clf_fair = LogisticRegression(max_iter=1000)
clf_fair.fit(X_fair, y_fair)
y_pred_fair = clf_fair.predict(X_test)
print(f"Accuracy: {accuracy(y_test.values, y_pred_fair):.3f}")
print(f"DI: {disparate_impact(y_test.values, y_pred_fair, sens):.3f}")
print(f"SPD: {statistical_parity_difference(y_test.values, y_pred_fair, sens):.3f}")
Use Reweighing in a Pipeline¶
Reweighing does not change samples — it returns per-sample weights. Use the ReweighingClassifier wrapper to fit it inside any sklearn-compatible workflow:
from skfair.preprocessing import ReweighingClassifier
from sklearn.linear_model import LogisticRegression
clf = ReweighingClassifier(
estimator=LogisticRegression(max_iter=1000),
sens_attr="sex",
priv_group=1,
)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
Next steps¶
- Preprocessing guide — all algorithms explained
- Metrics guide — fairness and performance metrics
- Audit guide — data-level and prediction-level fairness analysis
- Comparison guide — compare multiple preprocessing methods
- Experimentation guide — automated experiments with cross-validation
- API Reference