How (Not) To Evaluate Fraud Models

It's a situation that we've experienced several times: we discover a great feature that substantially improves the quality of one of our fraud models. A model including this feature flags more fraud with substantially higher precision at a threshold we care about, yielding substantial savings in dollars lost. And yet this change may correspond to improvements in AUC or KS that, while statistically significant, appear relatively minor.

