Skip to main content

Fraud Model Feedback

On this page
Prerequisites

Before optimizing feedback loops, understand:

TL;DR
  • Your vendor's ML model improves when you give it outcomes: chargebacks, fraud reports, and false positive flags
  • Chargebacks arrive 30-120 days late. Send fraud reports as soon as you confirm fraud to give the model faster signals
  • Score drift is real. The same score means different things over time. Check your score distribution weekly
  • Monitor four things: score distribution (weekly), block rate trend (weekly), fraud-in-approved rate (monthly), false positive sample (monthly)
  • At minimum, verify that chargebacks flow to your fraud vendor. This is the baseline. Everything else is optimization

Your fraud vendor's ML model is only as good as the data it learns from. Every fraud score you see is based on patterns the model learned from past transactions and their outcomes. If you don't feed outcomes back, the model stagnates.

This page covers how the feedback loop works, what breaks it, and what to monitor to keep your model accurate. For threshold setting and cost calculations, see Risk Scoring. For rule-based detection, see Building Fraud Rules.


What Your Fraud Score Actually Means

Every fraud vendor gives you a number. The number represents the model's estimate that a transaction is fraudulent.

VendorScore RangeHigher Means
Stripe Radar0-100Higher risk
Sift0-100Higher risk
ForterConfidence %Higher confidence in approve/decline decision
Signifyd0-1000Higher confidence the order is good
Adyen0-100+Higher risk

The same score means different things at different businesses. A Stripe Radar score of 65 on a $20 digital download is a different bet than a score of 65 on a $2,000 electronics order. Your thresholds should reflect your margins, fraud rate, and tolerance for false positives.

For setting thresholds and calculating the cost trade-off, see Risk Scoring: Finding Your Thresholds.


How the Model Learns: The Feedback Loop

Your vendor's ML model improves when you tell it what actually happened after a transaction was scored. This is the feedback loop.

The Three Types of Feedback

1. Chargebacks (automatic in most setups)

When a customer disputes a charge with their bank, the chargeback flows from the card network through your processor. If your fraud vendor is your processor (Stripe, Adyen), this happens automatically. If you use a third-party vendor (Sift, Forter), verify the integration is passing dispute data.

  • Signal strength: High. A chargeback is a confirmed bad outcome.
  • Delay: 30-120 days after the transaction. This is the core problem (see below).

2. Fraud reports (manual or semi-automated)

You flag a transaction as confirmed fraud before a chargeback arrives. Maybe a customer contacted you ("I didn't make this purchase"), or your review team identified a fraud pattern.

  • Signal strength: High, and much faster than waiting for a chargeback.
  • Delay: As fast as you flag it. Days instead of months.

3. False positive flags (manual)

You tell the model that a blocked or declined transaction was actually legitimate. This is the signal most merchants forget to send.

  • Signal strength: Medium. Helps the model learn what "good" looks like, not just what "bad" looks like.
  • Delay: Depends on when you identify the false positive (e.g., customer contacts support about a block).

How Feedback Reaches Your Vendor

VendorChargebacksFraud ReportsFalse Positive Flags
StripeAutomatic from disputesVia Dashboard or API (Report as fraudulent)Mark review items as legitimate
SiftRequires $chargeback event via APIRequires $label event via APIRequires $label with is_fraud: false
ForterRequires feedback API callRequires feedback API callRequires feedback API call
SignifydAutomatic from guarantee claimsVia case managementVia case management
AdyenAutomatic from disputesVia dispute management or APIVia risk rule feedback
At Minimum, Verify Chargebacks Flow

Make sure chargebacks reach your fraud vendor. This happens automatically with most processor-integrated tools (Stripe Radar, Adyen RevenueProtect). If you use a third-party tool like Sift or Forter, verify the integration is sending chargeback events. Check your vendor's documentation for integration status.

This is the baseline. Everything else on this page is optimization on top of it.


The Feedback Delay Problem

Chargebacks arrive 30-120 days after the transaction. This means the model is always learning from stale data.

Timeline:
Day 0: Transaction scored (model uses what it knows NOW)
Day 1-30: Fraud happens, goes undetected
Day 30: Customer notices unauthorized charge
Day 45: Customer files dispute with bank
Day 60: Chargeback reaches your processor
Day 75: Chargeback data feeds back to model
Day 90+: Model updates (next training cycle)

The model is learning from data that's 2-3 months old.

What This Means

The model is always one fraud cycle behind. If a new fraud pattern emerges in January, the model won't learn about it from chargebacks until March or April. Meanwhile, you're exposed.

What You Can Do

Send fraud reports immediately. Don't wait for the chargeback. When you confirm fraud through any channel, report it to your vendor right away.

SignalSpeedHow to Send
Chargeback30-120 daysAutomatic (verify integration)
Fraud report1-7 daysManual via dashboard or API
Customer complaint ("I didn't buy this")1-3 daysFlag in review tool, report to vendor
Refund-before-chargeback1-14 daysSome vendors auto-learn from refund reason codes
Manual review declineSame dayReview queue decision feeds back

The faster you send outcomes, the faster the model adapts. A business that sends fraud reports within 48 hours gives its model a 2-3 month head start over one that waits for chargebacks.

Refund Reason Codes

Some processors (Stripe, Adyen) can learn from refund reason codes. If you refund a transaction with reason "fraudulent," the processor may treat this as a fraud signal. Check your processor's documentation to see if this applies.


When Models Go Wrong

ML models are not set-and-forget. They degrade over time if you don't monitor them.

Score Drift

What it is: The distribution of fraud scores shifts over time. Transactions that scored 60 last quarter might score 45 now, even though the underlying fraud risk hasn't changed.

Why it happens: The model retrains on new data, new fraud patterns emerge, seasonal traffic changes the population mix, or the vendor updates their model.

How to spot it: Check your score distribution weekly. If the median score is shifting or the tails are changing shape, your thresholds may need adjustment.

Last quarter:     Median score = 25    |  5% of transactions scored > 70
This quarter: Median score = 32 | 8% of transactions scored > 70

If your decline threshold is 70, your block rate just went from 5% to 8%.
Did fraud actually increase, or did the scores shift?

What to do: Re-test your thresholds quarterly. Run the threshold sweep experiment from Risk Scoring to recalibrate.

Fraud Pattern Changes

What it is: Fraudsters change tactics. A new attack type hits your business that the model hasn't seen before.

Why it happens: Fraudsters adapt. If your model catches one pattern, they try another. New fraud techniques emerge industry-wide.

What to do: Rules catch known new patterns while the model catches up. When you identify a new fraud pattern:

  1. Write a rule to catch it (see Building Fraud Rules)
  2. Report the fraud to your vendor (speeds model learning)
  3. The model will eventually learn the pattern from outcomes

This is why you need both rules and ML. Rules are your fast response. ML is your long-term learner.

Cold Start

What it is: New business, new vertical, or new geography. The model has no history to learn from and is essentially guessing.

Strategy:

  1. Lean heavily on rules early (they work without training data)
  2. Use conservative thresholds (more reviews, fewer auto-approvals)
  3. Weight the model higher as data accumulates (3-6 months)
  4. Send all outcomes (fraud and legitimate) to accelerate learning

For a detailed cold start strategy, see Risk Scoring: Cold Start Strategy.

Seasonal Shifts

What it is: Holiday shopping, back-to-school, end-of-year patterns look different from normal traffic. The model may flag legitimate holiday behavior as anomalous.

What to do:

  • November-December: Relax thresholds slightly (higher volume, more gift purchases, more new shipping addresses). Move borderline declines to review
  • January: Tighten thresholds back (holiday fraud chargebacks start arriving, legitimate volume drops)
  • Your seasonal peaks: If your business has specific peak periods (tax season, semester start, etc.), adjust thresholds around those dates
Don't Relax Too Much

"Relax for the holidays" doesn't mean "turn off fraud detection." Fraudsters know you're relaxing. Adjust by 10-15%, not 50%. And monitor daily during peak periods.


What to Monitor

Weekly Checks

Score distribution: Is it shifting?

Pull a histogram of fraud scores for the past week. Compare to the previous 4-week average.

What to Look ForWhat It MeansAction
Median score increasingModel is scoring more transactions as riskyCheck if fraud is actually rising, or if scores drifted
More transactions in the "review" bandReview queue will growVerify you have capacity, or adjust thresholds
Scores clustering at extremesModel is more "decisive" (fewer gray areas)Usually fine, but check false positive rate

Block rate trend: Did something break?

SignalPossible CauseAction
Sudden increase in blocksNew rule deployed too aggressively, model update, fraud spikeInvestigate immediately
Gradual increase in blocksScore drift, traffic mix changeRe-test thresholds
Sudden decrease in blocksRule disabled accidentally, model updateCheck fraud-in-approved rate

Monthly Checks

Fraud-in-approved rate: Are you missing fraud?

Fraud-in-approved = Chargebacks on approved transactions / Total approved transactions

If this rate is rising, your detection is getting worse. Either thresholds are too loose, or a new fraud pattern is getting through.

False positive sample: Are you blocking good customers?

Pull a random sample of 10-20 declined or blocked transactions. Investigate each one:

  • Was it actually fraud?
  • Was it a legitimate customer?
  • What rule or score triggered the block?

If more than 50% of your sample are false positives, your detection is too aggressive.

Quarterly Checks

Threshold re-calibration: Re-run the threshold sweep experiment. Scores drift, and thresholds that were optimal last quarter may not be optimal now.

Monitoring Summary

CadenceWhat to CheckRed Flag
WeeklyScore distributionMedian shifted > 5 points
WeeklyBlock rateChanged > 20% from baseline
MonthlyFraud-in-approved rateRising for 2+ consecutive months
MonthlyFalse positive sample (10-20 blocked txns)Over 50% are legitimate
QuarterlyThreshold sweepOptimal thresholds shifted > 10 points

Next Steps

Just getting started with ML scoring?

  1. Verify chargebacks flow to your vendor (the baseline)
  2. Set initial thresholds using Risk Scoring
  3. Start weekly score distribution checks

Already have ML but want to improve it?

  1. Start sending fraud reports within 48 hours of confirmation
  2. Flag false positives in your review tool
  3. Run the threshold sweep experiment quarterly

Want the full operational picture?

  1. Running Fraud Operations - Daily/weekly/monthly cadence
  2. Building Fraud Rules - Rules as fast response to new patterns
  3. Processor Rules Configuration - Platform-specific setup