Fraud Model Feedback
On this page
Before optimizing feedback loops, understand:
- Risk scoring fundamentals and thresholds
- Rules vs. ML and when ML adds value
- Fraud metrics you're tracking
- How your processor or vendor scores transactions (see Processor Rules Configuration)
- Your vendor's ML model improves when you give it outcomes: chargebacks, fraud reports, and false positive flags
- Chargebacks arrive 30-120 days late. Send fraud reports as soon as you confirm fraud to give the model faster signals
- Score drift is real. The same score means different things over time. Check your score distribution weekly
- Monitor four things: score distribution (weekly), block rate trend (weekly), fraud-in-approved rate (monthly), false positive sample (monthly)
- At minimum, verify that chargebacks flow to your fraud vendor. This is the baseline. Everything else is optimization
Your fraud vendor's ML model is only as good as the data it learns from. Every fraud score you see is based on patterns the model learned from past transactions and their outcomes. If you don't feed outcomes back, the model stagnates.
This page covers how the feedback loop works, what breaks it, and what to monitor to keep your model accurate. For threshold setting and cost calculations, see Risk Scoring. For rule-based detection, see Building Fraud Rules.
What Your Fraud Score Actually Means
Every fraud vendor gives you a number. The number represents the model's estimate that a transaction is fraudulent.
| Vendor | Score Range | Higher Means |
|---|---|---|
| Stripe Radar | 0-100 | Higher risk |
| Sift | 0-100 | Higher risk |
| Forter | Confidence % | Higher confidence in approve/decline decision |
| Signifyd | 0-1000 | Higher confidence the order is good |
| Adyen | 0-100+ | Higher risk |
The same score means different things at different businesses. A Stripe Radar score of 65 on a $20 digital download is a different bet than a score of 65 on a $2,000 electronics order. Your thresholds should reflect your margins, fraud rate, and tolerance for false positives.
For setting thresholds and calculating the cost trade-off, see Risk Scoring: Finding Your Thresholds.
How the Model Learns: The Feedback Loop
Your vendor's ML model improves when you tell it what actually happened after a transaction was scored. This is the feedback loop.
The Three Types of Feedback
1. Chargebacks (automatic in most setups)
When a customer disputes a charge with their bank, the chargeback flows from the card network through your processor. If your fraud vendor is your processor (Stripe, Adyen), this happens automatically. If you use a third-party vendor (Sift, Forter), verify the integration is passing dispute data.
- Signal strength: High. A chargeback is a confirmed bad outcome.
- Delay: 30-120 days after the transaction. This is the core problem (see below).
2. Fraud reports (manual or semi-automated)
You flag a transaction as confirmed fraud before a chargeback arrives. Maybe a customer contacted you ("I didn't make this purchase"), or your review team identified a fraud pattern.
- Signal strength: High, and much faster than waiting for a chargeback.
- Delay: As fast as you flag it. Days instead of months.
3. False positive flags (manual)
You tell the model that a blocked or declined transaction was actually legitimate. This is the signal most merchants forget to send.
- Signal strength: Medium. Helps the model learn what "good" looks like, not just what "bad" looks like.
- Delay: Depends on when you identify the false positive (e.g., customer contacts support about a block).
How Feedback Reaches Your Vendor
| Vendor | Chargebacks | Fraud Reports | False Positive Flags |
|---|---|---|---|
| Stripe | Automatic from disputes | Via Dashboard or API (Report as fraudulent) | Mark review items as legitimate |
| Sift | Requires $chargeback event via API | Requires $label event via API | Requires $label with is_fraud: false |
| Forter | Requires feedback API call | Requires feedback API call | Requires feedback API call |
| Signifyd | Automatic from guarantee claims | Via case management | Via case management |
| Adyen | Automatic from disputes | Via dispute management or API | Via risk rule feedback |
Make sure chargebacks reach your fraud vendor. This happens automatically with most processor-integrated tools (Stripe Radar, Adyen RevenueProtect). If you use a third-party tool like Sift or Forter, verify the integration is sending chargeback events. Check your vendor's documentation for integration status.
This is the baseline. Everything else on this page is optimization on top of it.
The Feedback Delay Problem
Chargebacks arrive 30-120 days after the transaction. This means the model is always learning from stale data.
Timeline:
Day 0: Transaction scored (model uses what it knows NOW)
Day 1-30: Fraud happens, goes undetected
Day 30: Customer notices unauthorized charge
Day 45: Customer files dispute with bank
Day 60: Chargeback reaches your processor
Day 75: Chargeback data feeds back to model
Day 90+: Model updates (next training cycle)
The model is learning from data that's 2-3 months old.
What This Means
The model is always one fraud cycle behind. If a new fraud pattern emerges in January, the model won't learn about it from chargebacks until March or April. Meanwhile, you're exposed.
What You Can Do
Send fraud reports immediately. Don't wait for the chargeback. When you confirm fraud through any channel, report it to your vendor right away.
| Signal | Speed | How to Send |
|---|---|---|
| Chargeback | 30-120 days | Automatic (verify integration) |
| Fraud report | 1-7 days | Manual via dashboard or API |
| Customer complaint ("I didn't buy this") | 1-3 days | Flag in review tool, report to vendor |
| Refund-before-chargeback | 1-14 days | Some vendors auto-learn from refund reason codes |
| Manual review decline | Same day | Review queue decision feeds back |
The faster you send outcomes, the faster the model adapts. A business that sends fraud reports within 48 hours gives its model a 2-3 month head start over one that waits for chargebacks.
Some processors (Stripe, Adyen) can learn from refund reason codes. If you refund a transaction with reason "fraudulent," the processor may treat this as a fraud signal. Check your processor's documentation to see if this applies.
When Models Go Wrong
ML models are not set-and-forget. They degrade over time if you don't monitor them.
Score Drift
What it is: The distribution of fraud scores shifts over time. Transactions that scored 60 last quarter might score 45 now, even though the underlying fraud risk hasn't changed.
Why it happens: The model retrains on new data, new fraud patterns emerge, seasonal traffic changes the population mix, or the vendor updates their model.
How to spot it: Check your score distribution weekly. If the median score is shifting or the tails are changing shape, your thresholds may need adjustment.
Last quarter: Median score = 25 | 5% of transactions scored > 70
This quarter: Median score = 32 | 8% of transactions scored > 70
If your decline threshold is 70, your block rate just went from 5% to 8%.
Did fraud actually increase, or did the scores shift?
What to do: Re-test your thresholds quarterly. Run the threshold sweep experiment from Risk Scoring to recalibrate.
Fraud Pattern Changes
What it is: Fraudsters change tactics. A new attack type hits your business that the model hasn't seen before.
Why it happens: Fraudsters adapt. If your model catches one pattern, they try another. New fraud techniques emerge industry-wide.
What to do: Rules catch known new patterns while the model catches up. When you identify a new fraud pattern:
- Write a rule to catch it (see Building Fraud Rules)
- Report the fraud to your vendor (speeds model learning)
- The model will eventually learn the pattern from outcomes
This is why you need both rules and ML. Rules are your fast response. ML is your long-term learner.
Cold Start
What it is: New business, new vertical, or new geography. The model has no history to learn from and is essentially guessing.
Strategy:
- Lean heavily on rules early (they work without training data)
- Use conservative thresholds (more reviews, fewer auto-approvals)
- Weight the model higher as data accumulates (3-6 months)
- Send all outcomes (fraud and legitimate) to accelerate learning
For a detailed cold start strategy, see Risk Scoring: Cold Start Strategy.
Seasonal Shifts
What it is: Holiday shopping, back-to-school, end-of-year patterns look different from normal traffic. The model may flag legitimate holiday behavior as anomalous.
What to do:
- November-December: Relax thresholds slightly (higher volume, more gift purchases, more new shipping addresses). Move borderline declines to review
- January: Tighten thresholds back (holiday fraud chargebacks start arriving, legitimate volume drops)
- Your seasonal peaks: If your business has specific peak periods (tax season, semester start, etc.), adjust thresholds around those dates
"Relax for the holidays" doesn't mean "turn off fraud detection." Fraudsters know you're relaxing. Adjust by 10-15%, not 50%. And monitor daily during peak periods.
What to Monitor
Weekly Checks
Score distribution: Is it shifting?
Pull a histogram of fraud scores for the past week. Compare to the previous 4-week average.
| What to Look For | What It Means | Action |
|---|---|---|
| Median score increasing | Model is scoring more transactions as risky | Check if fraud is actually rising, or if scores drifted |
| More transactions in the "review" band | Review queue will grow | Verify you have capacity, or adjust thresholds |
| Scores clustering at extremes | Model is more "decisive" (fewer gray areas) | Usually fine, but check false positive rate |
Block rate trend: Did something break?
| Signal | Possible Cause | Action |
|---|---|---|
| Sudden increase in blocks | New rule deployed too aggressively, model update, fraud spike | Investigate immediately |
| Gradual increase in blocks | Score drift, traffic mix change | Re-test thresholds |
| Sudden decrease in blocks | Rule disabled accidentally, model update | Check fraud-in-approved rate |
Monthly Checks
Fraud-in-approved rate: Are you missing fraud?
Fraud-in-approved = Chargebacks on approved transactions / Total approved transactions
If this rate is rising, your detection is getting worse. Either thresholds are too loose, or a new fraud pattern is getting through.
False positive sample: Are you blocking good customers?
Pull a random sample of 10-20 declined or blocked transactions. Investigate each one:
- Was it actually fraud?
- Was it a legitimate customer?
- What rule or score triggered the block?
If more than 50% of your sample are false positives, your detection is too aggressive.
Quarterly Checks
Threshold re-calibration: Re-run the threshold sweep experiment. Scores drift, and thresholds that were optimal last quarter may not be optimal now.
Monitoring Summary
| Cadence | What to Check | Red Flag |
|---|---|---|
| Weekly | Score distribution | Median shifted > 5 points |
| Weekly | Block rate | Changed > 20% from baseline |
| Monthly | Fraud-in-approved rate | Rising for 2+ consecutive months |
| Monthly | False positive sample (10-20 blocked txns) | Over 50% are legitimate |
| Quarterly | Threshold sweep | Optimal thresholds shifted > 10 points |
Next Steps
Just getting started with ML scoring?
- Verify chargebacks flow to your vendor (the baseline)
- Set initial thresholds using Risk Scoring
- Start weekly score distribution checks
Already have ML but want to improve it?
- Start sending fraud reports within 48 hours of confirmation
- Flag false positives in your review tool
- Run the threshold sweep experiment quarterly
Want the full operational picture?
- Running Fraud Operations - Daily/weekly/monthly cadence
- Building Fraud Rules - Rules as fast response to new patterns
- Processor Rules Configuration - Platform-specific setup
Related
- Risk Scoring - Thresholds, cost calculations, cold start
- Rules vs. ML - When to use each approach
- Building Fraud Rules - Starter rules, allow/block lists, shadow mode
- Velocity Rules - Rate-based detection
- Data Enrichment - IP, email, phone features for ML models
- Running Fraud Operations - Operational cadence playbook
- Processor Rules Configuration - Vendor-specific setup
- Manual Review - Review queue as feedback source
- Fraud Metrics - Measuring detection performance
- Fraud Vendors - Vendor ML capabilities
- Fraud Economics - Cost of fraud decisions
- Experimentation - Testing threshold changes