Fraud Model Feedback

On this page

What Your Fraud Score Actually Means
How the Model Learns: The Feedback Loop
- The Three Types of Feedback
- How Feedback Reaches Your Vendor
The Feedback Delay Problem
- What This Means
- What You Can Do
When Models Go Wrong
What to Monitor
Next Steps
Related

Prerequisites

Before optimizing feedback loops, understand:

Risk scoring fundamentals and thresholds
Rules vs. ML and when ML adds value
Fraud metrics you're tracking
How your processor or vendor scores transactions (see Processor Rules Configuration)

TL;DR

Your vendor's ML model improves when you give it outcomes: chargebacks, fraud reports, and false positive flags
Chargebacks arrive 30-120 days late. Send fraud reports as soon as you confirm fraud to give the model faster signals
Score drift is real. The same score means different things over time. Check your score distribution weekly
Monitor four things: score distribution (weekly), block rate trend (weekly), fraud-in-approved rate (monthly), false positive sample (monthly)
At minimum, verify that chargebacks flow to your fraud vendor. This is the baseline. Everything else is optimization

Your fraud vendor's ML model is only as good as the data it learns from. Every fraud score you see is based on patterns the model learned from past transactions and their outcomes. If you don't feed outcomes back, the model stagnates.

This page covers how the feedback loop works, what breaks it, and what to monitor to keep your model accurate. For threshold setting and cost calculations, see Risk Scoring. For rule-based detection, see Building Fraud Rules.

What Your Fraud Score Actually Means

Every fraud vendor gives you a number. The number represents the model's estimate that a transaction is fraudulent.

Vendor	Score Range	Higher Means
Stripe Radar	0-100	Higher risk
Sift	0-100	Higher risk
Forter	Confidence %	Higher confidence in approve/decline decision
Signifyd	0-1000	Higher confidence the order is good
Adyen	0-100+	Higher risk

The same score means different things at different businesses. A Stripe Radar score of 65 on a $20 digital download is a different bet than a score of 65 on a $2,000 electronics order. Your thresholds should reflect your margins, fraud rate, and tolerance for false positives.

For setting thresholds and calculating the cost trade-off, see Risk Scoring: Finding Your Thresholds.

How the Model Learns: The Feedback Loop

Your vendor's ML model improves when you tell it what actually happened after a transaction was scored. This is the feedback loop.

The Three Types of Feedback

1. Chargebacks (automatic in most setups)

When a customer disputes a charge with their bank, the chargeback flows from the card network through your processor. If your fraud vendor is your processor (Stripe, Adyen), this happens automatically. If you use a third-party vendor (Sift, Forter), verify the integration is passing dispute data.

Signal strength: High. A chargeback is a confirmed bad outcome.
Delay: 30-120 days after the transaction. This is the core problem (see below).

2. Fraud reports (manual or semi-automated)

You flag a transaction as confirmed fraud before a chargeback arrives. Maybe a customer contacted you ("I didn't make this purchase"), or your review team identified a fraud pattern.

Signal strength: High, and much faster than waiting for a chargeback.
Delay: As fast as you flag it. Days instead of months.

3. False positive flags (manual)

You tell the model that a blocked or declined transaction was actually legitimate. This is the signal most merchants forget to send.

Signal strength: Medium. Helps the model learn what "good" looks like, not just what "bad" looks like.
Delay: Depends on when you identify the false positive (e.g., customer contacts support about a block).

How Feedback Reaches Your Vendor

Vendor	Chargebacks	Fraud Reports	False Positive Flags
Stripe	Automatic from disputes	Via Dashboard or API (`Report as fraudulent`)	Mark review items as legitimate
Sift	Requires `$chargeback` event via API	Requires `$label` event via API	Requires `$label` with `is_fraud: false`
Forter	Requires feedback API call	Requires feedback API call	Requires feedback API call
Signifyd	Automatic from guarantee claims	Via case management	Via case management
Adyen	Automatic from disputes	Via dispute management or API	Via risk rule feedback

At Minimum, Verify Chargebacks Flow

Make sure chargebacks reach your fraud vendor. This happens automatically with most processor-integrated tools (Stripe Radar, Adyen RevenueProtect). If you use a third-party tool like Sift or Forter, verify the integration is sending chargeback events. Check your vendor's documentation for integration status.

This is the baseline. Everything else on this page is optimization on top of it.

The Feedback Delay Problem

Chargebacks arrive 30-120 days after the transaction. This means the model is always learning from stale data.

Timeline:
Day 0:    Transaction scored (model uses what it knows NOW)
Day 1-30: Fraud happens, goes undetected
Day 30:   Customer notices unauthorized charge
Day 45:   Customer files dispute with bank
Day 60:   Chargeback reaches your processor
Day 75:   Chargeback data feeds back to model
Day 90+:  Model updates (next training cycle)

The model is learning from data that's 2-3 months old.

What This Means

The model is always one fraud cycle behind. If a new fraud pattern emerges in January, the model won't learn about it from chargebacks until March or April. Meanwhile, you're exposed.

What You Can Do

Send fraud reports immediately. Don't wait for the chargeback. When you confirm fraud through any channel, report it to your vendor right away.

Signal	Speed	How to Send
Chargeback	30-120 days	Automatic (verify integration)
Fraud report	1-7 days	Manual via dashboard or API
Customer complaint ("I didn't buy this")	1-3 days	Flag in review tool, report to vendor
Refund-before-chargeback	1-14 days	Some vendors auto-learn from refund reason codes
Manual review decline	Same day	Review queue decision feeds back

The faster you send outcomes, the faster the model adapts. A business that sends fraud reports within 48 hours gives its model a 2-3 month head start over one that waits for chargebacks.

Refund Reason Codes

Some processors (Stripe, Adyen) can learn from refund reason codes. If you refund a transaction with reason "fraudulent," the processor may treat this as a fraud signal. Check your processor's documentation to see if this applies.

When Models Go Wrong

ML models are not set-and-forget. They degrade over time if you don't monitor them.

Score Drift

What it is: The distribution of fraud scores shifts over time. Transactions that scored 60 last quarter might score 45 now, even though the underlying fraud risk hasn't changed.

Why it happens: The model retrains on new data, new fraud patterns emerge, seasonal traffic changes the population mix, or the vendor updates their model.

How to spot it: Check your score distribution weekly. If the median score is shifting or the tails are changing shape, your thresholds may need adjustment.

Last quarter:     Median score = 25    |  5% of transactions scored > 70
This quarter:     Median score = 32    |  8% of transactions scored > 70

If your decline threshold is 70, your block rate just went from 5% to 8%.
Did fraud actually increase, or did the scores shift?

What to do: Re-test your thresholds quarterly. Run the threshold sweep experiment from Risk Scoring to recalibrate.

Fraud Pattern Changes

What it is: Fraudsters change tactics. A new attack type hits your business that the model hasn't seen before.

Why it happens: Fraudsters adapt. If your model catches one pattern, they try another. New fraud techniques emerge industry-wide.

What to do: Rules catch known new patterns while the model catches up. When you identify a new fraud pattern:

Write a rule to catch it (see Building Fraud Rules)
Report the fraud to your vendor (speeds model learning)
The model will eventually learn the pattern from outcomes

This is why you need both rules and ML. Rules are your fast response. ML is your long-term learner.

Cold Start

What it is: New business, new vertical, or new geography. The model has no history to learn from and is essentially guessing.

Strategy:

Lean heavily on rules early (they work without training data)
Use conservative thresholds (more reviews, fewer auto-approvals)
Weight the model higher as data accumulates (3-6 months)
Send all outcomes (fraud and legitimate) to accelerate learning

For a detailed cold start strategy, see Risk Scoring: Cold Start Strategy.

Seasonal Shifts

What it is: Holiday shopping, back-to-school, end-of-year patterns look different from normal traffic. The model may flag legitimate holiday behavior as anomalous.

What to do:

November-December: Relax thresholds slightly (higher volume, more gift purchases, more new shipping addresses). Move borderline declines to review
January: Tighten thresholds back (holiday fraud chargebacks start arriving, legitimate volume drops)
Your seasonal peaks: If your business has specific peak periods (tax season, semester start, etc.), adjust thresholds around those dates

Don't Relax Too Much

"Relax for the holidays" doesn't mean "turn off fraud detection." Fraudsters know you're relaxing. Adjust by 10-15%, not 50%. And monitor daily during peak periods.

What to Monitor

Weekly Checks

Score distribution: Is it shifting?

Pull a histogram of fraud scores for the past week. Compare to the previous 4-week average.

What to Look For	What It Means	Action
Median score increasing	Model is scoring more transactions as risky	Check if fraud is actually rising, or if scores drifted
More transactions in the "review" band	Review queue will grow	Verify you have capacity, or adjust thresholds
Scores clustering at extremes	Model is more "decisive" (fewer gray areas)	Usually fine, but check false positive rate

Block rate trend: Did something break?

Signal	Possible Cause	Action
Sudden increase in blocks	New rule deployed too aggressively, model update, fraud spike	Investigate immediately
Gradual increase in blocks	Score drift, traffic mix change	Re-test thresholds
Sudden decrease in blocks	Rule disabled accidentally, model update	Check fraud-in-approved rate

Monthly Checks

Fraud-in-approved rate: Are you missing fraud?

Fraud-in-approved = Chargebacks on approved transactions / Total approved transactions

If this rate is rising, your detection is getting worse. Either thresholds are too loose, or a new fraud pattern is getting through.

False positive sample: Are you blocking good customers?

Pull a random sample of 10-20 declined or blocked transactions. Investigate each one:

Was it actually fraud?
Was it a legitimate customer?
What rule or score triggered the block?

If more than 50% of your sample are false positives, your detection is too aggressive.

Quarterly Checks

Threshold re-calibration: Re-run the threshold sweep experiment. Scores drift, and thresholds that were optimal last quarter may not be optimal now.

Monitoring Summary

Cadence	What to Check	Red Flag
Weekly	Score distribution	Median shifted > 5 points
Weekly	Block rate	Changed > 20% from baseline
Monthly	Fraud-in-approved rate	Rising for 2+ consecutive months
Monthly	False positive sample (10-20 blocked txns)	Over 50% are legitimate
Quarterly	Threshold sweep	Optimal thresholds shifted > 10 points

Next Steps

Just getting started with ML scoring?

Verify chargebacks flow to your vendor (the baseline)
Set initial thresholds using Risk Scoring
Start weekly score distribution checks

Already have ML but want to improve it?

Start sending fraud reports within 48 hours of confirmation
Flag false positives in your review tool
Run the threshold sweep experiment quarterly

Want the full operational picture?

Running Fraud Operations - Daily/weekly/monthly cadence
Building Fraud Rules - Rules as fast response to new patterns
Processor Rules Configuration - Platform-specific setup

Risk Scoring - Thresholds, cost calculations, cold start
Rules vs. ML - When to use each approach
Building Fraud Rules - Starter rules, allow/block lists, shadow mode
Velocity Rules - Rate-based detection
Data Enrichment - IP, email, phone features for ML models
Running Fraud Operations - Operational cadence playbook
Processor Rules Configuration - Vendor-specific setup
Manual Review - Review queue as feedback source
Fraud Metrics - Measuring detection performance
Fraud Vendors - Vendor ML capabilities
Fraud Economics - Cost of fraud decisions
Experimentation - Testing threshold changes

What Your Fraud Score Actually Means​

How the Model Learns: The Feedback Loop​

The Three Types of Feedback​

How Feedback Reaches Your Vendor​

The Feedback Delay Problem​

What This Means​

What You Can Do​

When Models Go Wrong​

Score Drift​

Fraud Pattern Changes​

Cold Start​

Seasonal Shifts​

What to Monitor​

Weekly Checks​

Monthly Checks​

Quarterly Checks​

Monitoring Summary​

Next Steps​

Related​