Statistical Guidance for Reporting Diagnostic Test Results with Qualitative Outcomes

This guidance addresses the reporting of results from studies evaluating diagnostic devices with qualitative outcomes (positive/negative) in PMAs and 510(k)s. It focuses on appropriate statistical practices for reporting results and identifies common inappropriate practices, with special attention to discrepant resolution issues.

Posted Mar 12, 2007

By ReguVirta

3 min read

What You Need to Know? 👇

What is the difference between sensitivity/specificity and positive/negative percent agreement in diagnostic test evaluation?

Sensitivity and specificity are used when comparing a new test to a reference standard (gold standard), measuring true diagnostic accuracy. Positive/negative percent agreement are used when comparing to a non-reference standard, measuring only agreement between methods, not correctness.

When should I use discrepant resolution in my diagnostic test study?

FDA does not recommend discrepant resolution as it produces biased estimates. If you must resolve discrepancies, use a reference standard resolver and test both concordant and discrepant results, not just discrepancies, to obtain valid performance estimates.

How do I handle equivocal or indeterminate test results in my statistical analysis?

Don’t discard equivocal results as this creates bias. Report two sets of performance measures: one including equivocal results with positive outcomes, another with negative outcomes. Consult FDA statisticians for guidance on your specific situation.

What constitutes an appropriate reference standard for my diagnostic device study?

A reference standard is the best available method for establishing presence/absence of the target condition, established by medical community consensus. It can be a single test, combination of methods, or clinical criteria. Consult FDA early if uncertain.

How can I avoid spectrum bias in my diagnostic test evaluation study?

Include subjects across the entire disease spectrum, relevant confounding conditions, and different demographic groups. Avoid studies with only healthy subjects and severe cases while omitting intermediate, difficult-to-diagnose cases that represent real-world use.

What statistical measures should I report for diagnostic test performance in my FDA submission?

Report sensitivity/specificity pairs with 95% confidence intervals when using reference standards, or positive/negative percent agreement when using non-reference standards. Include 2x2 tables, study population descriptions, and avoid overall agreement measures alone.

What You Need to Do 👇

Recommended Actions

Consult with FDA early to discuss study design and statistical analysis plans
Select appropriate benchmark:
- Use reference standard if available
- If reference standard impractical, use on subset of subjects
- If no reference standard, consider constructing one
- If no reference standard possible, use agreement measures
Design study population carefully:
- Include full spectrum of disease states
- Include confounding conditions
- Include diverse demographics
- Use final device version and instructions
- Include multiple qualified users
Document and report:
- Complete study context and methods
- Raw data in 2x2 tables
- Appropriate performance measures with confidence intervals
- Results by site and subgroups
- Full accounting of all subjects
Avoid inappropriate practices:
- Do not use discrepant resolution
- Do not eliminate equivocal results
- Do not use altered outcomes
- Do not compare to algorithms using the test
- Do not mix intended use and other populations

Key Considerations

Clinical testing

Study population must include subjects across entire range of disease states
Must include subjects with relevant confounding medical conditions
Must include subjects across different demographic groups
Must use final version of device according to final instructions
Must include multiple users with relevant training and expertise
Must cover range of expected use conditions

Labelling

Must clearly describe designated reference standard if constructed
Must define condition of interest and intended use population
Must describe conditions of use (operator experience, facility, controls, specimen criteria)
Must report performance measures with confidence intervals
Must report results separately for intended use population

Other considerations

Must use appropriate benchmark (reference standard when available)
Must avoid discrepant resolution practices
Must avoid elimination of equivocal results
Must avoid using test outcomes altered by discrepant resolution
Must avoid comparing to algorithms that use the new test outcome
Must report complete accounting of all subjects and test results
Must report results by clinical site and demographic subgroups

Relevant Guidances 🔗

CLSI EP12-A: User protocol for evaluation of qualitative test performance
CLSI GP10-A: Assessment of clinical accuracy using ROC plots

Original guidance

Statistical Guidance for Reporting Diagnostic Test Results with Qualitative Outcomes
HTML / PDF
Issue date: 2007-03-12
Last changed date: 2020-03-19
Status: FINAL
Official FDA topics: Medical Devices, Biostatistics
ReguVirta ID: c4fbbcad300c6d5161c028857324d965

FDA guidance, Final

Medical Devices Biostatistics

This post is licensed under CC BY 4.0 by the author.