Statistical Guidance for Reporting Diagnostic Test Results with Qualitative Outcomes
This guidance addresses the reporting of results from studies evaluating diagnostic devices with qualitative outcomes (positive/negative) in PMAs and 510(k)s. It focuses on appropriate statistical practices for reporting results and identifies common inappropriate practices, with special attention to discrepant resolution issues.
What You Need to Know? π
What is the difference between sensitivity/specificity and positive/negative percent agreement in diagnostic test evaluation?
Sensitivity and specificity are used when comparing a new test to a reference standard (gold standard), measuring true diagnostic accuracy. Positive/negative percent agreement are used when comparing to a non-reference standard, measuring only agreement between methods, not correctness.
When should I use discrepant resolution in my diagnostic test study?
FDA does not recommend discrepant resolution as it produces biased estimates. If you must resolve discrepancies, use a reference standard resolver and test both concordant and discrepant results, not just discrepancies, to obtain valid performance estimates.
How do I handle equivocal or indeterminate test results in my statistical analysis?
Donβt discard equivocal results as this creates bias. Report two sets of performance measures: one including equivocal results with positive outcomes, another with negative outcomes. Consult FDA statisticians for guidance on your specific situation.
What constitutes an appropriate reference standard for my diagnostic device study?
A reference standard is the best available method for establishing presence/absence of the target condition, established by medical community consensus. It can be a single test, combination of methods, or clinical criteria. Consult FDA early if uncertain.
How can I avoid spectrum bias in my diagnostic test evaluation study?
Include subjects across the entire disease spectrum, relevant confounding conditions, and different demographic groups. Avoid studies with only healthy subjects and severe cases while omitting intermediate, difficult-to-diagnose cases that represent real-world use.
What statistical measures should I report for diagnostic test performance in my FDA submission?
Report sensitivity/specificity pairs with 95% confidence intervals when using reference standards, or positive/negative percent agreement when using non-reference standards. Include 2x2 tables, study population descriptions, and avoid overall agreement measures alone.
What You Need to Do π
Recommended Actions
- Consult with FDA early to discuss study design and statistical analysis plans
- Select appropriate benchmark:
- Use reference standard if available
- If reference standard impractical, use on subset of subjects
- If no reference standard, consider constructing one
- If no reference standard possible, use agreement measures
- Design study population carefully:
- Include full spectrum of disease states
- Include confounding conditions
- Include diverse demographics
- Use final device version and instructions
- Include multiple qualified users
- Document and report:
- Complete study context and methods
- Raw data in 2x2 tables
- Appropriate performance measures with confidence intervals
- Results by site and subgroups
- Full accounting of all subjects
- Avoid inappropriate practices:
- Do not use discrepant resolution
- Do not eliminate equivocal results
- Do not use altered outcomes
- Do not compare to algorithms using the test
- Do not mix intended use and other populations
Key Considerations
Clinical testing
- Study population must include subjects across entire range of disease states
- Must include subjects with relevant confounding medical conditions
- Must include subjects across different demographic groups
- Must use final version of device according to final instructions
- Must include multiple users with relevant training and expertise
- Must cover range of expected use conditions
Labelling
- Must clearly describe designated reference standard if constructed
- Must define condition of interest and intended use population
- Must describe conditions of use (operator experience, facility, controls, specimen criteria)
- Must report performance measures with confidence intervals
- Must report results separately for intended use population
Other considerations
- Must use appropriate benchmark (reference standard when available)
- Must avoid discrepant resolution practices
- Must avoid elimination of equivocal results
- Must avoid using test outcomes altered by discrepant resolution
- Must avoid comparing to algorithms that use the new test outcome
- Must report complete accounting of all subjects and test results
- Must report results by clinical site and demographic subgroups
Relevant Guidances π
- CLIA Waiver Applications for In Vitro Diagnostic Devices - Demonstrating Simplicity and Insignificant Risk of Erroneous Results
- Dual 510k and CLIA Waiver by Application for In Vitro Diagnostic Tests
- Content and Decision-Making Process for 510k Submissions: Determining Substantial Equivalence
- Design Considerations for Medical Device Pivotal Clinical Studies
Related references and norms π
- CLSI EP12-A: User protocol for evaluation of qualitative test performance
- CLSI GP10-A: Assessment of clinical accuracy using ROC plots