- What Should Forensic Psychologists Know About the Divide Between Research Reliability and Field Reliability in Violence Risk Assessment?
- How Can Adversarial Allegiance Compound Scoring Variability in Violence Risk Assessments?
- What are the Benefits and Limits of PCL-R Training for Forensic Psychologists?
- What Are Some Practical Strategies for Defensible PCL-R Use in Violence Risk Assessments?
- Conclusion
- Additional Resources
What Should Forensic Psychologists Know About the Divide Between Research Reliability and Field Reliability in Violence Risk Assessment?
The PCL-R manual reports strong interrater reliability figures drawn from validation studies: .89 for the total score and .78 to .88 for individual factors. These numbers reflect conditions in which raters receive standardized PCL-R training, score independently, and often have their work checked against a criterion. In research contexts, raters who are not reliable are retrained or replaced. The incentive structure favors accuracy.
Field conditions introduce variables that no research protocol can fully replicate. Forensic psychologists working in practice may score the PCL-R under time pressure, with variable access to collateral records, and in the context of adversarial proceedings where the retaining party has a stake in the outcome. A study of PCL-R scoring in a Belgian forensic psychiatric sample found an ICC of just .42 for total scores when the same individuals were scored by different raters across prison and hospital settings. That figure falls well below what is considered adequate for an instrument used in high-stakes decision-making.
This pattern is visible in multiple studies. An analysis of 558 offenders evaluated for sexually violent predator civil commitment in Texas found that approximately 32% of the variance in PCL-R total scores was attributable to differences among evaluators rather than differences among the offenders themselves. This discrepancy held even after controlling for offenders' self-reported antisocial traits using the Personality Assessment Inventory. Factor and facet scores showed similar evaluator-driven variability, with Facet 3 (Lifestyle) showing the largest proportion of variance attributable to the evaluator. For a forensic psychologist conducting a violence risk assessment, these findings mean that the person doing the scoring may matter nearly as much as the person being scored.
How Can Adversarial Allegiance Compound Scoring Variability in Violence Risk Assessments?
Field reliability challenges with the PCL-R become especially acute in adversarial legal proceedings. Research by Murrie, Boccaccini, and colleagues has documented what they term "adversarial allegiance," the tendency for forensic experts' scores to shift in the direction favored by the retaining party. In an early field study examining sexually violent predator trials, opposing evaluators who scored the same offenders produced an ICC of just .39 for PCL-R total scores, and score differences were consistently in the direction that supported the party who retained them.
The effect sizes are striking. In adversarial SVP proceedings, prosecution-retained experts assigned PCL-R scores averaging around 24, while defense-retained experts assigned scores averaging around 18 for the same individuals. That six-point difference can move an individual from below to above the conventional diagnostic threshold of 30, or shift a score that might have been characterized as moderate into the high range. It can also alter the output of actuarial instruments that incorporate PCL-R scores, such as the VRAG, cascading the effect into broader violence risk assessment conclusions.
To rule out the possibility that these patterns merely reflected attorneys selecting experts with pre-existing scoring tendencies, Murrie and colleagues conducted a controlled experiment. They paid 108 forensic psychologists and psychiatrists to review identical offender case files but led some to believe they were consulting for the prosecution and others for the defense. Even under these conditions, participants who believed they were working for the prosecution assigned higher scores, while those who believed they were working for the defense assigned lower scores. The allegiance effect was stronger for the PCL-R than for the Static-99R, a more structured instrument requiring less subjective judgment. The finding suggests that the PCL-R's reliance on clinical inference, particularly for Factor 1 items involving interpersonal and affective traits, makes it especially susceptible to context effects.
Importantly, adversarial allegiance appears to be largely unconscious. Research on cognitive bias in forensic evaluations identifies multiple mechanisms through which retaining-party affiliation can shape scoring, including confirmation bias, in-group allegiance, and the tendency to seek information that supports a favored hypothesis. The implication for forensic psychologists is that good intentions alone do not protect against scoring drift. Structured debiasing strategies are necessary.
What are the Benefits and Limits of PCL-R Training for Forensic Psychologists?
Formal PCL-R training workshops are widely recommended and, in some jurisdictions, practically required for evaluators who use the instrument. The evidence suggests that training does help. In the Texas SVP evaluator study, analyses limited to the 11 evaluators who reported completing a PCL-R training workshop showed less evidence of evaluator differences than analyses including all 14 evaluators. These results provide indirect support for the idea that formal training reduces, though does not eliminate, scoring variability.
The variability in PCL-R scoring does not vanish with more experience, and in fact, clinician experience and test reliability can even have a negative correlation. A study comparing PCL-R scores assigned by trained graduate students (using file review) and licensed clinicians (using file review and interview) found that clinicians showed more variability in their scores than the graduate students. However, the clinicians' scores were stronger predictors of future violence. The researchers proposed that experienced evaluators may engage in more diagnostically skilled inquiry, picking up on valid indicators that less experienced raters miss, while also being more affected by non-diagnostic information. This pattern complicates the assumption that experience and training uniformly improve reliability. They may improve validity while simultaneously introducing greater variability.
For forensic psychologists navigating these findings, the practical takeaway is that PCL-R training is necessary but not sufficient. Training establishes a baseline of competence in item scoring, familiarizes evaluators with common scoring pitfalls, and provides calibration against expert-scored cases. But training alone does not inoculate against the contextual pressures, cognitive biases, and interpretive judgment calls that drive field reliability problems. Ongoing calibration, whether through peer consultation, scoring comparisons with colleagues, or periodic refresher training, is part of what sustained proficiency in violence risk assessment requires.
What Are Some Practical Strategies for Defensible PCL-R Use in Violence Risk Assessments?
Given the documented gap between research and field reliability, forensic psychologists who use the PCL-R in violence risk assessment should adopt practices that acknowledge uncertainty, promote transparency, and reduce the influence of contextual bias. Several strategies are supported by the research literature.
Report confidence intervals and the standard error of measurement. The PCL-R's standard error of measurement means that any individual score represents a range rather than a precise value. A group of forensic psychology researchers, including DeMatteo, Hart, Heilbrun, and others, have emphasized that interrater reliability is a property of scores obtained in a particular context, not a stable property of the test itself. Reporting scores with confidence intervals communicates appropriate uncertainty to courts and reduces the risk that a single number will be treated as more precise than the science supports.
Report factor and facet scores, not just totals. Research consistently shows that Factor 2 (Lifestyle/Antisocial Behavior) scores are more reliable and more predictive of recidivism than Factor 1 (Interpersonal/Affective) scores. Yet Factor 1 traits tend to exert a disproportionate influence on decision-makers' perceptions of risk. Providing the full scoring profile, including factor and facet breakdowns, allows consumers of the report to weigh the more reliable and predictive components appropriately. Some researchers have suggested that evaluators consider not reporting Factor 1 scores at all unless there is a compelling reason for their inclusion, given their lower reliability and weaker predictive validity.
Implement debiasing strategies to counter adversarial allegiance. The research on adversarial allegiance is not a reason to stop using the PCL-R, but it is a reason to build structural safeguards into one's practice. Recommended approaches include the "consider-the-opposite" technique, in which evaluators deliberately ask themselves what reasons might support an alternative conclusion. Forensic psychologists can also maintain personal databases of their own scoring patterns and compare their mean scores across cases where they were retained by the prosecution versus the defense. When systematic patterns emerge, they signal a need for recalibration. Where possible, blinding oneself to the referral source during the initial scoring phase can reduce the pull of allegiance effects.
Use the PCL-R as one component of a comprehensive assessment. The PCL-R was designed as a measure of psychopathic personality traits. It was not designed as a standalone violence risk assessment instrument, although it has been widely used as one. Research has repeatedly emphasized that violence risk assessment is context-specific and that no single instrument captures the full picture. Structured professional judgment tools such as the HCR-20 V3 offer a broader framework that incorporates dynamic risk factors, situational variables, and risk management planning. The HCR-20 V3 guides professionals through the conceptualization of violence with an emphasis on intervention and risk management rather than prediction alone. Using the PCL-R alongside SPJ tools and actuarial instruments provides convergent data points and reduces overreliance on any single score.
Prepare for cross-examination on reliability limitations. Forensic psychologists who testify about PCL-R results should expect opposing counsel to raise the field reliability literature. Rather than treating this challenge as a threat to credibility, evaluators can strengthen their testimony by demonstrating awareness of these limitations and explaining the steps they took to mitigate them. Defensible forensic assessments depend on the use of established methods applied with transparency about their boundaries. An evaluator who can articulate why field reliability differs from research reliability, and what they did about it, is in a stronger position than one who claims the manual's ICC values represent the reliability of their own scores.
Conclusion
The PCL-R remains a valuable instrument for assessing psychopathic traits, and psychopathy remains a relevant construct in violence risk assessment. But the instrument's strong research reliability does not automatically transfer to the courtroom or the forensic hospital. Forensic psychologists who use the PCL-R in high-stakes evaluations bear the responsibility of bridging that gap through rigorous training, transparent reporting, structured debiasing, and integration with broader assessment frameworks. The field reliability challenges documented in the research are not reasons to abandon the PCL-R. They are reasons to use it with the care, humility, and methodological discipline that high-stakes forensic evaluations demand.
Additional Resources
eBook
Training
- Limited-Time Specially Priced Risk Assessment Training Bundle
- Violence Risk Assessment Certificate
- Assessing Psychopathy using the Hare Scales (PCL-R and PCL-SV)
- AAFP: Case Law - Criminal Responsibility
- AAFP: Case Law Series: Competence to Stand Trial
- AAFP: Case Law Series: Juvenile Justice
- AAFP: Case Law: Disability & Worker's Compensation
- Legal Issues and Violence Risk
- MDLPA: Landmark Criminal Law and Procedure Cases Involving Persons with Mental Disabilities
- An Introduction to Violence Risk/Threat Assessment: Legal Issues
Blog Posts
- Evaluator Differences in PCL-R Scores Suggest Need for Training
- PCL-R Demonstrates Inadequate Field Reliability and Validity
- Clinical Experience May Affect the Predictive Validity of the PCL-R
- What Not to Use in Assessing Risk for Institutional Violence: The Case Against the PCL-R in Capital Cases
- An Introduction to Violence Risk Assessments
- General Violence Risk: A Structured Professional Judgment Approach (HCR-20 V3)
- Fighting for Objectivity: Cognitive Bias in Forensic Examinations
- Why Do Forensic Experts Disagree? Suggestions for Policy and Practice Changes
- Why Must I, as a Forensic Psychologist, Stay Proficient in Violence Risk Assessment Tools?
- What Should a Forensic Psychologist Know About Expert Testimony When Conducting High-Stakes Criminal Forensic Assessments?



