The Business of Practice

What Should Forensic Psychologists Know About PCL-R Training for Field Reliability Challenges in Violence Risk Assessments?

The Psychopathy Checklist-Revised (PCL-R) remains one of the most widely used instruments in forensic psychology, routinely introduced in legal proceedings to inform violence risk assessment, sexually violent predator commitments, and parole determinations. Research conducted under controlled conditions has consistently demonstrated strong interrater reliability, with intraclass correlation coefficients for total scores ranging from .86 to .94. Yet a growing body of field research tells a different story. When the PCL-R moves from the research lab to the courtroom, reliability drops substantially, scoring variability increases, and the consequences of those inconsistencies land on real people facing real decisions about their liberty. For forensic psychologists who use this instrument in high-stakes evaluations, understanding the gap between research reliability and field reliability is a professional and ethical imperative that shapes how scores should be obtained, reported, and defended.

What Should Forensic Psychologists Know About PCL-R Training for Field Reliability Challenges in Violence Risk Assessments?

What Should Forensic Psychologists Know About the Divide Between Research Reliability and Field Reliability in Violence Risk Assessment?

The PCL-R manual reports strong interrater reliability figures drawn from validation studies: .89 for the total score and .78 to .88 for individual factors. These numbers reflect conditions in which raters receive standardized PCL-R training, score independently, and often have their work checked against a criterion. In research contexts, raters who are not reliable are retrained or replaced. The incentive structure favors accuracy.

Field conditions introduce variables that no research protocol can fully replicate. Forensic psychologists working in practice may score the PCL-R under time pressure, with variable access to collateral records, and in the context of adversarial proceedings where the retaining party has a stake in the outcome. A study of PCL-R scoring in a Belgian forensic psychiatric sample found an ICC of just .42 for total scores when the same individuals were scored by different raters across prison and hospital settings. That figure falls well below what is considered adequate for an instrument used in high-stakes decision-making.

This pattern is visible in multiple studies. An analysis of 558 offenders evaluated for sexually violent predator civil commitment in Texas found that approximately 32% of the variance in PCL-R total scores was attributable to differences among evaluators rather than differences among the offenders themselves. This discrepancy held even after controlling for offenders' self-reported antisocial traits using the Personality Assessment Inventory. Factor and facet scores showed similar evaluator-driven variability, with Facet 3 (Lifestyle) showing the largest proportion of variance attributable to the evaluator. For a forensic psychologist conducting a violence risk assessment, these findings mean that the person doing the scoring may matter nearly as much as the person being scored.

How Can Adversarial Allegiance Compound Scoring Variability in Violence Risk Assessments?

Field reliability challenges with the PCL-R become especially acute in adversarial legal proceedings. Research by Murrie, Boccaccini, and colleagues has documented what they term "adversarial allegiance," the tendency for forensic experts' scores to shift in the direction favored by the retaining party. In an early field study examining sexually violent predator trials, opposing evaluators who scored the same offenders produced an ICC of just .39 for PCL-R total scores, and score differences were consistently in the direction that supported the party who retained them.

The effect sizes are striking. In adversarial SVP proceedings, prosecution-retained experts assigned PCL-R scores averaging around 24, while defense-retained experts assigned scores averaging around 18 for the same individuals. That six-point difference can move an individual from below to above the conventional diagnostic threshold of 30, or shift a score that might have been characterized as moderate into the high range. It can also alter the output of actuarial instruments that incorporate PCL-R scores, such as the VRAG, cascading the effect into broader violence risk assessment conclusions.

To rule out the possibility that these patterns merely reflected attorneys selecting experts with pre-existing scoring tendencies, Murrie and colleagues conducted a controlled experiment. They paid 108 forensic psychologists and psychiatrists to review identical offender case files but led some to believe they were consulting for the prosecution and others for the defense. Even under these conditions, participants who believed they were working for the prosecution assigned higher scores, while those who believed they were working for the defense assigned lower scores. The allegiance effect was stronger for the PCL-R than for the Static-99R, a more structured instrument requiring less subjective judgment. The finding suggests that the PCL-R's reliance on clinical inference, particularly for Factor 1 items involving interpersonal and affective traits, makes it especially susceptible to context effects.

Importantly, adversarial allegiance appears to be largely unconscious. Research on cognitive bias in forensic evaluations identifies multiple mechanisms through which retaining-party affiliation can shape scoring, including confirmation bias, in-group allegiance, and the tendency to seek information that supports a favored hypothesis. The implication for forensic psychologists is that good intentions alone do not protect against scoring drift. Structured debiasing strategies are necessary.

What are the Benefits and Limits of PCL-R Training for Forensic Psychologists?

Formal PCL-R training workshops are widely recommended and, in some jurisdictions, practically required for evaluators who use the instrument. The evidence suggests that training does help. In the Texas SVP evaluator study, analyses limited to the 11 evaluators who reported completing a PCL-R training workshop showed less evidence of evaluator differences than analyses including all 14 evaluators. These results provide indirect support for the idea that formal training reduces, though does not eliminate, scoring variability.

The variability in PCL-R scoring does not vanish with more experience, and in fact, clinician experience and test reliability can even have a negative correlation. A study comparing PCL-R scores assigned by trained graduate students (using file review) and licensed clinicians (using file review and interview) found that clinicians showed more variability in their scores than the graduate students. However, the clinicians' scores were stronger predictors of future violence. The researchers proposed that experienced evaluators may engage in more diagnostically skilled inquiry, picking up on valid indicators that less experienced raters miss, while also being more affected by non-diagnostic information. This pattern complicates the assumption that experience and training uniformly improve reliability. They may improve validity while simultaneously introducing greater variability.

For forensic psychologists navigating these findings, the practical takeaway is that PCL-R training is necessary but not sufficient. Training establishes a baseline of competence in item scoring, familiarizes evaluators with common scoring pitfalls, and provides calibration against expert-scored cases. But training alone does not inoculate against the contextual pressures, cognitive biases, and interpretive judgment calls that drive field reliability problems. Ongoing calibration, whether through peer consultation, scoring comparisons with colleagues, or periodic refresher training, is part of what sustained proficiency in violence risk assessment requires.

What Are Some Practical Strategies for Defensible PCL-R Use in Violence Risk Assessments?

Given the documented gap between research and field reliability, forensic psychologists who use the PCL-R in violence risk assessment should adopt practices that acknowledge uncertainty, promote transparency, and reduce the influence of contextual bias. Several strategies are supported by the research literature.

Report confidence intervals and the standard error of measurement. The PCL-R's standard error of measurement means that any individual score represents a range rather than a precise value. A group of forensic psychology researchers, including DeMatteo, Hart, Heilbrun, and others, have emphasized that interrater reliability is a property of scores obtained in a particular context, not a stable property of the test itself. Reporting scores with confidence intervals communicates appropriate uncertainty to courts and reduces the risk that a single number will be treated as more precise than the science supports.

Report factor and facet scores, not just totals. Research consistently shows that Factor 2 (Lifestyle/Antisocial Behavior) scores are more reliable and more predictive of recidivism than Factor 1 (Interpersonal/Affective) scores. Yet Factor 1 traits tend to exert a disproportionate influence on decision-makers' perceptions of risk. Providing the full scoring profile, including factor and facet breakdowns, allows consumers of the report to weigh the more reliable and predictive components appropriately. Some researchers have suggested that evaluators consider not reporting Factor 1 scores at all unless there is a compelling reason for their inclusion, given their lower reliability and weaker predictive validity.

Implement debiasing strategies to counter adversarial allegiance. The research on adversarial allegiance is not a reason to stop using the PCL-R, but it is a reason to build structural safeguards into one's practice. Recommended approaches include the "consider-the-opposite" technique, in which evaluators deliberately ask themselves what reasons might support an alternative conclusion. Forensic psychologists can also maintain personal databases of their own scoring patterns and compare their mean scores across cases where they were retained by the prosecution versus the defense. When systematic patterns emerge, they signal a need for recalibration. Where possible, blinding oneself to the referral source during the initial scoring phase can reduce the pull of allegiance effects.

Use the PCL-R as one component of a comprehensive assessment. The PCL-R was designed as a measure of psychopathic personality traits. It was not designed as a standalone violence risk assessment instrument, although it has been widely used as one. Research has repeatedly emphasized that violence risk assessment is context-specific and that no single instrument captures the full picture. Structured professional judgment tools such as the HCR-20 V3 offer a broader framework that incorporates dynamic risk factors, situational variables, and risk management planning. The HCR-20 V3 guides professionals through the conceptualization of violence with an emphasis on intervention and risk management rather than prediction alone. Using the PCL-R alongside SPJ tools and actuarial instruments provides convergent data points and reduces overreliance on any single score.

Prepare for cross-examination on reliability limitations. Forensic psychologists who testify about PCL-R results should expect opposing counsel to raise the field reliability literature. Rather than treating this challenge as a threat to credibility, evaluators can strengthen their testimony by demonstrating awareness of these limitations and explaining the steps they took to mitigate them. Defensible forensic assessments depend on the use of established methods applied with transparency about their boundaries. An evaluator who can articulate why field reliability differs from research reliability, and what they did about it, is in a stronger position than one who claims the manual's ICC values represent the reliability of their own scores.

Conclusion

The PCL-R remains a valuable instrument for assessing psychopathic traits, and psychopathy remains a relevant construct in violence risk assessment. But the instrument's strong research reliability does not automatically transfer to the courtroom or the forensic hospital. Forensic psychologists who use the PCL-R in high-stakes evaluations bear the responsibility of bridging that gap through rigorous training, transparent reporting, structured debiasing, and integration with broader assessment frameworks. The field reliability challenges documented in the research are not reasons to abandon the PCL-R. They are reasons to use it with the care, humility, and methodological discipline that high-stakes forensic evaluations demand.

Additional Resources

eBook

 

Training

 

Blog Posts

Latest Business of Practice posts

Browse Business of Practice

What Should a Forensic Psychologist Know When Preparing the Psychopathy Checklist-Revised (PCL-R) as Evidence for Criminal Court Testimony?

The Psychopathy Checklist-Revised (PCL-R) carries exceptional evidentiary weight in criminal court. Few assessment tools evoke stronger reactions

How Does a Forensic Psychologist Testifying as a Criminal Court Expert Witness Navigate Competency to Stand Trial Evaluations that Hold Up Under Courtroom Scrutiny?

Competency to stand trial is one of the most frequently raised psycholegal questions in American criminal courts. With more than 100,000 evaluations

How Can I, as a Forensic Psychologist, Learn How to Become a Child Custody Evaluator?

For forensic psychologists, the role of child custody evaluator represents a significant subspecialty within the field. The transition requires