Assessing Intimate Partner Abuse Risk in Female Perpetrators

In This Post

Featured Article
Article Title
Authors
Abstract
Keywords
Summary of Research
Translating Research into Practice
Other Interesting Tidbits for Researchers and Clinicians

Featured Article

Psychology, Crime & Law | 2026, Vol. 32, No. 2, p. 367-393

Article Title

Assessing risk among women who perpetrate intimate partner abuse

Authors

Julia Nazarewicz; Centre for Forensic Behavioural Science, Swinburne University of Technology and the Victorian Institute for Forensic Mental Health (Forensicare), Melbourne, Australia

Michael D. Trood; Centre for Forensic Behavioural Science, Swinburne University of Technology and the Victorian Institute for Forensic Mental Health (Forensicare), Melbourne, Australia

Troy E. McEwan; Centre for Forensic Behavioural Science, Swinburne University of Technology and the Victorian Institute for Forensic Mental Health (Forensicare), Melbourne, Australia; Centre for Education and Research in Forensic Psychology, University of Kent, Canterbury, UK

Susanne Strand; Centre for Forensic Behavioural Science, Swinburne University of Technology and the Victorian Institute for Forensic Mental Health (Forensicare), Melbourne, Australia; School of Behavioural, Social and Legal Sciences, Örebro University, Örebro, Sweden

Stefan Luebbers; Centre for Forensic Behavioural Science, Swinburne University of Technology and the Victorian Institute for Forensic Mental Health (Forensicare), Melbourne, Australia; Youth Forensic Specialist Service, Alfred Health, Melbourne, Australia

Benjamin L. Spivak; Centre for Forensic Behavioural Science, Swinburne University of Technology and the Victorian Institute for Forensic Mental Health (Forensicare), Melbourne, Australia

Abstract

This paper presents a prospective evaluation of the predictive validity of three risk assessment instruments in a sample of Australian women identified by police as intimate partner abuse (IPA) perpetrators. Using a subsample from Spivak et al. (2020), 410 female IPA perpetrators were screened using the Victoria Police Screening Assessment for Family Violence Risk (VP-SAFvR) and evaluated alongside two samples of 60 and 229 female IPA perpetrators assessed using the Brief Spousal Assault Form for the Evaluation of Risk (B-SAFER) and a modified version of the Lethality Screen respectively. Of the three instruments, the VP-SAFvR possessed indicators of effective discrimination (i.e. sensitivity, specificity, area under the curve) and predictive validity (i.e. positive predictive value, negative predictive value) on general IPA recidivism and its intended outcome of family or intimate partner abuse. The B-SAFER risk judgement similarly predicted its intended outcome of physical IPA recidivism, with notable indicators of discrimination and predictive validity. The results of the Modified Lethality Screen were conversely mixed on measures of discrimination and prediction for its intended outcome of severe IPA. The current findings suggest that these instruments function consistently for women and men who are identified by police as perpetrating family or intimate partner abuse.

Keywords

Intimate partner abuse; actuarial assessment; structured professional judgement; violence risk assessment; female perpetrators

Summary of Research

Over the past two decades, “police have been increasingly expected to effectively assess and manage risks associated with intimate partner abuse (IPA),” often using structured risk assessment tools to guide decisions about future harm. While “multiple existing instruments have been shown to provide valid assessments” for male perpetrators, “it remains unclear whether existing IPA risk assessments… can accurately assess risk when the identified perpetrator is a woman.” At the same time, “more women than ever before are being identified as perpetrators of IPA,” and there is “considerable evidence that some women do instigate or predominantly commit abuse.” Despite this, women have been largely excluded from validation research, leaving uncertainty about whether these tools generalize across genders. The present study, therefore, aimed to evaluate “the predictive validity of three police administered risk assessment instruments” in women identified as IPA perpetrators, examining their ability to predict different forms of recidivism and to determine whether these tools function effectively for this population (p. 367-370).

The study used police data from “two police divisions in metropolitan Melbourne,” where family violence incidents were recorded regardless of whether charges were laid. Three samples of female IPA perpetrators were analyzed: a VP-SAFvR sample (N = 410), a B-SAFER sample (N = 60), and a Modified Lethality Screen sample (N = 229), each followed for approximately 13–15 months. The VP-SAFvR is described as “an actuarial screening tool… to identify which FV cases are at increased risk of any future police reported family violence,” while the B-SAFER is a “15-item SPJ guideline designed to assist in the assessment and management of perpetrators… based on the risk of future spousal assault.” The Lethality Screen is an “11-question screening tool… intended to… estimate the risk of future severe or lethal partner violence.” Outcomes included general IPA recidivism, physical IPA recidivism, and severe IPA recidivism. Analyses focused on discrimination and predictive validity using ROC curves, sensitivity, specificity, and predictive values, as well as examining associations between individual risk factors and outcomes (p. 371-376).

Across instruments, recidivism was relatively common, with “22.68%” of women in the VP-SAFvR sample engaging in general IPA recidivism and “40%” of the B-SAFER sample identified as recidivists. The VP-SAFvR showed that women who reoffended had “significantly higher” scores than those who did not, and total scores “effectively discriminated between female perpetrators who did and did not engage in general IPA recidivism” with a “moderate and significant effect size.” Sensitivity was high, indicating that many individuals who reoffended were correctly identified, while specificity was lower, reflecting some false positives. The B-SAFER results showed mixed performance across outcomes. It “was not effective at discriminating” general IPA recidivism, but for its intended outcome, physical IPA, “police judgements accurately categorised recidivists and non-recidivists 70% of the time,” with a “large and significant effect size.” The findings suggest that “the majority of general IPA recidivists… engaged in non-physical IPA,” which likely contributed to the tool’s weaker performance for broader outcomes. The Modified Lethality Screen demonstrated weaker performance overall, with “relatively lower” sensitivity and predictive power for general IPA, though it performed somewhat better for severe IPA, albeit with very low base rates (p. 376-380).

Overall, the findings indicate that “the B-SAFER and VP-SAFvR demonstrated good predictive and discriminant validity… for their associated outcomes,” whereas “the Modified Lethality Screen did not effectively predict future general IPA.” The VP-SAFvR showed “reasonable utility in discriminating between women with and without general IPA and FIPA outcomes,” suggesting that tools developed for broader definitions of family violence can generalize to female perpetrators. Similarly, the B-SAFER “may be effective in predicting physical IPA recidivism in women,” supporting its use across genders when focused on physical violence (p. 380-383).

At the same time, the results highlight important limitations in applying tools across different forms of abuse. The B-SAFER’s weaker performance for general IPA suggests “the instrument’s inability to identify non-physical IPA recidivism,” reflecting that women may engage more frequently in non-physical forms of abuse. The Modified Lethality Screen’s “relatively unreliable” performance underscores the difficulty of predicting “low base rate outcomes” like severe violence and suggests that risk factors for severe IPA “may differ between men and women” (p. 380-383).

More broadly, the study emphasizes that risk assessment tools vary in purpose and effectiveness depending on their design. Instruments like the VP-SAFvR may be useful for “high-volume” screening of general risk, while the B-SAFER may be more appropriate for “a more careful and targeted approach” to assessing physical violence among higher-risk individuals. The findings also suggest that “these tools can effectively predict differing forms of IPA involving women” when used within their intended context, but that continued research is needed to refine risk assessment approaches for female perpetrators (p. 380-383).

Translating Research into Practice

“The results of this research highlight the importance of the intended applications and outcomes of tools designed to assess the risk of future IPA. Of the instruments evaluated, the strongest results were found among those that were developed on representative samples or specifically for the prediction of more common forms of IPA. On one hand, the results of the VP-SAFvR illustrate the utility of developing risk assessment instruments to match a broad definition of IPA that constitutes both physical violence and psychological abuse. On the other, the B-SAFER’s results demonstrate how a more careful and targeted approach to risk assessment can be used within a tiered policing intervention to assess high-risk female perpetrators’ risk of physical violence with reasonable accuracy.

The present study therefore underscores how these tools could serve different functions within a broader response to IPA. An actuarial instrument like the VP-SAFvR can be applied by frontline officers to assess the risk of future IPA on a large scale, rapidly categorising those who are low-risk for future incidents of physical or psychological abuse within the same dyad or family unit whilst referring high-risk cases for further assessment and service linkage. Therefore, it compliments a dual strategy of victim safety and perpetrator accountability in the context of a high-volume outcome. Conversely, the B-SAFER in the current study required additional resources (i.e. a specialist team and support from a clinician) so would be impractical for police officers to administer for every FVI call-out. Nevertheless, when applied to perpetrators already assessed as being at an elevated risk for IPA, the B-SAFER can provide a sound assessment of the risk of physical IPA to any victim. Hence, on the current results the B-SAFER may be more appropriate within a perpetrator accountability framework where there is an elevated risk of violence.

The poor predictive performance of the Modified Lethality Screen – a modified version of a tool developed on male samples for the prediction of severe/lethal violence – illustrates the pitfalls of predicting low base rate outcomes and suggests the need for further research to determine whether markers of severe IPA are consistent across men and women. This last point has been considered by other authors, who have highlighted how risk factors for IPA recidivism may differ between men and women (Henning et al., 2009), not only in content but the extent to which they are relevant to risk (De Vogel & Nicholls, 2016). It also reinforces the importance of validating and modifying risk assessment instruments for use across distinct populations.

On the other hand, this research highlights the utility of certain global factors for predicting IPA recidivism. That is, the VP-SAFvR and B-SAFER were designed for use with both sexes and have demonstrated here such tools can effectively predict differing forms of IPA involving women.

The results also echo those of another general tool, the DVSI-R, when predicting female to male IPV (AUC = .63, [.60, .66]) in a study by Gerstenberger et al. (2019). However, the DVSI-R possessed poor measures of discrimination when used to predict IPV among same-sex female couples (AUC = .56, [.47, .65]) despite same-sex females having significantly greater odds of re-arrest. Hence, the predictive utility of IPA risk factors might only generalise between men and women when the latter are abusive in heterosexual relationships. Furthermore, several non-gendered factors, such as items indicative of prior police contact for FV or health problems, were shown to meaningfully predict differing IPA outcomes within both the VP-SAFvR and B-SAFER samples. Nonetheless, the predictive ability of individual factors was not consistent across outcomes nor instruments and thereby directs further research to investigate whether and when specific factors predict future IPA, and to determine whether these differ across sexes.

The high rates of false positives observed among the severe IPA outcomes underline the importance of aligning police departments’ aims for risk assessment and the instrument’s rationale. The nature of low base rate outcomes means that instruments such as the Lethality Screen will inevitably misidentify a disproportionate number of individuals as high risk who will not go on to perpetrate severe and near lethal FV, even with further fine-tuning of their predictive and discriminant abilities (see Rosen, 1954; Trood et al., 2023). Indeed, the Lethality Screen was developed to ensure that people thought to be at increased risk of lethal/near lethal FV recidivism are connected to an advocate for safety planning and linkage to services, accepting that a large proportion of individuals who will not go on to perpetrate near lethal recidivism will also receive the intervention (Messing et al., 2017). It may not be appropriate for policing responses to perceived risk of lethal/near lethal violence to be directed with such imprecision, not only because of resource constraints but also because of the highly punitive and restrictive nature of many police interventions intended to prevent severe harm. The VP-SAFvR and B-SAFER results in the present study suggest that it may be possible to implement a more graduated police response to perpetrators with greater precision if a more general IPA outcome is assessed. This is consistent with the focussed deterrence strategy described by Sechrist and Weil (2018), though using a more complex method of categorising offenders than just police history of IPA. Of course, such a response would need to be in conjunction with social and health services where appropriate (Spivak et al., 2020).

In addition to using tools that consider the risks associated with IPA for both male and female perpetrators, the current study also supports the use of SPJ when assessing female IPA perpetration. Although the assessment of risk and identification of treatment needs among male and female offenders may be subject to bias (Coontz et al., 1994; Skeem et al., 2005), the results of the current study suggest police discretion in assessing the risk of female IPA perpetration may be particularly useful. Considering that the B-SAFER total scores only explained about 19% of the variance in the B-SAFER risk judgements, its likely the SPJ approach allowed police more flexibility in how they applied the risk factors and what risk judgements they made. This appears to be supported by existing research indicating that police consider additional factors alongside established risk markers when assessing women’s risk of IPA recidivism (Storey & Strand, 2012, 2013, 2017), suggesting opportunities to incorporate professional judgement may be particularly important in cases involving female perpetrators (see De Vogel & Nicholls, 2016). Interestingly, the current study found the B-SAFER case prioritisation judgement to be particularly useful in the prediction of future physical IPA. This might suggest that other factors external to the summative component of the tool may have contributed to police officers’ risk judgements or alternatively that they are giving stronger weight to certain items. Limited evidence from the broader literature indicates that risk judgements derived from SPJ instruments tend to outperform total scores from the same instruments in the prediction of general violence (see Guy et al., 2015). The present findings appear to support this trend and suggest professional judgements informed by scored instruments may be especially useful in the assessment of physical IPA in high-risk women.

We are hesitant to suggest any practical implications associated with the current results for the Lethality Screen, given the method by which it was administered in this study. However, we note that in the Australian context, the quite specific purpose of the Lethality Screen may mean that it is not suitable as a risk assessment instrument. Australian jurisdictions do not focus on physical violence when legally defining IPA, and in many jurisdictions police and legal responses relate to all types of FV, regardless of whether it involves physical abuse or an intimate partner” (p. 383-385).

Other Interesting Tidbits for Researchers and Clinicians

"Our results are limited by a low base rate severe IPA recidivism. This concern has been identified in forensic research on female populations more broadly (e.g. Helmus & Bourgon, 2011; Nicholls et al., 2013), and future research would ideally include measures with higher fidelity to recidivism base rates (e.g. multiple sources of administrative data and victim reports). This is an important point considering the socio-political context of IPA and female perpetration, and the factors that might influence identified recidivism rates. For instance, many IPA victims – especially men – do not report their victimisation experiences to the police (Archer, 2000; Cho & Wilke, 2010; Felson & Paré, 2005; MacQueen & Norris, 2016), often because of social stigma and gender stereotypes (Walker et al., 2018), which threatens the accuracy of female IPA perpetration and recidivism measurement when using formal data.

In a similar vein, women may not be as readily identified as perpetrators of IPA as men. Social constructions of gender and relationships perhaps skew the perceptions of police officers recording FVI cases, resulting in an ongoing male-perpetrator/female-victim bias (Henning et al., 2009; Walker et al., 2018) and the inaccurate recording of victims and perpetrators in ambiguous FVIs. During the collection of the VP-SAFvR and B-SAFER data, for example, police members described this being an issue when the perpetrator and victim roles were unclear, although it was not possible to control for or measure to what extent this occurred during the research period. Given there is evidence a significant proportion of couples also engage in bidirectional violence (i.e. where both parties are the perpetrator/victim, or have a history of swapping victim and perpetrator roles; Langhinrichsen- Rohling et al., 2012), police recording bias may be particularly problematic and result in low numbers of women accurately identified as perpetrators. Ergo, we recommend future research on frontline risk assessments additionally collect information that might reveal misidentification. For example, the instruments themselves could include a rating of the officer’s confidence they have identified the primary aggressor which may provide insights into when and how the misidentification of a perpetrator occurred.

The results of this study could equally be interpreted in the context of women being misidentified as perpetrators of abuse when they are the victims (Women’s Legal Service Victoria, 2015). While few studies have examined rates of misidentification at FVIs, the available evidence very tentatively suggests that up to 10% of female FV victims in Victoria are misidentified as respondents (i.e. aggressors) in police applications for Family Violence Intervention Orders (Ulbrick & Jago, 2018). If a large proportion of the samples examined here were misidentified as the perpetrator at index, then we might expect a drop in both the predictive and discriminant abilities of the instruments examined. Yet the VP-SAFvR and B-SAFER risk judgements in the current research produced reasonably strong measures of discriminant and predictive ability on measures of general and physical IPA recidivism among women respectively. Further, the VP-SAFvR’s discriminant and predictive ability parallels the same instrument’s performance among the broader population where most perpetrators are male (Spivak et al., 2023). Hence, the VP-SAFvR and B-SAFER appear to have withstood any possible drop in performance associated with misidentification of women as perpetrators and might only improve alongside a reduction in this phenomenon. Overall, the conclusions of this research are preliminary and require replication in larger samples of women who perpetrate IPA using a combination of police data and other measures.

Lastly, the present study was limited by its comparison of three different instruments on three different samples. Due to data availability, the Modified Lethality Screen could not be applied to every member of the VP-SAFvR sample while the B-SAFER was administered by police in a subset assessed as being at a higher recidivism risk. Ideally, all three instruments would be compared on the same sample of cases to remove the influence of sampling variation. This would also allow for a sharper contrast of the performance of included instruments against their differing aims. This limitation is a consequence of the field environment of this research, which also comes with considerable strengths, not least indicating the feasibility of use in practice of the VP-SAFvR and B-SAFER” (p. 385-386).