Annual Crime and Justice Forum webinar 23 February 2022 - workshop 2A
1. Workshop 2A:
Using natural language
processing tools in crime
statistics
Welcome
Billy Gazard
Centre for Crime and Justice
Office for National Statistics
@ONSfocus #ONSCrimeJustice
2. Agenda
14:15 to 14:20 – Welcome, Billy Gazard, Centre for Crime and Justice, ONS
14:20 to 14:35 – Kevin Smith – Crime Analysis Unit, Home Office
14:35 to 14:50 – Dr Angus Roberts and Dr Giouliana Kadra – Kings College London
14:50 to 15:15 – Discussant Billy Gazard, Centre for Crime and Justice, ONS
@ONSfocus #ONSCrimeJustice
3. The National Data Quality Improvement Service - NDQIS
OFFICIAL SENSITIVE
4. Police recorded crime and ‘flags’ – background
The main police recorded crime collection is offence-based. The Home Office are notified by police
forces how many crimes were recorded by offence each month. For example, in month X there were 15
personal robbery offences; 2 attempted murders etc.
In order to get more information about these offences, the Home Office also collects supplementary
information via so-called ‘flagged’ collections – where the police add information to a crime record.
05/04/2022 OFFICIAL SENSITIVE
4
‘Flagged’ collections
However, these collections tend to be reliant
on an officer or police staff correctly tagging
an offence with the appropriate marker.
We know this doesn’t always happen, and
that how well these flags are applied varies
between police forces.
So flagged collections tend to be an
undercount of the true picture.
Offences involving knives
Hate crime Domestic abuse
Metal theft
Online crime
Offences involving corrosive substances
So-called honour
based abuse
Child sexual
abuse /
exploitation
5. So what is NDQIS?
The National Data Quality Improvement Service – NDQIS – is a tool to improve the quality of these
flagged data by removing the reliance on police staff manually adding the flags.
There are three broad aims of NDQIS:
• To improve data quality
• To increase comparability of data between forces
• To reduce burden on the forces
This project is simply about improving police recorded crime data quality.
NDQIS DOES NOT tell forces what crimes they should be investigating / how they should be
investigated, or what outcome should be assigned to an offence.
05/04/2022 OFFICIAL SENSITIVE
5
6. How does NDQIS work?
The NDQIS software looks at fields held within the Record Management System (RMS) in each force,
such as MO Text and Occurrence Summary.
Other fields are also examined – depending on the collection.
Crime records are examined using semantic analysis – using a bespoke data dictionary for the
collection, and using the information from extra fields.
Each crime record is then assigned a confidence rating:
• High Confidence – NDQIS determines that the record meets the criteria (i.e. involved a knife)
• Low Confidence – NDQIS cannot determine for sure whether the record meets criteria – so record
needs to be manually reviewed / left as original decision in the force (depending on collection).
• Rejected – NDQIS determines that criteria not met (i.e. did not involve a knife).
Once data are processed, data are supplied to the Home Office.
05/04/2022 OFFICIAL SENSITIVE
6
7. NDQIS – offences involving knives or sharp instruments
The NDQIS concept was first tested on the offences involving knives or sharp instruments
collection (knife-enabled crime) – a high profile data collection and of key ministerial interest.
A data dictionary was created for the collection and tested on synthetic and then real data from forces.
Home Office statisticians audited thousands of records to ensure NDQIS was working. After the second
iteration of the ruleset, for 12 forces:
• We agreed with the High Confidence classification 98% of the time
• We agreed with the Rejected classification 99% of the time
• Around a quarter of records were assigned to Low Confidence – for manual review. Of these, 51%
were knife crimes and 49% were not.
Home Office, police forces, the National Police Chiefs’ Council and Office for National Statistics agreed
to proceed to implementation.
05/04/2022 OFFICIAL SENSITIVE
7
8. NDQIS knife-enabled crime results
05/04/2022
OFFICIAL SENSITIVE
8
NDQIS data published for 33 forces in January 2022.
These forces account for 9 in 10 knife offences recorded.
Nationally, offences were 11% higher on the new method / guidance
for 2019/20.
Increases due to new methodology identifying more knife crimes
offset by the changes in coverage.
HOWEVER – at the force level, some big changes in 2019/20 levels:
West Midlands 46% higher
South Yorkshire 66% higher
Metropolitan 8% lower
West Midlands South Yorkshire Metropolitan
England and Wales
0
1,000
2,000
3,000
4,000
5,000
6,000
NDQIS Old Method
0
500
1,000
1,500
2,000
NDQIS Old Method
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
16,000
18,000
NDQIS Old Method
0
10,000
20,000
30,000
40,000
50,000
60,000
2010/11 2011/12 2012/13 2013/14 2014/15 2015/16 2016/17 2017/18 2018/19 2019/20
Old method NDQIS method
9. NDQIS – offences involving knives or
sharp instruments
05/04/2022 OFFICIAL SENSITIVE
9
Impact can be seen on the rates per population.
West Midlands now the PFA with the highest rate per
population.
South Yorkshire moved from 13th to 4th (next to West
Yorkshire Police).
Kent the biggest mover – from 35th to 12th.
Lancashire has fallen from 11th to 21st.
Metropolitan Police 179 West Midlands 172
Greater Manchester 131 Metropolitan Police 165
West Midlands 118 Cleveland 125
West Yorkshire 104 South Yorkshire 115
Merseyside 100 Greater Manchester 113
Bedfordshire 97 Merseyside 109
Cleveland 92 West Yorkshire 109
Northamptonshire 85 Essex 98
Derbyshire 83 Bedfordshire 97
Leicestershire 81 Northamptonshire 96
Lancashire 76 Humberside 94
Sussex 73 Kent 88
South Yorkshire 69 Cambridgeshire 86
Humberside 69 Derbyshire 81
Nottinghamshire 67 Leicestershire 78
South Wales 66 Avon and Somerset 78
Cambridgeshire 66 Nottinghamshire 77
Thames Valley 66 Hertfordshire 77
Hertfordshire 62 Sussex 67
Essex 61 Hampshire 67
Warwickshire 61 Lancashire 65
West Mercia 57 Thames Valley 65
Avon and Somerset 56 Warwickshire 61
Northumbria 55 Cheshire 61
Staffordshire 53 South Wales 61
Gloucestershire 51 Norfolk 59
Norfolk 49 Suffolk 57
Lincolnshire 46 West Mercia 57
Suffolk 43 Lincolnshire 56
Cumbria 42 North Wales 55
Dyfed-Powys 41 Northumbria 55
North Yorkshire 40 Staffordshire 53
Wiltshire 40 Durham 52
North Wales 40 Gloucestershire 51
Kent 39 Devon and Cornwall 44
Hampshire 38 Surrey 42
Surrey 38 Cumbria 41
Cheshire 38 North Yorkshire 40
Gwent 37 Wiltshire 40
Dorset 35 Gwent 37
Devon and Cornwall 29 Dorset 35
Durham 26 Dyfed-Powys 34
Pre-NDQIS Post-NDQIS
10. NDQIS – Next steps
We’re now testing NDQIS on three further “flagged” collections:
• Domestic abuse
• Child sexual abuse / exploitation
• Hate crime
Different fields required for different collections. Domestic abuse also requires age of victim / offender
and relationship between victim and offender.
Issues raised by forces around safeguarding / investigation of domestic abuse / child sexual abuse
offences.
These collections discussed at NDQIS Steering Group yesterday.
05/04/2022 OFFICIAL SENSITIVE
10
11. Using natural language processing to extract and
classify instances of interpersonal violence in
mental healthcare electronic records
Dr Giouliana Kadra-Scalzo and Dr Angus Roberts
Institute of Psychiatry, Psychology and Neuroscience, King’s College London
12. Background
• People with mental illness are more likely to experience violent
victimisation compared to the general population- 15–45% of female
patients report experiences of victimization in the past year, and 40–
90% reporting lifetime victimization.
• Similar patterns have been observed for domestic violence, sexual
violence, violence perpetration, and witnessing violence.
• Potential for electronic health records kept by mental health services
13. King’s College London (KCL)
Coverage – Lambeth, Southwark, Lewisham, Croydon
Base population – c.1.4m
Records- since 2007 (updated every 24hrs)
EHRs- c. 500,000
Approvals: Oxford Research Ethics Committee C
(reference 08/H606/71+5)
Clinical services – specialist MH Trust
• CAMHS
• General adult psychiatry
• Older adult services
• Learning difficulties
• Addictions
• National
• IAPT
• Forensic
Croydon
Lewisham
Lambeth
Southwark
.
South London and Maudsley (SLAM)
SLAM Biomedical Research Centre (BRC)
15. Method
Keywords
Nouns
% abus%
% attack%
% beat%
% violenc%
% hit%
% rape%
% assault%
Verbs:
% fight%
% fought%
% slap%
% chok%
% push%
% punch%
% strangul%
% strangl%
% threw%
% struck%
Figure 1. The process of annotation, development and evaluation of NLP models
16. Method
Example of text fragment Label Annotation
“They were abused in their childhood” Violence presence, victim Affirmed
“Patient used to hit her partner” Violence presence, perpetrator; physical, domestic Affirmed
“Patient stabbed his roommate” Violence presence, perpetrator; physical, domestic Affirmed
“Expressed a lot of interest in violence, nazism” Violence presence Irrelevant
“No violence or aggression noted” Violence presence Negated
Possible labels:
violence presence- affirmed, negated or irrelevant
patient status- victim, perpetrator, and/or witness
violence type- domestic, physical, sexual
Table 1: Example of text fragments extracted for annotation in this study, alongside corresponding labels
and assigned annotations. Step 1
Step 2 Step 3
17. NLP model development
• We used a pre-trained BioBERT model and fine-tuned it on the annotated dataset. Each set was
generated independently (ensuring no overlap). Three datasets were used for:
1) model testing and training (development stage, 3771 sentences);
2) model fine-tuning (1411 sentences);
3) model blind testing (100 newly annotated sentences not used for model training or fine-tuning was also assessed).
• Our aim was to produce seven binary classification models for each annotated labels.
• We evaluated the models with 10-fold cross validation, comprising 10% annotated text extracts
for testing, and 90% text extracts in training, in each fold.
• Standard markers of NLP performance: precision (or positive predictive value), recall (or
sensitivity) and F1 score (the harmonic mean of precision and recall), using weighted averages to
take into account the dataset’s imbalance (i.e. differing numbers of extracts generated for each
keyword). Reported scores corresponded to the mean across the 10 test sets.
Method
18. Results
Table 2. NLP model performances on the training and testing dataset (3,771 text extracts); and on
blind test set with a 90% probability threshold (100 sentences) for the six labels.
• Inter-annotator agreement was high: 82-96% (60-85% Cohen’s kappa) for the six annotation
labels.
• For one annotation label (witness) we were unable to generate a model due to insufficient
data size.
Training set (average score on 10-fold cross-
validation)
Blind test set
Annotation label Precision Recall F1-score F1-score
Violence presence 93% 93% 93% 95%
Patient status: Perpetrator 89% 89% 89% 85%
Patient status: Victim 91% 89% 91% 90%
Violence type: Domestic 94% 94% 94% 93%
Violence type: Physical 91% 92% 91% 98%
Violence type: Sexual 98% 97% 97% 93%
20. Stage 5: Evaluation and
cost-benefit
Stage 4: Connections and
causal pathways
Stage 3: Integrate and
link data
Stage 2: Improve
measurement
Stage 1: Coordination and
theory of change
Health
Universities:
Bristol
City
KCL
Data providers:
Justice
Universities:
City
Lancaster
UCL
Data providers:
Specialised services
Universities:
City
Warwick
Data providers:
Inequalities
Universities:
City
Lancaster
Data providers:
Integrated theory
and data
Universities:
Bristol
City
Lancaster
UCL
Warwick
Data providers:
ONS (CSEW)
CRIS
PHW
NHSD
(APMS)
Police
MoJ: PNC
Solicitors
NCDV
Justice,
Health,
Specialised
services
Inequalities
Rape Crisis
Women’s Aid
Refuge
Safe Lives
Respect
Imkaan
UNODC
WB
OECD
WHO
UK BHPS
Users and advisors
Imkaan
Inquest
Council of Europe
Lancashire Const.
MoJ
ONS
MHCLG
Home Office
DHSC
PHE
VAMHNW
Monash GREVIO
Respect
Mind Samaritans Women’s
Budget
Group
Anti-slavery
Commission
er’s Office
National
Police Chiefs
Council
Health Services
UK government
Justice
Third sector
International
Future work
VISION – violence, health and society
£7.1m
5 years
Project lead
(Walby)
NLP
expertise
22. Key References
• Khalifeh H, Moran P, Borschmann R, et al. Domestic and sexual violence against patients with severe mental illness. Psychol Med 2015;45:875–86.
doi:10.1017/S0033291714001962
• Khalifeh H, Johnson S, Howard LM, et al. Violent and non-violent crime against adults with severe mental illness. British Journal of Psychiatry
2015;206:275–82. doi:10.1192/bjp.bp.114.147843
• Fazel S, Gulati G, Linsell L, et al. Schizophrenia and violence: Systematic review and meta-analysis. PLoS Medicine 2009;6.
doi:10.1371/journal.pmed.1000120
• Kadra G, Dean K, Hotopf M, et al. Investigating Exposure to Violence and Mental Health in a Diverse Urban Community Sample: Data from the South
East London Community Health (SELCoH) Survey. PLOS ONE 2014;9:e93660. doi:10.1371/journal.pone.0093660
• Lee J, Yoon W, Kim S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics
2020;36:1234–40. doi:10.1093/bioinformatics/btz682
• Mascio A, Kraljevic Z, Bean D, et al. Comparative Analysis of Text Classification Approaches in Electronic Health Records. arXiv:200506624 [cs]
Published Online First: 8 May 2020.http://arxiv.org/abs/2005.06624 (accessed 18 Jan 2021).
• Jackson RG, Patel R, Jayatilleke N, et al. Natural language processing to extract symptoms of severe mental illness from clinical text: the Clinical Record
Interactive Search Comprehensive Data Extraction (CRIS-CODE) project. BMJ Open 2017;7:e012012. doi:10.1136/bmjopen-2016-012012
• Perera G, Broadbent M, Callard F, et al. Cohort profile of the South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLaM
BRC) Case Register: current status and recent enhancement of an Electronic Mental Health Record-derived data resource. BMJ Open 2016;6:e008721.
doi:10.1136/bmjopen-2015-008721
• Clinical Record Interactive Search (CRIS). https://www.slam.nhs.uk/quality-and-research/clinical-record-interactive-search-cris/ (accessed 2 Feb 2021).