SlideShare a Scribd company logo
1 of 22
Workshop 2A:
Using natural language
processing tools in crime
statistics
Welcome
Billy Gazard
Centre for Crime and Justice
Office for National Statistics
@ONSfocus #ONSCrimeJustice
Agenda
14:15 to 14:20 – Welcome, Billy Gazard, Centre for Crime and Justice, ONS
14:20 to 14:35 – Kevin Smith – Crime Analysis Unit, Home Office
14:35 to 14:50 – Dr Angus Roberts and Dr Giouliana Kadra – Kings College London
14:50 to 15:15 – Discussant Billy Gazard, Centre for Crime and Justice, ONS
@ONSfocus #ONSCrimeJustice
The National Data Quality Improvement Service - NDQIS
OFFICIAL SENSITIVE
Police recorded crime and ‘flags’ – background
The main police recorded crime collection is offence-based. The Home Office are notified by police
forces how many crimes were recorded by offence each month. For example, in month X there were 15
personal robbery offences; 2 attempted murders etc.
In order to get more information about these offences, the Home Office also collects supplementary
information via so-called ‘flagged’ collections – where the police add information to a crime record.
05/04/2022 OFFICIAL SENSITIVE
4
‘Flagged’ collections
However, these collections tend to be reliant
on an officer or police staff correctly tagging
an offence with the appropriate marker.
We know this doesn’t always happen, and
that how well these flags are applied varies
between police forces.
So flagged collections tend to be an
undercount of the true picture.
Offences involving knives
Hate crime Domestic abuse
Metal theft
Online crime
Offences involving corrosive substances
So-called honour
based abuse
Child sexual
abuse /
exploitation
So what is NDQIS?
The National Data Quality Improvement Service – NDQIS – is a tool to improve the quality of these
flagged data by removing the reliance on police staff manually adding the flags.
There are three broad aims of NDQIS:
• To improve data quality
• To increase comparability of data between forces
• To reduce burden on the forces
This project is simply about improving police recorded crime data quality.
NDQIS DOES NOT tell forces what crimes they should be investigating / how they should be
investigated, or what outcome should be assigned to an offence.
05/04/2022 OFFICIAL SENSITIVE
5
How does NDQIS work?
The NDQIS software looks at fields held within the Record Management System (RMS) in each force,
such as MO Text and Occurrence Summary.
Other fields are also examined – depending on the collection.
Crime records are examined using semantic analysis – using a bespoke data dictionary for the
collection, and using the information from extra fields.
Each crime record is then assigned a confidence rating:
• High Confidence – NDQIS determines that the record meets the criteria (i.e. involved a knife)
• Low Confidence – NDQIS cannot determine for sure whether the record meets criteria – so record
needs to be manually reviewed / left as original decision in the force (depending on collection).
• Rejected – NDQIS determines that criteria not met (i.e. did not involve a knife).
Once data are processed, data are supplied to the Home Office.
05/04/2022 OFFICIAL SENSITIVE
6
NDQIS – offences involving knives or sharp instruments
The NDQIS concept was first tested on the offences involving knives or sharp instruments
collection (knife-enabled crime) – a high profile data collection and of key ministerial interest.
A data dictionary was created for the collection and tested on synthetic and then real data from forces.
Home Office statisticians audited thousands of records to ensure NDQIS was working. After the second
iteration of the ruleset, for 12 forces:
• We agreed with the High Confidence classification 98% of the time
• We agreed with the Rejected classification 99% of the time
• Around a quarter of records were assigned to Low Confidence – for manual review. Of these, 51%
were knife crimes and 49% were not.
Home Office, police forces, the National Police Chiefs’ Council and Office for National Statistics agreed
to proceed to implementation.
05/04/2022 OFFICIAL SENSITIVE
7
NDQIS knife-enabled crime results
05/04/2022
OFFICIAL SENSITIVE
8
NDQIS data published for 33 forces in January 2022.
These forces account for 9 in 10 knife offences recorded.
Nationally, offences were 11% higher on the new method / guidance
for 2019/20.
Increases due to new methodology identifying more knife crimes
offset by the changes in coverage.
HOWEVER – at the force level, some big changes in 2019/20 levels:
West Midlands 46% higher
South Yorkshire 66% higher
Metropolitan 8% lower
West Midlands South Yorkshire Metropolitan
England and Wales
0
1,000
2,000
3,000
4,000
5,000
6,000
NDQIS Old Method
0
500
1,000
1,500
2,000
NDQIS Old Method
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
16,000
18,000
NDQIS Old Method
0
10,000
20,000
30,000
40,000
50,000
60,000
2010/11 2011/12 2012/13 2013/14 2014/15 2015/16 2016/17 2017/18 2018/19 2019/20
Old method NDQIS method
NDQIS – offences involving knives or
sharp instruments
05/04/2022 OFFICIAL SENSITIVE
9
Impact can be seen on the rates per population.
West Midlands now the PFA with the highest rate per
population.
South Yorkshire moved from 13th to 4th (next to West
Yorkshire Police).
Kent the biggest mover – from 35th to 12th.
Lancashire has fallen from 11th to 21st.
Metropolitan Police 179 West Midlands 172
Greater Manchester 131 Metropolitan Police 165
West Midlands 118 Cleveland 125
West Yorkshire 104 South Yorkshire 115
Merseyside 100 Greater Manchester 113
Bedfordshire 97 Merseyside 109
Cleveland 92 West Yorkshire 109
Northamptonshire 85 Essex 98
Derbyshire 83 Bedfordshire 97
Leicestershire 81 Northamptonshire 96
Lancashire 76 Humberside 94
Sussex 73 Kent 88
South Yorkshire 69 Cambridgeshire 86
Humberside 69 Derbyshire 81
Nottinghamshire 67 Leicestershire 78
South Wales 66 Avon and Somerset 78
Cambridgeshire 66 Nottinghamshire 77
Thames Valley 66 Hertfordshire 77
Hertfordshire 62 Sussex 67
Essex 61 Hampshire 67
Warwickshire 61 Lancashire 65
West Mercia 57 Thames Valley 65
Avon and Somerset 56 Warwickshire 61
Northumbria 55 Cheshire 61
Staffordshire 53 South Wales 61
Gloucestershire 51 Norfolk 59
Norfolk 49 Suffolk 57
Lincolnshire 46 West Mercia 57
Suffolk 43 Lincolnshire 56
Cumbria 42 North Wales 55
Dyfed-Powys 41 Northumbria 55
North Yorkshire 40 Staffordshire 53
Wiltshire 40 Durham 52
North Wales 40 Gloucestershire 51
Kent 39 Devon and Cornwall 44
Hampshire 38 Surrey 42
Surrey 38 Cumbria 41
Cheshire 38 North Yorkshire 40
Gwent 37 Wiltshire 40
Dorset 35 Gwent 37
Devon and Cornwall 29 Dorset 35
Durham 26 Dyfed-Powys 34
Pre-NDQIS Post-NDQIS
NDQIS – Next steps
We’re now testing NDQIS on three further “flagged” collections:
• Domestic abuse
• Child sexual abuse / exploitation
• Hate crime
Different fields required for different collections. Domestic abuse also requires age of victim / offender
and relationship between victim and offender.
Issues raised by forces around safeguarding / investigation of domestic abuse / child sexual abuse
offences.
These collections discussed at NDQIS Steering Group yesterday.
05/04/2022 OFFICIAL SENSITIVE
10
Using natural language processing to extract and
classify instances of interpersonal violence in
mental healthcare electronic records
Dr Giouliana Kadra-Scalzo and Dr Angus Roberts
Institute of Psychiatry, Psychology and Neuroscience, King’s College London
Background
• People with mental illness are more likely to experience violent
victimisation compared to the general population- 15–45% of female
patients report experiences of victimization in the past year, and 40–
90% reporting lifetime victimization.
• Similar patterns have been observed for domestic violence, sexual
violence, violence perpetration, and witnessing violence.
• Potential for electronic health records kept by mental health services
King’s College London (KCL)
Coverage – Lambeth, Southwark, Lewisham, Croydon
Base population – c.1.4m
Records- since 2007 (updated every 24hrs)
EHRs- c. 500,000
Approvals: Oxford Research Ethics Committee C
(reference 08/H606/71+5)
Clinical services – specialist MH Trust
• CAMHS
• General adult psychiatry
• Older adult services
• Learning difficulties
• Addictions
• National
• IAPT
• Forensic
Croydon
Lewisham
Lambeth
Southwark
.
South London and Maudsley (SLAM)
SLAM Biomedical Research Centre (BRC)
CRIS data
Structured
• Ethnicity
• Diagnosis
• HoNOS
Free-text
• Inpatient progress
notes
• Clinical
summaries
NLP
• Smoking
• Antipsychotic
medication
• Psychotherapy
Method
Keywords
Nouns
% abus%
% attack%
% beat%
% violenc%
% hit%
% rape%
% assault%
Verbs:
% fight%
% fought%
% slap%
% chok%
% push%
% punch%
% strangul%
% strangl%
% threw%
% struck%
Figure 1. The process of annotation, development and evaluation of NLP models
Method
Example of text fragment Label Annotation
“They were abused in their childhood” Violence presence, victim Affirmed
“Patient used to hit her partner” Violence presence, perpetrator; physical, domestic Affirmed
“Patient stabbed his roommate” Violence presence, perpetrator; physical, domestic Affirmed
“Expressed a lot of interest in violence, nazism” Violence presence Irrelevant
“No violence or aggression noted” Violence presence Negated
Possible labels:
violence presence- affirmed, negated or irrelevant
patient status- victim, perpetrator, and/or witness
violence type- domestic, physical, sexual
Table 1: Example of text fragments extracted for annotation in this study, alongside corresponding labels
and assigned annotations. Step 1
Step 2 Step 3
NLP model development
• We used a pre-trained BioBERT model and fine-tuned it on the annotated dataset. Each set was
generated independently (ensuring no overlap). Three datasets were used for:
1) model testing and training (development stage, 3771 sentences);
2) model fine-tuning (1411 sentences);
3) model blind testing (100 newly annotated sentences not used for model training or fine-tuning was also assessed).
• Our aim was to produce seven binary classification models for each annotated labels.
• We evaluated the models with 10-fold cross validation, comprising 10% annotated text extracts
for testing, and 90% text extracts in training, in each fold.
• Standard markers of NLP performance: precision (or positive predictive value), recall (or
sensitivity) and F1 score (the harmonic mean of precision and recall), using weighted averages to
take into account the dataset’s imbalance (i.e. differing numbers of extracts generated for each
keyword). Reported scores corresponded to the mean across the 10 test sets.
Method
Results
Table 2. NLP model performances on the training and testing dataset (3,771 text extracts); and on
blind test set with a 90% probability threshold (100 sentences) for the six labels.
• Inter-annotator agreement was high: 82-96% (60-85% Cohen’s kappa) for the six annotation
labels.
• For one annotation label (witness) we were unable to generate a model due to insufficient
data size.
Training set (average score on 10-fold cross-
validation)
Blind test set
Annotation label Precision Recall F1-score F1-score
Violence presence 93% 93% 93% 95%
Patient status: Perpetrator 89% 89% 89% 85%
Patient status: Victim 91% 89% 91% 90%
Violence type: Domestic 94% 94% 94% 93%
Violence type: Physical 91% 92% 91% 98%
Violence type: Sexual 98% 97% 97% 93%
Reflections
Strengths Limitations Implications
Stage 5: Evaluation and
cost-benefit
Stage 4: Connections and
causal pathways
Stage 3: Integrate and
link data
Stage 2: Improve
measurement
Stage 1: Coordination and
theory of change
Health
Universities:
Bristol
City
KCL
Data providers:
Justice
Universities:
City
Lancaster
UCL
Data providers:
Specialised services
Universities:
City
Warwick
Data providers:
Inequalities
Universities:
City
Lancaster
Data providers:
Integrated theory
and data
Universities:
Bristol
City
Lancaster
UCL
Warwick
Data providers:
ONS (CSEW)
CRIS
PHW
NHSD
(APMS)
Police
MoJ: PNC
Solicitors
NCDV
Justice,
Health,
Specialised
services
Inequalities
Rape Crisis
Women’s Aid
Refuge
Safe Lives
Respect
Imkaan
UNODC
WB
OECD
WHO
UK BHPS
Users and advisors
Imkaan
Inquest
Council of Europe
Lancashire Const.
MoJ
ONS
MHCLG
Home Office
DHSC
PHE
VAMHNW
Monash GREVIO
Respect
Mind Samaritans Women’s
Budget
Group
Anti-slavery
Commission
er’s Office
National
Police Chiefs
Council
Health Services
UK government
Justice
Third sector
International
Future work
VISION – violence, health and society
£7.1m
5 years
Project lead
(Walby)
NLP
expertise
Thank you
giouliana.kadra@kcl.ac.uk
angus.roberts@kcl.ac.uk
Key References
• Khalifeh H, Moran P, Borschmann R, et al. Domestic and sexual violence against patients with severe mental illness. Psychol Med 2015;45:875–86.
doi:10.1017/S0033291714001962
• Khalifeh H, Johnson S, Howard LM, et al. Violent and non-violent crime against adults with severe mental illness. British Journal of Psychiatry
2015;206:275–82. doi:10.1192/bjp.bp.114.147843
• Fazel S, Gulati G, Linsell L, et al. Schizophrenia and violence: Systematic review and meta-analysis. PLoS Medicine 2009;6.
doi:10.1371/journal.pmed.1000120
• Kadra G, Dean K, Hotopf M, et al. Investigating Exposure to Violence and Mental Health in a Diverse Urban Community Sample: Data from the South
East London Community Health (SELCoH) Survey. PLOS ONE 2014;9:e93660. doi:10.1371/journal.pone.0093660
• Lee J, Yoon W, Kim S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics
2020;36:1234–40. doi:10.1093/bioinformatics/btz682
• Mascio A, Kraljevic Z, Bean D, et al. Comparative Analysis of Text Classification Approaches in Electronic Health Records. arXiv:200506624 [cs]
Published Online First: 8 May 2020.http://arxiv.org/abs/2005.06624 (accessed 18 Jan 2021).
• Jackson RG, Patel R, Jayatilleke N, et al. Natural language processing to extract symptoms of severe mental illness from clinical text: the Clinical Record
Interactive Search Comprehensive Data Extraction (CRIS-CODE) project. BMJ Open 2017;7:e012012. doi:10.1136/bmjopen-2016-012012
• Perera G, Broadbent M, Callard F, et al. Cohort profile of the South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLaM
BRC) Case Register: current status and recent enhancement of an Electronic Mental Health Record-derived data resource. BMJ Open 2016;6:e008721.
doi:10.1136/bmjopen-2015-008721
• Clinical Record Interactive Search (CRIS). https://www.slam.nhs.uk/quality-and-research/clinical-record-interactive-search-cris/ (accessed 2 Feb 2021).

More Related Content

Similar to Annual Crime and Justice Forum webinar 23 February 2022 - workshop 2A

Presentation 4 consult, c insight and comm-safety
Presentation 4   consult, c insight and comm-safetyPresentation 4   consult, c insight and comm-safety
Presentation 4 consult, c insight and comm-safetyCambridgeshireInsight
 
SlideShare Annual crime and justice statistics forum 2023.pptx
SlideShare Annual crime and justice statistics forum 2023.pptxSlideShare Annual crime and justice statistics forum 2023.pptx
SlideShare Annual crime and justice statistics forum 2023.pptxMattBaker737276
 
SlideShare Annual crime and justice statistics forum 2023.pptx
SlideShare Annual crime and justice statistics forum 2023.pptxSlideShare Annual crime and justice statistics forum 2023.pptx
SlideShare Annual crime and justice statistics forum 2023.pptxOffice for National Statistics
 
Litigation and inquest forum, Nottingham - September 2016
Litigation and inquest forum, Nottingham - September 2016Litigation and inquest forum, Nottingham - September 2016
Litigation and inquest forum, Nottingham - September 2016Browne Jacobson LLP
 
Litigation and inquest forum, Birmingham - September 2016
Litigation and inquest forum, Birmingham - September 2016Litigation and inquest forum, Birmingham - September 2016
Litigation and inquest forum, Birmingham - September 2016Browne Jacobson LLP
 
Ethics in Criminal Data Collection
Ethics in Criminal Data CollectionEthics in Criminal Data Collection
Ethics in Criminal Data CollectionAyodele Odubela
 
Using NI crime datasets in teaching and research - Richard Erskine
Using NI crime datasets in teaching and research - Richard Erskine Using NI crime datasets in teaching and research - Richard Erskine
Using NI crime datasets in teaching and research - Richard Erskine The Higher Education Academy
 
0. Lecture Series on PNP MC 2018-050.pdf
0. Lecture Series on PNP MC 2018-050.pdf0. Lecture Series on PNP MC 2018-050.pdf
0. Lecture Series on PNP MC 2018-050.pdfRaymondCardozaEndico
 
MVPD Yearly Department Report 2020
MVPD Yearly Department Report 2020MVPD Yearly Department Report 2020
MVPD Yearly Department Report 2020MastroTek
 
Litigation and inquest forum, Exeter - September 2016
Litigation and inquest forum, Exeter - September 2016Litigation and inquest forum, Exeter - September 2016
Litigation and inquest forum, Exeter - September 2016Browne Jacobson LLP
 
Diversion First Stakeholders Group: Jan. 29, 2018
Diversion First Stakeholders Group: Jan. 29, 2018 Diversion First Stakeholders Group: Jan. 29, 2018
Diversion First Stakeholders Group: Jan. 29, 2018 Fairfax County
 
Buncombe County Domestic Violence Comprehensive Plan
Buncombe County Domestic Violence Comprehensive PlanBuncombe County Domestic Violence Comprehensive Plan
Buncombe County Domestic Violence Comprehensive PlanGordon Smith
 
Claims club 2015-16, London and Birmingham
Claims club 2015-16, London and BirminghamClaims club 2015-16, London and Birmingham
Claims club 2015-16, London and BirminghamBrowne Jacobson LLP
 
Current imprisonment rates, future forecasts and security issues
Current imprisonment rates, future forecasts and security issuesCurrent imprisonment rates, future forecasts and security issues
Current imprisonment rates, future forecasts and security issuesPaul Colbert
 
Shawnee County Stepping Up 4.28.21
Shawnee County Stepping Up 4.28.21Shawnee County Stepping Up 4.28.21
Shawnee County Stepping Up 4.28.21Keri
 
Ch06 Measuring Crime
Ch06 Measuring CrimeCh06 Measuring Crime
Ch06 Measuring Crimeyxl007
 
Final - ANL Research report
Final - ANL Research reportFinal - ANL Research report
Final - ANL Research reportLinsey Rouse
 
Is the CPS successful?
Is the CPS successful?Is the CPS successful?
Is the CPS successful?Katie B
 

Similar to Annual Crime and Justice Forum webinar 23 February 2022 - workshop 2A (20)

Presentation 4 consult, c insight and comm-safety
Presentation 4   consult, c insight and comm-safetyPresentation 4   consult, c insight and comm-safety
Presentation 4 consult, c insight and comm-safety
 
SlideShare Annual crime and justice statistics forum 2023.pptx
SlideShare Annual crime and justice statistics forum 2023.pptxSlideShare Annual crime and justice statistics forum 2023.pptx
SlideShare Annual crime and justice statistics forum 2023.pptx
 
SlideShare Annual crime and justice statistics forum 2023.pptx
SlideShare Annual crime and justice statistics forum 2023.pptxSlideShare Annual crime and justice statistics forum 2023.pptx
SlideShare Annual crime and justice statistics forum 2023.pptx
 
Litigation and inquest forum, Nottingham - September 2016
Litigation and inquest forum, Nottingham - September 2016Litigation and inquest forum, Nottingham - September 2016
Litigation and inquest forum, Nottingham - September 2016
 
Litigation and inquest forum, Birmingham - September 2016
Litigation and inquest forum, Birmingham - September 2016Litigation and inquest forum, Birmingham - September 2016
Litigation and inquest forum, Birmingham - September 2016
 
Ethics in Criminal Data Collection
Ethics in Criminal Data CollectionEthics in Criminal Data Collection
Ethics in Criminal Data Collection
 
Using NI crime datasets in teaching and research - Richard Erskine
Using NI crime datasets in teaching and research - Richard Erskine Using NI crime datasets in teaching and research - Richard Erskine
Using NI crime datasets in teaching and research - Richard Erskine
 
BeNCH Offenders Study
BeNCH Offenders StudyBeNCH Offenders Study
BeNCH Offenders Study
 
0. Lecture Series on PNP MC 2018-050.pdf
0. Lecture Series on PNP MC 2018-050.pdf0. Lecture Series on PNP MC 2018-050.pdf
0. Lecture Series on PNP MC 2018-050.pdf
 
MVPD Yearly Department Report 2020
MVPD Yearly Department Report 2020MVPD Yearly Department Report 2020
MVPD Yearly Department Report 2020
 
Litigation and inquest forum, Exeter - September 2016
Litigation and inquest forum, Exeter - September 2016Litigation and inquest forum, Exeter - September 2016
Litigation and inquest forum, Exeter - September 2016
 
Diversion First Stakeholders Group: Jan. 29, 2018
Diversion First Stakeholders Group: Jan. 29, 2018 Diversion First Stakeholders Group: Jan. 29, 2018
Diversion First Stakeholders Group: Jan. 29, 2018
 
Session 2 - Carmelita Ericta (Philippines)
Session 2 - Carmelita Ericta (Philippines)Session 2 - Carmelita Ericta (Philippines)
Session 2 - Carmelita Ericta (Philippines)
 
Buncombe County Domestic Violence Comprehensive Plan
Buncombe County Domestic Violence Comprehensive PlanBuncombe County Domestic Violence Comprehensive Plan
Buncombe County Domestic Violence Comprehensive Plan
 
Claims club 2015-16, London and Birmingham
Claims club 2015-16, London and BirminghamClaims club 2015-16, London and Birmingham
Claims club 2015-16, London and Birmingham
 
Current imprisonment rates, future forecasts and security issues
Current imprisonment rates, future forecasts and security issuesCurrent imprisonment rates, future forecasts and security issues
Current imprisonment rates, future forecasts and security issues
 
Shawnee County Stepping Up 4.28.21
Shawnee County Stepping Up 4.28.21Shawnee County Stepping Up 4.28.21
Shawnee County Stepping Up 4.28.21
 
Ch06 Measuring Crime
Ch06 Measuring CrimeCh06 Measuring Crime
Ch06 Measuring Crime
 
Final - ANL Research report
Final - ANL Research reportFinal - ANL Research report
Final - ANL Research report
 
Is the CPS successful?
Is the CPS successful?Is the CPS successful?
Is the CPS successful?
 

More from Office for National Statistics

Slideshare - ONS Economic Forum Slidepack - 13 May 2024.pptx
Slideshare - ONS Economic Forum Slidepack - 13 May 2024.pptxSlideshare - ONS Economic Forum Slidepack - 13 May 2024.pptx
Slideshare - ONS Economic Forum Slidepack - 13 May 2024.pptxOffice for National Statistics
 
SlideShare ONS Economic Forum Slidepack - 22 April 2024.
SlideShare ONS Economic Forum Slidepack - 22 April 2024.SlideShare ONS Economic Forum Slidepack - 22 April 2024.
SlideShare ONS Economic Forum Slidepack - 22 April 2024.Office for National Statistics
 
Slideshare - ONS Economic Forum Slidepack - 18 March 2024.pptx
Slideshare - ONS Economic Forum Slidepack - 18 March 2024.pptxSlideshare - ONS Economic Forum Slidepack - 18 March 2024.pptx
Slideshare - ONS Economic Forum Slidepack - 18 March 2024.pptxOffice for National Statistics
 
Slideshare - ONS Economic Forum Slidepack - 19 February 2024.pptx
Slideshare - ONS Economic Forum Slidepack - 19 February 2024.pptxSlideshare - ONS Economic Forum Slidepack - 19 February 2024.pptx
Slideshare - ONS Economic Forum Slidepack - 19 February 2024.pptxOffice for National Statistics
 
SlideShare ONS Economic Forum Slidepack - 22 January 2024
SlideShare ONS Economic Forum Slidepack - 22 January 2024SlideShare ONS Economic Forum Slidepack - 22 January 2024
SlideShare ONS Economic Forum Slidepack - 22 January 2024Office for National Statistics
 
Beyond GDP: international developments and emerging frameworks - 26 September...
Beyond GDP: international developments and emerging frameworks - 26 September...Beyond GDP: international developments and emerging frameworks - 26 September...
Beyond GDP: international developments and emerging frameworks - 26 September...Office for National Statistics
 
SlideShare ONS Economic Forum Slidepack - 11 December 2023
SlideShare ONS Economic Forum Slidepack - 11 December 2023SlideShare ONS Economic Forum Slidepack - 11 December 2023
SlideShare ONS Economic Forum Slidepack - 11 December 2023Office for National Statistics
 
SlideShare ONS Economic Forum Slidepack - 13 November 2023
SlideShare ONS Economic Forum Slidepack - 13 November 2023SlideShare ONS Economic Forum Slidepack - 13 November 2023
SlideShare ONS Economic Forum Slidepack - 13 November 2023Office for National Statistics
 
SlideShare ONS Economic Forum Slidepack - 16 October 2023
SlideShare ONS Economic Forum Slidepack - 16 October 2023SlideShare ONS Economic Forum Slidepack - 16 October 2023
SlideShare ONS Economic Forum Slidepack - 16 October 2023Office for National Statistics
 
So what does ‘Beyond GDP’ mean for the UK – 12 October 2023
So what does ‘Beyond GDP’ mean for the UK – 12 October 2023So what does ‘Beyond GDP’ mean for the UK – 12 October 2023
So what does ‘Beyond GDP’ mean for the UK – 12 October 2023Office for National Statistics
 
GDP after 2025: updating national accounts and balance of payments – 11 Octob...
GDP after 2025: updating national accounts and balance of payments – 11 Octob...GDP after 2025: updating national accounts and balance of payments – 11 Octob...
GDP after 2025: updating national accounts and balance of payments – 11 Octob...Office for National Statistics
 
SlideShare Measuring the Economy Slidepack - 29 September 2023
SlideShare Measuring the Economy Slidepack - 29 September 2023SlideShare Measuring the Economy Slidepack - 29 September 2023
SlideShare Measuring the Economy Slidepack - 29 September 2023Office for National Statistics
 
SlideShare ONS Economic Forum Slidepack - 18 September 2023
SlideShare ONS Economic Forum Slidepack - 18 September 2023SlideShare ONS Economic Forum Slidepack - 18 September 2023
SlideShare ONS Economic Forum Slidepack - 18 September 2023Office for National Statistics
 
ONS Local presents Suffolk County Council's Cost of Living Dashboard
ONS Local presents Suffolk County Council's Cost of Living DashboardONS Local presents Suffolk County Council's Cost of Living Dashboard
ONS Local presents Suffolk County Council's Cost of Living DashboardOffice for National Statistics
 
ONS Local and Data Science Community Workshop 1: How to use APIs
ONS Local and Data Science Community Workshop 1: How to use APIsONS Local and Data Science Community Workshop 1: How to use APIs
ONS Local and Data Science Community Workshop 1: How to use APIsOffice for National Statistics
 
ONS Local and Data Science Community Workshop 1: How to use APIs
ONS Local and Data Science Community Workshop 1: How to use APIsONS Local and Data Science Community Workshop 1: How to use APIs
ONS Local and Data Science Community Workshop 1: How to use APIsOffice for National Statistics
 
ONS Local and Data Science Community Workshop 1: How to use APIs
ONS Local and Data Science Community Workshop 1: How to use APIsONS Local and Data Science Community Workshop 1: How to use APIs
ONS Local and Data Science Community Workshop 1: How to use APIsOffice for National Statistics
 
ONS Local presents: Adult Education Outcomes in London
ONS Local presents: Adult Education Outcomes in LondonONS Local presents: Adult Education Outcomes in London
ONS Local presents: Adult Education Outcomes in LondonOffice for National Statistics
 

More from Office for National Statistics (20)

Slideshare - ONS Economic Forum Slidepack - 13 May 2024.pptx
Slideshare - ONS Economic Forum Slidepack - 13 May 2024.pptxSlideshare - ONS Economic Forum Slidepack - 13 May 2024.pptx
Slideshare - ONS Economic Forum Slidepack - 13 May 2024.pptx
 
SlideShare ONS Economic Forum Slidepack - 22 April 2024.
SlideShare ONS Economic Forum Slidepack - 22 April 2024.SlideShare ONS Economic Forum Slidepack - 22 April 2024.
SlideShare ONS Economic Forum Slidepack - 22 April 2024.
 
Slideshare - ONS Economic Forum Slidepack - 18 March 2024.pptx
Slideshare - ONS Economic Forum Slidepack - 18 March 2024.pptxSlideshare - ONS Economic Forum Slidepack - 18 March 2024.pptx
Slideshare - ONS Economic Forum Slidepack - 18 March 2024.pptx
 
Slideshare - ONS Economic Forum Slidepack - 19 February 2024.pptx
Slideshare - ONS Economic Forum Slidepack - 19 February 2024.pptxSlideshare - ONS Economic Forum Slidepack - 19 February 2024.pptx
Slideshare - ONS Economic Forum Slidepack - 19 February 2024.pptx
 
SlideShare ONS Economic Forum Slidepack - 22 January 2024
SlideShare ONS Economic Forum Slidepack - 22 January 2024SlideShare ONS Economic Forum Slidepack - 22 January 2024
SlideShare ONS Economic Forum Slidepack - 22 January 2024
 
Beyond GDP: international developments and emerging frameworks - 26 September...
Beyond GDP: international developments and emerging frameworks - 26 September...Beyond GDP: international developments and emerging frameworks - 26 September...
Beyond GDP: international developments and emerging frameworks - 26 September...
 
SlideShare ONS Economic Forum Slidepack - 11 December 2023
SlideShare ONS Economic Forum Slidepack - 11 December 2023SlideShare ONS Economic Forum Slidepack - 11 December 2023
SlideShare ONS Economic Forum Slidepack - 11 December 2023
 
SlideShare ONS Economic Forum Slidepack - 13 November 2023
SlideShare ONS Economic Forum Slidepack - 13 November 2023SlideShare ONS Economic Forum Slidepack - 13 November 2023
SlideShare ONS Economic Forum Slidepack - 13 November 2023
 
SlideShare ONS Economic Forum Slidepack - 16 October 2023
SlideShare ONS Economic Forum Slidepack - 16 October 2023SlideShare ONS Economic Forum Slidepack - 16 October 2023
SlideShare ONS Economic Forum Slidepack - 16 October 2023
 
So what does ‘Beyond GDP’ mean for the UK – 12 October 2023
So what does ‘Beyond GDP’ mean for the UK – 12 October 2023So what does ‘Beyond GDP’ mean for the UK – 12 October 2023
So what does ‘Beyond GDP’ mean for the UK – 12 October 2023
 
GDP after 2025: updating national accounts and balance of payments – 11 Octob...
GDP after 2025: updating national accounts and balance of payments – 11 Octob...GDP after 2025: updating national accounts and balance of payments – 11 Octob...
GDP after 2025: updating national accounts and balance of payments – 11 Octob...
 
SlideShare Measuring the Economy Slidepack - 29 September 2023
SlideShare Measuring the Economy Slidepack - 29 September 2023SlideShare Measuring the Economy Slidepack - 29 September 2023
SlideShare Measuring the Economy Slidepack - 29 September 2023
 
Why dashboards?
Why dashboards?Why dashboards?
Why dashboards?
 
SlideShare ONS Economic Forum Slidepack - 18 September 2023
SlideShare ONS Economic Forum Slidepack - 18 September 2023SlideShare ONS Economic Forum Slidepack - 18 September 2023
SlideShare ONS Economic Forum Slidepack - 18 September 2023
 
Connecting to the StatXplore API in PowerBI
Connecting to the StatXplore API in PowerBIConnecting to the StatXplore API in PowerBI
Connecting to the StatXplore API in PowerBI
 
ONS Local presents Suffolk County Council's Cost of Living Dashboard
ONS Local presents Suffolk County Council's Cost of Living DashboardONS Local presents Suffolk County Council's Cost of Living Dashboard
ONS Local presents Suffolk County Council's Cost of Living Dashboard
 
ONS Local and Data Science Community Workshop 1: How to use APIs
ONS Local and Data Science Community Workshop 1: How to use APIsONS Local and Data Science Community Workshop 1: How to use APIs
ONS Local and Data Science Community Workshop 1: How to use APIs
 
ONS Local and Data Science Community Workshop 1: How to use APIs
ONS Local and Data Science Community Workshop 1: How to use APIsONS Local and Data Science Community Workshop 1: How to use APIs
ONS Local and Data Science Community Workshop 1: How to use APIs
 
ONS Local and Data Science Community Workshop 1: How to use APIs
ONS Local and Data Science Community Workshop 1: How to use APIsONS Local and Data Science Community Workshop 1: How to use APIs
ONS Local and Data Science Community Workshop 1: How to use APIs
 
ONS Local presents: Adult Education Outcomes in London
ONS Local presents: Adult Education Outcomes in LondonONS Local presents: Adult Education Outcomes in London
ONS Local presents: Adult Education Outcomes in London
 

Recently uploaded

Call Girls in North Sikkim 9332606886 HOT & SEXY Models beautiful and charmi...
Call Girls in North Sikkim  9332606886 HOT & SEXY Models beautiful and charmi...Call Girls in North Sikkim  9332606886 HOT & SEXY Models beautiful and charmi...
Call Girls in North Sikkim 9332606886 HOT & SEXY Models beautiful and charmi...Sareena Khatun
 
Lorain Road Business District Revitalization Plan Final Presentation
Lorain Road Business District Revitalization Plan Final PresentationLorain Road Business District Revitalization Plan Final Presentation
Lorain Road Business District Revitalization Plan Final PresentationCuyahoga County Planning Commission
 
Call Girl Service in West Tripura 9332606886Call Girls Advance Cash On Deliv...
Call Girl Service in West Tripura  9332606886Call Girls Advance Cash On Deliv...Call Girl Service in West Tripura  9332606886Call Girls Advance Cash On Deliv...
Call Girl Service in West Tripura 9332606886Call Girls Advance Cash On Deliv...ruksarkahn825
 
2024 UN Civil Society Conference in Support of the Summit of the Future.
2024 UN Civil Society Conference in Support of the Summit of the Future.2024 UN Civil Society Conference in Support of the Summit of the Future.
2024 UN Civil Society Conference in Support of the Summit of the Future.Christina Parmionova
 
Call Girls Radhanpur - 8250092165 Our call girls are sure to provide you with...
Call Girls Radhanpur - 8250092165 Our call girls are sure to provide you with...Call Girls Radhanpur - 8250092165 Our call girls are sure to provide you with...
Call Girls Radhanpur - 8250092165 Our call girls are sure to provide you with...Sareena Khatun
 
YHRGeorgetown Spring 2024 America should Take Her Share
YHRGeorgetown Spring 2024 America should Take Her ShareYHRGeorgetown Spring 2024 America should Take Her Share
YHRGeorgetown Spring 2024 America should Take Her Shareyalehistoricalreview
 
An Atoll Futures Research Institute? Presentation for CANCC
An Atoll Futures Research Institute? Presentation for CANCCAn Atoll Futures Research Institute? Presentation for CANCC
An Atoll Futures Research Institute? Presentation for CANCCNAP Global Network
 
Our nurses, our future. The economic power of care.
Our nurses, our future. The economic power of care.Our nurses, our future. The economic power of care.
Our nurses, our future. The economic power of care.Christina Parmionova
 
Managing large-scale outbreaks at Farrow-to-Weaner Farms
Managing large-scale outbreaks at Farrow-to-Weaner FarmsManaging large-scale outbreaks at Farrow-to-Weaner Farms
Managing large-scale outbreaks at Farrow-to-Weaner FarmsHarm Kiezebrink
 
2024: The FAR, Federal Acquisition Regulations, Part 31
2024: The FAR, Federal Acquisition Regulations, Part 312024: The FAR, Federal Acquisition Regulations, Part 31
2024: The FAR, Federal Acquisition Regulations, Part 31JSchaus & Associates
 
Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...gajnagarg
 
Adajan < Russian Call Girls Ahmedabad | Starting ₹,5K To @25k with A/C 800573...
Adajan < Russian Call Girls Ahmedabad | Starting ₹,5K To @25k with A/C 800573...Adajan < Russian Call Girls Ahmedabad | Starting ₹,5K To @25k with A/C 800573...
Adajan < Russian Call Girls Ahmedabad | Starting ₹,5K To @25k with A/C 800573...gragfaguni
 
Tuvalu Coastal Adaptation Project (TCAP)
Tuvalu Coastal Adaptation Project (TCAP)Tuvalu Coastal Adaptation Project (TCAP)
Tuvalu Coastal Adaptation Project (TCAP)NAP Global Network
 
2024: The FAR, Federal Acquisition Regulations, Part 32
2024: The FAR, Federal Acquisition Regulations, Part 322024: The FAR, Federal Acquisition Regulations, Part 32
2024: The FAR, Federal Acquisition Regulations, Part 32JSchaus & Associates
 
Time, Stress & Work Life Balance for Clerks with Beckie Whitehouse
Time, Stress & Work Life Balance for Clerks with Beckie WhitehouseTime, Stress & Work Life Balance for Clerks with Beckie Whitehouse
Time, Stress & Work Life Balance for Clerks with Beckie Whitehousesubs7
 
Scaling up coastal adaptation in Maldives through the NAP process
Scaling up coastal adaptation in Maldives through the NAP processScaling up coastal adaptation in Maldives through the NAP process
Scaling up coastal adaptation in Maldives through the NAP processNAP Global Network
 
The NAP process & South-South peer learning
The NAP process & South-South peer learningThe NAP process & South-South peer learning
The NAP process & South-South peer learningNAP Global Network
 
Item # 7-8 - 6900 Broadway P&Z Case # 438
Item # 7-8 - 6900 Broadway P&Z Case # 438Item # 7-8 - 6900 Broadway P&Z Case # 438
Item # 7-8 - 6900 Broadway P&Z Case # 438ahcitycouncil
 

Recently uploaded (20)

Call Girls in North Sikkim 9332606886 HOT & SEXY Models beautiful and charmi...
Call Girls in North Sikkim  9332606886 HOT & SEXY Models beautiful and charmi...Call Girls in North Sikkim  9332606886 HOT & SEXY Models beautiful and charmi...
Call Girls in North Sikkim 9332606886 HOT & SEXY Models beautiful and charmi...
 
Lorain Road Business District Revitalization Plan Final Presentation
Lorain Road Business District Revitalization Plan Final PresentationLorain Road Business District Revitalization Plan Final Presentation
Lorain Road Business District Revitalization Plan Final Presentation
 
Call Girl Service in West Tripura 9332606886Call Girls Advance Cash On Deliv...
Call Girl Service in West Tripura  9332606886Call Girls Advance Cash On Deliv...Call Girl Service in West Tripura  9332606886Call Girls Advance Cash On Deliv...
Call Girl Service in West Tripura 9332606886Call Girls Advance Cash On Deliv...
 
2024 UN Civil Society Conference in Support of the Summit of the Future.
2024 UN Civil Society Conference in Support of the Summit of the Future.2024 UN Civil Society Conference in Support of the Summit of the Future.
2024 UN Civil Society Conference in Support of the Summit of the Future.
 
Call Girls Radhanpur - 8250092165 Our call girls are sure to provide you with...
Call Girls Radhanpur - 8250092165 Our call girls are sure to provide you with...Call Girls Radhanpur - 8250092165 Our call girls are sure to provide you with...
Call Girls Radhanpur - 8250092165 Our call girls are sure to provide you with...
 
YHRGeorgetown Spring 2024 America should Take Her Share
YHRGeorgetown Spring 2024 America should Take Her ShareYHRGeorgetown Spring 2024 America should Take Her Share
YHRGeorgetown Spring 2024 America should Take Her Share
 
An Atoll Futures Research Institute? Presentation for CANCC
An Atoll Futures Research Institute? Presentation for CANCCAn Atoll Futures Research Institute? Presentation for CANCC
An Atoll Futures Research Institute? Presentation for CANCC
 
BioandPicforRepKendrick_LastUpdatedMay2024
BioandPicforRepKendrick_LastUpdatedMay2024BioandPicforRepKendrick_LastUpdatedMay2024
BioandPicforRepKendrick_LastUpdatedMay2024
 
Our nurses, our future. The economic power of care.
Our nurses, our future. The economic power of care.Our nurses, our future. The economic power of care.
Our nurses, our future. The economic power of care.
 
Managing large-scale outbreaks at Farrow-to-Weaner Farms
Managing large-scale outbreaks at Farrow-to-Weaner FarmsManaging large-scale outbreaks at Farrow-to-Weaner Farms
Managing large-scale outbreaks at Farrow-to-Weaner Farms
 
2024: The FAR, Federal Acquisition Regulations, Part 31
2024: The FAR, Federal Acquisition Regulations, Part 312024: The FAR, Federal Acquisition Regulations, Part 31
2024: The FAR, Federal Acquisition Regulations, Part 31
 
AHMR volume 10 number 1 January-April 2024
AHMR volume 10 number 1 January-April 2024AHMR volume 10 number 1 January-April 2024
AHMR volume 10 number 1 January-April 2024
 
Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...
 
Adajan < Russian Call Girls Ahmedabad | Starting ₹,5K To @25k with A/C 800573...
Adajan < Russian Call Girls Ahmedabad | Starting ₹,5K To @25k with A/C 800573...Adajan < Russian Call Girls Ahmedabad | Starting ₹,5K To @25k with A/C 800573...
Adajan < Russian Call Girls Ahmedabad | Starting ₹,5K To @25k with A/C 800573...
 
Tuvalu Coastal Adaptation Project (TCAP)
Tuvalu Coastal Adaptation Project (TCAP)Tuvalu Coastal Adaptation Project (TCAP)
Tuvalu Coastal Adaptation Project (TCAP)
 
2024: The FAR, Federal Acquisition Regulations, Part 32
2024: The FAR, Federal Acquisition Regulations, Part 322024: The FAR, Federal Acquisition Regulations, Part 32
2024: The FAR, Federal Acquisition Regulations, Part 32
 
Time, Stress & Work Life Balance for Clerks with Beckie Whitehouse
Time, Stress & Work Life Balance for Clerks with Beckie WhitehouseTime, Stress & Work Life Balance for Clerks with Beckie Whitehouse
Time, Stress & Work Life Balance for Clerks with Beckie Whitehouse
 
Scaling up coastal adaptation in Maldives through the NAP process
Scaling up coastal adaptation in Maldives through the NAP processScaling up coastal adaptation in Maldives through the NAP process
Scaling up coastal adaptation in Maldives through the NAP process
 
The NAP process & South-South peer learning
The NAP process & South-South peer learningThe NAP process & South-South peer learning
The NAP process & South-South peer learning
 
Item # 7-8 - 6900 Broadway P&Z Case # 438
Item # 7-8 - 6900 Broadway P&Z Case # 438Item # 7-8 - 6900 Broadway P&Z Case # 438
Item # 7-8 - 6900 Broadway P&Z Case # 438
 

Annual Crime and Justice Forum webinar 23 February 2022 - workshop 2A

  • 1. Workshop 2A: Using natural language processing tools in crime statistics Welcome Billy Gazard Centre for Crime and Justice Office for National Statistics @ONSfocus #ONSCrimeJustice
  • 2. Agenda 14:15 to 14:20 – Welcome, Billy Gazard, Centre for Crime and Justice, ONS 14:20 to 14:35 – Kevin Smith – Crime Analysis Unit, Home Office 14:35 to 14:50 – Dr Angus Roberts and Dr Giouliana Kadra – Kings College London 14:50 to 15:15 – Discussant Billy Gazard, Centre for Crime and Justice, ONS @ONSfocus #ONSCrimeJustice
  • 3. The National Data Quality Improvement Service - NDQIS OFFICIAL SENSITIVE
  • 4. Police recorded crime and ‘flags’ – background The main police recorded crime collection is offence-based. The Home Office are notified by police forces how many crimes were recorded by offence each month. For example, in month X there were 15 personal robbery offences; 2 attempted murders etc. In order to get more information about these offences, the Home Office also collects supplementary information via so-called ‘flagged’ collections – where the police add information to a crime record. 05/04/2022 OFFICIAL SENSITIVE 4 ‘Flagged’ collections However, these collections tend to be reliant on an officer or police staff correctly tagging an offence with the appropriate marker. We know this doesn’t always happen, and that how well these flags are applied varies between police forces. So flagged collections tend to be an undercount of the true picture. Offences involving knives Hate crime Domestic abuse Metal theft Online crime Offences involving corrosive substances So-called honour based abuse Child sexual abuse / exploitation
  • 5. So what is NDQIS? The National Data Quality Improvement Service – NDQIS – is a tool to improve the quality of these flagged data by removing the reliance on police staff manually adding the flags. There are three broad aims of NDQIS: • To improve data quality • To increase comparability of data between forces • To reduce burden on the forces This project is simply about improving police recorded crime data quality. NDQIS DOES NOT tell forces what crimes they should be investigating / how they should be investigated, or what outcome should be assigned to an offence. 05/04/2022 OFFICIAL SENSITIVE 5
  • 6. How does NDQIS work? The NDQIS software looks at fields held within the Record Management System (RMS) in each force, such as MO Text and Occurrence Summary. Other fields are also examined – depending on the collection. Crime records are examined using semantic analysis – using a bespoke data dictionary for the collection, and using the information from extra fields. Each crime record is then assigned a confidence rating: • High Confidence – NDQIS determines that the record meets the criteria (i.e. involved a knife) • Low Confidence – NDQIS cannot determine for sure whether the record meets criteria – so record needs to be manually reviewed / left as original decision in the force (depending on collection). • Rejected – NDQIS determines that criteria not met (i.e. did not involve a knife). Once data are processed, data are supplied to the Home Office. 05/04/2022 OFFICIAL SENSITIVE 6
  • 7. NDQIS – offences involving knives or sharp instruments The NDQIS concept was first tested on the offences involving knives or sharp instruments collection (knife-enabled crime) – a high profile data collection and of key ministerial interest. A data dictionary was created for the collection and tested on synthetic and then real data from forces. Home Office statisticians audited thousands of records to ensure NDQIS was working. After the second iteration of the ruleset, for 12 forces: • We agreed with the High Confidence classification 98% of the time • We agreed with the Rejected classification 99% of the time • Around a quarter of records were assigned to Low Confidence – for manual review. Of these, 51% were knife crimes and 49% were not. Home Office, police forces, the National Police Chiefs’ Council and Office for National Statistics agreed to proceed to implementation. 05/04/2022 OFFICIAL SENSITIVE 7
  • 8. NDQIS knife-enabled crime results 05/04/2022 OFFICIAL SENSITIVE 8 NDQIS data published for 33 forces in January 2022. These forces account for 9 in 10 knife offences recorded. Nationally, offences were 11% higher on the new method / guidance for 2019/20. Increases due to new methodology identifying more knife crimes offset by the changes in coverage. HOWEVER – at the force level, some big changes in 2019/20 levels: West Midlands 46% higher South Yorkshire 66% higher Metropolitan 8% lower West Midlands South Yorkshire Metropolitan England and Wales 0 1,000 2,000 3,000 4,000 5,000 6,000 NDQIS Old Method 0 500 1,000 1,500 2,000 NDQIS Old Method 0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000 18,000 NDQIS Old Method 0 10,000 20,000 30,000 40,000 50,000 60,000 2010/11 2011/12 2012/13 2013/14 2014/15 2015/16 2016/17 2017/18 2018/19 2019/20 Old method NDQIS method
  • 9. NDQIS – offences involving knives or sharp instruments 05/04/2022 OFFICIAL SENSITIVE 9 Impact can be seen on the rates per population. West Midlands now the PFA with the highest rate per population. South Yorkshire moved from 13th to 4th (next to West Yorkshire Police). Kent the biggest mover – from 35th to 12th. Lancashire has fallen from 11th to 21st. Metropolitan Police 179 West Midlands 172 Greater Manchester 131 Metropolitan Police 165 West Midlands 118 Cleveland 125 West Yorkshire 104 South Yorkshire 115 Merseyside 100 Greater Manchester 113 Bedfordshire 97 Merseyside 109 Cleveland 92 West Yorkshire 109 Northamptonshire 85 Essex 98 Derbyshire 83 Bedfordshire 97 Leicestershire 81 Northamptonshire 96 Lancashire 76 Humberside 94 Sussex 73 Kent 88 South Yorkshire 69 Cambridgeshire 86 Humberside 69 Derbyshire 81 Nottinghamshire 67 Leicestershire 78 South Wales 66 Avon and Somerset 78 Cambridgeshire 66 Nottinghamshire 77 Thames Valley 66 Hertfordshire 77 Hertfordshire 62 Sussex 67 Essex 61 Hampshire 67 Warwickshire 61 Lancashire 65 West Mercia 57 Thames Valley 65 Avon and Somerset 56 Warwickshire 61 Northumbria 55 Cheshire 61 Staffordshire 53 South Wales 61 Gloucestershire 51 Norfolk 59 Norfolk 49 Suffolk 57 Lincolnshire 46 West Mercia 57 Suffolk 43 Lincolnshire 56 Cumbria 42 North Wales 55 Dyfed-Powys 41 Northumbria 55 North Yorkshire 40 Staffordshire 53 Wiltshire 40 Durham 52 North Wales 40 Gloucestershire 51 Kent 39 Devon and Cornwall 44 Hampshire 38 Surrey 42 Surrey 38 Cumbria 41 Cheshire 38 North Yorkshire 40 Gwent 37 Wiltshire 40 Dorset 35 Gwent 37 Devon and Cornwall 29 Dorset 35 Durham 26 Dyfed-Powys 34 Pre-NDQIS Post-NDQIS
  • 10. NDQIS – Next steps We’re now testing NDQIS on three further “flagged” collections: • Domestic abuse • Child sexual abuse / exploitation • Hate crime Different fields required for different collections. Domestic abuse also requires age of victim / offender and relationship between victim and offender. Issues raised by forces around safeguarding / investigation of domestic abuse / child sexual abuse offences. These collections discussed at NDQIS Steering Group yesterday. 05/04/2022 OFFICIAL SENSITIVE 10
  • 11. Using natural language processing to extract and classify instances of interpersonal violence in mental healthcare electronic records Dr Giouliana Kadra-Scalzo and Dr Angus Roberts Institute of Psychiatry, Psychology and Neuroscience, King’s College London
  • 12. Background • People with mental illness are more likely to experience violent victimisation compared to the general population- 15–45% of female patients report experiences of victimization in the past year, and 40– 90% reporting lifetime victimization. • Similar patterns have been observed for domestic violence, sexual violence, violence perpetration, and witnessing violence. • Potential for electronic health records kept by mental health services
  • 13. King’s College London (KCL) Coverage – Lambeth, Southwark, Lewisham, Croydon Base population – c.1.4m Records- since 2007 (updated every 24hrs) EHRs- c. 500,000 Approvals: Oxford Research Ethics Committee C (reference 08/H606/71+5) Clinical services – specialist MH Trust • CAMHS • General adult psychiatry • Older adult services • Learning difficulties • Addictions • National • IAPT • Forensic Croydon Lewisham Lambeth Southwark . South London and Maudsley (SLAM) SLAM Biomedical Research Centre (BRC)
  • 14. CRIS data Structured • Ethnicity • Diagnosis • HoNOS Free-text • Inpatient progress notes • Clinical summaries NLP • Smoking • Antipsychotic medication • Psychotherapy
  • 15. Method Keywords Nouns % abus% % attack% % beat% % violenc% % hit% % rape% % assault% Verbs: % fight% % fought% % slap% % chok% % push% % punch% % strangul% % strangl% % threw% % struck% Figure 1. The process of annotation, development and evaluation of NLP models
  • 16. Method Example of text fragment Label Annotation “They were abused in their childhood” Violence presence, victim Affirmed “Patient used to hit her partner” Violence presence, perpetrator; physical, domestic Affirmed “Patient stabbed his roommate” Violence presence, perpetrator; physical, domestic Affirmed “Expressed a lot of interest in violence, nazism” Violence presence Irrelevant “No violence or aggression noted” Violence presence Negated Possible labels: violence presence- affirmed, negated or irrelevant patient status- victim, perpetrator, and/or witness violence type- domestic, physical, sexual Table 1: Example of text fragments extracted for annotation in this study, alongside corresponding labels and assigned annotations. Step 1 Step 2 Step 3
  • 17. NLP model development • We used a pre-trained BioBERT model and fine-tuned it on the annotated dataset. Each set was generated independently (ensuring no overlap). Three datasets were used for: 1) model testing and training (development stage, 3771 sentences); 2) model fine-tuning (1411 sentences); 3) model blind testing (100 newly annotated sentences not used for model training or fine-tuning was also assessed). • Our aim was to produce seven binary classification models for each annotated labels. • We evaluated the models with 10-fold cross validation, comprising 10% annotated text extracts for testing, and 90% text extracts in training, in each fold. • Standard markers of NLP performance: precision (or positive predictive value), recall (or sensitivity) and F1 score (the harmonic mean of precision and recall), using weighted averages to take into account the dataset’s imbalance (i.e. differing numbers of extracts generated for each keyword). Reported scores corresponded to the mean across the 10 test sets. Method
  • 18. Results Table 2. NLP model performances on the training and testing dataset (3,771 text extracts); and on blind test set with a 90% probability threshold (100 sentences) for the six labels. • Inter-annotator agreement was high: 82-96% (60-85% Cohen’s kappa) for the six annotation labels. • For one annotation label (witness) we were unable to generate a model due to insufficient data size. Training set (average score on 10-fold cross- validation) Blind test set Annotation label Precision Recall F1-score F1-score Violence presence 93% 93% 93% 95% Patient status: Perpetrator 89% 89% 89% 85% Patient status: Victim 91% 89% 91% 90% Violence type: Domestic 94% 94% 94% 93% Violence type: Physical 91% 92% 91% 98% Violence type: Sexual 98% 97% 97% 93%
  • 20. Stage 5: Evaluation and cost-benefit Stage 4: Connections and causal pathways Stage 3: Integrate and link data Stage 2: Improve measurement Stage 1: Coordination and theory of change Health Universities: Bristol City KCL Data providers: Justice Universities: City Lancaster UCL Data providers: Specialised services Universities: City Warwick Data providers: Inequalities Universities: City Lancaster Data providers: Integrated theory and data Universities: Bristol City Lancaster UCL Warwick Data providers: ONS (CSEW) CRIS PHW NHSD (APMS) Police MoJ: PNC Solicitors NCDV Justice, Health, Specialised services Inequalities Rape Crisis Women’s Aid Refuge Safe Lives Respect Imkaan UNODC WB OECD WHO UK BHPS Users and advisors Imkaan Inquest Council of Europe Lancashire Const. MoJ ONS MHCLG Home Office DHSC PHE VAMHNW Monash GREVIO Respect Mind Samaritans Women’s Budget Group Anti-slavery Commission er’s Office National Police Chiefs Council Health Services UK government Justice Third sector International Future work VISION – violence, health and society £7.1m 5 years Project lead (Walby) NLP expertise
  • 22. Key References • Khalifeh H, Moran P, Borschmann R, et al. Domestic and sexual violence against patients with severe mental illness. Psychol Med 2015;45:875–86. doi:10.1017/S0033291714001962 • Khalifeh H, Johnson S, Howard LM, et al. Violent and non-violent crime against adults with severe mental illness. British Journal of Psychiatry 2015;206:275–82. doi:10.1192/bjp.bp.114.147843 • Fazel S, Gulati G, Linsell L, et al. Schizophrenia and violence: Systematic review and meta-analysis. PLoS Medicine 2009;6. doi:10.1371/journal.pmed.1000120 • Kadra G, Dean K, Hotopf M, et al. Investigating Exposure to Violence and Mental Health in a Diverse Urban Community Sample: Data from the South East London Community Health (SELCoH) Survey. PLOS ONE 2014;9:e93660. doi:10.1371/journal.pone.0093660 • Lee J, Yoon W, Kim S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2020;36:1234–40. doi:10.1093/bioinformatics/btz682 • Mascio A, Kraljevic Z, Bean D, et al. Comparative Analysis of Text Classification Approaches in Electronic Health Records. arXiv:200506624 [cs] Published Online First: 8 May 2020.http://arxiv.org/abs/2005.06624 (accessed 18 Jan 2021). • Jackson RG, Patel R, Jayatilleke N, et al. Natural language processing to extract symptoms of severe mental illness from clinical text: the Clinical Record Interactive Search Comprehensive Data Extraction (CRIS-CODE) project. BMJ Open 2017;7:e012012. doi:10.1136/bmjopen-2016-012012 • Perera G, Broadbent M, Callard F, et al. Cohort profile of the South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLaM BRC) Case Register: current status and recent enhancement of an Electronic Mental Health Record-derived data resource. BMJ Open 2016;6:e008721. doi:10.1136/bmjopen-2015-008721 • Clinical Record Interactive Search (CRIS). https://www.slam.nhs.uk/quality-and-research/clinical-record-interactive-search-cris/ (accessed 2 Feb 2021).