17 March 2023
In our prior article, Breaking down health inequalities, one ‘place’ at a time | 91探花, we discussed the importance of using data analytics to reduce health inequalities at different levels (system, neighbourhood and place). Our analysis highlighted the significance of biased data and good quality data in obtaining effective population health intelligence. In this article, we will consider the common data issues that could lead to data bias in advanced analytics, such as artificial intelligence (AI) and machine learning (ML) in health and social care.
Bias can have a significant impact on population health, leading to incorrect diagnoses and treatment. One prominent recent example was the case of pulse oximetry during Covid-19, where its reading accuracy was affected by skin colour.1 Bias based on protected characteristics in medical technology is not surprising as the Hospital Episode Statistics database has shown discordance between self-reported ethnicity and NHS’ routine hospital data for 20-30% of minority ethnic patients, compared to only 2.2% for White British patients.2
It is essential to understand how and when data bias occurs in order to prevent and address it. This will help ensure that new AI algorithms and ML models do not perpetuate or even exacerbate inequities in health and social care. There are several common data issues that we often face working with clients.
Data collection
The lack of standardisation in data collection across health and social care is a widespread issue. Different data definitions, techniques and coding practices make it difficult to compare data from various sources.
Example 1
Different providers may use varying techniques and instruments to record blood pressure measurements. The auscultatory method can be subject to observer bias and measurement error. While the oscillometric method is less dependent on the observer, it can still lead to variations in recorded values based on the device and cuff size used.3
Differing data collection techniques can lead to variability in blood pressure measurements and, in turn, disparities in the diagnosis and treatment of hypertension for certain population groups. We recommend developing specifications for data sets based on agreed collection practices, such as using standardised measurement techniques and validated instruments, to support consistency in the recording.
Missing data
Missing data is another major issue in health and social care. This occurs when key data points are not collected or represent incomplete information, either due to system limitations, human error, or other factors. We recently evaluated a national NHS disease prevention programme and were faced with a large proportion of ‘unknown ethnicities’ in the data, for which we needed to adjust our approach accordingly.
Example 2
Ethnicity data may be grouped into broad ethnic categories, such as Black or Asian. This can mask differences between ethnic subgroups.
By overlooking differences between different ethnic subgroups, healthcare services will be unable to address the specific needs of communities, leading to unequal access to care and poorer health outcomes. To capitalise on the usefulness of data collected, we recommend ensuring maximum alignment of collection processes to common practices and data flows.
Data quality
Data quality is critical in ensuring accurate analysis. Data entry errors, inconsistencies in coding, and inaccuracies in measurements can all lead to biased analysis.
Example 3
Some healthcare providers may use the imperial system (eg pounds), while others may use the metric system (eg kilograms). The resulting BMI calculation may be inaccurate, potentially leading to miscalculation of a patient’s weight status.4
We have also found in our evaluations issues with self-reported weights, and timing issues in relation to when the measurement is taken. This, in turn, could lead to biased analyses and incorrect assumptions about the prevalence of obesity and related conditions in certain populations. Ultimately, these issues will exacerbate health inequalities by affecting the quality of care and resources allocated to certain groups based on inaccurate data. Creating agreed terminologies and data submission requirements often helps to drive data quality improvements.
Algorithm design
Patient data is often used in advanced analytical methods, such as risk prediction models, to predict the likelihood of future health outcomes, for example, hospitalisation. However, these models can be biased if the algorithm uses a dataset that is not representative of the population.
Example 4
Using historical data that reflects disparities in health and social care access and quality can result in a biased risk prediction model that can have lower sensitivity towards certain population groups and can predict false health outcomes.
Using biased models can exacerbate health inequities, as previously disadvantaged population groups will continue to receive inadequate care or under-treatment. To improve the accuracy and effectiveness of risk prediction models and other algorithms, it is important to address bias at the pre-processing stage by using balancing techniques.
How 91探花can help
Addressing common data issues, such as the above, is crucial in reducing bias and ensuring accurate analysis to achieve equitable health outcomes. Our analytics team can provide both short-term and long-term help to improve data-driven health solutions and deal with data bias. 91探花can help health organisations to improve their data collection processes and evaluate any advanced analytical technologies.
Our tools include:
- AI assurance: providing assurance over algorithms and AI technology;
- data processing: including data cleansing, data assessment and model reviews;
- model development: our Data Science team can develop AI models using R, Python and Alteryx;
- assistance with data sourcing from different systems;
- creating common data models and data templates;
- automating processes: including standardising model and data collection; and
- balancing techniques: including methods of dealing with data bias, such as oversampling.
References
1
2Saunders CL, Abel GA, El Turabi A, Ahmed F, Lyratzopoulos G. Accuracy of routinely recorded ethnic group information compared with self-reported ethnicity: evidence from the English Cancer Patient Experience survey. BMJ open. 2013;3(6).
3The auscultatory method involves using a stethoscope, whereas the oscillometric method uses an electronic device.
4The body mass index (BMI) is a measure that uses your height and weight to work out if your weight is healthy.