

A team from Population Data Science at Swansea University, the University of Liverpool and the Cystic Fibrosis Trust spoke to us about their recent collaborative paper published in the International Journal of Population Data Science and its significance towards future treatment opportunities within cystic fibrosis.
Cystic fibrosis (CF) is one of the UK’s most common life-threatening inherited diseases, that primarily affect the respiratory and digestive systems. CF affects everyone differently, but for many, it involves a rigorous daily treatment regime including physiotherapy, oral, nebulised and occasionally intravenous antibiotics, and taking enzyme tablets with food. About 12 to 14 children are born with CF in Wales every year and around 10,500 people in the UK are currently living with the condition.
The UK CF Registry – Rebecca Cosgriff, Cystic Fibrosis Trust

The UK CF Registry records health data on consenting people with CF in England, Northern Ireland, Scotland and Wales and is managed by the Cystic Fibrosis Trust. The UK CF Registry captures 99% of the UK CF population.
The data held by the Registry is collected at annual review visits at specialist CF care centres across the UK dating back to 1996. The data include demographic, treatment and health outcomes for people with CF.
The purpose of the UK CF Registry is to improve the health of people with CF. It provides an overview of CF, helping people to understand their condition and giving clinical teams evidence to improve the quality of care. Data from the UK CF Registry are used for a range of purposes including helping commissioners provide appropriate funding to NHS centres, and monitoring the safety and effectiveness of new treatments. These anonymised data are used by researchers, after completing a rigorous assessment of the research they’d like to conduct, to identify trends and patterns that tell us new things about CF and potentially better ways to support people in the future.
In 2015 the Cystic Fibrosis Trust awarded a ‘CF-EpiNet’ Strategic Research Centre (SRC) grant of £750,000 to a multidisciplinary group of investigators, with the aim of the programme to enhance and harness the data with the UK CF Registry, to make it a richer resource to inform future care of people with CF, to perform state-of-the-art statistical modelling on data on the same group of people over time; to identify changes in care that could make a difference to people with CF and to assess the health economics of CF care.
One method to enhance the existing data collection found within the UK CF Registry was to link with population-scale data stored and accessed within the SAIL Databank (SAIL), which contains 100% secondary care and 80% primary care data coverage for the population of Wales.
Creating a richer source of data – Daniela Schlueter, University of Liverpool and Rowena Griffiths, Swansea University – Co-authors of the paper.
Rowena Griffiths, Population Data Science at Swansea University Daniela Schlueter, University of Liverpool
The UK CF Registry is an incredible resource for understanding the natural progression of CF, identifying factors that may impact on the disease course, and for evaluating the effectiveness of treatments and the quality of care people receive.
However, it is not possible to capture everything about an individual’s health within a register. For example, in an earlier study we were interested in determining whether babies with cystic fibrosis were born lighter on average than babies without CF and if so, to what extent this could be explained by them being born earlier. At the time of starting the study, the UK CF Registry data was not available in SAIL and birth weight was not collected in the register. We therefore used electronic health records (eHR) in the SAIL Databank to identify a cohort of children with CF born in Wales and to study the effect of CF on birthweight. But we were aware that the number of children we had identified was larger than the expected number of children with CF in Wales; therefore some children had to be misclassified as having CF when indeed they didn’t.
By acquiring and linking the UK CF Registry into the SAIL Databank we were able to identify which children in our eHR cohort truly had CF and which were misclassified. We found that 257 out of 352 were true cases; only 11 children with CF were missing from our eHR cohort. If children had had CF codes in multiple eHR data sources (primary care, secondary care or the congenital anomaly register), they were more likely to be true cases but there were some that had been correctly identified as having CF based on a CF code in one data source only. These findings are important if people want to use eHR data alone for CF research for a couple of reasons:
1. Even for a well-defined disease such as CF, it is difficult to correctly identify people with CF in eHR data.
2. Using only one data source, such as secondary care (hospital) records, will likely misclassify some people and not identify others.
The anonymised, linked UK CF Registry and eHR data in the SAIL Databank are available for other researchers to access and use. This allows future detailed research into the health and health care utilisation of people with CF beyond what is possible with the register alone, but with the protection against misclassification of cases, one of the pitfalls of eHR research.
Data available for researchers to use – Ashley Akbari, Senior Research Manager and Data Scientist at Population Data Science, Swansea University

Linking the UK CF Registry with population-scale data available within the SAIL Databank has resulted in an enhanced combined resource for CF intelligence and research, which provides a more complete and quality assured resource of both eHR and register data, which all CF researchers, health care professionals and most importantly people with CF will benefit from in the future.
The lessons learned from this study can also be applied to other rare diseases and we hope to build on the foundations already established and progress in collaboration with our partners to undertake further linkage of these data to more data sources and a wider coverage across the UK, to provide a richer, fuller picture of CF care and services.
Data is held within the privacy-protecting Secure Research Platform (SeRP) based at Population Data Science at Swansea University. SAIL has a strict set of policies, structures and control practices in place to protect privacy through a matching, anonymisation and encryption process.
If you are interested in speaking with the authors of the paper, or about accessing the UK CF Registry and other linked eHR data in the SAIL Databank, please contact the SAIL Team.
Visit IJPDS to read the full paper on Identifying children with Cystic Fibrosis in population-scale routinely collected data in Wales
A note on the SAIL Databank:
The research was enabled by the Secure Anonymised Information Linkage (SAIL) Databank at Swansea University, a safe haven for billions of person-based records. It can be used to shed new light on important areas of public health and public service. The SAIL Databank holds linkable datasets, and all data that can be accessed for research is anonymous – meaning that the identities of the individuals represented are not known. This is particularly important when working with vulnerable and young people. Information is also available for the entire population of Wales, so it can be used to understand more about people who may not be easily reached through surveys or other research methods. //://saildatabank.com/