The aim of pharmacoepidemiology is to estimate the effects of medications on clinical outcomes. It’s therefore critical that exposures and outcomes are accurately captured in the data sources used for results to be valid. Electronic healthcare databases are increasingly being utilised, with UK primary care data being most widely used. However, these data are not collected for research purposes and only relate to health as observed, understood and documented in one section of the healthcare system. Therefore, researchers face potential misclassification bias, which traditional validation methods may not adequately identify or address. Linking multiple databases may offer new options for validation and potentially lead to reduced misclassification, but the benefits do not come without challenges.
The thesis aimed to explore and demonstrate use of Clinical Practice Research Datalink linked data for capturing exposure and outcomes in pharmacoepidemiological research. The aims were to: (1) demonstrate challenges in interpretation when using external data sources to validate unlinked data, (2) evaluate agreement between primary care, secondary care, registry and mortality data in capturing cancer diagnoses, and explore determinants of non-concordance, (3) conduct case studies to demonstrate use of linked data for optimal outcome ascertainment, increased case ascertainment, and the added value of additional variables, and (4) conduct case studies to demonstrate use of linked data to describe and assess the impact of drug treatment patterns.
In the discussion, researchers are recommended to consider whether a gold standard data source for their event of interest exists. Five questions are proposed for each available data source to guide the selection of a gold standard, or to select sources to compare and combine to increase case ascertainment. Considerations of the impact of linkage upon sample size, study period and resulting patient populations are raised. Practical issues of data access and reporting guidelines are discussed. Recommendations are made to data providers to support researchers considering use of linked data. Finally, policy makers are recommended to enable high quality research for the benefit of the population they serve when developing information and technology solutions.
In conclusion, linked data offer opportunities to assess agreement in recording across sources, enabling misclassification to be identified and reduced. Identification of a gold standard is challenging and may not be possible. Investigating data provenance and determinants of non-concordance will guide ways in which data should be combined for a given study. Outcome ascertainment will be optimised when a gold standard exists but can also be improved through data linkage that increases case-mix. Additional variables made available through linkage can be used to internally validate exposures and outcomes, adjust investigated associations for a broader range of covariates, and extend research questions beyond those that can be addressed using a single data source. Given the benefits that linked data can offer, it is critical that policy makers, data providers, the research community and the public work together to ensure the success of national linked data initiatives. Only by each playing their part will the jigsaw puzzle be solved.