This project comprises a set of R packages to assist in epidemiological studies using electronic health records databases.
CALIBER (http://caliberresearch.org/) is led from the Farr Institute @ London. CALIBER investigators represent a collaboration between epidemiologists, clinicians, statisticians, health informaticians and computer scientists with initial funding from the Wellcome Trust and the National Institute for Health Research.
The goal of CALIBER is to provide evidence across different stages of translation, from discovery, through evaluation to implementation where electronic health records provide new scientific opportunities.
Download R packages here: http://r-forge.r-project.org/R/?group_id=1598
Identifying patients with particular medical diagnoses in electronic health record data requires an algorithm to select the appropriate diagnostic codes. Research groups may accumulate a large number of code lists for different medical conditions, and the CALIBERcodelists package contains functions for creating codelists and storing them in a standardised format with metadata such as the authors and version number.
CALIBER investigators can use this package in conjunction with the CALIBERlookups package which contains the source dictionaries; other researchers can use the scripts provided to create lookup tables from the official sources of the Read, ICD-10, OPCS and CPRD dictionaries.
The CALIBER data management package includes functions to:
The CALIBER drug dose algorithm converts unstructured text dosage instructions into a structured format.
The Freetext Matching Algorithm is a natural language processing system for analysing clinical free text, and is available from the freetext-matching-algorithm GitHub repository. It uses lookup tables from the freetext-matching-algorithm-lookups GitHub repository. This R package provides an interface to the program, as well as tools to help manipulate the lookup tables. It is currently only availble for Linux systems as it requires wine and git.
Missing data are frequently handled by multiple imputation, but parametric imputation methods may lead to biased results if the imputation method is incorrectly specified. Random Forest is a non-parametric prediction method which can handle non-linearities and interactions in a flexible way.
The CALIBERrfimpute package contains novel imputation functions using Random Forest within the framework of Multivariate Imputation by Chained Equations.
An alternative Random Forest imputation algorithm was developed by Doove et al. and is available in the MICE package; our vignette in CALIBERrfimpute compares the two methods.
Link to project summary page: http://r-forge.r-project.org/projects/caliberanalysis/
Download R packages here: http://r-forge.r-project.org/R/?group_id=1598