R-Forge Logo

CALIBER health records research toolkit

This project comprises a set of R packages to assist in epidemiological studies using electronic health records databases.

CALIBER (http://caliberresearch.org/) is led from the Farr Institute @ London. CALIBER investigators represent a collaboration between epidemiologists, clinicians, statisticians, health informaticians and computer scientists with initial funding from the Wellcome Trust and the National Institute for Health Research.

The goal of CALIBER is to provide evidence across different stages of translation, from discovery, through evaluation to implementation where electronic health records provide new scientific opportunities.

Download R packages here: http://r-forge.r-project.org/R/?group_id=1598


Identifying patients with particular medical diagnoses in electronic health record data requires an algorithm to select the appropriate diagnostic codes. Research groups may accumulate a large number of code lists for different medical conditions, and the CALIBERcodelists package contains functions for creating codelists and storing them in a standardised format with metadata such as the authors and version number.

CALIBER investigators can use this package in conjunction with the CALIBERlookups package which contains the source dictionaries; other researchers can use the scripts provided to create lookup tables from the official sources of the Read, ICD-10, OPCS and CPRD dictionaries.


The CALIBER data management package includes functions to:

  1. Import data - Import single or multiple files to data.table or ffdf objects in R, with automatic unzipping of compressed files and conversion of dates, and decoding using lookup tables in the CALIBERlookups package.
  2. Build cohorts - A 'cohort' S3 class to store information about a cohort, and functions for generating analysis variables from multiple row per patient data.
  3. Create presentation tables - Produce summary tables in LaTeX or plain text, and format numbers and percentages.
  4. Make forest plots - Produce forest plots using a spreadsheet template, with the facility to include several plots side by side, and specify the formatting of text.


The CALIBER drug dose algorithm converts unstructured text dosage instructions into a structured format.


  1. Shah AD, Martinez C. Pharmacoepidemiology and Drug Safety 2006; 15: 161-166. doi: 10.1002/pds.1151


The Freetext Matching Algorithm is a natural language processing system for analysing clinical free text, and is available from the freetext-matching-algorithm GitHub repository. It uses lookup tables from the freetext-matching-algorithm-lookups GitHub repository. This R package provides an interface to the program, as well as tools to help manipulate the lookup tables. It is currently only availble for Linux systems as it requires wine and git.


Missing data are frequently handled by multiple imputation, but parametric imputation methods may lead to biased results if the imputation method is incorrectly specified. Random Forest is a non-parametric prediction method which can handle non-linearities and interactions in a flexible way.

The CALIBERrfimpute package contains novel imputation functions using Random Forest within the framework of Multivariate Imputation by Chained Equations.

An alternative Random Forest imputation algorithm was developed by Doove et al. and is available in the MICE package; our vignette in CALIBERrfimpute compares the two methods.


  1. Shah AD, Bartlett JW, Carpenter J, Nicholas O, Hemingway H. Comparison of Random Forest and parametric imputation models for imputing missing data using MICE: a CALIBER study. American Journal of Epidemiology 2014. doi: 10.1093/aje/kwt312
  2. Doove LL, van Buuren S, Dusseldorp E. Recursive partitioning for missing data imputation in the presence of interaction effects. Computational Statistics and Data Analysis 2014;72:92--104. doi: 10.1016/j.csda.2013.10.025

Link to project summary page: http://r-forge.r-project.org/projects/caliberanalysis/

Download R packages here: http://r-forge.r-project.org/R/?group_id=1598