Objective We describe an approach for modeling temporal relationships in a large scale association analysis of electronic health record data. The addition of temporal information can inform hypothesis generation and help to explain the relationships. We applied this approach on a dataset containing 41.2 million time-stamped International Classification of Diseases, Ninth Revision (ICD-9) codes from 1.6 million patients. Methods We performed two independent analyses including a pairwise association analysis using a χ^{2} test and a temporal analysis using a binomial test. Data were visualized using network diagrams and reviewed for clinical significance. Results We found nearly 400 000 highly associated pairs of ICD-9 codes with varying numbers of strong temporal associations ranging from ≥1 day to ≥10 years apart. Most of the findings were not considered clinically novel, although some, such as an association between Helicobacter pylori infection and diabetes, have recently been reported in the literature. The temporal analysis in our large cohort, however, revealed that diabetes usually preceded the diagnoses of H pylori , raising questions about possible cause and effect. Discussion Such analyses have significant limitations, some of which are due to known problems with ICD-9 codes and others to potentially incomplete data even at a health system level. Nevertheless, large scale association analyses with temporal modeling can help provide a mechanism for novel discovery in support of hypothesis generation. Conclusions Temporal relationships can provide an additional layer of meaning in identifying and interpreting clinical associations.
【저자키워드】 Data collection, Electronic health records, international classification of diseases, databases, factual, Data Mining/methods, Medical Records Systems, Computerized,