Listen to the Podcast
Prefer audio? Listen to this article as a conversational podcast (~18 min):
Download MP3 View Transcript
Picture this: You’re a hospital administrator who just received a troubling report. Three patients developed aortic aneurysms after taking a common antibiotic. Is this a coincidence, or have you stumbled onto a serious safety signal?
To answer this question, you’d need to compare your data with hundreds of other hospitals. But here’s the problem—every hospital stores data differently. One uses ICD-10 codes, another SNOMED-CT. One stores medications by brand name, another by generic. It’s like trying to compare apples and oranges, except the apples are labeled in French and the oranges in Mandarin.
This is where OHDSI changes everything.
In this post, we’ll explore how a global community of 4,751+ collaborators across 88 countries built a “universal translator” for healthcare data, enabling researchers to answer questions that would be impossible to tackle alone.
What is OHDSI?
Think of OHDSI like the United Nations of healthcare data. Just as the UN brings countries together with a common framework for diplomacy, OHDSI brings healthcare organizations together with a common framework for data analysis.
Ecosystem)) Data Standards OMOP CDM Person Table Visit Table Condition Table Drug Table Procedure Table Measurement Table Vocabularies SNOMED-CT RxNorm LOINC ICD-10 CPT Methods Characterization Patient Profiles Disease Natural History Estimation Comparative Effectiveness Safety Studies Prediction Risk Models ML/AI Tools Athena Browse Concepts Download Vocabularies Atlas Cohort Builder Study Design HADES R Packages Analysis Pipeline Strategus Orchestration Reproducibility Community Working Groups CDM Vocabularies HADES Phenotypes Events Symposium Tutorials Chapters
Key Facts
| Metric | Value |
|---|---|
| Founded | 2014 |
| Collaborators | 4,751+ |
| Countries | 88 |
| Time Zones | 21 |
| Continents | 6 |
| Central Hub | Columbia University |
The Mission
Mission: Improve health by empowering a community to collaboratively generate the evidence that promotes better health decisions and better care.
The Data Standardization Problem
Here’s the thing that confused me for weeks when I first started learning about healthcare interoperability: why can’t hospitals just share data?
The answer is painfully simple once you understand it.
The Electrical Outlet Analogy
Imagine traveling internationally with your laptop. Your laptop (the analytical method) works the same everywhere—it’s designed to run identically regardless of location. But every country has different electrical outlets (the data).
Without OMOP, you need a different adapter for every outlet. With OMOP, you have a universal adapter that works everywhere.
(Epic, Cerner)"] Claims["Claims Data"] Registry["Registries"] end subgraph ETL["ETL Process"] Extract["Extract"] Transform["Transform"] Load["Load"] end subgraph Vocab["Standardized Vocabularies"] Athena["Athena
12M+ Concepts"] Mapping["Source → Standard
Concept Mapping"] end subgraph CDM["OMOP CDM (Common Format)"] Person["Person"] Visit["Visit"] Condition["Condition"] Drug["Drug"] Procedure["Procedure"] Measurement["Measurement"] end subgraph Analysis["Analysis Tools"] Atlas["Atlas
(Web UI)"] HADES["HADES
(R Packages)"] end subgraph Output["Evidence"] Characterization["Characterization"] Estimation["Estimation"] Prediction["Prediction"] end EHR --> Extract Claims --> Extract Registry --> Extract Extract --> Transform Transform --> Load Athena --> Mapping Mapping --> Load Load --> Person Load --> Visit Load --> Condition Load --> Drug Load --> Procedure Load --> Measurement Person --> Atlas Visit --> Atlas Condition --> Atlas Drug --> Atlas Atlas --> HADES HADES --> Characterization HADES --> Estimation HADES --> Prediction
Real-World Mapping Example
Let me show you what this actually looks like. Say Hospital A and Hospital B both want to analyze their use of Ciprofloxacin (a common antibiotic).
Code: 2551"] end subgraph HospB["Hospital B (Uses HMOG)"] CodeB["Ciprofloxacin
Code: 104"] end subgraph OMOP["OMOP Standard"] Standard["Standard Concept ID
1797513
(Ciprofloxacin)"] end subgraph Shared["Shared Analysis"] Compare["✓ Can Compare
✓ Can Share
✓ Can Analyze"] end CodeA -->|"Map"| Standard CodeB -->|"Map"| Standard Standard --> Compare
The same drug has code “2551” in Hospital A and code “104” in Hospital B. But after mapping to OMOP, both become Concept ID 1797513. Now they can be analyzed together.
The OMOP Common Data Model
Okay, now things get interesting. Let’s look at how the OMOP CDM actually works.
Core Principles
The CDM was built around these foundational ideas:
| Principle | What It Means | Why It Matters |
|---|---|---|
| Patient-Centric | Everything revolves around the patient | Not for billing—purely for research |
| Standard Vocabulary | Common terminology for all data | Enables cross-institutional comparison |
| Domain-Oriented | Concepts organized by clinical domain | Intuitive navigation |
| Source Preservation | Original codes are kept | Can trace back if needed |
| Extensible | Can evolve and add features | Future-proof design |
| Database Independent | Works with any database | PostgreSQL, SQL Server, Oracle, etc. |
The Table Structure
The PERSON table is central—everything connects back to it. This patient-centric design is what makes OMOP different from billing-focused data models.
Standardized Vocabularies
This is where OHDSI gets really powerful. The vocabularies are a massive “Rosetta Stone” for healthcare data.
By the Numbers
| Metric | Count |
|---|---|
| Total Concepts | 12+ million |
| Vocabularies | 142 |
| Domains | 44 |
| Concept Relationships | 80+ million |
Key Vocabulary Sources
| Vocabulary | Domain | Examples |
|---|---|---|
| SNOMED-CT | Conditions | Diseases, findings |
| RxNorm | Drugs | Medications |
| LOINC | Measurements | Lab tests |
| ICD-10 | Conditions | Diagnosis codes |
| CPT | Procedures | Procedure codes |
Concept ID vs Concept Code
This distinction tripped me up at first. Let me explain:
| Attribute | Concept ID | Concept Code |
|---|---|---|
| Source | OHDSI-assigned | External (SNOMED, RxNorm, etc.) |
| Uniqueness | Globally unique | Unique within vocabulary only |
| Format | Integer | String |
| Example | 1797513 | “2551” |
Why this matters: The same concept code “2551” might mean different things in different vocabularies. The Concept ID ensures uniqueness across the entire OMOP ecosystem.
Study Types
OHDSI supports three main types of studies, each answering a different kind of question:
14 OMOP Sites"] Phenotype["🎯 Phenotypes
Cohort Definitions"] Design["📐 Study Design
Estimation"] Methods["📈 Methods
Propensity Scores"] Tools["🔧 Tools
Atlas, HADES"] end subgraph Study["Fluoroquinolone Study"] Question["Does fluoroquinolone
increase aortic
aneurysm risk?"] Results["📋 Results"] end DB --> Question Phenotype --> Question Design --> Question Methods --> Question Tools --> Question Question --> Results
1. Characterization
Question: “Who are these patients?”
Examples:
- What are the demographics of diabetic patients?
- What medications do heart failure patients take?
- What’s the natural history of Parkinson’s disease?
2. Estimation (Population-Level Effect)
Question: “Does treatment A cause outcome B?”
Template: “Does exposure to [TREATMENT] have different risk of [OUTCOME] within [TIME] vs [COMPARATOR]?”
Examples:
- Do ACE inhibitors reduce stroke risk vs ARBs?
- Does metformin cause lactic acidosis?
- Do fluoroquinolones increase aortic aneurysm risk?
3. Prediction (Patient-Level)
Question: “What will happen to this specific patient?”
Examples:
- What’s this patient’s 5-year heart attack risk?
- Will this patient be readmitted within 30 days?
- What’s the probability of treatment response?
The OHDSI Tool Ecosystem
Here’s where it gets fun—the tools that make all of this possible.
athena.ohdsi.org
───────────
• Browse Concepts
• Download Vocabs
• 12M+ Concepts"] end subgraph Design["Design Layer"] Atlas["🎨 Atlas
atlas.ohdsi.org
───────────
• Cohort Builder
• Study Design
• Concept Sets"] end subgraph Analysis["Analysis Layer"] HADES["📦 HADES
R Packages
───────────
• CohortGenerator
• CohortMethod
• PatientLevelPred"] Strategus["⚙️ Strategus
Pipeline
───────────
• Orchestration
• Reproducibility
• Automation"] end subgraph QA["Quality Layer"] DQD["📊 Data Quality
Dashboard
───────────
• ETL Validation
• Issue Detection
• Quality Metrics"] end end Athena --> Atlas Atlas --> HADES HADES --> Strategus DQD --> Atlas
Tool Summary
| Tool | Purpose | URL |
|---|---|---|
| Athena | Vocabulary browser and download | athena.ohdsi.org |
| Atlas | Web-based study design and cohort building | atlas.ohdsi.org |
| HADES | R packages for analysis | ohdsi.github.io/Hades |
| Strategus | Study execution pipeline | Part of HADES |
| Data Quality Dashboard | ETL validation | ohdsi.github.io/DataQualityDashboard |
Real-World Example: The Fluoroquinolones Study
Let me bring this all together with a real example.
The Question: Do fluoroquinolones (common antibiotics like Ciprofloxacin) increase the risk of aortic aneurysm or dissection?
The Approach:
- 14 databases converted to OMOP CDM participated
- Phenotypes defined patients exposed to fluoroquinolones and patients with UTIs
- Study design compared fluoroquinolone users to other UTI treatments
- Methods used propensity score matching to control for confounding
- Tools: Atlas for cohort definition, HADES for analysis
This kind of study would be nearly impossible without OHDSI. Each hospital would need its own study team, and results couldn’t be directly compared.
Getting Started
Ready to join the community? Here’s how:
Ways to Participate
| Level | Description | Time Commitment |
|---|---|---|
| Observer | Join working groups, listen, learn | 1-2 hours/month |
| Consumer | Use OHDSI tools and data | As needed |
| Contributor | Convert data, run studies | 10+ hours/month |
| Developer | Build tools, improve packages | Ongoing |
| Leader | Lead working groups, mentor others | Significant |
Essential Resources
| Resource | URL |
|---|---|
| Main Website | ohdsi.org |
| Vocabularies | athena.ohdsi.org |
| Web Tools | atlas.ohdsi.org |
| R Packages | ohdsi.github.io/Hades |
| Documentation | ohdsi.github.io/CommonDataModel |
| Community | forums.ohdsi.org |
| Book of OHDSI | ohdsi.github.io/TheBookOfOhdsi |
Key Takeaways
- OHDSI is a global community of 4,751+ collaborators working to standardize healthcare data analysis
- The OMOP CDM is the universal “adapter” that allows different healthcare systems to share and compare data
- Standardized vocabularies (12M+ concepts) provide a common language for clinical concepts
- Three study types (Characterization, Estimation, Prediction) cover the major research questions
- Open-source tools (Athena, Atlas, HADES) make analysis accessible to everyone
Learning Resources
Study Materials
Test your understanding and reinforce key concepts:
- Flashcard Deck - 30 cards for spaced repetition learning
- Self-Assessment Quiz - 30 questions (multiple choice, true/false, short answer)
- Presentation Slides - 27 slides for teaching or review
References & Resources
Official OHDSI Resources
| Resource | Description | Link |
|---|---|---|
| OHDSI Website | Main community hub | ohdsi.org |
| OHDSI 2025 Symposium | Annual global conference | ohdsi.org/ohdsi2025 |
| OHDSI GitHub | Open-source code repositories | github.com/OHDSI |
| The Book of OHDSI | Comprehensive guide (free online) | ohdsi.github.io/TheBookOfOhdsi |
| OHDSI Forums | Community Q&A and discussions | forums.ohdsi.org |
| OHDSI YouTube | Tutorials and presentations | youtube.com/@OHDSI |
Tools & Documentation
| Tool | Purpose | Link |
|---|---|---|
| Athena | Browse and download vocabularies | athena.ohdsi.org |
| Atlas | Web-based cohort building and study design | atlas-demo.ohdsi.org |
| HADES | Health Analytics Data-to-Evidence Suite (R packages) | ohdsi.github.io/Hades |
| OMOP CDM | Common Data Model documentation | ohdsi.github.io/CommonDataModel |
| Data Quality Dashboard | Validate your OMOP CDM conversion | ohdsi.github.io/DataQualityDashboard |
| ACHILLES | Database characterization and visualization | ohdsi.github.io/Achilles |
Standardized Vocabularies
| Vocabulary | Domain | Link |
|---|---|---|
| SNOMED-CT | Clinical findings, diseases, procedures | snomed.org |
| RxNorm | Medications and drug products | nlm.nih.gov/rxnorm |
| LOINC | Laboratory tests and measurements | loinc.org |
| ICD-10 | Diagnosis classification | who.int/icd |
| CPT | Procedure codes | ama-assn.org/cpt |
Getting Involved
| Activity | Description | Link |
|---|---|---|
| Working Groups | Join specialized community groups | ohdsi.org/working-groups |
| Regional Chapters | Connect with local OHDSI communities | ohdsi.org/regional-chapters |
| Study-a-thons | Collaborative research events | ohdsi.org/studyathon |
| OHDSI Network Studies | Participate in global research | ohdsi.org/network-studies |
Key GitHub Repositories
| Repository | Description |
|---|---|
| OHDSI/CommonDataModel | OMOP CDM specifications and DDL scripts |
| OHDSI/Athena | Vocabulary download and management |
| OHDSI/Atlas | Web application for cohort building |
| OHDSI/WebAPI | Backend services for Atlas |
| OHDSI/Hades | R package ecosystem for analysis |
| OHDSI/CohortMethod | Comparative effectiveness studies |
| OHDSI/PatientLevelPrediction | Machine learning for patient outcomes |
| OHDSI/ETL-Synthea | Convert Synthea data to OMOP CDM |
The next time you see a medication order or lab result in a patient chart, you’ll know there’s an entire global infrastructure working behind the scenes to make that data meaningful—not just for one patient, but for millions of patients around the world.
Source: OHDSI Introduction Tutorial (YouTube)
Generated using the Feynman Technique: Complex concepts explained simply, with analogies and practical examples.