Over time, several research scientists have asked me whether there are any open access databases of de-identified laboratory data coded with LOINC. Clinical laboratory data are used in a majority of medical decisions. They are also valuable in measuring the quality of care, for public health and surveillance, and in cost-effectiveness studies.

Large, openly-available data sources could enable more of these kinds of analyses. Yet, medical privacy is a core tenet of healthcare. De-identification of protected health information according to HIPAA can be a challenging task. And the rapid growth of genetic laboratory testing raises additional questions and challenges for de-identification.

The underlying research question will drive whether any particular data set is useful, and scientists may have different collaboration possibilities available to them that can open doors. Here are some opportunities and directions for researchers to pursue when looking for de-identified laboratory data.

MIMIC

The first resource to mention is the MIMIC database, an openly available critical care dataset. Now in it’s third iteration, MIMIC-III (Medical Information Mart for Intensive Care) is a large, single-center database that holds information about patients admitted to critical care units at a large hospital. The database contains a wealth of health information, including: laboratory tests, vital signs, medications, procedure codes, imaging reports, fluid balance, etc. It is available for use in research, quality improvement, or education.

MIMIC is a truly unique resource. I’m not aware of anything similar. In its original form, the catalog of test variables contained only local observation codes. While she was at the NLM, my LOINC colleague Swapna Abhyankar led an effort to standardize the MIMIC data to LOINC. The LOINC mappings are now available in the MIMIC distribution.

Research Networks

Large scale research networks have blossomed in the last few years. One reason is the technologic advances in informatics tools and common data models. Another is that they have been spurred by funding opportunities. Examples include the Clinical and Translational Science Awards (CTSA) Program, the FDA’s Mini-Sentinel project, PCORnet, and Observational Health Data Sciences and Informatics (OHDSI).

Depending on your institution, if you have joined one of these initiatives you may already have access to more resources than you were aware of. As a researcher, the data providers fulfill the role of a “safe harbor” or “honest broker” for the data, so researchers only have access to de-identified data.

The secret to success of these networks is that the participating institutions commit to adopting a common technology platform. Many CTSA institutions use i2b2, Mini-Sentinel uses its own data model and the open source PopMedNet query tool. PCORnet created a common data model based on (but not the same as) the Mini-Sentinel approach.

OHDSI

OHDSI has a similar approach to these other networks in that its participants use a common data model (based on the OMOP data model) and shared set of technologies for data analysis. Hripcsak et al have also written a nice paper describing ODHSI’s vision and the opportunities available to researchers.

What’s unique about OHDSI is that it is an open community. You don’t need to have been selected by the grant/contract process. Their core objectives include, among others, these three great principles that match very closely to how we’re developing LOINC:

Community: Everyone is welcome to actively participate in OHDSI, whether you are a patient, a health professional, a researcher, or someone who simply believes in our cause.
Collaboration: We work collectively to prioritize and address the real world needs of our community’s participants.
Openness: We strive to make all our community’s proceeds open and publicly accessible, including the methods, tools and the evidence that we generate.

In addition to their common technology stack for running queries at distributed sites, OHDSI has developed an impressive set of software tools for data analytics, including clinical characterization, population level estimation, and patient prediction. All these tools are made freely available on GitHub under an open source license.

As a participant in OHDSI, it is possible to run the same analysis at multiple sites, over many data sets. The scope of data available is far greater than mosts scientists have access to within just their home institution. The Data Network page gives an overview of the data from participating institutions.

OHDSI uses vocabulary standards like LOINC to accomplish the data normalization needed for such analyses. You can read more about the approach they’ve taken here.

De-identifying your own data sets

You may also be considering de-identifying your own data to better support re-use. Much laboratory data is sent (and stored) as discrete results, which makes de-identification slightly easier. Identifying information in key fields (e.g. patient name) can be excluded.

However, some laboratory tests results are still sent as narrative text. This is especially true in the emerging field of genetic testing, which presents another set of challenges for protecting privacy.

If you are interested in de-identifying clinical test, check out the NLM Scrubber software program. You could use such a program on a clinical data set in order to more easily justify to the IRB its use for research purposes.

Wrap-up

The promise of large-scale observational health databases is now a reality for medical researchers. With standardized laboratory data coded with LOINC, researchers can advance health by generating scientific evidence about disease history, healthcare delivery, the effects of interventions, and the countless other questions.

You can’t use “but I don’t have access to…” as an excuse any longer. Go forth and analyze!

References

{1095885:JQB83Z5V};{1095885:ZW6MF5UT};{1095885:6AJSAE2R};{1095885:JQSEBIRR};{1095885:QE4QA7RJ};{1095885:WPP76RHN};{1095885:ZW6MF5UT} nature default asc 0 2133

Fecho, K., Garcia, J. J., Yi, H., Roupe, G. & Krishnamurthy, A. FHIR PIT: a geospatial and spatiotemporal data integration pipeline to support subject-level clinical research. BMC Med Inform Decis Mak 25, 24 (2025).

Cheng, K. Y., Böhm, R., Bulin, C., Jandok, B. & Schreiweis, B. Quality Assessment Framework of Clinical Routine Data for Secondary Use. Stud Health Technol Inform 316, 100–104 (2024).

Dörenberg, J. Synoptic reporting : Chances and challenges for secondary use of pathology data in biobanking. Pathologie (Heidelb) https://doi.org/10.1007/s00292-025-01505-y (2025) doi:10.1007/s00292-025-01505-y.

Hornback, A. et al. FHIR in Focus: Enabling Biomedical Data Harmonization for Intelligent Healthcare Systems. IEEE Rev Biomed Eng PP, (2025).

Marques, M. et al. The B-Health Box: A Standards-Based Fog IoT Gateway for Interoperable Health and Wellbeing Data Collection. Sensors (Basel) 25, 7116 (2025).

Hsieh, C.-Y. et al. Taiwan’s National Health Insurance Research Database (NHIRD): in the Era of Artificial Intelligence, Causal Inference, and Data Security. Clin Epidemiol 17, 967–981 (2025).

Mantri, M., Satokar, S., Tambe, P. & Bhutad, C. FHIR Standard-Based Oncology Data Model for Cancer Screening: Design and Implementation Study. JMIR Cancer 11, e79011 (2025).

Casanova, R., Villa-Garzon, F. A. & Branch-Bedoya, J. W. Architectural patterns for health information systems: a systematic review. Front Digit Health 7, 1694839 (2025).

Hier, D. B., Carrithers, M. D., Do, T. S. & Obafemi-Ajayi, T. REMOTE: A Framework to Create Fast Healthcare Interoperability Resources (FHIR) from Unstructured Clinical Data. Annu Int Conf IEEE Eng Med Biol Soc 2025, 1–6 (2025).

Ghatage, R. et al. TRAI: An AI-Driven Mobile Application to Reduce the Gap Between Triage and Care. Annu Int Conf IEEE Eng Med Biol Soc 2025, 1–5 (2025).

Soumma, S. B. et al. Design and Implementation of a Scalable Clinical Data Warehouse for Resource-Constrained Healthcare Systems. Annu Int Conf IEEE Eng Med Biol Soc 2025, 1–7 (2025).

C, R. S. et al. A DAG-enabled cryptographic framework for secure drug traceability with identity-bound authentication and anomaly detection. Sci Rep https://doi.org/10.1038/s41598-025-30413-7 (2025) doi:10.1038/s41598-025-30413-7.

Mantri, M., Satokar, S., Tambe, P. & Bhutad, C. FHIR Standard-Based Oncology Data Model for Cancer Screening: Design and Implementation Study. JMIR Cancer 11, e79011 (2025).

Thayer, J. G. et al. Combining International Standards to Develop Clinical Decision Support for Parent Smoking Cessation in Pediatrics. J Med Internet Res 27, e75198 (2025).

King, A. J. et al. A FHIR-Powered Python Implementation of the SENECA Algorithm for Sepsis Subtyping. Appl Clin Inform 16, 1588–1594 (2025).

Nothacker, M. et al. Digitalisation of the guideline registry of the Association of Scientific Medical Societies in Germany for an open, guideline-based, trustworthy evidence ecosystem (Dissolve-E): a protocol of a before-after study with different user groups. BMJ Open 15, e095294 (2025).

Graefe, A. S. L. et al. RareLink: scalable REDCap-based framework for rare disease interoperability linking international registries to FHIR and Phenopackets. NPJ Genom Med 10, 72 (2025).

Wiedekopf, J., Ohlsen, T., Kock-Schoppenhauer, A.-K. & Ingenerf, J. BabelFSH-a toolkit for an effective HL7 FHIR-based terminology provision. J Biomed Semantics https://doi.org/10.1186/s13326-025-00343-4 (2025) doi:10.1186/s13326-025-00343-4.

Simjanoska Misheva, M. et al. AI Act Compliance Within the MyHealth@EU Framework: Tutorial. J Med Internet Res 27, e81184 (2025).

Hohenstein, B., Binder, T. & Kramann, R. [AI Application in Nephrological Diagnostics]. Dtsch Med Wochenschr 150, 1403–1410 (2025).

Finster, M., Wenzel, M. & Taghizadeh, E. Common data models and data standards for tabular health data: a systematic review. BMC Med Inform Decis Mak 25, 422 (2025).

Braunstein, M., Dobbins, C., Steel, J. & Hansen, D. FHIR Project-Based Training for Australia’s Digital Health Workforce. Stud Health Technol Inform 333, 8–13 (2025).

Barbaria, S. et al. Advancing Compliance with HIPAA and GDPR in Healthcare: A Blockchain-Based Strategy for Secure Data Exchange in Clinical Research Involving Private Health Information. Healthcare (Basel) 13, 2594 (2025).

Adegoke, K. et al. Interoperability as a Catalyst for Digital Health and Therapeutics: A Scoping Review of Emerging Technologies and Standards (2015-2025). Int J Environ Res Public Health 22, 1535 (2025).

Cheng, A. C. et al. Opportunities, barriers, and remedies for implementing REDCap integration with electronic health records via Fast Healthcare Interoperability Resources (FHIR). JAMIA Open 8, ooaf111 (2025).

Nopour, R. Using FHIR for data sharing: A scoping review of challenges and facilitators in healthcare settings. Int J Med Inform 205, 106128 (2026).

Engelke, M., Baldini, G., Kleesiek, J., Nensa, F. & Dada, A. FHIR-Former: enhancing clinical predictions through Fast Healthcare Interoperability Resources and large language models. J Am Med Inform Assoc ocaf165 (2025) http://doi.org/10.1093/jamia/ocaf165.

Liu, S. et al. A standard-based taxonomy of features that affect user response to clinical decision support alerts. BMC Med Inform Decis Mak 25, 389 (2025).

Abedian, S., Yesakov, E., Ostrovskiy, S. & Hussein, R. Streamlining wearable data integration for EHDS: a case study on advancing healthcare interoperability using Garmin devices and FHIR. Front Digit Health 7, 1636775 (2025).

Borys, K. et al. DermaDashboard: Bridging the Gap Between FHIR Standards and Clinical Usability. JMIR Cancer 11, e73691 (2025).

Alnuaimi, M. K. Integrating Wearable Sensor Data With an AI-Based, Protocol-Flexible Triage Platform to Accelerate Decision-Making During the Golden Hour of Combat Casualty Care. Cureus 17, e91121 (2025).

Beyer, S. et al. Preparing for the European Health Data Space: an open-source compiler for fast, transparent, and portable health data transformations. Front Med (Lausanne) 12, 1661091 (2025).

Pelka, O. et al. Democratizing AI in Healthcare with Open Medical Inference (OMI): Protocols, Data Exchange, and AI Integration. Rofo https://doi.org/10.1055/a-2651-6653 (2025) doi:10.1055/a-2651-6653.

Tomasik, R. et al. Definitions to data flow: Operationalizing MIABIS in HL7 FHIR. J Biomed Inform 104919 (2025) http://doi.org/10.1016/j.jbi.2025.104919.

Sayeed, R. et al. A standards-based approach to digital health research: implementing the people heart study. J Am Med Inform Assoc ocaf163 (2025) http://doi.org/10.1093/jamia/ocaf163.

Montomoli, J. et al. [Not Available]. Recenti Prog Med 116, 581–582 (2025).

De Angelis, P. et al. [Not Available]. Recenti Prog Med 116, 601–602 (2025).

Berens, B., Grüger, J., Poschen, C. & Knorr, K. A FHIR Specification to Formalize Cohort Definitions. Stud Health Technol Inform 332, 165–169 (2025).

Abedian, S., Yesakov, E., Ostrovskiy, S. & Hussein, R. Integrating Garmin Wearable Data into FHIR-Based Health Systems for Improved Interoperability. Stud Health Technol Inform 332, 185–189 (2025).

Avakian, A. Closing the Loop: A Software-Based Middleware Framework for Automated Vital Sign Integration With Cloud-Based Electronic Medical Records (EMRs). Cureus 17, e90513 (2025).

Gershkovich, P. Wearing a fur coat in the summertime: Should digital pathology redefine medical imaging? J Pathol Inform 18, 100450 (2025).

Felbel, D. et al. The ‘Advancing Cardiovascular Risk Identification with Structured Clinical Documentation and Biosignal Derived Phenotypes Synthesis’ project: conceptual design, project planning, and first implementation experiences. Eur Heart J Digit Health 6, 1084–1093 (2025).

Simjanoska Misheva, M. et al. AI Act Compliance within the MyHealth@EU Framework: A Tutorial. J Med Internet Res https://doi.org/10.2196/81184 (2025) doi:10.2196/81184.

Wang, J.-F. et al. Leveraging EHR Data and Up-to-Date Clinical Guidelines for Highly Accurate and Practical Clinical Diabetes Drug and Dosage Recommendation System. Methods Inf Med https://doi.org/10.1055/a-2707-2862 (2025) doi:10.1055/a-2707-2862.

Hwang, J. et al. Building a Standardized Cancer Synoptic Report With Semantic and Syntactic Interoperability: Development Study Using SNOMED CT and Fast Healthcare Interoperability Resources (FHIR). JMIR Med Inform 13, e76870 (2025).

von Dincklage, F. et al. Computer-Interpretable Quality Indicators for Intensive Care Medicine: Development and Validation Study. J Med Internet Res 27, e77077 (2025).

Katsch, F., Mészáros, Á., Héja, T., Hussein, R. & Duftschmid, G. Semiautomatic mapping of a national drug terminology to standardised OMOP drug concepts using publicly available supplementary information. BMC Med Res Methodol 25, 213 (2025).

Ambalavanan, R. et al. Ontologies as the semantic bridge between artificial intelligence and healthcare. Front Digit Health 7, 1668385 (2025).

Richardson, A. & Genyn, P. Clinical Trial Schedule of Activities Specification using FHIR Definitional Resources. JMIR Med Inform https://doi.org/10.2196/71430 (2025) doi:10.2196/71430.

Afshar, M. et al. A Novel Playbook for Pragmatic Trial Operations to Monitor and Evaluate Ambient Artificial Intelligence in Clinical Practice. NEJM AI 2, (2025).

Sources of de-identified laboratory data coded with LOINC