The most powerful health data standards are openly developed public goods. In crafting the Manifesto for Open Terminology Development we selected openness and accessibility as guiding principles. Yet in my conversations with standards development organizations who embrace these ideals, many still struggle with how to optimally license their standards. For example, there is discussion now within the UCUM governance committee on using a “familiar” license such as one from Creative Commons (CC). Similar discussions have percolated in many international standards communities.

pondering statue

These are critical decisions, because the license is the fundamental means of establishing how users can implement a standard. Open licensing of biomedical terminologies is crucial for open science. Wilson et al and Carbon et al both highlighted the importance of clear, open licensing for maximizing the benefits of data sharing.

Some biomedical ontology communities have embraced Creative Commons licenses. For example, the OBO Foundry requires included ontologies to be licensed with either the Creative Commons Attribution license (CC-BY) or placed into the public domain (CC0).

Yet the use of the Creative Commons licenses for biomedical terminology standards presents important issues. Here I detail the limitations, rationale, and recommendations for developing fit-for-purpose model license for biomedical terminologies.

Important Note: I am not a lawyer. I am fellow sojourner on the path towards an interoperable ecosystem where data is available with open standards that unlock the potential for information systems and applications to improve health decision-making and care. I’ve wrestled with these issues long enough to have some perspective to share. This post is for informational purposes only, and nothing is intended as legal advice in any capacity.

Common Principles

There are quite a few definitions of “open standards“. I’m not here to quibble about their differences. My purpose is to summarize a few shared principles that I’ve observed among organizations with an open approach to developing biomedical terminology standards that influence their licensing decisions.

These principles need not all be addressed through licensing, though licensing provides a legal basis for enforcement. Among the other approaches that can help accomplish these aims are governance of the standards development process, branding and marketing, certification, and facilitating the community of practice.

Integrity

Terminology standards are intended to create a shared, universal meaning of the concepts they represent. Therefore, codes, descriptions, and other normative content of the standard should not be changed by end users. Changes (updates, additions, etc) should be managed through the open (community-driven) standards development process for the public good.

Cooperation trumps competition

The point of working together to develop an open standard is to have one shared way of doing things. It really defeats the whole purpose of standardization if people take the standard, modify it, and then promulgate those variants for the same uses as the original.

Low barriers to use

Standards demonstrate the network effect and become more valuable as more people and systems use them. Therefore, biomedical terminology standards developers aim for low barriers to use, including incorporation into software products (whether commercial or noncommercial) and annotation of health data in datasets and exchanges of health information between systems.

Ecosystem

No single biomedical terminology standard covers everything, so they are often used together with others for different purposes. For example one terminology for diagnoses, another for genes, another for tests and measures, another for units of measure, and another for medications. Further, terminologies are used in the context of other exchange and syntax standards. Therefore, the global ecosystem needs to be able to use combinations of different standards.

Special concern for language translations

Language translations can facilitate use of standards across jurisdictions. Language translations are a special case of derivatives that require less “creativity” but detailed content knowledge and skill in both languages. Standards developers may wish to ensure that translations (but not other kinds of derivates) are made available under the same terms as the original standard.

Background Considerations

Use of a “general” or familiar license

Presently, many widely-used biomedical terminologies have their own proprietary licenses. While understandable, this can present challenges for the many users who need to use several of them. This is true for “primary” uses, such as health IT software used by clinicians, as well as “secondary” or “downstream” uses such as shareable data sets that are annotated with codes from these terminologies.

I believe there is high value in general, or familiar, licenses that have been encountered and understood outside of just biomedical ontologies. This is a laudable goal, and Creative Commons has made remarkable progress in advancing the current state of open content licensing. Scientists and developers who encounter custom licenses may not have the energy or resources to interpret the legal implications. Encountering that barrier, they are likely to just move on or develop their own approach rather than building on the standard.

If you’re not already familiar with the Creative Commons license suite, now would be a good time to review the About CC Licenses overview of the six different license types. You’ve seen these licenses for content on YouTubeWikipediaFlickr, etc.

Generally speaking, these licenses answer the user’s question: what can I do with this work? The different licenses give options for the licensor to specify whether attribution is required, commercial use is permitted, creating derivatives is allowed, and, if so whether those derivatives must be shared under the same license terms as the original. Creative Commons has a very handy “License Chooser“ that helps you find the license that best suits your needs.

If a standards developer intends to make their standards available in the public domain, or with attribution only (allowing the free range of adaptations and derivatives), then the Creative Commons licenses (and/or in combination with the Open Data Commons licenses) are well suited for this. For example, HL7 decided to declare the FHIR specification as public domain by applying the CC0 license.

However, not all standards developers are willing to dedicate their standards to the public domain and thereby waive all rights to the work worldwide under copyright law.

Note: I’ve often heard people use the term “public domain” as if it only means “publicly accessible” (i.e. published on the web). What public domain really means is unprotected by intellectual property rights (both copyright and patent) and free for anyone to use, modify, and build on.

When your primary goal in standards development is consistency, it can be hard to reconcile allowing any user to change any part of the standard. With the set of ideal principles (described above) in mind, standards developers may review the other Creative Commons license and realize this is a more complex situation, especially as it relates to derivatives. In addition, standards may have both data content that is copyrightable, database or data model attributes (which may have separate copyright), and occasionally, reference software that would have its own copyright. With these considerations, the Creative Commons licenses are often not a good fit, as we’ll discuss further.

Contributor Permissions

Before digging into detailed license provisions, I would be remiss not to mention the issue of contributor permissions. Openly developed standards incorporate the contributions of many people (who are often doing so on behalf of an organization). You see this same feature of collective contributions reflected in many open source software licenses, e.g. MPLApache, that are framed in terms of “Contributors” who’s work is incorporated into the whole software product.

Biomedical terminology standards are developed by collaborations or organizations of varying maturity, some of whom may not have considered how important it is to have secured appropriate permissions from all of the contributors/authors. Securing such permissions are typically handled via a copyright transfer agreement of some kind. For example, LOINC has a Submitters Policy to make clear that Contributors grant Regenstrief the right to incorporate their content/ideas into LOINC and that they did not have any rights over LOINC (except via the public license) because of it.

Standards developers would not want someone coming back later claiming certain rights (whether copyright or patent) over potions of the content that would disrupt the integrity of the whole or prevent the standards development organization’s ability to act independently in regards to licensing decisions. Many (but not all) organizations have a process for obtaining contributor permissions, but I wanted to highlight this as an important issue.

Databases vs Data content

There are special intellectual property issues about databases, with substantial variation across jurisdictions. Wikipedia provides a succinct definition:

database right is a sui generis property right, comparable to but distinct from copyright, that exists to recognize the investment that is made in compiling a database, even when this does not involve the “creative” aspect that is reflected by copyright.[1]

The idea here is that copyright protects databases and other information collections if they are “works of authorship”. U.S. Copyright Law indicates that for a database or collection to qualify as a work of authorship, it must exhibit at least a modest amount of original creative expression reflected in the selection, organization, or overall coordination of the data elements. The data elements (i.e. the contents of the database) may themselves be original works of authorship or may be uncopyrightable facts or similar items.

In contrast, database rights confer protection based on the effort and resources it takes to compile information (regardless of the degree of creativity involved). The European Union Directive 96/9/EC on the legal protection of databases articulates principles for the legal treatment under copyright law and the sui generis rights that don’t qualify for copyright. The U.S. does not presently have a similar overarching law, but has an interesting evolution of case law on “originality” and there have been some efforts to adopt sui generis protection.

Biomedical terminology standards are typically published as databases (in various forms) or have database features. Again, I’m a non-legal expert, and the extent to which such protections exist in different jurisdictions is beyond this summary. I mention database rights here because I suspect many in the global research community lack understanding of these concepts.

The idea that the data model and relationships may have rights separate from the content (e.g. codes, names, descriptions) makes sense to me. To the extent that they apply to any particular terminology standard, it would be prudent to consider addressing them in the license.

One of the important updates in Version 4.0 of the CC license suite was to address sui generis database rights in addition to copyright and the other copyright-like rights. (See historical discussion and a FAQ). From my perspective, the CC license provisions around database rights make sense according to the broad purposes of each license variant (BY, BY-ND, etc).

Yet, as I hope to illustrate, there are additional considerations that I believe make the CC licenses less desirable for biomedical terminologies.

Licensing Issues

Trouble with prohibiting (or allowing) “Adapt/Remix” via the No Derivatives clause

Standards developers may wish to prevent users from altering published content (e.g. prohibit users from changing the identifier, description, or other normative attributes). It goes against the very purpose of having a standard. So the CC Attribution, No Derivatives licenses are attractive. This is the clause from the CC BY-ND (No Derivatives) license that prohibits adaptations (or conversely, the one that other CC licenses allow):

“Adaptation” means a work derived from or based upon the Work, or upon the Work and other pre-existing works. Adaptations may include works such as translations, derivative works, or any alterations and arrangements of any kind involving the Work. For purposes of this License, where the Work is a musical work, performance, or phonogram, the synchronization of the Work in timed-relation with a moving image is an Adaptation. For the avoidance of doubt, including the Work in a Collection is not an Adaptation.

For reference, here is how Copyright Law of the United States defines a derivative work:

“derivative work” is a work based upon one or more preexisting works, such as a translation, musical arrangement, dramatization, fictionalization, motion picture version, sound recording, art reproduction, abridgment, condensation, or any other form in which a work may be recast, transformed, or adapted. A work consisting of editorial revisions, annotations, elaborations, or other modifications, which, as a whole, represent an original work of authorship, is a “derivative work”.

The CC Attribution, No Derivatives license (CC BY-ND) clause prohibits more than what most standards developers had in mind and prevents use in ways they would want to support. For example, a terminology licensed this way could not be included in a larger set of terminology artifacts or augmented with addition knowledge via linkages, mappings, etc. Similarly, under this license content from this terminology could not be incorporated into other technical specifications such HL7 FHIR Implementation Guides.

Annotating clinical data

The CC licenses generally shine in making plain to non-legal experts what they can and can’t do with the licensed material. Yet, this is not the case for the primary purpose of most biomedical terminologies: annotating clinical data in health and research systems.

As noted, the newer CC version 4.0 licenses now helpfully include a section on Sui Generis Database Rights. But I believe that typical users, including software developers, would have a hard time understanding what constitutes “substantial portion of the database contents” of a particular biomedical terminology.

How many terms, codes, relationships, etc would that be?

Further, the CC licenses are based on the concept of Sharing, which is about making content public. For example, if an ND license is applied, you may produce an adapted database but cannot share it publicly. This may or may not match the standards developer’s intent. I also find it difficult to understand how the license terms would apply to sharing (e.g. developing and then selling or distributing) software that uses standard terminologies and how they would apply to sharing or exchanging health data that is annotated with the licensed standard terminology. The latter case of sharing health data annotated with licensed content is so crucial because health data may be recorded for a primary use (e.g. clinical care) and then exchanged in various ways for secondary uses (e.g. public health reporting, quality measurement, clinical research, etc).

But really, this is just the tip of the spear. A quick brainstorm of other related issues that users ask but are unclear from the CC BY-ND (or CC BY) licenses:

  1. Are there limits to the extent and manner of subsetting or extraction from the terminology content as distributed from the standards developer?
  2. For a conventional use in an electronic health record system (EHR), must that system always include certain attributes (e.g. identifier, description, and coding system identifier)?
    1. Would it be acceptable to just have a label (description) and no identifier?
    2. Must an identifier (code or URI) always be accompanied by a human readable description/name?
  3. What usage is granted in the context of information technology “systems” (e.g. EHRs) versus health records (in various formats) for individuals or populations? Are there any terms of use differences if you are the “sender” versus the “receiver” of data (who may not know in advance what the record contains)?
  4. How should the proper attribution and copyright notice be handled in the context of annotated data (e.g. a patient’s blood pressure result instance that is annotated with a LOINC code to identify the measurement and a UCUM code to identify the unit of measure) versus software applications that produce such data?
  5. What use, if anything, does the standards developer promote (or require) related to the codes, descriptions, URIs in various use contexts?

Again, standards developers may not consider licensing to be the best means of addressing these issues. But the license is where users go to understand what they are allowed to do with the terminology content, and the place to specify requirements that can be legally enforced.

Preventing unintended use

Many standards developers have the perspective that cooperation is better than competition. It really defeats the whole purpose of standardization if people take the standard and use it to create a competitor product. I understand this perspective and recognize the difficult balance of allowing free (in a sense broader than just money) use and preventing someone from “stealing” your content and using it against your aims. There are several possible ways to address this concern.

The first, as we’ve discussed, would be to use a license with a “No Derivatives” clause. Because derivates have such a broad meaning in the legal sense, that approach is too restrictive. Creative Commons also has licenses that prohibit commercial use, but that would not be acceptable in the health data space where much software (e.g. electronic health record systems) are sold as vended products. And of course you could “compete” without being commercial.

Another approach would be the “copyleft” or “share-alike” licensing model, whereby any derivatives making use of the content must be made available under a similar license. As the software world has hotly debated, this is approach is problematic for commercial applications (e.g. heath IT software vendors) and would severely limit the standard’s potential use.

In its development of LOINC and UCUM, Regenstrief has not been satisfied with these approaches. The LOINC license has a special clause (clause 1 actually) that addresses the issue head on. Basically, it says that you cannot use LOINC to make a competing standard. When I present about LOINC, I always say that we added Clause 1 because an alternate coding system defeats the purpose of having a standard in the first place. It would undermine solving the central goal of why we were developing LOINC – to have a consistent, universal way to identify observations. I have not found a similar clause in other general purpose licenses.

With Grahame Grieve’s leadership, HL7 took a different approach with the (now wildly successful) HL7® FHIR® standard by placing it in the public domain (CC0). In essence, HL7 decided that they would address these issues not by protecting the intellectual property, but rather by a) protecting the “brand” of FHIR (via trademark) and b) attempting to thwart competitors by outperforming on community engagement and governance. In essence, there is nothing that prevents another group from taking a copy of the FHIR specification, calling it “En Fuego” (or whatever), and trying to get everyone to use it.

What HL7 is betting on, is that when (if) they manage the community processes of developing the standard, engaging participants, and evolving the specification and its community there will be little reason for another group to fork a competitor. And I think you can argue that they have been quite successful at this with FHIR. It is similar to the approach other of successful, long-running open source projects like Ruby on Rails. And RxNorm is a widely used biomedical vocabulary developed by the U.S. National Library of Medicine that is made available in the public domain (because it is created by the U.S. Government). However, other standards developers have different assets, resources, competencies, and community trust than HL7 or a federal entity.

So there is much to consider. Intellectual property protections are just one ingredient.

User-created extensions

No standard biomedical terminology has full coverage of its domain, especially considering how medicine and science progress. A higher frequency of update releases can partially address this. But, they cannot satisfy all user-specific or context-specific needs. Whether and how standards development organizations allow users to address these issues by creating extensions is contentious. For example, the idea of user extensions is not well suited to the historical usage of some terminologies as statistical classifications.

The CC “No Derivates” licenses would prohibit sharing of any user-created content that was “based upon” the overall framework and structure of the licensed terminology. But, the Creative Commons licenses are not designed to distinguish between adaptations (derivatives) of various kinds. In the context of biomedical terminologies, standards development organizations may well want to have different rights granted (or prohibited) for altering published content, subsetting, augmenting by incorporating additional relationships or mappings, or adding new concepts that “fill in the gaps” of the terminology for a particular purpose.

LOINC was one of the first “open” terminologies to begin addressing this issue. The LOINC license (Clause 2) prohibits users from editing (altering) any published content but allows additions of new fields (attributes) to be created. Clause 3 allows users to delete records (a “record” corresponds to an entry for a concept and its attributes) or add new records, but prescribes a specific (albeit somewhat rudimentary) way that such new content be identified so that it cannot be confused with official LOINC content. In that same clause, the LOINC license also includes a stipulation that users “make reasonable efforts” to submit requests for new content in order to minimize the need for such local extensions. The intent was capture the true spirit of a global commons where everyone contributes to the public good.

I accept that international standards will not ever cover all local needs. Although we are always limited by resource constraints in developing standards, Regenstrief set a relatively low bar for justifying the use case for needing a LOINC term. And the LOINC experience surprised me in how often concepts that first appeared to only have project or regional relevance later had much broader usage. Certainly we see this playing out with the COVID-19 pandemic.

SNOMED International has developed rather elaborate mechanisms for users to create, manage, and share SNOMED CT extensions. Central to this approach is the assignment of Namespace, Module, Extension, and Edition identifiers to delineate the extension content from the core International release content on which the extension depends and to create packages that group these elements together. The SNOMED licensing model allows both member countries and affiliates in member countries to create extensions.

In my opinion, the SNOMED approach is innovative, but in practice also facilitates fragmentation. Many factors contribute, including the technical complexity, multi-layer (affiliate + national release center + SNOMED International), lack of an integrated platform for accessing content across the plethora of extensions, and the lack of incentives to promote sharing (in part due to their overall licensing approach), and others. My opinion is that having a robust and “fully sanctioned” approach to extensions with these dynamics enables people to get the brand benefits of saying they use SNOMED without the hard work of cooperating and collaborating together to build an international public good.

I see continued opportunities for innovation in how open biomedical terminologies create capability for users to adapt to local requirements while contributing to the international commons. Fostering an active, passionate, and cohesive community around ongoing development will remain central to success. One significant gap that I see in most current approaches is providing proper attribution, at the level of concept attributes, when many individuals contribute to a public good. Such attribution can be a strong motivator for community participation, particularly for those in the biomedical sciences.

Other considerations

Pay attention to what exactly is being licensed

Many terminology standards are distributed as a bundle with database files, documentation, software, etc. The licensor (e.g. standards developer) should mark which elements of the work are subject to a given license and which are not. For those elements that are not subject to the license, users may need to obtain separate permission. Some custom licenses, like the LOINC license, cover all elements in one license. Having the legal complexity of these different elements in one document may overwhelm the intended convenience of a single license.

Creative Commons licenses are not designed for use on software. So if the standards bundle includes software, those elements would need a separate license. There are many open software licenses to choose from, and a discussion about the different features and restrictions is outside the scope of this piece.

Health-specific disclaimers

For some health IT software released under open source licenses, the IP owners have elected to add specific medical disclaimers. For example, OpenMRS is licensed under the MPLv2 license with a specific Healthcare Disclaimer. Standards developers should consider whether the clauses in general licenses such as Creative Commons Section 5 – Disclaimer of Warranties and Limitation of Liability are comprehensive enough for their needs.

Grace periods and arbitration

Special versions, called “ports”, of the Creative Commons version 3.0 licenses have been created for Intergovernmental Organizations (IGOs). IGOs have privileges and immunities from national legal processes that may make it difficult to bring a legal suit in a national forum. Therefore IGOs typically use mediation and arbitration as the preferred means of resolving legal disputes.

The CC version 3.0 IGO ported licenses inlaced two special provisions. First is that unless otherwise mutually agreed, disputes are resolved by mediation or, if that is unsuccessful, through arbitration. Second is a grace period (a.k.a cure period) that automatically reinstates the license automatically if a violation is rectified within 30 days. This grace period is also included in all variants of the newer version 4.0 licenses.

Whether using the CC licenses or not, standards developers may wish to consider including grace periods and arbitration options in their licenses.

Conclusions

In summary, I see significant challenges with using the CC licenses for biomedical terminology standards.

I wish that I could point to another suitable general license as an alternative, but I do not believe that one exists. I do believe that open licensing of biomedical terminologies is crucial for open science and that the current model licenses have significant issues. Having fit-for-purpose model licenses (analogous to those from Creative Commons) for standard biomedical terminologies will accelerate open terminology development and use. Furthermore, such model licenses could serve as a framework to evaluate and clarify existing terminology licenses.

From my conversations I know that there are other existing and emerging vocabularies that would be interested in co-pioneering such an approach. I would strongly encourage funders and standards developers to pursue research and innovation in this area as I believe it has transformational potential for open science and better health.

Disclosure of Material Connection

Some of the links in the post above are “affiliate links.” This means if you click on the link and purchase the item, I will receive an affiliate commission. Regardless, I only recommend products or services I use personally and believe will add value to my readers. I am disclosing this in accordance with the Federal Trade Commission’s 16 CFR, Part 255: “Guides Concerning the Use of Endorsements and Testimonials in Advertising.”