Vocabulary Policy
Purpose of This Document
This document describes the policies for both external and internal vocabularies adopted by LINCS, and the responsibilities associated with vocabulary use and development. All LINCS datasets must adhere to these policies when selecting and using vocabularies to describe their entities.
The policies outlined in this document are intended to ensure that vocabularies used in LINCS datasets are well-structured, interoperable, and maintainable, while also maximizing the semantic specificity required by each dataset.
Vocabularies
Generally speaking, a vocabulary is a set or collection of terms. Vocabulary is a generic term to refer to a set of concepts that could be concretely described in an ontology, taxonomy, or thesaurus. LINCS employs a specific definition of vocabulary and provides a set of policies for the adoption and development of vocabularies within the project.
First of all, LINCS adopts the CIDOC Conceptual Reference Model (CRM) as its upper-level ontology, providing classes and properties to model the data. While the CIDOC CRM provides a high-level classification of entity types such as Person, Place, or Information Object, it does not provide enough semantic specificity to describe each entity in detail.
For example, the CIDOC CRM class E22 Human-Made Object represents any physical object made by humans, but there is no way to distinguish a painting from a chair. By adopting terms from external vocabularies, LINCS datasets can specify the type of each individual object. This allows for much more detailed classification and enables a richer semantic description.
The example above shows how the artwork Hommage à Alanis Obomsawin by Atikamekw artist Meky Ottawa is classified in the MONA Public Art dataset. The artwork is modelled as an instance of CIDOC CRM class E22 Human-Made Object, but its type is further specified by using the terms mural painting and public art from the Getty Arts & Architecture Thesaurus.
Definition of Vocabulary in LINCS
LINCS adopts an operational definition of vocabulary based on term usage rather than on a specific set of structural requirements or implementation technologies. For our purposes, a vocabulary term <X> is any Uniform Resource Identifier (URI) that belongs to at least one of the following triple statements in the LINCS triplestore:
<X> rdf:type crm:E55_Type→ The term is explicitly declared as a type in CIDOC CRM.<X> rdf:type skos:Concept→ The term is explicitly declared as a SKOS concept.<Y> crm:P2_has_type <X>→ The term is used as a type for another entity.
In practice, any entity that is declared to be a type or that is directly used as a type for another entity is considered to be a vocabulary term, and the source of that term qualifies as a vocabulary for the purposes of LINCS. The image below displays the three possible conditions.
It is important to note that the vocabulary term <X> does not need to fulfill all three conditions at the same time. Just one of the triple statements above is sufficient for to be considered a vocabulary term.
Examples of Vocabularies
Vocabularies come in many different forms and as stated above, LINCS adopts a very broad definition based on usage. Any collection of terms that is utilized as a vocabulary is considered to be a vocabulary.
A Proper Controlled Vocabulary: Getty AAT
For example, the University of Saskatchewan Art Collection (USask Art) dataset describes over 6,000 instances of artworks. All these artworks are classified as instances of the CIDOC CRM class E22 Human-Made Object, but this is not sufficient to distinguish, for example, a photograph from a painting. By applying terms from the Getty Art & Architecture Thesaurus, the dataset can refine the classification by typing specific artworks as photographs or paintings.
A Knowledge Base Used as a Vocabulary: Wikidata
As another example, the Yellow Nineties dataset introduces specificity to the CRM class E74 Group by way of the Wikidata terms membership organization or publisher. Wikidata itself is a general-purpose knowledge base and was not designed as a controlled vocabulary, but given that it contains extensive type hierarchies, it can be used as such. Wikidata provides a large number of terms that can be used to type entities in LINCS datasets, and its terms are widely reused by the Linked Open Data community.
LINCS-hosted Vocabularies
In some cases, new vocabularies have been created and hosted within LINCS to provide domain-specific terms not readily available in other external vocabularies. These vocabularies are made accessible and documented through the LINCS Vocabulary Browser.
For example, most LINCS datasets make use of terms from the LINCS Biography vocabulary to describe personal names. In the figure below, the name Abishabis from the Historical Canadians dataset is typed as a personal name using the term biography:personalName.
As another example, three projects with datasets pertaining to early modern London (ARK, MoEML, and REED London) created their own vocabulary of early modern place types to ensure their data was typed using temporally and spatially specific definitions that could not be found in existing vocabularies, such as Bailiwick and Hall”. This work resulted in the Early Modern London Place Type Vocabulary.
General Vocabulary Policies
The following policies apply to all LINCS vocabularies, whether external or hosted by LINCS itself.
-
All LINCS datasets must link their entity types to existing vocabularies wherever possible.
-
When selecting a vocabulary, interoperability should be maximized by attempting to reuse vocabularies adopted in previous LINCS datasets as much as possible. For recommendations about which vocabularies to adopt based on current usage in LINCS datasets, see the Recommended Vocabularies section below.
-
While the use of external vocabulary terms should be prioritized, interoperability should not come at the expense of the semantic specificity required by the dataset. For example, if a dataset requires the representation of the type
public artwork, a vocabulary providing only the termartworkwould not be sufficient, and efforts should be made to find a more suitable vocabulary or to add terms to an existing one. -
Data stewards should make an effort to adopt external vocabularies whenever possible. When no external vocabulary satisfies the requirements of a dataset, the creation of a LINCS-hosted vocabulary can be proposed.
For more detailed information about the workflow for proposing new vocabulary terms or an entire vocabulary, see below.
Policy for External Vocabularies
LINCS allows the adoption of any external vocabulary provided that it satisfies the following set of conditions. All external vocabularies should:
-
Identify each vocabulary term through a unique, persistent, and dereferenceable URI, so that the term can be unambiguously identified and accessed on the web.
-
Describe each vocabulary term through labels and definitions that fulfill semantic requirements, making them accessible through the term URI so that they can be easily retrieved and understood by both humans and machines.
-
Exhibit evidence of ongoing maintenance and adoption in the broader Linked Open Data community.
-
Be compatible with LINCS ethics policies.
LINCS does not mandate the use of a specific implementation technology, such as the Simple Knowledge Organization System (SKOS), for external vocabularies. However, the implementation should be compliant with the requirements outlined above.
Finding External Vocabularies
The Linked Open Vocabularies (LOV) project is a good starting point for searching for terms from over 700 vocabularies. The Canadian Heritage Information Network (CHIN) provides a useful guide to vocabularies for cultural heritage; see Vocabulary (data value standards).
LINCS recommends that data stewards consider vocabularies that have already been adopted in previous LINCS datasets, as this promotes interoperability and reuse. For recommendations about which vocabularies to adopt based on current usage in LINCS datasets, see the Recommended Vocabularies section below.
For more information about obtaining URIs for deferenceable vocabulary terms, see Formulating and Obtaining URIs: A Guide to Commonly Used Vocabularies and Reference Sources.
Adding Terms to External Vocabularies
Researchers may create vocabulary terms by adding to any other existing external vocabulary willing to take additions. Wikidata is particularly suitable for this purpose since it allows any user to create new terms without prior approval. Once a term has been added and is accessible through the web, it can immediately be used within a LINCS dataset. Data stewards are responsible for ensuring that any terms they create are compliant with the policies of the organization that manages the vocabulary and with the LINCS requirements outlined above.
Creating New External Vocabularies
If no suitable external vocabularies are found, data stewards may opt to create and host their own external vocabularies. LINCS recommends that any new external vocabularies created by data stewards adhere to the specifications of the Simple Knowledge Organization System (SKOS), since these guarantee a consistent and interoperable vocabulary structure.
LINCS does not contribute to the creation, maintenance, or hosting of external vocabularies by data stewards. Publishing a vocabulary is a significant undertaking that requires long-term support, therefore LINCS does not recommend this option unless data stewards have the resources and a clear commitment to keep the vocabulary accessible and well-maintained in the long term.
Policy for LINCS-hosted Vocabularies
Vocabularies hosted by LINCS must adhere to a stricter set of requirements to ensure that they are accessible, well-structured, interoperable, and maintainable.
Responsibilities
This section defines who is responsible for the creation, maintenance, and modification of vocabularies hosted by LINCS. The responsibilities are distributed among LINCS staff, data stewards, and external users as described in the table below.
| Responsibility | Responsible Party |
|---|---|
| Hosting of the vocabulary on the LINCS website. | LINCS |
| Technical maintenance of the vocabulary, such as keeping the Vocabulary Browser software up to date, or resolving technical malfunctions. | LINCS |
| Definition of vocabulary terms. | Data stewards |
| Semantic maintenance of the vocabulary, including updates and corrections to labels, definitions, and taxonomic structure. | Data stewards |
| Proposal of new terms for inclusion in the vocabulary, or changes to existing terms. Anyone may propose changes to a LINCS vocabulary through the Vocabulary Browser feedback form. | External users |
| Approval of proposed changes to the vocabulary. LINCS will notify the responsible data stewards that a proposal has been made and request a response. If a response is not received within 3 months, LINCS will decide whether to accept or reject the proposal. | Data stewards & LINCS |
| Publication of the vocabulary on an external website. Data stewards may decide to provide vocabulary data elsewhere, but LINCS is not responsible for creating, maintaining, or synchronizing the external website. | Data stewards |
| Monitoring of biases and compliance with LINCS ethics policies. Any party introducing new vocabulary terms acknowledges the risk of coding bias into LINCS digital infrastructures, and commits to the rejection of terms that are at odds with LINCS guiding principles and values. | Everyone |
| [A couple more rows to be added] |
Accessibility & Persistence
Vocabularies hosted by LINCS are subject to accessibility and persistence requirements to ensure that they remain available and usable over time.
Each vocabulary is made accessible in multiple locations:
- They are available for browsing and exploring within the LINCS Vocabulary Browser.
- The source files are stored within the Vocabularies repository on the LINCS GitLab.
- They are uploaded to the LINCS Blazegraph triplestore and appear as named graphs within ResearchSpace. In addition, they are also available from the Fuseki triplestore.
- Optionally, data stewards may choose to publish vocabularies on their own project websites.
All LINCS-hosted vocabularies must be persistent resources, as defined by the W3C persistence policy: “Publishers must recognize their responsibility in maintaining data once it is published. Key to both access and reuse is ensuring that the dataset(s) your organization publishes remains available where you say it will be and is maintained over time.” Outdated vocabulary terms should never be deleted, but instead deprecated.
Implementation Requirements
LINCS-hosted vocabularies must meet the following implementation requirements:
-
New vocabulary terms for hosting by LINCS must be declared in a SKOS vocabulary and follow the declaration requirements provided in the [SKOS Vocabulary Guidelines and Instructions]. All vocabularies must be declared using SKOS. For more information about required and recommended properties for vocabulary declaration, see SKOS requirements.
-
Definitions should be provided for every vocabulary term. When defining terms, do not use absolutes, so that the definition is not too rigid and can accommodate edge cases. For example, the term “painting” could be defined as “Usually a work of art created by applying pigment to a surface, but may also include works created through other methods such as digital painting or collage.”
-
LINCS is a bilingual project, so labels should be provided in both English and French whenever possible, and definitions in both languages are highly desirable.
-
Projects’ metadata forms should list what vocabularies they have used, and any internally-created vocabulary terms or datasets that are important to their content.
Updating LINCS-hosted Vocabularies
Where existing vocabulary terms are unable to provide the semantic specificity required by a dataset, new terms may be proposed for inclusion in LINCS provided that:
-
Data stewards have searched for existing vocabularies but were unable to find terms or definitions that suit their needs.
-
Data stewards are willing to undertake the labour of developing terms, labels, and definitions for their vocabulary. Both data stewards and LINCS staff need to have the resources to support this work.
-
Projects may require additional funding for creation of larger vocabularies specific to their needs. Creation, implementation, and maintenance of vocabularies is contingent on LINCS resources. In the event LINCS cannot host proposed vocabulary terms, projects may create their own vocabulary terms in Wikidata. To propose new vocabulary terms, or an entire vocabulary, please see the [Vocabulary Workflow].
Any party introducing new vocabulary terms bears the responsibility of, first, assessing whether LINCS already has a relevant term available to datasets. Any party introducing new vocabulary terms bears the responsibility of linking new terms to existing terms where reasonably possible and useful to other datasets.
Any changes to vocabularies used by LINCS should be communicated through the feedback form available on LINCS Vocabulary Browser or via email. This includes suggestions for new terms, edits to existing terms, or any other modifications. LINCS staff will review all feedback and respond to users in a timely manner.
Recommended Vocabularies
The table below lists the recommended vocabularies for each entity type in LINCS datasets. These recommendations are based on the vocabularies currently used in LINCS datasets, as well as on the availability of terms and definitions that meet the semantic requirements of each entity type. Data stewards are encouraged to use these vocabularies whenever possible, but they may also choose to use other vocabularies that meet the requirements outlined in this document.
NOTE: The table is incomplete but this is to give an idea. The Recommended Vocabularies column will include up to three most-used vocabularies for each entity type (no numbers, just names). Data coming soon.
| Entity Type | Recommended Vocabularies (based on current LINCS usage) |
|---|---|
| Appellation | LINCS Biography, ... |
| Conceptual Object | |
| Identifier | |
| Information Object | |
| Linguistic Object | |
| Person | |
| Physical Object | |
| Place |
Current Usage in LINCS → to be moved
NOTE: This section was supposed to include statistics about actual usage for each vocabulary and dataset (see here), but it will be moved outside of this document and into Vocabulary Documentation. Replaced by Recommended Vocabularies above.
The table below provides more detailed statistics about vocabolaries used in LINCS datasets. It lists the vocabularies currently used in each dataset, and the number of terms from each vocabulary that are used in each dataset. This information is based on the data available in the LINCS Blazegraph triplestore.
| Dataset | Getty AAT | Homosaurus | Library of Congress | LINCS Biography | LINCS Context | LINCS Edit | LINCS EML | LINCS Event | LINCS Genre | LINCS Identity | LINCS Injuries & Illnesses | LINCS Occupation | LINCS Personal Relations | Nomenclature | OCLC FAST | Schema.org | VIAF | Wikidata | DBpedia |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AdArchive | |||||||||||||||||||
| Anthologia Graeca | |||||||||||||||||||
| Cabinet Conclusions | |||||||||||||||||||
| Canadians Read | |||||||||||||||||||
| CWRC | |||||||||||||||||||
| Ethnomusicology | |||||||||||||||||||
| Heresies | |||||||||||||||||||
| Historical Canadian Persons | |||||||||||||||||||
| Historical Indian Affairs Agents | |||||||||||||||||||
| Orlando |
References and Resources
- CIDOC CRM Special Interest Group, "CIDOC CRM 7.3.1 Reference Document". Available at https://www.cidoc-crm.org/
- [More to be added]
Document Details
Version: 1.0
Authors: ...
Contributors: the LINCS Project team
Last Updated: 2026-03-03
Released: 2026-03-03