2 minute read

On Friday, 9th January 2026, at 10:30, Michal Med talked about effective publishing of RDF-based metadata (based on RDF top-level vocabularies for metadata description, such as SKOS, Dublin Core, DCAT, etc.).

Problem definition

The current approach to research data leads to the need for a catalogue solution, which will be able to consume RDF-based metadata regardless of its nature - whether it is a book, a map, a manuscript, material data observation, or a chemical experiment protocol. In one of the previous open mic sessions we have introduced Czech Core Metadata Model – a DCAT extension metadata profile designed to describe common properties of research data with possibility to be exteded for the needs of specific scientifiec fields.

Research data jungle

But as the researchers provide the metadata in RDF-based serializations of CCMM and DCAT(-AP), there is still a need to catalogize the metadata in an RDF-based catalogue, which allows the use of the capabilities of the RDF approach, such as searching the domain-specific metadata properties. Many ‘catalogues’ are now based on the dataset repositories (!), which flatten the graph metadata model and are then very ineffective in discovery, searching, and statistical dashboards.

The problem may be defined as looking for effective way to publish machine readable metadata in a catalogue that allows lossless discovery of all relevant information for various purposes, from finding relevant data to overall statistics about published data, creating dashboards and reusing and interconnecting data.

Piveau catalogue as an solution

Tha basic assumption is that data are properly described with metadata based on concepts from top level metadata ontologies, such as SKOS[1], Dublin Core[2], DCAT[3] etc. This is something that is already taken into account in the process of creation of metadata profiles. But it is also important to publish metadata using the catalogues that allow users to fully utilize the benefits of using graph metadata.

In this talk, we took a specific data management ecosystem called Piveau [4] as a metadata catalogue for publishing and harvesting metadata from various providers. This tool is used (among others) as a metadata catalogue for European data portals and is among possible candidates to be used in building the Czech node of the European Open Science Cloud [5] as a metadata management tool and catalogue.

The talk described the multilevel architecture of the environment, consisting of a storage part (Hub), a harvesting part (Consus), and validation and reporting tools (Metrics). In the practical part, it is shown how to create a catalogue, import metadata, or harvest it from a remote catalogue. Piveau is based on Virtuoso triple store and DCAT-based metadata profiles, allows user management using Keycloak, and indexing in Elasticsearch. Typical flow starts with original resources, harvesting them, analysing and discovering them, and reusing them in other tools, such as statistical dashboards. Most of the tools use UI only for end users, catalogue and metadata management is usually done using API, described in Piveau hub-repo service.

Presentation slides can be found here.

Related links:

  1. SKOS
  2. Dublin Core
  3. Data Catalog Vovabulary (DCAT) Version 3
  4. piveau
  5. EOSC CZ