Using LLMs to build glossaries of Czech legislation
On Friday 12 December 2025 Martin Ledvinka held an Open Mic session with the topic "Using LLMs to build glossaries of Czech legislation."
Abstract
Glossaries allow disambiguation of the meaning of words in a given context by creating terms - pairs of label and definition describing the label’s semantics in a given context. However, creating such glossaries manually is an arduous and time-consuming process. In this talk, experiments with using LLMs to extract glossaries from Czech legislative documents were presented.
The approach builds a pipeline of downloading legislative documents, pre-processing them and then using a LLM to extract a glossary of terms explicitly defined by the document. The output is then post-processed and a SKOS glossary is created. Finally, the glossary can be imported to TermIt - a terminological manager developed by the KBSS.

While the steps of the proof-of-concept of the pipeline are to a large extent manual, they have great potential for automation, and the output of this pipeline can be used as a seed for creating more complex glossaries that include also terms defined implicitly by the document.
The presentation slides are available at this link.
Further reading: