Heidelberg Center for Digital HumanitiesResearch Projects

On this page you will find information about various projects that are currently supported by the HCDH or that have presented themselves within the Center's event series and would like to network further.

Naval Kishore Press - Digital

The Naval Kishore Press (NKP) was founded in 1858 in the northern Indian city of Lakhnau by Munshi Naval Kishore (1836-1895) and developed over the next four decades into one of India's most important publishing companies. Naval Kishore Press published works in Hindi, Urdu, Arabic, Persian, Sanskrit, and English. In terms of content, the publishing portfolio covered a wide range - fiction, textbooks, guidebooks, religion, texts of classical Sanskrit literature, literature on Islam, Indian medicine, Quran editions, translations of English classics were published by the publishing house. The CATS Library / South Asia Department of Heidelberg University owns a representative cross-section of the publishing output of this important publishing house with the Naval Kishore Press collection of approximately 2,200 titles (of which 742 titles are on microfilm).

Selected Hindi and Sanskrit works from the NKP collection have been digitized and made available online as editable full-text versions in Devanāgarī and transliteration as part of the DFG-funded project Fachinformationsdienst Asien (2016-2021). A text passage found via the full-text search is made visible in the facsimile by highlighting the text passage.

Transkribus is used for text recognition. Different data models have been trained based on ground truth (GT) transcriptions for text recognition of Devanagari works. They give very good results with a CER of about 2%.

For ground truth data, a ground truth data archive for South Asian scripts has been established on heiDATA. Here GT data from the Naval Kishore Press - digital project as well as GT data from collaborating partners are available for reuse.

In addition, the Naval Kishore Press Bibliography, established by the South Asia Institute Library, serves as a central reference tool for printed works of the publishing house. The aim of the bibliography is to provide a central database of the holdings distributed to libraries worldwide. In addition to the Heidelberg collection, the Naval Kishore Press works available at the Bodleian Library in Oxford are also listed.

Further information

Prosopography - Database of Middle Assyrian Texts

The project “Database for the Personal Names of Middle Assyrian Texts” aims to create a functional, web-based database that meets the requirements of a project in the field of naming studies and the international standards for comparative projects. It is a database for the DFG project “The Prosopography of Middle Assyrian Texts (PMA)” led by Prof. Dr. Ariel M. Bagg (Department of Languages and Cultures of the Near East/Assyriology) since September 2019, whose goal is to compile a “bibliographic” lexicon of the c. 5,000 personal names (c. 12,500 individuals) that occur in the Middle Assyrian text corpus. The corpus consists of about 3,000 cuneiform texts from the second half of the second millennium BCE, written in the Middle Assyrian dialect of Akkadian. According to the data management concept submitted to the DFG, a user-friendly database will be developed, which will be fed with data by the applicant during the course of the project and converted into a web-based database after the end of the project (August 2025). The database will not be an essential tool for project work, but rather will allow for updates and further study after the project ends. In order to ensure the sustainable public availability of the project results, the web-based software easydb will be used - after consultation with the UB Heidelberg - for the creation of the project database.

Further information

Object and provenance - blog project

The blog project “Objekt und Provenienz” (Object and Provenance) aims to make provenance research transparent and public: Since 2021, historical documents from the so-called Old Inventory of the Collection of Classical Antiquities have been successively placed online there with transcriptions that provide information about the acquisition or donation of objects. Thanks to high-resolution scans from Heidelberg University Library, these documents can be digitally annotated in a second step, i.e. references to identified objects and photos of the objects can be linked directly (work in progress). In the longer term, further historical documents on the collection will also be made available on the blog.

The project combines provenance research on the Heidelberg Collection of Antiquities with a Citizen Science approach, through which interested parties have participated in the transcription of historical sources on the history of the collection. It thus sees itself as a pilot project among archaeological university collections in Germany and aims to set a good example to help create awareness not only of provenance issues, but also of the often lengthy research required to address them.

Annotation of moralizing practices

In the project “Annotation of Moralizing Practices” we are creating a dataset of texts from different languages (German, English, French, Italian) and text genres (online forums, political debates, newspaper texts, non-fiction...) in which language acts of moralizing are annotated.

By moralizing speech acts, we mean discourse-strategic procedures in which the description of contentious issues and required actions are narrowly framed in moral terms. Vocabulary referring to moral values (e.g., “freedom,” "security," or "credibility") is used to enforce a demand that in this way appears unassailable and requires no further justification.

The resulting dataset will be used in the future for automated research on the phenomenon of moralization - a diffuse everyday language concept that will be operationalized as a term of descriptive linguistics.

OCR technologies in comparison

The project is anchored in digital linguistics. A basis for the computer-aided study of pre-modern lexicographic works and historical language contacts in the field of lexis was developed. In order to develop a technical infrastructure for the digital recording of multilingual dictionaries (manuscripts and old prints), lexicographic data were processed with a view to linking them in a database. At the same time, the data set to be examined was expanded with the help of the HTR tools Transkribus and eScriptorium. In this context, HTR models were trained and applied for further automatic transcriptions. In parallel, different OCR engines (CITlab HTR+, PyLaia, kraken) were evaluated and their advantages and disadvantages weighed up. In addition, international contacts and cooperation were established with other projects that merge lexicographic data as well as entire dictionaries (Gorazd, LiLa, Logeion, MLW digital).

To the project

Materials for individual projects

Table

PP on the 'Annotation of Moralising Practices' project	pdf (1,5 MB) (pdf, 1,5 MB)
OCR technologies in comparison: from manuscripts and old prints to database structures and HTR models	pdf (13 MB) (pdf, 13 MB)