logoCSIC logoILC
BACK TO GLG Site

Callimachus

A Regest of Greek Papyri

Presentation

Callimachus (still in beta) is a regest of Greek and Latin Papyri (and Coptic papyri containing Greek words). Today it contains information from the documentary papyri collected by the partners of the PAPYRI.INFO project. Soon it will include literary papyri as well.

What is Callimachus

Callimachus is an automated regest of published papyri and ostraka, ie. a processed extract of the formal contents of the text in the papyri hosted at the Papyri.info site. Additional info about the date, origin, material, etc., of the papyri (from the HGV database) is included in order to enrich the queries. Currently Callimachus (in beta) contains only the data on documentary papyri. Literary papyri will be added in the following weeks. Lexical information about the papyrus is contained in the sibling Anagnostes database, soon to be released here.

Callimachus contains three kinds of information.

The first one refers to several countable features of the text, as it was encoded by the Papyri.info project; for instance, how many words, letters, gaps, letters per line, scribal hands, etc. can be found inside every document. These data was extracted during the parsing of the documents from the Integrating Digital Papyrology Papyri.info github repo. The lexical information belongs to another project, Anagnostes, soon to appear here.

The second type of information is an automated calculation of the state of the text of the papyrus (Callimachus' number). In other words, how much (and how well) the original text of the papyrus can be read in the edition used by Papyri.info. This calculation is provided as two decimal numbers (CRN and CNN) from 0 to 1 (one means all the text is perfectly readable).


Callimachus Readability Number (CRN) is a measure of the readability of the part of the text that was edited (up to which point the editor was able to read or conjecture the papyrus' text information).
Callimachus Conservation Number (CCN) is a measure of the conservation of the papyrus' text. CRN (center) and CCN (center) refers only to the "center" of the papyrus, defined as the part of the text after the first full word preserved and before the last full word preserved. Here you may find how this number is obtained.

There is still another variety of the CRN and CNN, (namely CRN2 and CRN2) which somehow amplifies the differences between different states of preservation: this is obtained by squaring the values of each letter and then obtaining the square root of the total. Whether this, or the simple number is more useful, is a matter to be resolved.


The third kind of data is mainly data about the papyrus (or ostrakon) itself, as provided by the Papyri.info project: Date, Origin, material, content, etc. This information comes from the metadata included in the XML documents, or from the HGV database. All this info (and many more) can be consulted in the Papyri.info site as well.

What is Callimachus for?

Callimachus can be used in papyrology research as well as in linguistics-related projects.

You can use Callimachus to search papyri containing any specific feature, or a combination of features. For example, you can search for papyri containing any specific trait (Are there coronides in non literary papyri? Where can I find examples of papyri using a specific fraction, or a type of deletion mark?, etc.), or combination of traits. This may help you to find parallels to your object of study. The use of Callimachus in combination with Anagnostes (soon to be uploaded) will allow you to combine lexical information with all the data types of Callimachus.

You can use Callimachus to help you build your corpus, using, for example, mainly papyri from a certain date and origin with high Callimachus number (meaning better preservation) and with a minimum of words.

Some features (like a high number of regularizations) can be meaningful for the linguist interested in phonetic traits on the koiné.

How the sausage is made

In order to build Callimachus (and Anagnostes), all the digital editions of the texts from the DDB EpiDoc section currently at the github repository were parsed: full words were identified and lemmatized using the Madrid List of Greek words and differentiated from groups of letters with no grammatical value. All the pieces of the text (words, gaps, meaningless groups of letters) were counted and annotated. We are still refining the algoritms to detect and count things in a sensible way.

The Callimachus numbers are calculated after all this information is processed: e.g. to count the number of letters per line, only the original reading of the words, and not the editorial corrections are counted; and the text deleted by the copist is omitted, and only the text added after the deletion is counted.

Most of the elements present on the documents (figures, glyphs, etc.) were counted as well, when such information was encoded in the digital edition using TEI-EpiDoc. Editorial annotations on several features of the text (like the number of hands, and the change of hand) are counted as well. Very soon, the literary papyri will be added to Callimachus.

What kind of information is available via Callimachus? »

Who makes Callimachus?

Callimachus is made within the Grupo de Lingüística Griega by Daniel Riaño Rufilanchas.

Many thanks to Silvia Martínez Valero, José Antonio Berenguer, Juan Arboleya, Laura Salas Morellón y Antonio Revuelta for their help in the project and betatesting.

Esta base de datos y las utilidades para desarrollarlo han sido financiadas gracias a los proyectos FFI2017-89110-P y PGC2018-096171-B-C21 del Ministerio de Ciencia, Innovación y Universidades.

This database and the code used to create & publish it are supported by projects FFI2017-89110-P and PGC2018-096171-B-C21 from Spain's Ministerio de Ciencia, Innovación y Universidades.

Contact with Callimachus