I have compilled here a list of Ancient Greek words lemmatized, morphologically analysed, and glossed, to be used in all kind of projects related with the Ancient Greek world, scholarly or educational, as well as in linguistic and all kind of Digital Humanities projects. Today, the list contains more than 1,300,000 different forms of ancient Greek words (and variant spellings for words found in papyri) and about 3,800,000 different possible morphological analysis for them. The words come from every kind of literary texts, in prose or verse, from Homer (c. 8th century BC) to the 6th century AD, plus many non literary papyri coming from the Papyrological Navigator.
The list contains the proper names (PN) identified as such (through capitalisation) in the sources. Some effort has been put in automatically identifying of the nature of the referent (a person, a building, an event…). A much more complete list of personal list, though, is to be found in our list of Ancient Greek Personal names).
This page was set to collect reactions and ideas before the final release of the full MAGWL. The list will be uploaded in a way that allow any student or researcher to contribute their corrections and to download the entire list.
You can see the first 10K entries of the list here.A MAJOR REVISION OF THE WORDLIST IS CURRENTLY DEVELOPING. THOUSANDS OF GHOST FORMS AND FALSE FORM-LEMMA PAIRINGS ARE BEING REMOVED, AND SEVERAL THOUSAND NEW FORMS ARE BEING ENTERED. PLEASE BE PATIENT!
Many thanks to Silvia, Antonio and José Antonio for their help!
This page is a work on progress by the Grupo de Lingüística Griega del ILC (GLG).
The MAGWL consists of a series of lines, each of them containing lexical and morphological information of a Greek word; Or rather, any morphological form of a word. Or even spelling variants of a given morphological form of a word, in the case of the papyrological documentation.
Here is an example:
Βουκολίδα Βουκολίδης ø Βουκολιδα n-d---ma-;n-d---mn-;n-d---mv- ø Bucolides, PN of a man Morph
μετεστρατοπεδεύσατο μεταστρατοπεδεύω μετά-στρατοπεδεύω μεταστρατοπεδευω v3saim--- ø shift oneʼs ground; camp Diorisis;Morph
A morphological form (or variant spelling) of a Greek word.
The lemma to which that form belongs.
If the word has a prefix, the parts of the word (prefixes + lexeme, no further Wortbildung description is given).
The form of the word, without diacritics.
The possible morphological parsings of the form, using the compact Perseus schema of annotation. (You can se the meaning of the letters at each position here).
Occasional, and always incomplete info on the dialects where the form is attested.
A gloss of the word (in English, or in Spanish if no English translation was found).
The corpora where any of the above items were found.
Data in this list comes from four kind of sources of open data:
The resulting list (with its many errors) is the output of processing all the above material and consolidating it automatically. All the lemmatization is the work of the responsibles of the wordlist and the treebanks, but basically from the people in charge of the Morph lists.
Some of the features of the list are dependant of the original sources. For instance, the original Morph list lemmatised many verbal compounds differentiating the first preverb form all the rest, but the results were sometimes problematic. MorpheusU fixed some of the impossible analyses.
The assignment of the part of speech of the forms is guess work based in grammatical information provided by Morph, or the cues that may be obtained from the lexica (Ancient Greek lexica usually does not say explicitly if a word is, say, a verb, and adjective or a noun, but the information provided should provide a human user with enough cues to infer that.)
All the Treebanks (except Riaño's) depend somehow of the Morph list for lemmatization and POS tagging, or a corrected version thereof. However, they have somehow corrected many errors, most notably Gorman.
Before processing, Keersmaekers corpus was cleaned from several thousand numerals, and a fair number of artifacts (sometimes the digital result of different convention for the edition of papyri for the last two centuries). In the process I may have deleted some legit words.
This list contains many non Greek names (names without a Greek etymology). I did no attempt to delete or separate them from the rest of the names. Often, such names appear without inflection. Quite often, the editors did not try to accentuate the forms, or the lemma.
I have used LSJ and DGE to correct some apparent anomalies in other treebanks. I used LPGN and the Trismegistos repertory of names to do the same with the proper names. This changes are not noted in the list except for the fact that the name of the corpus (LGPN or Trism) will appear in the line corresponding to the nominative singular of that PN.
Some of the treebanks may present some lacunae in proper names, specially (for reasons unknown to me) with names starting with rho and omega.
The glosses to the list words come from the first two definitions of LSJ and DGE. Because of that procedure, the gloss is not always the most common meaning of the word in all times (some times lexica start by the older assumed meaning; or the firstly attested; or the intransitive vs. the transitive; or just the way around, etc.). When I couldn't extract a definition from LSJ (or when a word is not present in that dictionary) I used the DGE (preceded by "Sp." for Spanish).
For the personal names, I provide first an automatic transliteration of the name (most of the times it should correspond to at least one possible transliteration in English). Then, if I was able to teach the program to recognise what kind of entity is it (based on the clues provided by the dictionary's definition) I give a tag that classifies the name among persons, events and places. The sort of tags you find in such cases are "PN of a festivity", "PN of a woman", "PN of a man", etc. (PN is of course "Proper Name"). This kind of tagging is intended for students of Greek and for automatic linguistic analysis.
Because of all the above, you may think that: a) Whenever a disagreement exists between data from LSJ and DGE on one side, and the rest of the sources on the other, it is usually the information coming from the first two sources the safer bet. b) Whenever a disagreement exists between data from Morph and the Treebanks depending on it, it is generally the data of the human curated source the best (this is specially true with Gorman).
For some reason, the English rationale to capitalise verbs and adjectives was extended to the modern printing Ancient Greek. Inconsistencies in the capitalisation of nominals and personal noun-derived adjectives and verbs have been dealt with, in accord to this convention. For that purpose I have used data taken, or inferred, from LGPN, Trismegistos or the lexica. Likewise, inconsistencies in accentuation have been carefully (and incompletely) dealt with. In he same vein, lexica and the list of personal names have been used to identify the nature of the referent of PNs from other sources, and to identify the right POS tagging for some entries. Most of the times, I have not used the information from lexica and LGPN on the sex of the bearer of a proper name to rectify the gender of nouns in other corpora, since Greek names are many times used for both sexes.
When a name has a common meaning (as many Greek names do have) I added that meaning between parentheses, after the transliteration and the identification tag, for instance:
During the process of building this list I have occasionally deleted several impossible forms or non-words.
I have used Perseus nine-positions schema for morphological description of Greek and Latin words. Although it is not complete (e.g., no way to indicate verbal adjectives) and somewhat problematic (infinitive and participle a mood?) it is a compact way to describe the greater part of basic Greek morphology.
The meaning of the characters used for the morphological description (5th item of each line on the list) are these:
1 part of speech
This page is work on progress by the
Please, send your comments and ideas to Daniel
The contents of this site are CC by Daniel Riaño Rufilanchas