Tag the Preder Geriadur ar Stlenneg to comply with TEI guidelines
This page aims to describe how I tagged the Preder Geriadur ar Stlenneg.
This is a french-english-breton dictionary of computing. It has been written by Guy Etienne and published by Preder in 1995. This is not a free resource. It will probably not be possible to spread it easily.
Why to tag this dictionary ?
There are few resources in breton with the computing vocabulary. And the first on-line dictionary people think to doesn't contain the vocabulary used in free softwares (OOo, Firefox, Gimp or even Gnome applis). So I collected some words and translations in order to write a glossary.
I wanted a dictionary that could be eng(<>fra<)>bre in order to be useful for software translation and bre(<>eng<)>fra for software use.
Because it keeps lot of infos (even not displayed) I choose to write this dictionary following the TEI format.
That's probably not the easiest way and I've never done it before. I needed an exercise...
The Geriadur ar Stlenneg was the kind of resource I wanted to see in the dictionary of my dreams.
Since november 2009 Preder lets people download this dictionary with a Windows applet. (No licence given...)
I can't use this applet with my linux OS, but the dictionary database was in abs format and it is possible to convert it.
From the csv file I tagged this dictionary to get a TEI P5 compliant file.
Thanks to Piotr Banski for the advices, the swa-eng dictionary was an helpful example.
How ?
There are two types of entry-template, the first one only contains example(s) as follows :
<entry n="nb">
<form><orth>nom</orth></form>
<sense>
<cit type="example">
<quote>Un exemple en français</quote>
<cit type="translation" xml:lang="eng">An example in english</cit>
<cit type="translation" xml:lang="bre">Ur skouer e brezhonneg</cit>
<usg typ="dom">domain in french</usg>
</cit>
<cit type="example">
<quote>Un autre exemple en français</quote>
<cit type="translation" xml:lang="eng">An other example in english</cit>
<cit type="translation" xml:lang="bre">Ur skouer all e brezhonneg</cit>
<usg typ="dom">domain in french</usg>
</cit>
</sense>
</entry>
The second type contains the translations and a definition, sometimes a note :
<entry n="nb">
<form><orth>nom</orth></form>
<sense>
<cit type="translation" xml:lang="fra">nom en français</cit>
<cit type="translation" xml:lang="eng">name in english</cit>
<cit type="translation" xml:lang="bre">anv e brezhoneg<gen>gender</gen><form type="infl">plural suffix or plural form</form></cit>
<def>A definition of the headword</def>
<note>Something about the headword, in french</note>
</sense>
</entry>
Please note that only the breton translation contains grammatical infos. Numerical (ls. for liester, plural) and part-of-speech ( aa. for adjective) infos were not a problem. Some breton translations contains two words. Example : the breton translation for 'desk accessory' is 'prest burev' which would have been tagged as :
<cit type="translation" xml:lang="bre">prest burev</cit>
The plural form is 'prestoù burev' and it usually appears in dictionaries as: prest g. -où burev
where g. is the abbreviation for gourel, masculin and -où the suffix for inflected plural form.
Look how i tagged it :
<cit type="translation" xml:lang="bre">prest <gen>g</gen><form type="infl">-où</form> burev</cit>
By the same way i tagged the feminine inflected forms. Here is the xml code for
oberataer g. ―ion (ez b.―ed)
<cit type="translation" xml:lang="bre">oberataer
<gen>g</gen>
<form type="infl">-ion</form>
<form type="infl">ez
<gen>b</gen>
<form type="infl">-ed</form>
</form>
</cit>
There were probably different way to tag it. Let me know if you think it is not a correct way, or if you have better idea...
DenisResults
Here is the P4 tagged file. I still have a problem with my tagging and the P4 DTD. And don't work anymore on this version until I discovered the P5 guidelines.
Here is the P5 tagged file. To get it portable I used the swa-eng DTD (thanks again, Mister Banski !). It seems to be OK. Here is a P5 pack with the dico in a Dict format, the teientry2text.xsl I used to convert this dico...
More infos
Preder page of the Geriadur ar StlennegThe Text Encoding Initiative (TEI) site
Freedict.org, free bilingual dictionaries
TEI on Wikipedia
TEI on Wikipedia [fr]
My main dico page [fr]