Tag the Preder Geriadur ar Stlenneg to comply with TEI guidelines

This page aims to describe how I tagged the Preder Geriadur ar Stlenneg.
This is a french-english-breton dictionary of computing. It has been written by Guy Etienne and published by Preder in 1995. This is not a free resource. It will probably not be possible to spread it easily.

Why to tag this dictionary ?

There are few resources in breton with the computing vocabulary. And the first on-line dictionary people think to doesn't contain the vocabulary used in free softwares (OOo, Firefox, Gimp or even Gnome applis). So I collected some words and translations in order to write a glossary.
I wanted a dictionary that could be eng(<>fra<)>bre in order to be useful for software translation and bre(<>eng<)>fra for software use.

Because it keeps lot of infos (even not displayed) I choose to write this dictionary following the TEI format.

That's probably not the easiest way and I've never done it before. I needed an exercise...

The Geriadur ar Stlenneg was the kind of resource I wanted to see in the dictionary of my dreams.
Since november 2009 Preder lets people download this dictionary with a Windows applet. (No licence given...)

I can't use this applet with my linux OS, but the dictionary database was in abs format and it is possible to convert it.

From the csv file I tagged this dictionary to get a TEI P5 compliant file.

Thanks to Piotr Banski for the advices, the swa-eng dictionary was an helpful example.

How ?

There are two types of entry-template, the first one only contains example(s) as follows : <entry n="nb">
   <form><orth>nom</orth></form>
   <sense>
      <cit type="example">
          <quote>Un exemple en français</quote>
          <cit type="translation" xml:lang="eng">An example in english</cit>
          <cit type="translation" xml:lang="bre">Ur skouer e brezhonneg</cit>
          <usg typ="dom">domain in french</usg>
      </cit>
      <cit type="example">
          <quote>Un autre exemple en français</quote>
          <cit type="translation" xml:lang="eng">An other example in english</cit>
          <cit type="translation" xml:lang="bre">Ur skouer all e brezhonneg</cit>
          <usg typ="dom">domain in french</usg>
      </cit>
   </sense>
</entry>

The second type contains the translations and a definition, sometimes a note : <entry n="nb">
   <form><orth>nom</orth></form>
   <sense>
      <cit type="translation" xml:lang="fra">nom en français</cit>
      <cit type="translation" xml:lang="eng">name in english</cit>
      <cit type="translation" xml:lang="bre">anv e brezhoneg<gen>gender</gen><form type="infl">plural suffix or plural form</form></cit>
      <def>A definition of the headword</def>
      <note>Something about the headword, in french</note>
   </sense>
</entry>
Please note that only the breton translation contains grammatical infos. Numerical (ls. for liester, plural) and part-of-speech ( aa. for adjective) infos were not a problem.
Some breton translations contains two words. Example : the breton translation for 'desk accessory' is 'prest burev' which would have been tagged as : <cit type="translation" xml:lang="bre">prest burev</cit> The plural form is 'prestoù burev' and it usually appears in dictionaries as:
prest g. -où burev
where g. is the abbreviation for gourel, masculin and -où the suffix for inflected plural form.
Look how i tagged it : <cit type="translation" xml:lang="bre">prest <gen>g</gen><form type="infl">-où</form> burev</cit> By the same way i tagged the feminine inflected forms. Here is the xml code for oberataer g. ―ion (ez b.―ed)   <cit type="translation" xml:lang="bre">oberataer
    <gen>g</gen>
    <form type="infl">-ion</form>
    <form type="infl">ez
      <gen>b</gen>
      <form type="infl">-ed</form>
    </form>
  </cit>

There were probably different way to tag it. Let me know if you think it is not a correct way, or if you have better idea...

Denis

Results

Here is the P4 tagged file. I still have a problem with my tagging and the P4 DTD. And don't work anymore on this version until I discovered the P5 guidelines.

Here is the P5 tagged file. To get it portable I used the swa-eng DTD (thanks again, Mister Banski !). It seems to be OK. Here is a P5 pack with the dico in a Dict format, the teientry2text.xsl I used to convert this dico...

More infos

Preder page of the Geriadur ar Stlenneg
The Text Encoding Initiative (TEI) site
Freedict.org, free bilingual dictionaries
TEI on Wikipedia
TEI on Wikipedia [fr]
My main dico page [fr]