My account My account Contact Version fr
Search  
Home    News    Informations

Thesis online: Méthodes pour informatiser des langues et des groupes de langues « peu dotées »


Contact name : Vincent Berment, Equipe GETA, laboratoire CLIPS, IMAG, 385 rue de la Bibliothèque, BP 53, 38041 GRENOBLE CEDEX 9
Area of Research : Asia
Date :01-07-2004 to 31-08-2005
send a message

Chers tous,

Ma thèse est maintenant disponible sur les deux sites indiqués ci-dessous.

http://tel.ccsd.cnrs.fr/documents/archives0/00/00/63/13/index_fr.html
http://bibliotheque.imag.fr/publications/theses/2004/Berment.Vincent/notice-francais.html


Je vous en rappelle le titre et le résumé ci-dessous.

Cordialement,

Vincent Berment

PS : Les parties écrites avec des systèmes d'écriture non latins sont apparemment toutes bien passées lors du passage en PDF, y compris celle en écriture verticale (chinois traditionnel). Dans le cas où vous rencontreriez des difficultés pour les visualiser, merci de m'en faire part.

-------------------------------------------------------

Titre de la thèse :

Méthodes pour informatiser des langues et des groupes de langues « peu dotées »


Abstract

In 2004, less than 1 % of the 6809 languages of the world profits from a high level of computerization, including a broad range of services going from text processing to machine translation. This thesis, which focuses on the other languages - the pi-languages - aims at proposing solutions to cure their digital underdevelopment.

In a first part, intended to show the complexity of the problem, we present the languages' diversity, the technologies used, as well as the approaches of the various actors: linguistic populations, software publishers, the United Nations, States... A technique for measuring the computerization degree of a language - the sigma-index - is proposed, as well as several optimization methods.

The second part deals with the computerization of the Laotian language and concretely presents the results obtained for this language by applying the methods described previously. The described achievements contributed to improve the sigma-index of the Laotian language by approximately 4 points, this index being currently evaluated with 8.7/20.

In the third part, we show that an approach by groups of languages can reduce the computerization costs thanks to the use of a modular architecture associating existing general software and specific complements. For the most language-related parts, complementary generic lingware tools give the populations the possibility to computerize their languages by themselves. We validated this method by applying it to the syllabic segmentation of Southeast Asian languages with unsegmented writings, such as Burmese, Khmer, Laotian and Siamese (Thai).

Keywords : Language Computerization, Poorly Computerized Languages, Word Processing, Virtual Keyboard, Sort, Phonetic Transcription, Electronic Dictionary, South-East Asia Language, Unsegmented Writing Systems, Segmentation, Natural Language Processing, Unicode







News Search
News archives
News main page
Add a news




News
  News main page   News archives

Calls, Offers
  calls & offers main page   calls & offers archives

 
 
Printable version

Web site creation