The Sinhala Collation Sequence and its Representation in UNICODE

TitleThe Sinhala Collation Sequence and its Representation in UNICODE
Publication TypeJournal Article
Year of Publication2006
AuthorsWeerasinghe, AR, Herath, DL, Gamage, K
JournalLocalisation Focus - The International Journal of Localisation
KeywordsCollation, Internationalisation, localisation, Sinhala, UNICODE, UNICODE Collation Algorithm

The alphabet of a language is perhaps the first thing we learn as users. The alphabet of our mother tongue would be the first alphabet we ever learn. And yet, a closer look reveals that there is much about such an alphabet that we have not explicitly specified anywhere. The Sinhala alphabet order is a prime example. We use it, recite it and yet would be hard pressed to define it explicitly.

Sinhala is spoken in all parts of Sri Lanka except some districts in the north, east and centre by approximately 20 million people. It is spoken by an additional 30,000 (1993) people in Canada, Maldives, Singapore, Thailand and United Arab Emirates. Sinhala is classified as an Indo-European language and used as an official language.

The UNICODE Collation Algorithm (UCA) is an attempt to make explicit the collation sequence of any language expressed in the UNICODE (or any other) coding system. In order to express the Sinhala collation sequence (alphabetical order) using UCA, the authors undertook the task of identifying unresolved issues facing the unambiguous definition of the order. This paper first describes the issues identified through this study, suggesting alternate solutions and recommending one of them. Finally, it sets out the recommended collation sequence for Sinhala in the form of the UNICODE collation specification. The outcome of this process is a unique and unambiguous expression of the Sinhala collation sequence which could be tested using existing tools and software environments.


Download Publication: