ADVERTISER
LINKS
 
activePDF
 
Alexsys
 
Altova
 
Amyuni Technologies
 
Automated QA
 
Axosoft
 
Business Objects
 
Codejock Software
 
ComponentOne
 
Coverity
 
Data Dynamics
 
dtSearch
 
Dundas
 
Dynamsoft
 
Hewlett-Packard
 
IBM
 
Imagix
 
Infragistics
 
InstallAware Software
 
InterSystems
 
iWay
 
Kovair
 
LEAD Technologies
 
McObject
 
Microsoft
 
MKS
 
No Magic
 
nsoftware
 
Parasoft
 
Pegasus Imaging Corp
 
Perforce
 
Prezza Technologies
 
Programmer's Paradise
 
Programming Research
 
Rally Software Dev
 
Red-Gate Software
 
ScaleOut
 
Seapine
 
Serena
 
Software FX
 
Sparx Systems
 
Swell Software
 
Syncfusion
 
TechExcel
 
Telerik
 
UrbanCode
 
WANdisco
 
Xceed Software
 

 

 
 

 
 

 
 
 

 

 

 
AS OF 5/17/2008 2:03PM EST
Sumero-Akkadian Recognized Here
Unicode 5.0 adds scripts from ancient languages
By P. J. Connolly

October 15, 2006 — Fifteen years after its first publication, the Unicode standard has reached a milestone with Unicode 5.0.0, the latest version of the character encoding scheme. The new version includes 1,369 new character assignments, with three new contemporary script families and two ancient: Balinese, N’Ko and Phags-Pa; Phoenician and Sumero-Akkadian Cuneiform, respectively.

The cuneiform characters represent the effort of a multidisciplinary team based out of Johns Hopkins University, known as the Digital Hammurabi project. Much of the project’s efforts and its National Science Foundation grant were devoted to hardware solutions that addressed the problems of scanning three-dimensional clay tablets, and displaying them in a format that allows users to magnify, pan, rotate and tilt the images, and generate three-dimensional models as well as two-dimensional drawings that represent the precious originals.

But software concerns also played a part: The first cuneiform e-mail was sent in 2001, and in 2004, both the Unicode Consortium and the ISO 10646 WG2 working group approved an encoding standard, which incorporated characters from Akkadian, Eblaite, Elamite, Hittite, Hurrian and Sumerian.

Unicode is important in the internationalization and localization of applications; ideally, translatable strings such as dialog boxes and menu items are separated off into resource files, while variable formatting and searching, sorting and other processing are designed to be language-independent. This internationalized application is then packaged with appropriate resource files, becoming localized versions that cost less to produce than those built by translating the entire application into other languages, one at a time.

Mark Davis, president of the Unicode Consortium, explained, “Companies tended to toss their products across the wall to some subsidiary in Japan or France or someplace, and that group would have to make sense of what all this code was.” He observed that “you’d end up with something that was difficult to maintain because you had multiple versions of code floating around,” with expensive barriers to doing business in foreign markets. Although the market for software in Phoenician or Sumerian is virtually nil, the Unicode standard includes archaic scripts in support of academic and antiquarian research.

The bulk of the new characters are from the added scripts; the cuneiform entries alone account for 982 additions. A number of minor additions to Western and Asian character and symbol sets make up the rest of the changes to the character database.

The files that constitute the Unicode Character Database are already available online at the Unicode Consortium’s Web site (www.unicode.org/versions/Unicode5.0.0). A hard copy edition, titled “The Unicode Standard, Version 5.0” (ISBN 0-321-48091-0), will be published by Addison-Wesley in the fourth quarter of this year; the text will be available online in the early part of 2007.

Changes in the book’s physical format and paper stock will result in a lighter, easier-to-use publication. Nevertheless, there’s actually more content than ever: The book will provide the full text of the Unicode standard, including the complete Unicode Standard Annexes, for the first time.

Unicode 5.0 tightens the conformance requirements for bidirectional implementations, used in Semitic languages such as Arabic and Hebrew. A number of behavioral specifications and property values for character, word, line and sentence separation were tweaked for accuracy; case-folding stability is considered improved over Unicode 4.1, and support for pattern syntax characters and stable identifiers is now included.





(NEW!)  


 
 
 
 
 

SUBSCRIBE TODAY

E-Newsletters:
News on Mon/Thurs.
Test & QA Report
EclipseSource
   

   SUBMIT
 
 
 

     CUSTOMER SERVICE
 
   Download Current
   Issue Now!

   Need Back Issues?
    DOWNLOAD HERE

   Moving? Take
   SD Times With You!
 
 
 
EVENTS CALENDAR
 
IDUG (International DB2 Users Group)
5/18/2008 to 5/22/2008
Dallas
IDUG

BREW 2008
5/28/2008 to 5/30/2008
San Diego
Qualcomm

RailsConf
5/29/2008 to 6/1/2008
Portland
O'Reilly Media

IBM Rational Software Development Conf.
6/1/2008 to 6/5/2008
Orlando
IBM Rational

TechEd 2008 Developers
6/3/2008 to 6/6/2008
Orlando
Microsoft

REGISTER
 



 
SD TIMES 100

It's time once again to
recognize the organizations
or individuals that have
demonstrated leadership in
their markets.


 
GET NOTIFIED

On the latest white papers,
software downloads. Web
seminars and conferences.
 
 


                    


Copyright © 1999-2008 BZ Media LLC, all rights reserved.
Phone: +1 (631) 421-4158 • E-mail: info@bzmedia.com