International Language Support in XML
Because XML is by and large just a text document format with features for validation checking and representing structured, hierarchical metadata, there is nothing that restricts it from being applicable in only certain geographies. As a result, the W3C and other stan-dards organizations have gone through great pains to make sure XML can support vari-ous international and localization needs that have plagued the adoption of other document formats. In particular, XML is capable of supporting a number of languages, data formats, character sets, and peculiarities of localization that allow the format to not only cross geographic boundaries but logical boundaries as well.
Developed prior to the emergence of XML, the Unicode standard is a universal character set whose goal is to provide an unambiguous encoding of the content of plain text that can be written in any and all languages of the world. The latest version of Unicode, ver-sion 3.0, covers almost all the languages and dialects used in the world, including lan-guages that are no longer actively spoken. Unicode 3.0 contains all the characters needed by these languages as well as additional characters used for interoperability with older character encodings and for control functions.
Because XML is a text-based language, it is dependent on characters and the representa-tion of those characters. As such, it has relied on a version of Unicode to encode its ele-ments, attributes, and data content. Therefore, XML can support as part of its native specification any of a number of major language and character sets, thus enabling the encoding of almost any text document. However, it should be noted that some inconsis-tencies in Unicode adoption are present. For example, a wider use of Unicode characters is permitted in general XML content than is allowable for element and attribute names. A movement is underway to correct this shortfall and allow for an equally wide use of arbi-trary and complete characters as are possible in Unicode 3.0 and future versions.