My FAQ,最新最全的IT技术FAQ
最新100篇 | 推荐100篇 | 专题100篇 | 排行榜 | 搜索 | 在线API文档
首 页 | 程序开发 | 操作系统 | 软件应用 | 图形图象 | 网络应用 | 精文荟萃 | 教育认证 | 未整理篇 | 技术讨论
  当前位置: > 程序开发 > 编程语言 > Java > 数据库
Turkish Java Needs Special Brewing @ JDJ
作者:未知 时间:2005-08-10 22:59 出处:Java频道 责编:My FAQ
              摘要:Turkish Java Needs Special Brewing @ JDJ

On a recent trip to Turkey to meet with a customer, I heard a comment that one of the reasons Java is being held back in that country is because of an almost ubiquitous local bug.

In the Turkish alphabet there are two letters for "i," dotless and dotted. The problem is that the dotless "i" in lowercase becomes the dotless in uppercase. At first glance this wouldn't appear to be a problem; however, the problem lies in what programmers do with upper- and lowercases in their code.

The two lowercase letters are \u0069 "i" and \u0131 (dotless "I") and are totally unrelated. Their uppercase versions are \u0130 (capital letter "I" with dot above it) and \u0049 "I". The issue is that this behavior does not occur in English where the single lowercase dotted "i" becomes an uppercase dotless "I."

With the statement String.toUppercase(), most Java programmers try to effectively neutralize case. Consider a HashMap with string keys and you have a key that you want to look up. If you want to ignore case, you'll probably uppercase everything going into the map, its entries, and the string you're doing the lookup with. This works fine for English, but not for Turkish, where dotless becomes dotless. I was shown an example of this bug in a popular HTML editor where a developer had done this with the set of HTML tags, so <title> would be indistinguishable from <TITLE> to their program and all variants in between, and probably looked like:

If (tagEnteredByUser.toUppercase().equals("TITLE"){
doTitleTagStuff();
}

In Turkish when "title" is entered, the resulting uppercase string has a dotted uppercase I (not the English dotless one) and the program wasn't working as desired. This bug is just one example of where it had occurred. Another popular Java application failed with a similar bug tied back to the following code:

if (System.getProperty("os.name").toUppercase().equals("WINDOWS"){
doStuffSpecificForWindows();
}

The current locale is set as the user's country, and the implementation of string methods use the default locale.

String toUppercase(){
return toUppercase(java.util.Locale.getDefault());
}

Given that this works for English (where /u0060 uppercases to /u0049 correctly), why doesn't it hold true for Turkish? The developer did find special code that deliberately does the dotted to dotted, dotless to dotless, complete with a comment ironically stating:

// special code for turkey

The solution is to specify an explicit English locale when uppercasing for programmatic purposes, so the first line of buggy code would become:

If (tagEnteredByUser.toUppercase(java.util.Locale.ENGLISH)).equals("TITLE"){
doTitleTagStuff();
}

Even if this were diligently done by everyone developing your code, you'll still encounter a problem when using something written by someone else whose source you don't have access to. For this the current workaround by Tamar Sezgin and others is to switch the locale of the program before the buggy code, make the call, and then switch back.

Locale.setDefault(Locale.ENGLISH);
// Use incorrectly written code
Locale.setDefault(new Locale("tr","","");

The problem with this is that it fails to follow the principle of least astonishment. It's only there because Java supports locale-sensitive case conversion. However, this isn't offered by alternatives such as VB, C++, or Delphi, where case conversion follows English rules and if you want to do dotless "correctly" you have to implement it yourself. The only case where you would actually want to do it "correctly" would be for a user-visible string accepting a Turkish name (such as a surname), and the developers who want to do this would be those who were more likely to be aware of locale issues. The exception would then be:

Locale turkishLocale = new Locale("tr","","");
String tag = anotherUserVisibleString.toUppercase(turkishLocale));
String s2 = anotherUserVisibleString.toUppercase(turkishLocale));
If(s1.equals(s2)){
doSomethingFunWithTwoEqualsStrings();
}

However, even better would be:

If(sq.equalsIgnoreCase(s2)){
doSomethingFunWithTwoEqualsStrings();
}

so the only real case of wanting to uppercase a user-visible string to compare against another user-visible string is left to developers of database indexes and doesn't need to be tackled at all by most Java programmers.

There is a PMR 53119 open to try to get Java changed so the default logic is to assume the string is not user visible. However, because this would be a breaking change to the current behavior, it can't be done. In the meantime, I would urge all developers who ever find themselves converting a string into upper- or lowercase to think about whether these are user-visible strings. If not, make sure you explicitly use the English locale, otherwise you're going to serve up Java that tastes great everywhere except Turkey.

.  .  .

I would like to thank Tamar Sezgin of IBM Turkey for explaining this problem to me and helping with this editorial.

 
首页 | 投资与合作 | 服务条款 | 隐私政策 | 收藏本站 | 设为首页 | 新用户注册 | 免责声明 | 使用帮助
Copyright ©2005-2008 myfaq.com.cn All rights reserved. www.myfaq.com.cn 版权所有