Unicode and Text
...or if you think that XML is somehow easy to use, you're wrong!
In fact, look at these articles by Michael Kaplan:
- "Every character has a story #4: U+feff (alternate title: UTF-8 is the BOM, dude!)" or "How Notepad did something correctly that the entire RSS 2.0-cheerleader crowd still doesn't get in their head"
- "The jury will give this string no weight" or "You may not be able to compare two strings in a XML document, even by using the right APIs, as soon as you leave the US". Don't you dare to believe that
while (*c != 0) { if (*c != *d) strEqual=FALSE; c++; d++; }is valid code in any way!
Michael really gets "it". Most of the RSS-supporting crowd doesn't. Text is HARD! (HARD! HARD!). Writing a specification that takes these things into account is even harder.
Now all of that wouldn't be a problem if we could trust our educational institutions to actually teach their engineering students about these things. But they're not. Instead hundreds of computer science undergrads of the University of Regensburg today believe that you can't use German Umlaute in C.


