Header graphic 6 of 8



Other stuff

Other sites

I wish this site were powered by Django

July 28th, 2006

Typo3, PHP, MySQL connections and Unicode

Filed under: Attitude,Cutting the crap,PHP,Technology — jm @ 15:30

I have said it before and now I have to say it again.

Over the last three months I've spent a ridiculous amount of time on a Typo3 project for the Chair for Business Engineering at the University of Regensburg. We built a full-featured DMS based on Subversion, Typo3's DAM and WebSVN. We got hit by PHP's miserable Unicode support so hard, it isn't even funny anymore.

I always assumed that using Unicode strings with PHP can derail a project badly as soon as you run into one of the thousand's of possible problem scenarios, but now that it happened to me, I can prove it. What happened was that while Subversion, interestingly, fully supports UTF8-encoded log messages and URLs, it also gives an error and exits if it detects encoding errors. So on multiple occasions, when we built a log message in a PHP script by concatenating 2 Strings like $str = $strFromWebBrowser . $strFromPHP; we'd end up with an error message, because $strFromWebBrowser was UTF8-encoded by the browser, but $strFromPHP was not. We were able to fix this by enforcing internal UTF8 encoding using PHP's mbstring extension and limiting ourselves to the few string functions that mbstring does overload.

The second and really hard to debug problem that hit us was that Typo3 4.0 (Typo3's total crappiness is a topic for another post) still opens database connections using PHP's old mysql library. This means that if you want to work with MySQL 4.1 or better, you don't get support for all the fancy new binary protocol features, like the client character set support in the connection packet header. So now you have a database that's fully UTF8-encoded in MySQL 5.0, but every connection that Typo3 opens in its central database class class.t3lib_db.php has character_set_client set to "latin1". This means that if PHP sends UTF8 data across this database connection, the string is double-encoded. This also happens when Typo3's "force_utf8" option is set. It took a long night with hexdump and this ISO8859-1 to UTF8-table to figure this out and we fixed it literally in the last minute.

You might think that enforcing a client character set using MySQL's init-conn configuration option could solve the problem, but unfortunately MySQL seems to ignore this setting in the current release. So the only option was to patch Typo3's database code, inserting these two lines into the code:

@mysql_query('SET NAMES utf8',$this->link); @mysql_query('SET CHARACTER SET utf8',$this->link);

I still think that PHP's new mysqli library is crap, too, because the parameter binding is as bad as Java's (no named parameter support) and the resultset handling just sucks, but at least it fixes most SQL-injection attacks by supporting prepared statements. That Typo3 is still tied to MySQL as a database is just sad, but I can't decide what they should fix first, the ugly admin interface or their database code. I tend to the latter as you can't install Typo3's DAM extensions if you install their database abstraction layer (that nobody uses anyway), which is an undocumented dependancy, by the way, that hit us during the first few weeks of development.

So there you have it: please please please don't use PHP for a project if you have any other choice.

Comments are closed.