logo
Header graphic 6 of 8

Read more

External links

More

Last update: 02/21/2008

I’m sorry, but PHP sucks.

Tim Bray says: “based on my limited experience [...] all the PHP code I’ve seen in that experience has been messy, unmaintainable crap. Spaghetti SQL wrapped in spaghetti PHP wrapped in spaghetti HTML, replicated in slightly-varying form in dozens of places”.

Right off the bat: there are things to like about PHP. I will list those first, to get them out of the way. I think this is important so that I don’t get mail that basically says: “yeah, but X”.

  1. “PHP makes it easy to get things done for a beginner”

    This statement is absolutely, positively, true. I taught a course on PHP, so I know. While all Programming 101 courses that start with Java will spend a lot of time explaining things that beginners have a hard time to understand, like OOP principles and so on, PHP lets you connect to a database in no time. MySQL support? It’s filed right under “mysql”. A PHP page at first relates to exactly one HTML page. This concept makes you feel powerful. Something most other programming languages, at least at first, don’t.

    PHP gets out of your way, in the start, more than anything else. It’s what people who already know about OOP like about RoR.

  2. “PHP is easy to install”

    In fact, there are multiple packages like XAMPP, that get you started in no time.

Unfortunately, that’s where the fun ends. Here’s what makes PHP a toy and not a tool. I can’t stress that enough. If you’re planning to develop a software system and you have a choice on which technology you want to implement it: there is almost NO reason for implementing it with PHP, but a lot of reasons against. I already used PHP when it was called PHP/FI. I understand that today’s PHP has organically grown out of that CGI binary. But unfortunately PHP is now being marketed as a “enterprise-class” technology.

This is why it sucks.

  1. There are 3 incompatible versions of PHP.

    There have been 2 forward-incompatible changes in the codebase that make a certain amount of code backward and forward-unportable. While backward-portability is most often not supported, as new versions of the runtime introduce new features, forward-portability, that means backwards-compatibility for the runtime, is a must. Especially for software systems that need big investments to develope.

    In detail, the changes were:

    • from PHP 4.3 to PHP 4.4 the behavior of references has been changed to avoid a memory leak.

      This means that the perfectly reasonable code:

      function &myFactoryMethod() {
          return &new ProtectedClass();
      }

      does not work anymore. Instead, to do the same thing you now have to write:

      function &myFactoryMethod() {
          return $var = &new ProtectedClass();
      }

      While small scripts tend to not use a factory pattern, or any pattern for that matter, every properly designed medium-sized software will.

    • from PHP 4.x to PHP 5.x many core concepts of the PHP language have been changed to better support OOP.

      Not only were new features introduced, that would be expected, but also a lot of behavior has changed. Call-by-reference is now standard behavior, exceptions were introduced, new keywords have been reserved, attribute and method visibility has been added. While the Zend Corporation ensures that “most scripts will work flawlessly with the new object model”, fact is that only few systems have been ported to PHP 5.x by today (02/12/2006). Many libraries remain unavailable. Still, PHP 5 seems to be a major push from the developers to get their act together. Unfortunately everybody is still having PHP 4 installed, as most bigger software systems don’t work with PHP 5, even as Zend asserts otherwise.

      However, one of the most pressing problems, the incredibly bad character set support, has not been adressed.

      A smaller problem that should not go unnoticed is that PHP’s generally good idea of making things easy by providing direct access to functions like the mysql_*-family also broke compatibility when MySQL 4.1+ introduced new binary protocol features. Any script that doesn’t use the mysqli_* functions or a proper database abstraction layer for PHP (like ADODB or PEAR-DB) is broken and needs to be updated for the new database version.

      I didn’t realize how badly broken these scripts are, until I recently finished a project that ran exactly into this problem.

      It has been pointed out to me that PHP 6 will finally introduce Unicode support. Which will be another incompatible change. So there will be only incompatible versions of PHP. I’m guessing that we’ll see a repeat of the <sarcasm>landslide that is the adoption of PHP 5</sarcasm>.

  2. PHP’s character-set support is so bad, I want to scratch my eyes out

    Parsing text, like HTTP form input, with PHP is incredibly hard to get right. Even if you’ve compiled the mbstring extension and configured PHP to override its string functions (most likely breaking the often-used mail()-function on the way), the character-set support is incredibly bad. I’ll first give you some pointers before I start the head-shaking:

    • PHP can supposedly parse XML. It really can’t. It can pattern-match properly structured text files in certain character-sets only used in a small part of the world

    • Some security-critical functions in PHP break if used with certain character sets. A list of supported character sets is here, but unfortunately this doesn’t include for example CP850.

    Abetting this problem is that many developers, especially beginners with PHP, have absolutely no idea what a character-set really is, so it’s hard to explain the problem. In fact, most of the people teaching courses at the University of Regensburg have no idea what a character-set is. The quote I use most often, comes from the C 101 course that I had to take in 2004: “German Umlaute don’t work in C, so don’t use them”.

    If you are one of those people read at least this article by Joel Spolsky and then go to this post by Mark Pilgrim and don’t stop researching the net until you know what it means.

    The problem is also supported by the fact that the database most PHP applications are built on (MySQL) has had notoriously bad character-set support up to version 5.0 (the support was phased in in the 4.1 release, but there it only broke lots of Java applications because the standard table collation was Swedish).

    While MySQL’s character-set support is not exceptionally bad anymore, it’s still bad!

    Interestingly, people with mostly western character-sets never stumble over this problem until their web-application gets so popular that someone from Russia tries to leave a comment in russian (and their forum software suddenly falls apart in that thread). However, people like the makers of ezPublish come from Norway and recognized PHP’s problems right from the start.

  3. (string)"false" == (int)0 is true

    Now that’s just nasty. Please note the explicit casts. Even in a dynamically-typed language this shouldn’t happen, because you’ve just explicitly “statically-typed it”. Of course the opposite of this flawed creation also evaluates to true: (boolean)false == (string)"0".

    Please note that this problem has nothing to do with the use of double-quotes.

    This is PHP’s type coercion gone haywire, because every other not-empty string value is interpreted as true.

    But of course this problem has a solution: enter PHP’s flawed identity concept. Take the following code:

    if ("false" == true) echo "true\n";
    // => true
    
    if ("false" == false) echo "true\n";
    // => false
    
    if ("false" == 0) echo "true\n";
    // => true, wtf
    
    if (false == 0) echo "true\n";
    // => true, as expected
    
    // so "false" = true && "false" = 0, so "false" is true and 
    // false is false and we haven't even discussed identity, yet
    
    if ((string)"false" === (int)0) echo "true\n";
    // => false, ...ok...
    
    if ("0" === 0) echo "true\n";
    // => false
    
    if ("false" === false) echo "true\n";
    // => false
    
    if ((int)"0" === 0) echo "true\n";
    // => true, with type coercion

    This means: if you want to predictably compare two variables and you don’t control their input range (i.e. one of them contains user input, for example), you have to compare them by identity, but then you have to explicitly convert them and know what their contents’ type is, so you get a predictable result. So PHP because of the strange conversion mentioned above, essentially has 2 type systems that behave differently. Because “false” is 0 or false in one type system, but “false” is true in the other (or false… sometimes).

  4. References are syntactically complex and implemented, let’s be nice, “badly” (PHP 5 solves some of this with an incompatible change)

    The first part of the sentence means that any kind of high-performance object-handling, be it arrays, hashes or object instances, requires a lot of syntactic sugar that invites syntax errors and produces bugs just to avoid the “copy-by-default pass-by-value” handling that PHP 4 normally uses. It’s just like C, but C knew when you tried to return a pointer (or a reference in C++, for that matter), so you still needed less syntactic sugar than you need in PHP.

    In PHP 5 objects are passed by reference by default.

    The second part means that references aren’t references like you know them from any other language. References are aliases in PHP. The reason why you have to assign a new object reference to a variable in PHP 4.4? Otherwise there exists no name for the returned variable to point to and the garbage collector never finds it.

    function &myFactoryMethod() {
      return $var = &new ProtectedClass();
    }
    $xyz =& myFactoryMethod();
    // now $xyz is not a reference to the returned object, it is AN ALIAS for $var

    Why don’t they just automatically create names for anonymous variables? I don’t know… <sarcasm>perhaps they thought that would have been bad design</sarcasm>.

  5. PHP isn’t thread-safe

    PHP 4 and 5 can only be used with Apache2′s mpm_prefork model, not with mpm_worker. That means that PHP limits your performance choices with the Apache HTTP-server. It seems that it can be used with mpm_worker by not using the mod_php plug-in directly, but by using FastCGI. However, currently (11/11/2006) there’s at least one bug in PHP’s fastcgi support that could smack you in the face when you least expect it.

    In a presentation by Cal Henderson about Flickr’s use of PHP that I found recently, he says “PHP leaks memory like a sieve”, which is one of the main problems for making PHP work with FastCGI, SCGI or a threaded execution model.

    PHP’s session handling is single-threaded, too

    Another real caveat here is PHP’s session handling. By default, PHP will use disk-based sessions (i.e. files) and will acquire a lock on the file as soon as you call session_start(). This will essentially make sure that multiple requests that use sessions will run serialized, one after the other. While session-handling in concurrent scenarios is not a trivial task, it still is important to note that with the default setting of PHP, if you do a session_start() at the top of every PHP page, user’s will experience strange behavior on your site. For example, if they open multiple links in tabs or windows, or your site uses frames, these will load sequentially, not in parallel. You can alleviate this problem by closing your sessions with session_write_close() or by using a solution like Sharedance to solve this problem.

  6. The quality of PHP’s compiled-in libraries differs widely

    While, of course, the quality of all libraries differs widely, in most other systems, the core libraries are designed to a certain set of standards. PHP’s internal XML support just sucks. At the same time PHP’s developers couldn’t decide if they prefer the underscore method of function_naming() or Java-style functionNames(). So they just used both… sometimes… when they didn’t use C-style fncnms().

    Also, as Jason pointed out, some functions (like money_format()) are only defined on some platforms and not on others. Please note: money_format() is not a function that is provided by an extension, it’s part of PHP’s standard library!

  7. PHP’s support for large integers is exactly as idiotic as its unicode handling and its type system

    PHP’s integers are 32-bit signed integers on 32-bit platforms and 64-bit signed integers on 64-bit platforms. There’s no official way to tell what the size of an int is.Theoretically, if you were to load a 32-bit unsigned number from a binary stream, you could use unpack(). That would convert a really big number into a float, if it didn’t fit into 32-bit, a problem in its own right. Unless you’re running PHP version 5.2.1, because that version had a bug in unpack(), but I digress.

    So if you want to write a PHP program that can handle integers larger than 2^31 on any platform, you can’t use normal operators (+ - * /), you’ll need to use the bcmath library. Oh, you also can’t use unpack() because of one of my favorite notes in PHP’s documentation:

    “Note that PHP internally stores integral values as signed. If you unpack a large unsigned long and it is of the same size as PHP internally stored values the result will be a negative number even though unsigned unpacking was specified.”

    So you want an unsigned integer? Ummm… no, sorry, can’t expect us to return one. Hopefully you’re not writing software for an electronic voting box! You can read more on this in an excellent post titled “Integers in PHP, running with scissors and portability” on the excellent MySQL performance blog. They also came up with a portable function that allows you to sidestep this issue, it reads like this (they could have chosen better variable names, though :-) ):

    function _Make64 ( $hi, $lo ) {
        // on x64, we can just use int
        if ( ((int)4294967296)!=0 )
            return (((int)$hi)<<32) + ((int)$lo);
    
        // workaround signed/unsigned braindamage on x32
        $hi = sprintf ( "%u", $hi );
        $lo = sprintf ( "%u", $lo );
    
        // use GMP or bcmath if possible
        if ( function_exists("gmp_mul") )
            return gmp_strval ( gmp_add ( gmp_mul ( $hi, "4294967296" ), $lo ) );
     
        if ( function_exists("bcmul") )
            return bcadd ( bcmul ( $hi, "4294967296" ), $lo );
    
        // compute everything manually
        $a = substr ( $hi, 0, -5 );
        $b = substr ( $hi, -5 );
        $ac = $a*42949; // hope that float precision is enough
        $bd = $b*67296;
        $adbc = $a*67296+$b*42949;
        $r4 = substr ( $bd, -5 ) +  + substr ( $lo, -5 );
        $r3 = substr ( $bd, 0, -5 ) + substr ( $adbc, -5 ) + substr ( $lo, 0, -5 );
        $r2 = substr ( $adbc, 0, -5 ) + substr ( $ac, -5 );
        $r1 = substr ( $ac, 0, -5 );
        while ( $r4>100000 ) { $r4-=100000; $r3++; }
        while ( $r3>100000 ) { $r3-=100000; $r2++; }
        while ( $r2>100000 ) { $r2-=100000; $r1++; }
    
        $r = sprintf ( "%d%05d%05d%05d", $r1, $r2, $r3, $r4 );
        $l = strlen($r);
    
        $i = 0;
        while ( $r[$i]=="0" && $i<$l-1 )
            $i++;
    
        return substr ( $r, $i );         
    }

    Yes, this is the kind of code you need to write to create PHP code that is portable and predictable. Want to go back to C++? I won’t blame you.

Conclusion

I maintain that of all the dynamically-typed (or freedom) languages PHP has the most-approachable syntax especially for developers coming from the typical statically-typed languages. I guess that this plays a huge part in PHP’s popularity right now. Also, the adoption of PHP by Oracle and the availability of many polished and sophisticated applications are important. There, at least from my point of view, seem to be far more content management systems and other tools, written in PHP, than in Python or Ruby. Especially some of the most popular tools in the developer community are written in PHP: PhpPgAdmin and PhpMyAdmin.

That said, I am a huge advocate for choosing “the right tool for the job” and that, of course, means that you might want to choose PHP under the right circumstances. I’ll give you a few examples:

  1. You have found the ideal framework or base for your software and it’s written in PHP

    This website runs on WordPress. It has by far the best UI of any weblogging application. It also means that I have to navigate around all the problems I mentioned in this document. My guess is that in the long run that will take more time than not using a PHP based weblogging application, but for now, I just wait for PyBloxsom to catch up.

  2. You already have a huge investment in PHP technology
  3. Your time-constraints do not allow you to learn something else

For me, PHP is now an unacceptable solution to all but the simplest problems. At least, if you’ve made it this far through this article you now know what the most common pitfalls are and can avoid them. But please, please don’t believe that writing stable, maintainable and most-of-all predictable code in PHP is easy, even if you use PHP 5 and a MVC web framework (like symfony). In my opinion, doing anything with PHP is almost as hard as with C. You’re far better off if you invest elsewhere.

Have fun!

I hope this article gave you some pointers. If you have any questions or want to leave a comment, you’re free to do so by the way. Just go to the comments page for this article. Also, take a look at the external links they might prove quite helpful.

Now, if you’re already invested in PHP, there are ways to make it work. No problem mentioned in this article doesn’t have a solution with some of them listed right here in this article. The problem is making these solutions maintainable, making them predictably work on your platform and documenting all shortcomings and risks. In Cal Henderson’s Flickr presentation, he says that unicode with PHP is easy. “Just set the right headers” and “don’t use htmlentities()“, because that will mess up your strings. But that’s not the whole truth, for example. If you do it that way, you’ll also need a very good HTML filter to prevent XSS attacks. However, not knowing about these problems, or worse, trying to talk them away, will not help you to build stable software.

So I hope I got you thinking and please don’t think that Python, Java and Ruby don’t have any problems, it’s just that PHP has really many of them :-). Knowing about the shortcomings of different languages and their implementations can only make you a better programmer. So good luck, have fun and thank you.

Changelog

02/21/2006: As a response to the first version of this article, Harry Fuecks posted A pro PHP rant on SitePoint. PHP’s XML support, he says, became better with PHP 5 and Unicode-support is finally planned for PHP6. He also says that PHP scales very well and that it’s database support might be better than that of Ruby or Python.

The PHP developers tried to speed the adoption up by ending support for PHP4 in 2007. They now have conceded to continue support for PHP4 in 2008. I guess making PHP5 not backwards-compatible was not such a good idea, even if necessary to bring the language forward.

So my response to Harry Fuecks is: at the rate that people are adopting PHP 5, I won’t hold my breath for PHP 6′s Unicode support. Simply because of all the compatibility problems that it doubtlessly will bring for all the software that expects ISO-8859-1 byte-sized characters everywhere. But what do you know… with the release of PHP 6, PHP might really become a viable enterprise development platform. 8 years after everybody else had these features and years after Zend said that it was… but progress is progress, isn’t it? (right?)

07/28/2006: I have a new post up about PHP. After working with PHP’s mysqli extension for a while, I also fully disagree with Harry Fuecks’ assertion that PHP’s database support is better than anybody else’s… read the post ;-)

07/18/2006: I added some clarifying remarks about PHP’s identity model in the sidebar.

11/11/2006: I added a link to a PHP/fastcgi bug that broke WordPress 2.0.5 for people using fastcgi. I also removed Ambivalence as an example for a PHP MVC web framework at the end of the article, because it wasn’t updated in a long time. Instead I now link to symfony, which seems to have a big community and support. If you need to do something in PHP, you might as well do it with the best tools available.

02/22/2008: I’ve rewritten parts of the article based on comments and new developments. I wanted to clear up some confusion about PHP’s type system, the original article didn’t cover that topic very well and add a section on the brain-dead handling of 64-bit integers and the pitfall that is PHP’s session handling. I also tried to restructure the notes in the article, so I moved many of them to this section at the bottom as they lost relevance over time. Additionally I restructured the site itself because I want to expand on the theme of informing people about shortcomings in programming languages. You can take a look at that effort here.

02/25/2008: another update to the identity thing. It somehow is hard to me to find the right examples to make my point… I hope it works now :-).