Yet Another Programming Blog
PHP, Drupal, Zend Framework. Done.
PHP, Drupal, Zend Framework. Done.
Fri, 7 Oct, 2011
A few months ago a colleague mentioned Skybound's Stylizer for CSS development. It looked awesome in the video demo so I was quite eager to try it out. About one month ago I was tasked with re-theming my employer's web site (which is, of course, powered by Drupal), so figured that would be a great opportunity to test out Stylizer.
At first glance, Stylizer looks quite beautiful, especially with the dark theme and the fine graphics on the instrument panel. However, flicking constantly between Firefox and Eclipse (mostly white) and Stylizer (mostly black) almost made my retinas explode, so it was a good thing I found the option to switch Stylizer to a mostly white theme, which solved the problem.
My overall plan for the project was to use the Zen Starter Kit to create a new Zen sub-theme and build up from there. One of the things I discovered about Zen (and, indeed, many other Drupal themes) is that it takes "separation of concerns" to a whole new level, resulting in a very large number of CSS files (17 if you strip it back to the bare minimum). Drupal also likes to produce quite a few levels of nested divs. Both of these factors make learning Stylizer a bit more difficult, as it complicates the navigation of the CSS files, even with the "bullseye" functionality. As a result, I would definitely not recommend trying to learn Stylizer on a Drupal theme.
One of the great things I noticed about Stylizer when watching the demo was the code reformatting. Stylizer provides some great options, but unfortunately, not reformatting the code is not one of them. Although this isn't a huge deal, it does mean you'll somewhat lose the ability to make small mods to your sub-theme and then compare your updated files with the original version later to see what you changed. I suppose you could save all your files with Stylizer first, check those into version control and then work from there, but you will still lose the ability to easily compare your updated version against the original version of Zen (or any future versions).
The Stylizer positioning system looks excellent too, but it does rely on some "initialisation" code being included in one of your stylesheets. It also means that some effects that could be achieved in one line (eg. float: right) may be implemented with three lines instead. This bothered me a little, but not hugely.
The image replacement functionality also looked awesome, but does require a separate piece of JavaScript to be included on your page, which means integrating it into your theme. Also, I recall seeing CSS expressions being used as well, which are quite bad for performance, so I wasn't too thrilled about that either.
In terms of general usage, the only really annoying thing that I noticed was the need to separately refresh Stylizer's embedded browser (which is Chrome on OS X) whenever you wanted to fully reload the page. There is a Command-R shortcut, but this only re-loads the stylesheets. You have to click the browser window's Refresh icon if you want to actually re-render the page. This is hugely annoying, especially if you're a command-line junkie like me.
And that, honestly, is probably the key trait that will determine whether or not you fall in love with Stylizer. If you are a command-line junkie, by definition you prefer to type, not click, slide, wiggle and jiggle your way through an "instrument panel". You will probably also object to having three lines of CSS code generated where one line would suffice. And you'll definitely object to having to click an icon to re-render the page! If you're anything like me, you'll eventually start to wonder what value Stylizer is providing above and beyond just using Firebug in conjunction with your favourite IDE. And although some people swear by Stylizer for Drupal theme development, the huge number of CSS files in themes like Zen does, IMHO, make using a tool like Stylizer quite difficult. For the record, I ended up ditching Zen and modifying the Danland theme instead.
Having said all that, Stylizer is still a very solid product and I could definitely see it being of great use in certain situations eg. whipping up a static 10-page brochure site for a client with lots of photos, call-outs, testamonials, etc. on the different pages, requiring lots of different positioning, font sizes, etc. In a scenario like this, Stylizer would be extremely handy.
Ultimatey, my advice would be to download it and check it out for yourself. Whether or not it adds value to your workflow will probably depend on what your workflow currently looks like and what your overall development mindset is like.
Mon, 15 Aug, 2011
Recently, whilst trying to make an old PHP application W3C compliant, I realised that none of the ampersands separating the GET parameters in the links that the application was generating were written as & - they were all just plain ampersands.
This results in the following error from the validator:
Warning: unescaped & or unknown entity "&id"The solution is pretty straightforward - find all instances in the code where URLs are being generated and replace each ampersand with the proper & notation.
However, in this particular application, there were 502 instances! Although I would be happy with a manual search and replace solution, the problem with just searching for the ampersands is that there are obviously many others in the code that are unrelated to URL generation.
So after a short amount of Googling I came up with the following solution:
Search for: \&+([a-zA-Z]+)
Replace with: &$1This will find all strings that consist of an ampersand followed by some alpha characters and replace the ampersand with the proper code. The good thing is that this way of searching excludes logic operators, single ampersands in comments, etc. The bad thing is that it picks up ampersands at the front of things that are already HTML entities. If anybody knows of a way to exclude those using regular expressions, please feel free to drop a comment below.
Sat, 12 Dec, 2009
Over the last few months I've had the dubious pleasure of converting an older PHP application to Spanish, Chinese and Hebrew. Although painful at times, it was, in the end, a very rewarding experience that taught me a great deal about web application design in general and internationalization (I18n) in particular. Although there are thousands of resources on the web about I18n, as usual there are often multiple solutions to each problem and it took a lot of trial and error to figure out which solutions provided the best cross-browser support and overall usability/appeal for the end user.
The aim of this article is not to provide a comprehensive "I18n How-To" but more to record all of the hints and tips that I discovered during my project in the one place. I'm sure that most of what is included will be relevant for pretty much any other PHP developer who has to take an older PHP application and add I18n support.
If you are internationalizing a PHP application, I would also highly recommend this article by Paul Reinheimer. This site was also an extremely useful resource that I referred to very frequently throughout the course of the project.
The first thing I checked was the database's ability to store UTF8 characters. It turned out that I was already using the UTF8 encoding of the Unicode character set with the utf8_unicode_ci collation. So far this has worked fine for English, Spanish, Simplified Chinese and Hebrew characters. I'm not sure about Traditional Chinese though - I think I read somewhere that you need UTF16 for that. See this page for more information about MySQL and Unicode.
For some strange reason the previous developer of the application had failed to include a DOCTYPE declaration in the pages, so I added one that suited the HTML that the application was already generating:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional" "http://www.w3.org/TR/html4/loose.dtd">Another fairly strange omission was a content type declaration in the HTML header, so I added this one:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">More information on these things can be found here.
There is a whole GNU project called gettext that automates the process of extracting the strings out of an existing PHP application and also provides tools for developers and translators to help them work with these files. Its a real pity, then, that I didn't discover gettext until I was too far down the road with my own solution! Thankfully, the web app I was working on didn't have too many strings, so I just went through manually and added a function called t() around each string (this will look familiar to Drupal coders :-). This approach also afforded me the opportunity to identify places in the code where sentences were being constructed based on English syntax and grammar, as these required special treatment when converting to Hebrew and Chinese.
The one part of the application that did contain loads of text were the downloadable PDF reports. Thankfully these slabs of text had been externalized in the form of XML files which, it turns out, can be edited reasonably well in Microsoft Word 2003 for Windows. This version of Word will actually retain the XML format and, as an added benefit, will allow the use of the spell checker, which was a big advantage for our translators. Yes, I know, I'm normally bashing Microsoft for one thing or another, but hey, credit where credit's due: Word 2003 for Windows rocks!
The user's preferred language (as per their browser settings) is available in the $_SERVER['HTTP_ACCEPT_LANGUAGE'] variable. This was used to figure which language should be used to render the login screen, which also contained a language selector that allowed the user to select a different language.
Something that comes up fairly soon in the I18n process is entering characters from different languages for testing purposes. This page proved to be invaluable in this regard. And then you get to Chinese and Hebrew and realise you need to type in Unicode characters occasionally. Thankfully, OS X has awesome support for this.
It was great to implement Spanish first, as that allowed me to get the basic infrastructure in place before attempting Hebrew, which presents three major difficulties:
That first one - working with RTL text - is a major PITA. Although Eclipse does a great job of working with RTL text, it just constantly messes with your head when the left and right cursor keys and backspace key start operating backwards. If you don't believe me, try copying the text below into your favourite text editor and see what happens:
שלום העולםOnce you've got your head around that little challenge, then there's the problem of figuring out the best approach for designing your bi-directional HTML. The best information I found was here. Although there are several options for telling the browser that the content should be rendered RTL, the option I went with was as follows:
dir="rtl" in the head element of the page and, if you are specifically targeting Hebrew, include lang="he" as well
dir="rtl" in the body element of the page, as this will result in the scroll bar being rendered on the left side of the browser Window, which apparently is not the standard approach (Javascript pop-ups can also be affected by setting the rtl attribute on the body element)body element, include a new div element that has the dir="rtl" attributeIf you follow the above approach and your layout is fairly basic, then this may be all you need to do!
However, if the existing layout does something unfortunate, like, say, rendering a rounded-corners look that utilizes a whole bunch of little jpg files for the corners, then you may be in trouble (like I was). The solution in my case was to spend three days re-writing all the code that produced the HTML (which was sprinkled throughout the entire code base) and developing a new look and feel based on these awesome stylesheets from Matthew James Taylor. In the end, there were only eight CSS hacks needed for RTL rendering and four for RTL rendering on Internet Explorer 6 and they were mostly due to application-specific issues.
Although it was great that the text for the PDF reports was stored in XML files, there was still the issue of generating PDF files that rendered the text correctly. Unfortunately, FPDF, which I had been using for over two years, does not support Unicode characters, so it was time to find a new PDF class. After investigating a few, I settled on TCPDF, which is based on FPDF, which meant that my existing code worked with it pretty much straight out of the box. TCPDF supports Unicode, RTL and lots of other good stuff.
Once I was producing PDF files in different languages, along came the RFC 2183 problem, which is basically that HTTP headers are only supposed to be encoded in US-ASCII. This includes the filename parameter of the Content-Disposition header. As it happens, you can set the file name to a string that contains Unicode characters and Firefox, for example, will actually decode it correctly. However, downloading the same file with Internet Explorer results in the file name in the dialog box becoming garbled junk.
After searching high and low, I eventually found the solution here. The executive summary is:
urlencode the entire filename (including the extension) and it will save the file name correctlyHere is some sample code:
<?php
$filename = '世界您好';
if ( mb_detect_encoding( $filename ) != 'ASCII' && strpos( $_SERVER['HTTP_USER_AGENT'], 'MSIE' ) !== false )
{
$filename = urlencode( $filename . '.pdf' );
}
else
{
$filename .= '.pdf';
}
header( 'Pragma: public' );
header( 'Expires: 0' );
header( 'Cache-Control: must-revalidate, post-check=0, pre-check=0' );
header( 'Cache-Control: private', false ); // required for certain browsers
header( 'Content-Type: application/pdf; charset=utf-8;' );
header( 'Content-Disposition: attachment; filename="' . $filename . '"' );
header( 'Content-Transfer-Encoding: binary' );
header( 'Content-Length: ' . filesize( $filename ));
readfile( $filename );
exit;
?>The aforementioned approach works on Internet Explorer 6, 7 & 8.
Our web application also has a function where multiple PDF files can be delivered in a single zip archive. Although this has worked splendidly for some time, Windows, once again, had difficulties with Unicode file names.
Actually, the problem is related to the zip file format itself, which, until recently didn't contain an indicator as to which encoding was being used for storing the names of the files inside the archive. As a result, Windows native zip support will simply interpret the file names according to the current code page. But if the file names are in Unicode then it won't matter what code page Windows is currently set to, the file names will come out garbled.
However, there is a partial workaround in Windows. If you go to the "Advanced" tab of the "Regional and Language Options" part of Control Panel you will find an option called "Language for non-Unicode programs". If you are trying to unzip an archive that has Unicode file names that contain, say, Chinese characters, then selecting one of the Chinese options will probably enable the zip files to be extracted with the file names correctly converted. I say "partial" workaround only because it is unlikely that all users will be willing and/or able to alter this setting, which leaves us back at square one.
I should point out that there are no such workarounds required on OS X. It just works. Of course. :-)
One aspect of our web application is that it stores names as two separate fields in the database: first_name and last_name. This allows us to do some really simple things, like say "Dear first_name" at the top of e-mail messages. Chinese names, however, are typically rendered with the last name first and one of the business requirements for our translation was to say "Dear LastnameFirstname" at the start of e-mails.
In order to do this, the code that generated the e-mails needed to figure out if the name fields were in Chinese or not. This resulted in yet another extended session of intense Googling, only to find that the solution can be implemented in one line:
<?php
function is_Chinese( $str )
{
return preg_match("/[\x{4e00}-\x{9fa5}]/u", $str );
}
?>The basic approach is to build a regular expression that looks for one or more characters within a certain range of Unicode code points.
The alternative title for this section could well be "Don't Believe Everything You Read On The Web"! It is kind of ironic that after so much success Googling for I18n solutions that I could be let down so horrendously on the HTML e-mail construction front. Pretty much every single article I read on HTML e-mails said to remove "unnecessary" e-mail tags such as DOCTYPE, HTML tag, HEAD tag, BODY tag, etc. Although the rationale for this approach makes sense, I think the information might simply be out-of-date. My reasons for this assertion are as follows:
So the code for generating the HTML part of the e-mails within the application looks like this:
<?php
$body = '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional" "http://www.w3.org/TR/html4/loose.dtd">' .
'<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8">' .
'<title>' . $subject . '</title>' .
'<style type="text/css">body, p, td { font-family:Verdana,Arial,Helvetica;font-size:12px; }</style></head>' .
'<body>' . $content . '</body></html>';
?>In the case of Hebrew e-mails, the content was also wrapped in an additional div tag:
<div dir="rtl" lang="he" style="font-family:arial; font-size:larger;">I'll add to this page as I find better solutions for some of the challenges I encountered and if you have a specific challenge that you've been unable to solve (or if you find an error in any of the solutions above), drop a comment below.