Recently, whilst trying to make an old PHP application W3C compliant, I realised that none of the ampersands separating the GET parameters in the links that the application was generating were written as &
- they were all just plain ampersands.
This results in the following error from the validator:
Warning: unescaped & or unknown entity “&id”
The solution is pretty straightforward - find all instances in the code where URLs are being generated and replace each ampersand with the proper &
notation.
However, in this particular application, there were 502 instances! Although I would be happy with a manual search and replace solution, the problem with just searching for the ampersands is that there are obviously many others in the code that are unrelated to URL generation.
So after a short amount of Googling I came up with the following solution:
Search for: \&+([a-zA-Z]+) Replace with: &$1
This will find all strings that consist of an ampersand followed by some alpha characters and replace the ampersand with the proper code. The good thing is that this way of searching excludes logic operators, single ampersands in comments, etc. The bad thing is that it picks up ampersands at the front of things that are already HTML entities. If anybody knows of a way to exclude those using regular expressions, please feel free to drop a comment below.