How to Make Drupal 6 Valid HTML 4.01 Strict May 6, 2009

Note: This article is discussing Drupal version 6.11.

Drupal 6 is built to validate as XHTML but I’ve lately been leaning towards HTML standards. I also enjoy making things – Drupal in particular – do what I want them to do, so I just had to see if I could get Drupal to validate as HTML 4.01 Strict.

Since Drupal has a lot of theme override functions, I naturally presumed that it would be a simple task and it was. Well, except for the fact that Drupal core spits out a content-type meta short-tag in the <head> section of the website. But let’s start from the beginning.

The theme basic from Raincity Studios is a starting point for all my projects and I dearly recommend it. The problem I had here was mainly in the <head> section, where Drupal outputs all the tags as short-tags, ending with “/>”. These tags are being printed out with the help of the PHP variables $head and $styles.

So basically, all we need to do is put the following two lines in into the theme_preprocess_page() function of our template.php file, to remove the slash from the end of the tags.

php $vars‘head’ = str_replace(" />", ">", $vars‘head’); $vars‘styles’ = str_replace(" />", ">", $vars‘styles’); /php

Remaining is the content-type problem mentioned above. It turns out there are two Drupal core functions, called drupal_get_html_head() and drupal_final_markup(), in the file /includes/common.inc, which make sure a content-type meta tag is prepended to the <head> section.

The code looks like this and as far as I can tell, there’s no other way of getting rid of this than to do changes to this file.

php function drupal_get_html_head() { ’$output = "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />\n"; ’return $output . drupal_set_html_head(); }

function drupal_final_markup($content) { ’// Make sure that the charset is always specified as the first element of the ’// head region to prevent encoding-based attacks. ’return preg_replace(‘/<head>/i’, "\$0\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />", $content, 1); } /php

REXML could not parse this XML/HTML: 
<p>So just removing the slash from the string in that function will solve the problem. Changing core code is a bad, bad thing but at least it&rsquo;s a very minor change and we can still update the site with the only problem being that it again may not validate. 

<p>After doing this I had no other problems than contributed modules sometimes outputting XHTML, and very rightfully so since Drupal is based on it. This can often be solved with theme functions. ImageCache, for example, is used by calling a theme function so I Googled it&rsquo;s API, found the original code and put the following in my template.php...</p>

[php]
function mytheme_valid_imagecache($namespace, $path, $alt = '', $title = '', $attributes = NULL) {
&rsquo;if (is_null($attributes)) {
&rsquo;&rsquo;$attributes['class'] = 'imagecache imagecache-'. $namespace;
&rsquo;} 
&rsquo;$attributes = drupal_attributes($attributes);
&rsquo;$imagecache_url = imagecache_create_url($namespace, $path);
&rsquo;return '&lt;img src=&quot;'. $imagecache_url .'&quot; alt=&quot;'. check_plain($alt) .'&quot; title=&quot;'. check_plain($title) .'&quot; '. $attributes .'&gt;';
}
[/php]

<p>...which is an exact replica of the original code but without the slash in the &lt;img&gt; tag. Now I just call this function instead of the ImageCache theme function, and I get valid HTML 4.01 Strict.</p>

<p>I wouldn&rsquo;t recommend this at all if you&rsquo;re using more than just a few modules. If you&rsquo;re using more modules, a lot of ugly changes may have to be made &ndash; changes that may also break something if you ever decide to update the site. However, it was a nice experiment for this particular project and I enjoy seeing the wonderful green color when I run the site through the validator.

<p>Maybe time to write a HTMLify module.</p>