How To Validate Your Website’s HTMLBy: Dave Taylor
June 15th, 2010
I searched for HTML CODE FIX and it was all pretty much self-help. Can you recommend a company, SEO I guess, that can just fix HTML code and check to see why two products of mine don't get submitted to Google. It seems to be not meeting their guidelines. I have over 100 html errors according to htmlvalidator.
I'm not entirely convinced that the problem on your site is that your HTML isn't validating - there are a number of other reasons why a page might not be promptly included in the Google index - but it's a darn interesting question, the implied "how do I fix the HTML on my site to validate", so that's what we'll run with here.
Why bother with valid HTML? Well, as Google explains in its own Webmaster Guidelines that a best practice is to always "Check for broken links and correct HTML". Generally, though, the chances of your page rendering properly on a wide variety of different Web browsers goes up quite a bit if your HTML is as clean as possible.
Fortunately there are a number of different HTML validation services online that for free will scan your Web page and report any and all errors or warnings it encounters.
The questioner generously allowed us to pull his page, from the site Aquasana For Life, for purposes of illustrating how to fix HTML validation errors. To accomplish that, I saved their page as HTML and moved it to our own server, so we can tweak and modify the page to see what happens.
First off, the untouched source given to the W3C Validator reports:
171 Errors! That's a TON!
Fortunately, I know a secret about the validator: there are a lot of cascading errors that can appear in HTML and fixing one can often remove a dozen or more reported errors. For example, the first error reported:
Is basically saying "if you have S&H in the source code, you need to escape the ampersand". To escape an ampersand is to make it a character entity: as a first step i will replace all occurrences of & with &. I found 13 occurences.
The next error shown is a bit trickier:
The page uses XHTML, including unpaired tags in the form <foo /> but since the document itself is written in HTML 4.01 Transitional (it specifies in the first line, as all HTML documents should indicate what format they use) the " />" is overkill and confusing the validator. Easily fixed: a global substitution of " />" with ">".
The next error is interesting:
Read the associated small print and you should have an "a ha!" moment:
"One common cause for this error is the use of XHTML syntax in HTML documents. Due to HTML's rules of implicitly closed elements, this error can create cascading effects. For instance, using XHTML's "self-closing" tags for "meta" and "link" in the "head" section of a HTML document may cause the parser to infer the end of the "head" section and the beginning of the "body" section (where "link" and "meta" are not allowed; hence the reported error)."It might be that the XHTML closing tag sequences " />" cause this, but looking at the source, there's also a mess of meta tag, style, meta tag, style, in the head of the document. Since the first two meta tags weren't flagged as being illegal or in the wrong place, we'll reorganize the document so that every meta tag appears before the first style block too.
Let's save this and rerun the validator just to see what it reports: 160 Errors, 7 warning(s). That's a bit better. Time to look a bit more closely at the errors again:
Look closely and you'll see that this is indeed an HTML error. The sequence here is <a><font><u>word</u></a></font> but that's wrong. You need to close things in the reverse order to how you open them so the last two close tags are in reverse order and it should appear </u></font></a>. Easily fixed, but it's a pervasive error so it'll be tedious to find every occurrence...
The next error is pretty straightforward: if you're going to write good HTML code, then every single IMG tag you have should include an ALT attribute. Since there are a ton of these, a fast solution is to add ' alt="" to each and every img tag, then delete the dupes if we bump into them on the next validator wave. On the other hand, if there are more IMG tags that already do have an alt tag, well, then you're making more work. Might be done by hand.
Here's an interesting error: in HTML 4.01 Transitional there is no "<nobr>" tag. Solution? Delete them all. There's no container in HTML I'm aware of that stops the browser using a white space as a line break as needed. The way you can do that is to replace each space with a (non-breaking space), but that's more work than I'm going to get into for this particular task.
Look at that particular line and it is indeed wrong: <tr> </tr> is not valid HTML: there needs to be a <td> </td> container within. I add that and this error goes away.
There are a lot more sloppy HTML coding errors, including more than one occurrence of width= attributes in a tag, a border attribute in an hr tag (where none exists), and so on. They're tedious, but fixable...
This is an interesting error and a common point of confusion with CSS. Here's the key: you can only have one unique occurrence of an "id" in a given document, but you can have multiple occurrences of a "class" in the same doc. More likely than not, every time you use "id" you really mean to use "class", so I'll tweak this appropriately.
... much time passes, and finally ...
Soooooo close! Let's see what that final error actually is:
Ah, we can deal with that. It's an illegal attribute value within a "td" tag. Instead of "center", we need to use "middle" here. I do that, resubmit the newly fixed file, and:
Phew! That's a lot of work, I have to say, but it's worth it. Now we can add a "Valid HTML 4.01 Transitional" on the page, if desired, but, more importantly, we're now matching what Google wants to see on our home page.
Moral of this story: don't be sloppy, especially on nested open/close sequences. If you open a b c d then you need to close d c b a, not in some other random order.
Good luck with your own validation efforts. It's a pain, no question, but it's worth it.Comments
About the Author: Dave Taylor has been involved with the Internet since 1980 and is internationally known as an expert on both business and technology issues. Holder of an MSEd and MBA, author of twenty books and founder of four startups, he also runs a strategic marketing company and consults with firms seeking the best approach to working with weblogs and social networks. Dave is an award-winning speaker and frequent guest on radio and podcast programs. AskDaveTaylor.com http://www.intuitive.com/blog/