Html Tidy fix for anchor tag - already defined in entity, line…

I was having some problems with an XHTML document (that contained a named anchor tag). It was being cleaned up by HTML Tidy before being passed in to PHP DOM’s loadHTML() method.

The document was getting parsed perfectly by DOM until I added an anchor:

<a name=”MyAnchor” id=”MyAnchor”></a>

After adding it to the document, DOM threw a warning along the lines of:
DOMDocument::loadHTML(): ID MyAnchor already defined in Entity, line: 349

I isolated the issue and discovered that PHP DOM’s loadHTML() method does not like to see an anchor tag with both the name and the id being the same  (in PHP version 5.2.9) .

So, I discovered that , by removing one of those attributes in the anchor element, DOM was happy and the error message disappeared.

This was OK:

<a name=”MyAnchor” ></a>

However, when I added HTML Tidy back into the mix, the error   re-appeared.

HTML Tidy was ‘fixing’ the tag by adding the missing id attribute before the Document was loaded into DOM.

Luckily there is a config option, for Tidy, that can suppress this behavour during tidy::parseFile (and I assume other parsing routines)

Its called anchor-as-name. This option controls the deletion or addition of the name attribute in elements where it can serve as anchor. By setting that to false or “no”,  any existing name attribute is removed if an id attribute exists or has been added.

Unfortunately, my vesion of Tidy is a bit old and so I need to upgrade to take advantage of that feature or resort to writing a little function to re-parsing the document myself to fix the ‘anchor tag’ issue before handing it to DOM.

As a really quick and evil fix I simply specified different values for each of the attributes:

<a name=”MyAnchor” id=”MyAnchorSuppressingTidy” ></a>

So, there we have it.

I just wrote this up because it was one of those issues that can be quite pesky to debug. So, when the anchor tag in html breaks DOM after a parse through Tidy - have a play around changing the values of id and name and just to see if this might be the cause.

I found the advice (in Chapter 23) of Steve McConnel’s book ‘Code Complete’ to be most useful during this debugging episode - that book is worth its weight in gold.

One Response to “Html Tidy fix for anchor tag - already defined in entity, line…”

  1. katsuo11 Says:

    Hi,

    Thank you for this post.
    I just walked through the same problem and this post considerably saved me some debugging time.

    Much appreciated.

Leave a Reply