The HTML parser is based on the HTML specifications 3.2 and 4.0. It does not check for strong conformance to the language specifications. Instead it is tolerant and accepts incorrect input which is found on many HTML pages. The HTML parser does build a syntax tree.
References
- HTML 3.2 Reference Specification, W3C Recommendation, 14-Jan-1997, http://www.w3.org/TR/REC-html32.html
- HTML 4.0 Specification, W3C Recommendation, revised on 24-Apr-1998, http://www.w3.org/TR/1998/REC-html40-19980424/
- HTML 4.01 Specification, W3C Recommendation, 24 December 1999, http://www.w3.org/TR/REC-html40/