Class HtmlTokenizer
- Namespace
- SimpleSign.HtmlToPdf.Parsing
- Assembly
- SimpleSign.HtmlToPdf.dll
Tolerant HTML parser that produces a lightweight DOM tree. Handles well-formed document HTML (headings, paragraphs, tables, lists, images). Not a full HTML5 parser -- designed for structured documents, not arbitrary web pages.
public static class HtmlTokenizer
- Inheritance
-
HtmlTokenizer
- Inherited Members
Remarks
Architecture: single-pass character-by-character parser. No regex. Supports self-closing tags, quoted/unquoted attributes, entities, comments, DOCTYPE, implicit closing for p/li, and whitespace normalization (except pre).
Methods
Parse(string)
Parses an HTML string into a DOM tree.
public static HtmlNode Parse(string html)
Parameters
htmlstringThe HTML content to parse.
Returns
- HtmlNode
The root node of the parsed DOM tree.