Table of Contents

Class HtmlTokenizer

Namespace
SimpleSign.HtmlToPdf.Parsing
Assembly
SimpleSign.HtmlToPdf.dll

Tolerant HTML parser that produces a lightweight DOM tree. Handles well-formed document HTML (headings, paragraphs, tables, lists, images). Not a full HTML5 parser -- designed for structured documents, not arbitrary web pages.

public static class HtmlTokenizer
Inheritance
HtmlTokenizer
Inherited Members

Remarks

Architecture: single-pass character-by-character parser. No regex. Supports self-closing tags, quoted/unquoted attributes, entities, comments, DOCTYPE, implicit closing for p/li, and whitespace normalization (except pre).

Methods

Parse(string)

Parses an HTML string into a DOM tree.

public static HtmlNode Parse(string html)

Parameters

html string

The HTML content to parse.

Returns

HtmlNode

The root node of the parsed DOM tree.