Html2Text.Net

Just fast HTML -> plain text.

Lightweight, hand rolled, high-performance HTML to plain text conversion for .NET.

Check out the LIVE DEMO!

Use cases
Usage
Install, build, test
How it works
- Goals
Performance notes
Regression tests

Use cases

Search / indexing pipelines: Strip HTML down to text for full-text search, indexing, classification, or deduping.
- Example: convert HTML to text before indexing in Elasticsearch / OpenSearch
Batch processing: Convert large archives of HTML (docs, KB articles, CMS exports) into text efficiently.
Email & notification processing: Get a readable text version of HTML emails for previews, logs, or plain-text fallbacks.
LLM / NLP preprocessing: Normalize HTML into clean text before chunking, embedding, or extraction.
Logging / auditing: Store a text representation of HTML content for review or compliance.

Usage

Simple as possible:

using Html2Text;

string html = "<h1>Hello</h1><p>World</p>";

string text = Html2Text.Convert(html);

Output:

Hello

World

Install, build, test, contribute

Install using NuGet (recommended):

dotnet add package Html2Text.Net

Supported frameworks

.Net 8+
.Net Framework 4.6.2+
.Net Standard 2.0 for compatibility with other frameworks, including .Net 5/6/7

For .Net Framework users, PackageReference style dependencies are recommended. Also ensure binding redirects are enabled.

Contributing

Contributions and pull requests are welcome! With .Net 10 SDK installed, to build locally:

dotnet build

To run unit and regression tests:

(windows): dotnet test
(linux/mac): dotnet test -f net10.0

To run the example console app:

dotnet build
dotnet run --project Html2Text.Example Samples/scottallen.html

How it works

Pipeline

HTML document -> Lexer (tokens) -> Parser (AST nodes) -> Renderer (string text)

Text nodes are emitted in document order.
Basic block separation is preserved (e.g., paragraphs/headings insert newlines).
Whitespace is normalized to produce readable plain text.

Minimal formatting is added to make the plain text output readable in only 4 cases:

HTML tables are given cell separators | and horizontal lines --- under column headers:

| Chart                  | Record Holder     | Record       |
| ---------------------- | ----------------- | ------------ |
| Opening Days           | Avengers: Endgame | $157,461,641 |
| Top Single Day Grosses | Avengers: Endgame | $157,461,641 |

Lists and nested lists are indented and given a leading - like so:

 - 1 Early life
 - 2 Enigma machine
 - 3 Solving the wiring
 - Toggle Solving the wiring subsection
   - 3.1 French help
 - 4 Solving daily settings
 - Toggle Solving daily settings subsection
   - 4.1 Early methods
   - 4.2 Bomba and sheets
   - 4.3 Allies informed

In preformatted areas <pre> whitespace is preserved:

private int GetSmallestNonNegative(int x, int y) {
    return x < 0 && y < 0 ? 0
        : x < 0 ? y
        : y < 0 ? x
        : Math.Min(x, y);
}

The <hr/> element adds a horizontal line of dashes ---.

Goals

This project is focused on:

High performance: designed for low allocations and fast throughput.
Text extraction only: get the words from the page/document.
No dependencies: Lightweight, not an embedded browser engine. No dependencies other than .NET itself.

Non-goals (by design)

The following are intentionally out of scope so the library can excel at the goals above:

Respecting CSS, computed styles, display:none, or visibility.
Pixel-accurate layout, whitespace mirroring, or browser-equivalent rendering.
Executing JavaScript or loading remote resources.

Performance notes

High performance is a goal of this project. This library:

designed for converting many documents quickly (batch processing, indexing, search pipelines).
avoids DOM dependencies.
uses a lightweight, hand rolled lexer/parser/renderer pipeline.

Benchmarks are in Html2Text.PerfTests and can be run locally with:

dotnet run -c Release --project Html2Text.PerfTests

Or check out the latest automated perf test results here: https://pavlosmcg.github.io/Html2Text.Net/dev/bench/

Regression tests

Each file in the Samples/ directory acts as an acceptance/regression test. The results of converting these HTML files to plain text are saved in Html2Text.RegressionTests/*.verified.txt:

Samples/<file-name>.html -> Html2Text.Convert(<file-contents>) -> <file-name>.verified.txt

For example scottallen.html -> scottallen.verified.txt

Html2Text.RegressionTests uses Verify to make test assertions against verified output snapshots. If you need to update the outputs please see the Verify docs for snapshot management.

Projects in this repository

Html2Text/: core library
Html2Text.Example/: small example app
Html2Text.Tests/: unit tests
Html2Text.RegressionTests/: regression/acceptance tests
Html2Text.PerfTests/: performance benchmarking console app
Samples/: sample HTML files used during development and automated regression testing

Distributed under MPL-2.0 see LICENSE.txt

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
.github/workflows		.github/workflows
Html2Text.Example		Html2Text.Example
Html2Text.PerfTests		Html2Text.PerfTests
Html2Text.RegressionTests		Html2Text.RegressionTests
Html2Text.Tests		Html2Text.Tests
Html2Text		Html2Text
Samples		Samples
.gitignore		.gitignore
Html2Text.sln		Html2Text.sln
LICENSE.txt		LICENSE.txt
README.md		README.md
global.json		global.json
icon.png		icon.png
perftests-chart.png		perftests-chart.png
perftests-console.png		perftests-console.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Html2Text.Net

Use cases

Usage

Install, build, test, contribute

Supported frameworks

Contributing

How it works

Pipeline

Goals

Non-goals (by design)

Performance notes

Regression tests

Projects in this repository

About

Uh oh!

Contributors 2

Uh oh!

Languages

License

pavlosmcg/Html2Text.Net

Folders and files

Latest commit

History

Repository files navigation

Html2Text.Net

Use cases

Usage

Install, build, test, contribute

Supported frameworks

Contributing

How it works

Pipeline

Goals

Non-goals (by design)

Performance notes

Regression tests

Projects in this repository

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 2

Uh oh!

Languages