How HTML entity encoding works
An HTML entity is a character reference that the browser parses back into a single character. The five reserved HTML characters (<, >, &, ", ') always need encoding when text is rendered as HTML; everything else is optional and depends on the document encoding.
- Pick a mode and scope. Encode mode walks your input character by character. Decode mode walks the input looking for entity patterns. The scope toggle decides whether only the five HTML-safe chars get encoded, or whether every non-ASCII code point is also rewritten.
- Pick an entity style. Named entities (
©) read well in source. Decimal references (©) and hex references (©) carry every Unicode code point without needing a name. Older email clients and XML parsers prefer the numeric forms. - Walk the input. On encode, we read each code point and look it up against a built-in table of about 200 common named entities. Misses fall back to numeric. On decode, we scan with a single regex that matches
&name;,&#NNN;, and&#xHH;in one pass. - Map to characters. Named matches resolve through a reverse table. Numeric matches go through
String.fromCodePointwith base 10 or base 16. Unknown named entities are left untouched so partial input round-trips without loss. - Live mode. Toggle live mode and every keystroke re-runs the conversion with a 150 ms debounce. Helpful when you are tweaking a snippet and want immediate feedback before pasting it into a template.
Why encode HTML entities
- Stop user input from breaking layout. When a user types a stray
<into a comment box, dropping that text straight into HTML rewrites the rest of the page. Encoding the reserved characters first means the browser renders the character instead of parsing it as the start of a tag. - Keep attribute values valid. Embedding a quoted string inside an HTML attribute needs the embedded quote replaced with
"(for double-quoted attrs) or'(for single-quoted). Otherwise the parser closes the attribute early and the rest of the line becomes stray markup. - Defuse accidental HTML in stored data. Logs, bug reports, and chat exports often contain real angle brackets and ampersands. Entity-encoding the dump before pasting it into a documentation page keeps that copy visible as text instead of triggering the renderer or the link auto-detector.
- Share code snippets safely. Posting an example tag like
<script>alert(1)</script>in a blog post, an email, or a Slack message needs the brackets encoded so the snippet displays rather than runs. The same technique covers RSS feed bodies and JSON-LD `description` fields.
Common applications
Entity encoding shows up wherever raw text gets composed into HTML at runtime — even when the framework usually handles it for you, the manual tool is useful for the moments it doesn't.
- Server-rendered templates: Jinja2, ERB, Twig, and Handlebars auto-escape by default, but raw blocks and `safe` markers turn that off — the codec lets you confirm what the escape would have produced.
- Email and newsletter authoring: many ESP templating engines do not auto-escape merge fields, so smart quotes and copyright glyphs in user-supplied names need pre-encoding.
- Documentation and code samples: pasting an example HTML tag into a Markdown blog post or a static-site snippet needs the brackets encoded so the renderer treats it as visible text.
A worked example
Paste <script>alert('hi')</script> into the input with mode set to Encode, style Named, scope Minimal. The output reads <script>alert('hi')</script>. Switch style to Numeric hex and the same input produces <script>alert('hi')</script>. Flip mode to Decode, paste the encoded string back in, and the original tag comes back intact.
FAQ
What are HTML entities?
HTML entities are character references the browser substitutes back into single characters when it parses the page. They come in three forms: named (like & for &), decimal numeric (&), and hex numeric (&). The five reserved HTML characters (<, >, &, ", ') need encoding any time the text gets dropped into HTML. The other roughly 2,225 named entities cover symbols, accents, and Greek letters but are optional once the document encoding is UTF-8.
When should I use named vs numeric entities?
Use named entities when you want the source to read clearly (a human reviewing © in a template gets it immediately). Use numeric (decimal or hex) when the consumer is older or stricter — XML parsers, legacy email clients, and some feed readers recognise only a small subset of HTML5 named entities, and they all recognise the numeric forms. Hex tends to win in security-focused contexts because it lines up one-for-one with the Unicode code-point notation used in spec documents.
Does decoding handle hex entities like &?
Yes. The decoder uses a single regex that matches all three entity forms in one pass: &name;, &#NNN;, and &#xHH;. Numeric matches are resolved with String.fromCodePoint using base 10 or base 16. Mixed input (named and numeric in the same string) decodes correctly, and unknown names are left as literal text so partial input round-trips without loss.
Is this safe for use with untrusted input?
The codec itself is browser-only and does not send your input anywhere. Whether the output is safe to embed depends on context. Entity encoding handles HTML body and attribute-value contexts, which covers the OWASP Rule #1 case. JavaScript contexts (inline event handlers, `<script>` blocks), CSS contexts, and URL contexts each need their own encoding rules — entity encoding alone is not sufficient there. For a server-side defence in depth, pair this with a context-aware templating engine like DOMPurify or your framework's auto-escape.
Browser-side entity encoding sits at the boundary between user input and rendered HTML. Doing the conversion locally means you can sanity-check what your framework would have emitted, without ever sending the original text to a third-party tool.