ftfy ("fixes text for you") repairs mojibake and other Unicode glitches after the fact — the garbled text you get when bytes were decoded with the wrong encoding, when HTML entities were left unescaped, when curly quotes and ligatures crept in, and so on.
import { fix_text } from "@photostructure/ftfy";
fix_text("✔ No problems"); // → "✔ No problems"
fix_text("¯\\_(ã\x83\x84)_/¯"); // → "¯\_(ツ)_/¯"
fix_text("été"); // → "été"This package is a straight, faithful TypeScript port of python-ftfy (v6.3.1, Apache-2.0) by Robyn Speer. All of the cleverness — the encoding-detection heuristics, the badness model, the sloppy codecs, the CESU-8/Java UTF-8 variants — is theirs.
The guiding rule of this port is parity, not innovation: it mirrors the Python library module-for-module and function-for-function, and is validated against python-ftfy's own test suite. Where you see something that looks unusual, it almost certainly encodes a Python edge-case we are deliberately preserving. If you find a behavioral difference from python-ftfy that isn't documented as an intentional divergence, that's a bug — please report it.
explain_unicodeisasync. Node has no built-in Unicode name database, so the names table is lazy-loaded on first call (await explain_unicode(...)). This keeps it out of the heap entirely for the common case where you only callfix_text.- Length semantics. JavaScript strings are UTF-16 code units; Python strings are
codepoints. This only affects the maximum-segment-length cap in
fix_text, never the fix output.
- Zero runtime dependencies. CPython's
html.unescapeand thewcwidthpackage are ported in-tree, and all lookup tables are embedded. (Existing npm equivalents diverge from CPython on edge cases the test suite pins.) - Dual ESM + CJS, with full TypeScript types. Requires Node.js ≥ 22.
npm install @photostructure/ftfyThe public API mirrors python-ftfy, including its snake_case names:
import {
fix_text, // the main entry point
fix_and_explain, // → { text, explanation } so you can see/replay each step
fix_encoding, // just the encoding-repair step
apply_plan, // replay an explanation onto the original text
guess_bytes, // best-effort decode of unknown bytes
fix_file, // fix a stream/file line by line
TextFixerConfig, // per-call options (snake_case keys)
__version__, // "6.3.1" — the upstream version this mirrors
} from "@photostructure/ftfy";ftfy somefile.txt # fix a file, write to stdout
ftfy -g somefile.txt # guess the encoding
ftfy -e sloppy-windows-1252 in.txt -o out.txt
cat somefile.txt | ftfy # read from stdinThis package follows its own semver line, independent of
python-ftfy's version numbers. The npm version describes the stability of this
package's API; the exported __version__ separately records the upstream python-ftfy
release this port mirrors (currently 6.3.1). The two move independently — a bugfix
release here bumps the npm version without changing __version__.
0.xwhile the port is still reaching full parity with python-ftfy's test suite.1.0.0once parity is complete and the public API is stable.- After that, ordinary semver: patch for fixes, minor for additive API, major for breaking changes — regardless of which version upstream is on.
Releases are cut entirely by CI (.github/workflows/build.yml
via workflow_dispatch); the version field in package.json is never edited by hand.
This is an in-progress, parity-driven port. Correctness is defined by python-ftfy's test suite, which is ported alongside the implementation. See docs/DESIGN.md for the technical decisions and CLAUDE.md for the architecture.
This is a derivative work of python-ftfy by Robyn Speer, used and redistributed under the Apache License 2.0. See LICENSE and NOTICE. The TypeScript port is maintained by PhotoStructure; the upstream design and algorithms are credited to the original author.