- BOM fast path: UTF-8 with BOM now reports
encoding="utf_8_sig"(aligned withcodecs/CandidateSelector), notutf_8. - Streaming: In-band HTML/XML hints are probed on every
feed()until a hint is found (scan limited to the first 4KB inside_probe_inband_hint); shared regex patterns imported fromhints.py;codecsimported at module level. detect_multi: Adjacent same-encoding merges no longer re-runfrom_byteson the full merged span;byte_counton the sharedDetectionResultis updated withdataclasses.replace.repair_bytes: Explicitmax_iterationsandchainsparameters instead of**kwargs: object.- Coverage:
fail_underraised from 50 to 75 (current suite ~75%; 85% remains a stretch goal with moreapi/ fingerprint tests).
DocumentSegment/MultiEncodingResultnow useslotson Python 3.10+ like other dataclasses in the package.
examples/scripts (basic detection,detect(), streaming, repair, HTTP hints, multi-encoding)- MkDocs documentation site and GitHub Actions workflow Docs Pages → GitHub Pages
[project.optional-dependencies]groupdocs(mkdocs,mkdocs-material,pymdown-extensions)project.urls.Documentationinpyproject.toml
- README logo on PyPI: use
raw.githubusercontent.comURL (sdist has noassets/for the image) - Source distribution: include
LICENSEin maturinincludeso PyPI acceptsLicense-Filemetadata
First published release.
from_bytes(),from_path(),from_fp(),is_binary()APIdetect()chardet / charset-normalizer drop-in compatibilitydetect_stream()for iterator-based streaming detectionStreamDetectorfor incremental detection;snapshot(),is_stable, auto-stop when encoding is stable (configurable)- In-band hints from HTML meta tags and XML declarations in
StreamDetector - Byte-distribution fingerprinting with pre-computed lookup table
repair()andrepair_bytes()for mojibake detection and repair;is_mojibake();RepairResulthint_from_http_headers(),hint_from_content(),best_hint()for standalone hint extractiondetect_multi()for multi-encoding documents;MultiEncodingResultandDocumentSegment- Optional Rust core extension (
pip install "bytesense[fast]") - CLI:
bytesense <file> py.typedmarker for PEP 561 compliance- Full type annotations, mypy strict mode
- GitHub Actions CI across 3 OS × multiple Python versions
LANGUAGE_ENCODINGSaligned with languages inCHAR_FREQUENCIES- Candidate shortlist tuned for speed while preserving accuracy targets
- Benchmark and packaging fixes (e.g. UTF-8 BOM test fixture, build backend configuration)