Changelog

All notable changes to notepad-cleanup will be documented in this file.

[0.2.4] - 2026-04-10

Fixed

compare now finds new-format nc-* session folders (#14). Since v0.2.2, extract saves sessions using the nc-YYYY-MM-DD__hh-mm-ss naming format, but session discovery still only matched the legacy notepad-cleanup-* pattern. As a result, any nc-* folders in search directories were silently skipped during compare, leading to missed duplicates
find_session_dirs() now iterates both patterns (notepad-cleanup-* and nc-*) and validates each candidate by checking for manifest.json. This rejects false positives like nc-backups, nc-scratch, or any folder starting with nc- that isn't a real extraction
_get_session_dir() helper now matches both formats when walking up from a file

Changed

DEFAULT_SESSION_PATTERN (single) is now DEFAULT_SESSION_PATTERNS (list). The singular constant is kept for backward compatibility
Test helper make_session() now creates a manifest.json marker so test fixtures match the validation logic

Tests

test_find_sessions_both_formats: verifies both old and new formats discovered
test_find_sessions_rejects_false_positives: verifies nc-backups and similar folders without manifest.json are not treated as sessions

[0.2.3] - 2026-03-16

Added

Link-aware organize: organize now creates symlinks in organized/ for dedup-linked files instead of copying data. Preserves the connection network so linked files point back to their canonical provenance root. Fallback chain: symlink -> hardlink -> dazzlelink -> copy
links command with separate and join actions:
- links separate --last: moves symlinked files from organized/ into organized-links/ preserving category structure. Shows only new files
- links join --last: moves them back, restoring the full collection
- Both support --dry-run and --dir-name for custom directory names
Organized link manifest (organized/_organized_links.json): tracks which files in organized/ are symlinks vs copies for reliable detection
Previous session reference in AI prompt: when linked files exist, Claude receives a reference section with category names from previous sessions for naming consistency
load_link_manifest() and get_linked_paths() in dedup.py as shared data layer for link-aware operations

Changed

execute_plan() now accepts linked_paths parameter; checks each file against the dedup link manifest before deciding to copy or symlink
generate_prompt() now accepts linked_paths for reference section
Organize summary shows separate counts for copied vs linked files
Prompt template (organize.md) gains {skip_section} and {reference_section} template variables

[0.2.2] - 2026-03-16

Added

docs/parameters.md: full command reference with all options, flags, and examples
docs/install.md: installation guide (pip, venv, source, Claude CLI)
Backfill script for ghtraf daily history (tests/one-offs/backfill_ghtraf_history.py)

Changed

README slimmed down: moved per-command details to docs/parameters.md, installation details to docs/install.md. Kept How It Works, Output Structure, and Features
Restored tree-style output structure with visual hierarchy indicators

[0.2.1] - 2026-03-16

Added

GitHub Traffic Tracker (ghtraf): badge gists, archive gist, traffic-badges workflow with CI trigger, stats dashboard at docs/stats/
PyPI publishing via Trusted Publisher (OIDC): publish.yml workflow triggers on GitHub Release, builds and uploads automatically
pyproject.toml (modern Python packaging metadata, replaces setup.py as primary)
README badges: PyPI version, Release Date, Installs (via ghtraf endpoint)

Changed

setup.py updated with long_description, project_urls, additional classifiers
README updated with v0.2.0 features, new workflow section, links to docs

[0.2.0] - 2026-03-16

Added

Deduplication system (compare command): detect exact and near-duplicate files across historical extraction sessions before organizing with AI
- Heuristic fuzzy matching with log-quadratic threshold curve (3.5% fit error across anchor points). See docs/fuzzy-matching.md for derivation
- Configurable fuzzy modes: --fuzzy small (default, <50KB), --fuzzy all, --fuzzy "lte 100KB", --no-fuzzy
- Progress bar with per-file and per-candidate detail showing fuzzy pipeline stage ([vs: filename [chk:4]])
- Hash caching for fast repeat scans (mtime + size invalidation)
- Compare results caching (_compare_results.json) with staleness detection
- Historical session indexing: prefers organized/ files over raw window*/ when both exist; only indexes known text file extensions
Filesystem linking (--link flag on compare): replace duplicates with hardlinks, symlinks, or DazzleLink JSON descriptors
- Auto-detect best strategy per platform (--link auto)
- Backup originals as .orig before linking
- Confirmation prompt before modifying files
- Link manifest (_dedup_links.json) tracks all operations
Diff script generation: compare auto-generates _compare_diffs.cmd (Windows) / _compare_diffs.sh (Unix) to spot-check each matched pair in Beyond Compare, WinMerge, VS Code, or other configured diff tool
diff command: find and launch the generated diff script (diff --last)
Configuration system (config command, ~/.notepad-cleanup.json):
- Unified folder registry with ... notation (... = output, ...1/...2 = other folders, ...-1/...-2 = recent extractions MRU)
- ConfigManager class in dedicated config.py module
- config show, config add, config remove, config set, config unset
- Folder roles: output and search are independent assignments
- Persistent diff tool, MRU depth, search dirs
- ... expansion in all path arguments (resolved at runtime, never stored)
- config show <...ref> resolves any ... reference for scripting
- Windows case-insensitive path comparison (_paths_equal)
- Environment variable expansion (%USERPROFILE%, $HOME)
- Stray quote stripping for trailing-backslash shell escaping issues
- Too-broad path detection (warns on home dir, drive roots)
--last flag on compare, organize, and diff commands: auto-uses most recent extraction from MRU without copy-pasting paths
MRU (Most Recently Used) extraction history: configurable depth (default 10), referenced as ...-1, ...-2, etc.
Search dir composition: -s for explicit-only search, -ss for additive (includes saved dirs), -nsp to exclude parent folder
docs/fuzzy-matching.md: threshold formula derivation, customization via environment variables, fitting script reference
docs/config.md: full configuration reference covering folders, roles, MRU, settings, ... notation, and search behavior
Path shortening in display (~\Desktop instead of C:\Users\...\Desktop)

Changed

Default output directory: ~/Desktop/notepad-cleanup/nc-TIMESTAMP (was ~/Desktop/notepad-cleanup-TIMESTAMP). Consolidates extractions into one folder
Extract now auto-registers output parent as a search dir in folder registry
Extract hints now show both compare --last and organize --last as next steps
Help text updated across all commands to reflect new workflow: extract -> compare -> organize
Config functions extracted from dedup.py into dedicated config.py module

[0.1.4] - 2026-02-19

Added

--dry-run flag on extract — preview what would be extracted without saving files
-h alias for --help on all commands
-V alias for --version
Detailed help text with examples for all commands (extract, organize, run)
Auto-versioning system (ported from wingather): _version.py as canonical version source, pre-commit/post-commit hooks auto-stamp branch, build number, date, and commit hash into version string
Version scripts: scripts/update-version.sh, scripts/install-hooks.sh, scripts/paths.sh, scripts/hooks/ (pre-commit, post-commit, pre-push)
CHANGELOG.md
GitHub Discussions enabled

Changed

setup.py reads version from _version.py via get_pip_version() (PEP 440)
__init__.py imports version from _version.py (single source of truth)
README badges: added Discussions, Platform

Fixed

Phase 2 now correctly identifies newly loaded RichEditD2DPT controls by tracking handle snapshots before/after each tab.select(), instead of blindly reading the last handle (which often re-read an already-loaded tab)
Increased Phase 2 tab switch delay from 0.08s to 0.15s for more reliable control loading

[0.1.3] - 2026-02-14

Added

README with features, installation, usage, and architecture docs
GPL-3.0 license
FUNDING.yml (GitHub, Ko-fi, Buy Me A Coffee)
Issue templates for bug reports and feature requests
CONTRIBUTING.md with development setup guide

Changed

CI workflow switched to Windows runners (lint + build)
CODEOWNERS updated to @djdarcy
setup.py: added GPL-3.0 classifier, updated author

[0.1.2] - 2026-02-14

Fixed

Phase 2 duplicate extraction: global dedup across all windows using normalized text hashing (line endings + trailing whitespace stripped)
UIA cross-window bleed: use app.window(handle=) instead of app.top_window() since all Notepad instances share one PID
Phase 2 reads via WM_GETTEXT (same as Phase 1) instead of UIA Document.window_text() — eliminates hash mismatch between methods
Ctrl+C during Claude CLI: use time.sleep() + process.poll() instead of thread.join() which swallows KeyboardInterrupt on Windows

Changed

Each tab preserved as individual file — removed quick-notes.md compaction
Output folder renamed from _reorganized/ to organized/

[0.1.1] - 2026-02-14

Fixed

get_tab_count() rewritten to use UIA descendants(control_type="TabItem") instead of NotepadTextBox child count, which only counted loaded tabs and prevented Phase 2 from triggering
Phase 2 tab enumeration: use descendants() instead of children() chain since WinUI TabItems aren't direct children of the Tab control

Changed

Organizer switched from inline content embedding to Claude Read tool approach: short prompt with --allowedTools Read,Grep, Claude reads files from disk
Removed build_file_listing() and stdin piping (no longer needed)
Added threaded stdout reader for Ctrl+C support during Claude CLI subprocess

[0.1.0] - 2026-02-14

Added

Two-phase extraction: silent WM_GETTEXT (Phase 1) + UIA tab switching (Phase 2)
CLI with extract, organize, run commands (Click + Rich)
AI organization via Claude Code CLI: returns JSON plan, Python executes file ops
Manifest.json tracking for all extracted files
Spike scripts in tests/one-offs/ for UIA exploration

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Changelog

[0.2.4] - 2026-04-10

Fixed

Changed

Tests

[0.2.3] - 2026-03-16

Added

Changed

[0.2.2] - 2026-03-16

Added

Changed

[0.2.1] - 2026-03-16

Added

Changed

[0.2.0] - 2026-03-16

Added

Changed

[0.1.4] - 2026-02-19

Added

Changed

Fixed

[0.1.3] - 2026-02-14

Added

Changed

[0.1.2] - 2026-02-14

Fixed

Changed

[0.1.1] - 2026-02-14

Fixed

Changed

[0.1.0] - 2026-02-14

Added

Uh oh!

Uh oh!

FilesExpand file tree

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog

[0.2.4] - 2026-04-10

Fixed

Changed

Tests

[0.2.3] - 2026-03-16

Added

Changed

[0.2.2] - 2026-03-16

Added

Changed

[0.2.1] - 2026-03-16

Added

Changed

[0.2.0] - 2026-03-16

Added

Changed

[0.1.4] - 2026-02-19

Added

Changed

Fixed

[0.1.3] - 2026-02-14

Added

Changed

[0.1.2] - 2026-02-14

Fixed

Changed

[0.1.1] - 2026-02-14

Fixed

Changed

[0.1.0] - 2026-02-14

Added