feat(git-sync): one-way GitHub → Wiki sync#660
Conversation
Add the tracer-bullet spec for one-way GitHub->Wiki sync (read-only synced spaces) plus a headless runner that implements one tracer bullet per iteration, tracking progress in the spec's Progress section. - specs/github_one_way_sync.md: TB1-TB6 plan + Progress checklist - scripts/spec-loop.sh: single-branch loop, one bullet per iteration - scripts/prompt.md: per-iteration policy (build/test/agent-browser, PR) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Backend walking skeleton for one-way GitHub sync (spec specs/github_one_way_sync.md, TB1 carved into TB1a/TB1b). What shipped: - Wiki Space: new "Git Sync" tab — git_synced (immutable after insert, enforced in validate), repo_full_name, branch, docs_subdir, and last_sync_status/commit/time/error (all read_only in desk). Whitelisted sync_now() enqueues the engine on the long queue. - Wiki Document: read_only source_path (repo-relative path); cross-sync identity is (wiki_space, source_path). - New wiki/wiki/git_sync.py: transport-agnostic engine. Module-level _fetch_* GitHub helpers (monkeypatched in tests, optional token for TB4). build_nodes() infers structure (folder→group, .md→leaf, README/index→ folder landing, root landing→root group, H1-or-humanized title, alphabetical sort). _sync_to_live() snapshots the live tree, synthesizes a target Wiki Revision keyed by reused doc_key, and drives the existing _apply_merge_changes_only under in_apply_merge_revision, then stamps source_path back. Head-SHA short-circuit = idempotent no-op syncs. Key decisions: - Reuse the merge applier (external-driven target revision) — the riskiest seam — instead of writing Wiki Document directly. Validated here. - Path-keyed identity: a file path change is delete+add, not a move. Deferred: read-only enforcement + create-space UI (TB1b); changed-SHA-only blob fetch + error rollback/sync log (TB5). Tests: 9 unit tests (mock GitHub HTTP) — inference, add/update/delete/ path-change, source_path/doc_key stability, no-op, git_synced immutability. Demoed live against public BuildWithHussain/giki. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A git-synced Wiki Space is read-only in the wiki — the GitHub repo is the source of truth. Block every content-mutation entry point server-side (defense in depth), letting only the sync engine through. Key decisions: - New shared helpers in wiki/permissions.py: is_git_synced_space(space) (cached git_synced lookup) and assert_space_writable(space), which throws PermissionError unless frappe.flags.in_apply_merge_revision is set — the same flag the sync engine already uses to suppress revision side-effects. - Wired assert_space_writable into the four mutation entry points: get_or_create_draft_change_request, create_change_request, apply_cr_operations (via cr.wiki_space), and reorder_wiki_documents (via _get_wiki_space_for_document). - wiki_document_has_permission now denies write ptypes on a synced space's documents except under the merge flag — covers desk + any path the four entry points miss. The public reader (get_wiki_tree) never creates a CR, so browsing/reading synced content is unaffected. TB1b split: this is TB1b-i. TB1b-ii ships the frontend read-only render (non-CR tree path) + create-space dialog. See specs/github_one_way_sync.md. Tests: 6 new unit tests in TestGitSyncReadOnly (wiki/wiki/test_git_sync.py), each verified to fail when the fix is temp-reverted. Full module: 15 passing. Spec: specs/github_one_way_sync.md (TB1b-i). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…b-ii) Render a git-synced Wiki Space read-only in the authoring SPA and let managers create one. Closes the frontend half of TB1b; spec: specs/github_one_way_sync.md. Key decisions: - SpaceDetails branches on `isGitSynced`: the CR-hydrate watch early-returns for synced spaces; a second watch sources the sidebar from a non-CR `get_wiki_tree` (adapted to the snake_case tree shape, Wiki Document name doubling as doc_key/document_name) and auto-kicks the first sync for a never-synced space. - ContributionBanner is replaced by an inline synced bar (badge + repo@branch link + "Sync now" -> sync_now doc method). - `readonly` threads through WikiDocumentList -> NestedDraggable (disabled drag, hidden handle/dropdown/add buttons) and WikiDocumentPanel -> WikiEditor (editable:false, no toolbar, save/autosave/save-shortcut short-circuited). - SpaceList create dialog gains a "Git synced" switch + repo/branch inputs, passed in the insert payload. Files: SpaceDetails.vue, WikiDocumentList.vue, NestedDraggable.vue, WikiDocumentPanel.vue, WikiEditor.vue, SpaceList.vue, plus e2e git-sync-readonly.spec.ts (temp-revert verified). Next: TB2 — Edit on GitHub action. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds an "Edit on GitHub" item to the synced page's three-dots menu, opening the source file in GitHub's web editor: https://github.com/{repo}/edit/{branch}/{source_path} Key decisions: - New pure helper frontend/src/lib/github.js (buildGithubEditUrl / isEditableSourcePath); the URL resolves only for markdown source_paths. - WikiDocumentPanel reads repo/branch from the cached Wiki Space resource (the one SpaceDetails already loaded) + the doc's source_path. - Engine: reconcile TB1a's deviation — a group's source_path now points at its README/index landing file when one exists (folder path only when there is no landing), so the URL plugs straight in and group landings are editable. - Three-dots Dropdown trigger gains :title="More actions" (a11y + e2e handle). Files: git_sync.py (group source_path), test_git_sync.py (+1 test, updated inference test), lib/github.js (new), WikiDocumentPanel.vue, e2e git-sync-edit-on-github.spec.ts (new), spec reconcile (TB2 as-built + ticked). Spec: specs/github_one_way_sync.md — TB2. Next: TB3 (.wiki.json override). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Let a repo's .wiki.json drive nav order, nesting, and titles instead of alphabetical folder inference (spec: specs/github_one_way_sync.md TB3). - load_wiki_config(): reads .wiki.json from the repo root in the already -fetched tree; raises on malformed JSON / non-object so the sync records a clear error instead of silently falling back. - build_nodes_from_config(): walks ordered nav (single-key dicts; list value = group, string value = leaf path under config.docs_dir/space.docs_subdir). Emits the SAME node shape as build_nodes, so _sync_to_live is untouched: hierarchy rides synthetic dir/parent_dir keys (nav title-chain), nav order rides a zero-padded seg counter the existing sibling sort reproduces as document order, titles come straight from nav. - sync_space(): uses the config path when .wiki.json has a nav, else the TB1 inference path unchanged. nav is authoritative (unlisted/missing files skipped). nav groups are file-less (synthetic source_path, no Edit-on-GitHub, no landing). 7 unit tests (parse present/absent/malformed, order+titles+nesting, missing-file skip, docs_dir override, full sync-to-live with non-alphabetical nav); sync-level test temp-revert verified. Engine-only — no frontend changes. Next: TB4 — GitHub App connection + repo picker (private repos). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Backend-only auth plumbing for the GitHub App connection (TB4 carved into
TB4a/TB4b — far more than one demoable slice, mirroring TB1's split).
- Wiki Settings: new "GitHub App" tab — github_app_id, github_app_client_id,
github_app_public_link (Data) + Secrets section with client_secret,
private_key, webhook_secret (Password, encrypted at rest).
- wiki/api/github.py:
- _app_jwt() signs a short-lived RS256 JWT (iss=app id, 9-min exp under
GitHub's 10-min cap) with the stored private key.
- installation_access_token() exchanges it for a never-stored, short-lived
installation token (the engine's future token source; server-side only).
- installations()/repositories() list the connecting user's installations
and repos (paginated), taking an explicit user OAuth token.
- HTTP rides the module-level requests import so tests redirect it.
- 5 unit tests: real RSA-keypair JWT round-trip, unconfigured-throws, token
mint (URL + Bearer header), installation + repo parse/pagination. Run green;
console-demoed JWT signing + token-mint endpoint live.
Next (TB4b): OAuth round-trip (www/github/{authorize,redirect}) + whitelisted
listing wrappers + repo picker UI + Wiki Space.github_installation_id + engine
minting/threading the token so private repos sync end-to-end.
Spec: specs/github_one_way_sync.md (TB4a).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eading (TB4b)
Completes TB4b of specs/github_one_way_sync.md: private repos sync
end-to-end via a connect-and-pick flow.
Backend:
- wiki/api/github.py: user-to-server OAuth round-trip (single-use CSRF
state, build_authorize_url, exchange_oauth_code, per-user token cache)
+ whitelisted picker wrappers (is_connected/my_installations/
my_repositories) that read the cached token.
- wiki/www/github/{authorize,redirect}.py: portal endpoints that mint
state, 302 to GitHub, verify+exchange the code, and bounce back.
- Wiki Space.github_installation_id (read-only); git_sync.sync_space
mints a short-lived installation token from it when no token is passed.
Frontend:
- SpaceList.vue create dialog replaces the plain-text repo/branch inputs
with a "Connect GitHub" popup flow + installation/repo Autocompletes
feeding github_installation_id/repo_full_name/branch into the payload.
Tests: 5 new units in test_github.py (OAuth state, authorize URL, code
exchange success/error, token cache + wrapper forwarding) and 1 in
test_git_sync.py (engine token mint); all pass (10 github + 24 git_sync).
Connect prompt demoed live via agent-browser.
Next: TB5 — sync log & status panel.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Make one-way GitHub sync observable (spec TB5). - New `Wiki Git Sync Log` doctype: one row per run with status (Running/Success/No Change/Error), commit SHA, start/finish, and created/updated/deleted/moved counts. - `git_sync.py` writes a log in every branch via `_start_sync_log` / `_finalize_sync_log`; counts come from `_diff_counts`, which reuses the merge applier's `_find_changed_keys`/`_classify_changes` (changed parent_key = move, else update). `_sync_to_live` now returns the counts. - `GitSyncPanel.vue` (repo/branch, last-sync, Sync now, run history) shown under a new conditional "Git Sync" tab in `SpaceSettings.vue`. - 4 backend units in `TestGitSyncLog` (temp-revert verified); 28 git_sync tests pass. Demoed on BuildWithHussain/giki. Next: TB6 — real-time webhook sync. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Push-driven sync so a git-synced space updates without a manual click. - wiki/api/github.py: allow_guest `webhook()` receiver → signature-gated `_dispatch_webhook` (constant-time HMAC-SHA256 over the raw body), routes `push` events to branch-matched git-synced spaces and enqueues `git_sync.sync_space(trigger="Webhook")`; non-push events are accepted-and-ignored, bad signatures raise PermissionError. - git_sync.py: `trigger` arg threaded onto the sync log. - wiki_git_sync_log.json: `trigger` field (Manual/Webhook). - GitSyncPanel.vue: Webhook badge + copyable webhook payload URL. Tests: 6 webhook unit tests (signature valid/invalid/missing, push routing, non-push ignored) — all green. Engine reuse keeps redundant deliveries idempotent via the head-SHA short-circuit. Spec: specs/github_one_way_sync.md (TB6). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`gh pr edit` fails on a deprecated Projects GraphQL field. Give the loop an
explicit recipe: create with `gh pr create`, update the body via
`gh api -X PATCH repos/{repo}/pulls/{n} -F body=@file`.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
One-click GitHub App creation replacing TB4a's manual five-credential paste. A "Create GitHub App" button on the Wiki Settings desk form opens /github/new_app, which POSTs a read-only-sync manifest to GitHub; GitHub creates the App and redirects to /github/manifest_redirect, which converts the temporary code into the App's id/client id+secret/private key/webhook secret and stores them in Wiki Settings. GitHub rejects manifests whose webhook host isn't publicly reachable, so is_public_host() drops hook_attributes on localhost/private hosts (the admin wires the webhook later from the Git Sync panel). Manual paste still works untouched. 5 unit tests (manifest shape, webhook-omitted-for-unreachable-host, personal-vs-org URL, conversion POST, credential storage). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Deep folder trees build a Wiki Document route from the full ancestor chain (mirroring the authoring flow), which overflows the default 140-char Data limit and aborts the sync with CharacterLengthExceededError (the .md source path is itself ~140 chars). Widen Wiki Document.route, Wiki Revision Item.route, and Wiki Document.source_path to length 500. Regression test test_deeply_nested_long_route_syncs (fails at each field before its widening). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
GitHub rejects an App manifest that subscribes to events without a
reachable hook URL ("Hook url cannot be blank"). On non-public hosts,
omit default_events alongside hook_attributes so the webhook-less App
still validates; public hosts auto-wire both.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
GitHub's manifest docs mark hook_attributes.url as Required and reject non-public hosts, so neither a wiki.localhost hook nor an omitted hook validates. On non-public hosts register an inactive RFC-2606 placeholder (example.com) hook with no events; public hosts keep the live webhook + push subscription. The admin repoints the hook once deployed publicly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
GitHub redirects to /github/manifest_redirect via GET, which Frappe does not auto-commit, so the stored App credentials rolled back and the desk fields stayed empty. Explicitly frappe.db.commit() after storing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
After creating/connecting the GitHub App, the create-space picker showed "No results found" with no way forward when the App wasn't installed on any account. Add app_install_url() and an "Install GitHub App" prompt (popup + poll) shown when connected but installations are empty; the account/repo/branch picker appears once an installation exists. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
frappe-ui Autocomplete (and Select) portal their dropdown outside the modal Dialog, which sets pointer-events:none on everything outside its content — so option clicks never registered. Swap both pickers to native <select> elements (OS-rendered dropdown, immune to the portal/pointer trap); bind directly to github_installation_id/repo_full_name and derive the branch default from the selected repo. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A whole-repo scan (empty docs_subdir) pulled in .github issue templates as wiki content. Skip any dot-prefixed path segment (.github, .vscode, …) in structure inference, and add an optional "Docs folder" input to the Create Wiki Space dialog so the sync can be scoped to e.g. docs/. Unit test test_build_nodes_ignores_dot_directories. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Routes were slugged from the page title (H1), so two files sharing an H1 (e.g. frappe_docs developer-api) resolved to the same route and the whole sync aborted on validate_unique_route_for_leaves. Slug from the unique path segment instead, and dedupe any residual collision with a numeric suffix (deterministic by source_path) so one dup can never fail the sync. Regression test test_colliding_slugs_get_unique_routes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The connect-account OAuth token lived only in Redis (8h TTL), so every bench restart / expiry forced a reconnect. Persist it per-user in a new Wiki GitHub Connection doctype (access + refresh token + expiry), survive cache loss, and transparently refresh expired access tokens via the stored refresh token — connect becomes a one-time step per user. exchange_oauth_code now returns the full token payload; redirect.py commits the persisted connection (GET request). 2 new unit tests (persist-across-cache-eviction, refresh-on-expiry). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ite links Synced pages referencing repo-relative images (``, `../assets/y.png`) rendered dead links — the bytes never entered the wiki. The engine now imports those images and rewrites the Markdown to the stored `/files/…` URL, one-way. - `_fetch_blob_bytes`: raw/binary blob fetch (no utf-8 decode), monkeypatched in tests alongside the other `_fetch_*` helpers. - `_is_repo_relative_image`: skip http(s)/protocol-relative/data:/mailto:/ in-page anchors and already-stored `/files/` URLs. - `_import_repo_image`: one Frappe File per (space, blob SHA), named `gitimg-<sha>.<ext>`, attached to the Wiki Space. Idempotent via a `file_name LIKE` lookup (survives WebP's .png->.webp rename); runs convert_file_to_webp when auto_convert_images_to_webp is on. - `_rewrite_image_links`: resolve src against the source file's dir (incl. ./ and ../ via posixpath.normpath), rewrite before get_or_create_content_blob so the deduped blob carries final, stable URLs. Threaded into build_nodes / build_nodes_from_config via an optional `space` (None = no-op). Privacy decision resolved: v1 ships public files + documented caveat (a private repo's images are world-readable by URL); the permission-gated private-files path is deferred to TB8b. 6 unit tests (relative ../ resolution, external/data: passthrough, missing image untouched, SHA-idempotent re-import, WebP on/off); the 4 rewrite-positive tests are temp-revert verified. _FakeRepo now emits slash-free hex blob SHAs (real-GitHub-shaped, stateless) so File-name idempotency isn't masked. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Real docs repos (Docusaurus/Hugo/Jekyll/mkdocs) prefix .md with a YAML
front-matter block; stored verbatim it renders as a setext H2 ("title: …\n---")
and mis-titles pages that have no H1. TB9 strips the leading block before
render and uses front-matter `title` (precedence: FM title → H1 → humanized
filename). Malformed FM left intact; .wiki.json nav title stays authoritative;
ordering fields (sidebar_position/weight) deferred.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Docs repos (Docusaurus/Hugo/Jekyll/mkdocs) prefix .md files with a YAML
front-matter block. Stored verbatim it renders as a setext H2 ("title: …\n---")
atop every page, and pages with no H1 get mis-titled from their filename.
- strip_front_matter(raw) -> (body, meta): match a leading ---…--- block
(BOM/CRLF tolerant, non-greedy), yaml.safe_load the body. Return (raw, {})
untouched on no-match / parse error / non-mapping result, so a leading
thematic break or malformed block is left intact rather than eaten.
- _front_matter_title(meta): use `title` only when a non-bool scalar.
- build_nodes' content_of now returns (body, fm_title); title precedence is
fm_title -> first H1 -> humanized filename, for leaf pages and folder
landings. build_nodes_from_config strips the leaf body too but keeps the
.wiki.json nav title authoritative. Strip runs before image rewrite (TB8a)
and before content-blob, so the deduped blob holds clean body.
Only the leading block is treated as front matter; a mid-body --- is untouched.
Ordering fields (sidebar_position/weight) deferred.
11 unit tests in TestGitSyncFrontMatter (strip+meta, CRLF/BOM, no-FM regression,
malformed-intact, leading-thematic-break-intact, mid-body untouched, strip+title,
H1 fallback, humanized fallback, nav-title-wins, FM-title->live doc end-to-end);
the 7 strip-dependent tests are temp-revert verified.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
semgrep ci flagged four new findings introduced by this branch:
- git_sync.py: pass str(exc) (not the exception object) to a translated
.format() — frappe-format-string-injection.
- api/github.webhook: allow_guest is required (GitHub posts webhooks
unauthenticated; the body is HMAC-verified before any work) — # nosemgrep
with rationale.
- www/github/{redirect,manifest_redirect}: the GET-redirect commits are
required (Frappe doesn't auto-commit a GET) and already documented —
# nosemgrep on each.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
test_repositories_parsed_and_paginated matched "page=1" in the request URL to decide which page to return — but the URL also carries "per_page=100", and "page=1" is a substring of "per_page=100". So the mock returned the full 100-item first page for *every* page, len(batch) was never < 100, and github.repositories() looped forever. In CI the server-test job hung here until the runner timed out (failing the whole job). Match "&page=1" instead so the page param is disambiguated from per_page. Production repositories() pagination was always correct — test-only fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Confidence Score: 3/5The auth and permission layers are solid, but the sync engine has a correctness bug that will silently break re-syncs for any space with more than 20 pages. The missing wiki/wiki/git_sync.py — the
|
| Filename | Overview |
|---|---|
| wiki/wiki/git_sync.py | Core sync engine; contains a P1 bug where frappe.get_all without limit=0 silently truncates live_docs at 20, breaking stable doc_key identity for spaces with >20 pages on re-sync. Also has N+1 queries in the source_path stamping loop. |
| wiki/api/github.py | GitHub App auth, OAuth, and webhook plumbing. HMAC signature verification is correct (constant-time). OAuth CSRF state handling is sound. Minor style issue: save(ignore_permissions=True) in store_app_credentials lacks inline comment. |
| wiki/permissions.py | New is_git_synced_space / assert_space_writable helpers and write-deny in wiki_document_has_permission. Bypass via in_apply_merge_revision flag is correct and well-contained. |
| wiki/wiki/doctype/wiki_space/wiki_space.py | Adds git_synced immutability check and sync_now doc method. Missing write-level permission guard on sync_now() — any reader can enqueue a sync via the API. |
| wiki/www/github/redirect.py | OAuth callback handler; verifies CSRF state, exchanges code for token, persists it, and redirects. Login check and state verification are correct. |
| wiki/www/github/new_app.py | App manifest creation flow; checks login and write permission on Wiki Settings before proceeding. Correct permission gating. |
| wiki/www/github/manifest_redirect.py | Manifest redirect handler; verifies state and permission before calling store_app_credentials. Correct. |
| frontend/src/pages/SpaceDetails.vue | Adds isGitSynced branching to prevent CR hydration and serve non-CR wiki tree for synced spaces. Read-only enforcement in frontend, correctly paired with server-side guards. |
Sequence Diagram
%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant GH as GitHub
participant WH as /api/method/wiki.api.github.webhook
participant Q as Background Queue
participant SE as sync_space()
participant API as GitHub REST API
participant DB as Frappe DB
GH->>WH: POST push event (HMAC-signed)
WH->>WH: _verify_signature (HMAC-SHA256)
WH->>DB: get_all Wiki Space (repo+branch match)
WH->>Q: frappe.enqueue(sync_space, space)
Note over Q,SE: Background worker picks up job
SE->>DB: get_doc Wiki Space
SE->>API: "GET /repos/{repo}/git/ref (head SHA)"
SE->>SE: short-circuit if SHA unchanged
SE->>API: "GET /repos/{repo}/git/trees/{sha}?recursive=1"
SE->>SE: load_wiki_config / build_nodes
SE->>API: GET blobs per page (content fetch)
SE->>DB: create_revision_from_live_tree (live snapshot)
SE->>DB: "frappe.get_all Wiki Document (live_docs — missing limit=0!)"
SE->>DB: insert Wiki Revision + Wiki Revision Items
SE->>DB: _apply_merge_changes_only (under in_apply_merge_revision)
SE->>DB: stamp source_path per node (N+1 loop)
SE->>DB: "set_value Wiki Space last_sync_*"
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant GH as GitHub
participant WH as /api/method/wiki.api.github.webhook
participant Q as Background Queue
participant SE as sync_space()
participant API as GitHub REST API
participant DB as Frappe DB
GH->>WH: POST push event (HMAC-signed)
WH->>WH: _verify_signature (HMAC-SHA256)
WH->>DB: get_all Wiki Space (repo+branch match)
WH->>Q: frappe.enqueue(sync_space, space)
Note over Q,SE: Background worker picks up job
SE->>DB: get_doc Wiki Space
SE->>API: "GET /repos/{repo}/git/ref (head SHA)"
SE->>SE: short-circuit if SHA unchanged
SE->>API: "GET /repos/{repo}/git/trees/{sha}?recursive=1"
SE->>SE: load_wiki_config / build_nodes
SE->>API: GET blobs per page (content fetch)
SE->>DB: create_revision_from_live_tree (live snapshot)
SE->>DB: "frappe.get_all Wiki Document (live_docs — missing limit=0!)"
SE->>DB: insert Wiki Revision + Wiki Revision Items
SE->>DB: _apply_merge_changes_only (under in_apply_merge_revision)
SE->>DB: stamp source_path per node (N+1 loop)
SE->>DB: "set_value Wiki Space last_sync_*"
Prompt To Fix All With AI
Fix the following 4 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 4
wiki/wiki/git_sync.py:517-521
**`frappe.get_all` missing `limit` — re-syncs silently truncate at 20 docs**
`frappe.get_all` defaults to `limit_page_length=20`, so `live_docs` will only contain the first 20 Wiki Documents in the space. On a re-sync of any space with more than 20 pages, every page beyond the 20th has no entry in `src_to_key`, so `node["doc_key"]` is reassigned a fresh `frappe.generate_hash(length=12)` each time — breaking the stable identity guarantee. The merge applier will then create duplicate pages instead of updating the existing ones.
Add `limit=0` (or a sufficiently large value) to the query.
### Issue 2 of 4
wiki/wiki/git_sync.py:638-643
**N+1 query pattern in `source_path` stamping**
Two DB calls (`get_value` + `set_value`) are made per node inside this loop. For a repo with hundreds of pages this creates significant DB churn on every sync run. The `doc_key → name` mapping is already computable from the `_apply_merge_changes_only` result, or a single bulk `get_all(filters={"doc_key": ["in", ...]})` lookup could build the map before the loop.
### Issue 3 of 4
wiki/api/github.py:?
`save(ignore_permissions=True)` is used here without an explanatory inline comment, which obscures why the permission bypass is safe. The caller verifies `frappe.has_permission("Wiki Settings", "write")`, but that context is invisible at the call site.
```suggestion
settings.save(ignore_permissions=True) # caller verified write permission on Wiki Settings
frappe.clear_cache(doctype="Wiki Settings")
```
### Issue 4 of 4
wiki/wiki/doctype/wiki_space/wiki_space.py:68-82
**`sync_now()` lacks an explicit write-permission check**
Frappe only validates `read` permission when a `@frappe.whitelist()` document method is invoked via the API. Any user with read access to the Wiki Space can therefore call `sync_now()` directly, potentially hammering the GitHub API rate limits for the installation token on every request. Adding `frappe.only_for(["Wiki Manager", "Wiki Approver"])` or a `can_write_space` guard would restrict triggering syncs to authorised users.
Reviews (1): Last reviewed commit: "fix(git-sync): test mock infinite loop h..." | Re-trigger Greptile
| live_docs = frappe.get_all( | ||
| "Wiki Document", | ||
| fields=["name", "doc_key", "source_path"], | ||
| filters=[["lft", ">=", root_lft], ["rgt", "<=", root_rgt]], | ||
| ) |
There was a problem hiding this comment.
frappe.get_all missing limit — re-syncs silently truncate at 20 docs
frappe.get_all defaults to limit_page_length=20, so live_docs will only contain the first 20 Wiki Documents in the space. On a re-sync of any space with more than 20 pages, every page beyond the 20th has no entry in src_to_key, so node["doc_key"] is reassigned a fresh frappe.generate_hash(length=12) each time — breaking the stable identity guarantee. The merge applier will then create duplicate pages instead of updating the existing ones.
Add limit=0 (or a sufficiently large value) to the query.
Prompt To Fix With AI
This is a comment left during a code review.
Path: wiki/wiki/git_sync.py
Line: 517-521
Comment:
**`frappe.get_all` missing `limit` — re-syncs silently truncate at 20 docs**
`frappe.get_all` defaults to `limit_page_length=20`, so `live_docs` will only contain the first 20 Wiki Documents in the space. On a re-sync of any space with more than 20 pages, every page beyond the 20th has no entry in `src_to_key`, so `node["doc_key"]` is reassigned a fresh `frappe.generate_hash(length=12)` each time — breaking the stable identity guarantee. The merge applier will then create duplicate pages instead of updating the existing ones.
Add `limit=0` (or a sufficiently large value) to the query.
How can I resolve this? If you propose a fix, please make it concise.| for node in nodes: | ||
| name = frappe.db.get_value("Wiki Document", {"doc_key": node["doc_key"]}, "name") | ||
| if name: | ||
| frappe.db.set_value( | ||
| "Wiki Document", name, "source_path", node["source_path"], update_modified=False | ||
| ) |
There was a problem hiding this comment.
N+1 query pattern in
source_path stamping
Two DB calls (get_value + set_value) are made per node inside this loop. For a repo with hundreds of pages this creates significant DB churn on every sync run. The doc_key → name mapping is already computable from the _apply_merge_changes_only result, or a single bulk get_all(filters={"doc_key": ["in", ...]}) lookup could build the map before the loop.
Prompt To Fix With AI
This is a comment left during a code review.
Path: wiki/wiki/git_sync.py
Line: 638-643
Comment:
**N+1 query pattern in `source_path` stamping**
Two DB calls (`get_value` + `set_value`) are made per node inside this loop. For a repo with hundreds of pages this creates significant DB churn on every sync run. The `doc_key → name` mapping is already computable from the `_apply_merge_changes_only` result, or a single bulk `get_all(filters={"doc_key": ["in", ...]})` lookup could build the map before the loop.
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| @frappe.whitelist() | ||
| def sync_now(self) -> dict: | ||
| """Enqueue a one-way sync from the configured GitHub repo.""" | ||
| if not self.git_synced: | ||
| frappe.throw(_("This Wiki Space is not git-synced.")) | ||
|
|
||
| frappe.enqueue( | ||
| "wiki.wiki.git_sync.sync_space", | ||
| queue="long", | ||
| job_name=f"wiki_git_sync:{self.name}", | ||
| space_name=self.name, | ||
| token=None, | ||
| ) | ||
| frappe.db.set_value("Wiki Space", self.name, "last_sync_status", "Pending", update_modified=False) | ||
| return {"status": "queued"} |
There was a problem hiding this comment.
sync_now() lacks an explicit write-permission check
Frappe only validates read permission when a @frappe.whitelist() document method is invoked via the API. Any user with read access to the Wiki Space can therefore call sync_now() directly, potentially hammering the GitHub API rate limits for the installation token on every request. Adding frappe.only_for(["Wiki Manager", "Wiki Approver"]) or a can_write_space guard would restrict triggering syncs to authorised users.
Prompt To Fix With AI
This is a comment left during a code review.
Path: wiki/wiki/doctype/wiki_space/wiki_space.py
Line: 68-82
Comment:
**`sync_now()` lacks an explicit write-permission check**
Frappe only validates `read` permission when a `@frappe.whitelist()` document method is invoked via the API. Any user with read access to the Wiki Space can therefore call `sync_now()` directly, potentially hammering the GitHub API rate limits for the installation token on every request. Adding `frappe.only_for(["Wiki Manager", "Wiki Approver"])` or a `can_write_space` guard would restrict triggering syncs to authorised users.
How can I resolve this? If you propose a fix, please make it concise.
Why?
Let a Wiki Space mirror a GitHub repo's docs one way (repo → wiki): the repo is the source of truth and the space is read-only in the wiki. See
specs/github_one_way_sync.md.What?
Shipping in tracer-bullet slices:
Wiki Space"Git Sync" tab (git_syncedimmutable after insert,repo_full_name,branch,docs_subdir,last_sync_*), whitelistedsync_now(),Wiki Document.source_pathidentity, andwiki/wiki/git_sync.py(fetch repo tree, infer structure, apply)./edit/{branch}/{source_path}); group landing pages point at theirREADME.md/index.md..wiki.jsonstructure override. A repo can ship a root.wiki.json(docs_dir+ ordered, nestablenav) to drive sidebar order, nesting, and titles instead of alphabetical folder inference; absent → inference unchanged.Wiki Settings"GitHub App" tab andwiki/api/github.pythat signs an App JWT, mints short-lived installation tokens (so private repos work, no per-space secrets), and lists installations/repos.How?
TB1a: the engine synthesizes a target Wiki Revision outside a Change Request and drives the existing merge applier (
_apply_merge_changes_only) underin_apply_merge_revision, reusingdoc_keypersource_path. Head-SHA short-circuit → idempotent re-syncs.TB1b-i: new
is_git_synced_space()/assert_space_writable()helpers inwiki/permissions.py(bypass viain_apply_merge_revision), wired intoget_or_create_draft_change_request,create_change_request,apply_cr_operations, andreorder_wiki_documents; plus a write-deny inwiki_document_has_permission. The public reader (get_wiki_tree) never creates a CR, so browsing synced content is unaffected.TB1b-ii:
SpaceDetails.vuebranches onisGitSynced— the CR-hydrate watch early-returns and the sidebar is sourced from a non-CRget_wiki_treeinstead.ContributionBannerbecomes an inline "Git synced — read only" bar (repo@branch link + "Sync now"). Areadonlyprop threads throughWikiDocumentList → NestedDraggable(no drag/add/row-actions) andWikiDocumentPanel → WikiEditor(editable:false, no toolbar, save short-circuited).SpaceList.vue's create dialog gains a "Git synced" switch + repo/branch inputs. New e2egit-sync-readonly.spec.ts.TB3:
load_wiki_config()reads the root.wiki.jsonfrom the already-fetched tree (raises on malformed JSON);build_nodes_from_config()walksnavand emits the same node shape as the inference path, so the merge applier is untouched — hierarchy rides synthetic dir keys, nav order rides a zero-padded sort key, titles come fromnav. 7 unit tests.TB2: new pure helper
frontend/src/lib/github.js(buildGithubEditUrl) + an "Edit on GitHub" item inWikiDocumentPanelbuilt from the cachedWiki Spacerepo/branch and the doc'ssource_path. The engine now stamps a group'ssource_pathas its README/index landing file so the link opens an editable file. New e2egit-sync-edit-on-github.spec.ts.TB4a:
_app_jwt()signs a short-lived RS256 JWT (iss=app id, 9-min exp) with the stored private key;installation_access_token()exchanges it atPOST /app/installations/{id}/access_tokensfor a never-stored installation token (the engine's future token source);installations()/repositories()list the connecting user's installations/repos (paginated) from a user OAuth token. HTTP rides the module-levelrequestsimport so tests redirect it. 5 unit tests. TB4 was carved into TB4a (this, backend) and TB4b (OAuth round-trip + repo picker UI + engine token threading).Progress
sync_now.wiki.jsonstructure override:load_wiki_config+build_nodes_from_config, 7 unit testsWiki Git Sync Logdoctype (per-run counts) +GitSyncPanel.vuehistory; 4 unit tests