Shared metadata engine for Collectarr: canonical catalog, provider ingest, image delivery, admin tooling, and search infrastructure.
Collectarr Core owns the shared catalog and provider pipeline. Personal library
state such as owned items, grades, notes, wishlists, and local tags stays in
collectarr-app and can optionally sync through collectarr-sync. Core is the
place where canonical metadata gets normalized, enriched, indexed, and exposed
to clients.
- Multi-media catalog covering comics, manga, anime, books, games, board games, movies, TV, and music
- Canonical entities for series, volumes, items, editions, variants, releases, people, organizations, story arcs, characters, and shared series tags
- Typed item/search/admin responses so clients consume one normalized metadata contract instead of provider-specific payloads
- Shared editorial metadata that complements local-first personal data in the app
- 10 provider integrations: GCD, ComicVine, Hardcover, AniList, MangaDex, OpenLibrary, BGG, MusicBrainz, IGDB, and TMDb
- Provider-aware title normalization, alias handling, issue matching, and barcode / UPC lookup
- DB-backed ingest queue with retries, status tracking, and worker processing
- Optional Meilisearch indexing for fast catalog queries and richer search previews
- External image URLs by default, with optional MinIO / S3 mirroring for controlled hosting
- MangaDex cover proxy support, WebP normalization, cache budgeting, and origin tracking
- Content-addressed image handling for uploaded assets and derived media variants
- Image cache health surfaced through admin tooling instead of ad hoc scripts
- Admin dashboard in the Collectarr desktop app for provider health, ingest queues, duplicate review, user management, image cache stats, and audit logs
- Role-based access with viewer / editor / admin permissions
- OpenAPI docs at
/docsfor API exploration and schema-backed integration work - Daily-refreshable catalog badges and provider support docs generated from the live registry
The generated schema docs stay in sync with SQLAlchemy metadata and include columns, enums, indexes, defaults, unique constraints, and cross-domain references.
While the schema is still evolving toward its near-final shape, Core ships a
single squashed Alembic baseline (alembic/versions/20260624_1000_clean_schema_baseline.py)
rather than an incremental migration history. The server database starts empty,
so when the models change the baseline is regenerated and the database is
recreated instead of stacking migrations. python -m app.scripts.bootstrap_alembic
builds a fresh database straight from that baseline (alembic upgrade head).
Once the schema stabilizes, normal incremental autogenerated revisions will be
stacked on top of the baseline.
Copy-Item .env.example .env
docker compose up --build -d
docker compose exec api python -m app.scripts.bootstrap_alembic
docker compose exec api python -m app.scripts.seed_comicspython -m pip install -e .[dev]
python -m ruff check .
python -m pytest.\tools\dev.ps1 start # Start Docker stack
.\tools\dev.ps1 start -WithSync # Start Core + collectarr-sync dev stack
.\tools\dev.ps1 migrate # Run Alembic migrations
.\tools\dev.ps1 seed # Seed sample comics data
.\tools\dev.ps1 test # Run test suite
.\tools\dev.ps1 check # Lint + type check
.\tools\dev.ps1 smoke-providers # Smoke test all providers
.\tools\dev.ps1 reset-stack # Clean reset of containers and volumes
python -m scripts.export_provider_supportCore is the canonical source of cross-library metadata. When a provider exposes a new field, wire it through the normalized metadata contract first and only then project it into the client.
- Normalize the field in the provider ingest pipeline.
- Expose it through public schemas used by the app: item responses, search results, and admin / provider previews.
- Add it to Meilisearch documents and display attributes when it should affect search or preview UX.
- Keep field names stable so
collectarr-appcan cache and render the same canonical shape offline.
When normalizing provider data, preserve the provider-native raw payload exactly
as returned upstream. If a workflow also needs the canonical provider item id,
use ProviderItem.provider_item_id alongside the raw mapping instead of
rewriting raw['id'], because some providers expose numeric or kind-specific
identifiers that are not interchangeable with the canonical route id.
That keeps provider growth additive: new library kinds can share the same catalog/search/admin contract instead of inventing parallel app-only fields.
| Service | URL |
|---|---|
| API | http://localhost:8010 |
| API docs (Swagger) | http://localhost:8010/docs |
| Sync service | http://localhost:8020 |
| Meilisearch | http://localhost:7700 |
| MinIO console | http://localhost:9001 |
Release publishing is manual-only. The Release GitHub Actions workflow uses
workflow_dispatch; pushing to main runs CI only and never auto-publishes.
Current stable release: v1.0.0
Current backend image tags:
ghcr.io/collectarr/collectarr-core:v1.0.0ghcr.io/collectarr/collectarr-core:latest
When a releasable version is detected, the workflow publishes a GitHub Release
and pushes the backend container image to ghcr.io/collectarr/collectarr-core
with both the semantic version tag and latest.
The first published GHCR package defaults to private. After the first real
release, open the package page in the collectarr organization and switch
collectarr-core to public before expecting anonymous docker pull
operations to work:
https://github.com/orgs/collectarr/packages/container/package/collectarr-core
For personal LAN deployment on unRAID with Docker Compose, see docs/unraid.md.
The repo includes snapshot badges for total catalog items and per-kind item
counts. .github/workflows/catalog-badges.yml refreshes them on a daily
schedule or manual dispatch.
To switch from placeholder badges to live counts, configure:
COLLECTARR_BADGES_BASE_URLfor the public Core base URL (e.g.,COLLECTARR_BADGES_BASE_URL=http://localhost:8010)COLLECTARR_BADGES_TOKENfor bearer-token access to/admin/catalog/summary
Or, instead of a static token:
COLLECTARR_BADGES_EMAILCOLLECTARR_BADGES_PASSWORD
The workflow logs in through /auth/login when a bearer token is not provided.
| Repo | Purpose |
|---|---|
collectarr-app |
Flutter client for local-first collection browsing, editing, and admin-facing UX |
collectarr-sync |
Optional personal sync service for multi-device shelf state |
See docs/provider-support.md for the generated support matrix derived from the provider registry.
See docs/library-parity-contract.md for the cross-repo active-kind/provider guarantees and enforcement points.
See docs/implementation-plan.md for the full roadmap.
Current active tracks:
- typed-per-kind metadata contract hardening and typed drift diagnostics as release gate
- per-media normalization depth (video/book/manga/game provider mapping)
- duplicate/merge review workflow on top of confidence signals + public-deployment hardening
If Collectarr is useful to you, you can support ongoing development on Ko-fi: