Skip to content

collectarr/collectarr-core

Repository files navigation

Collectarr Core

License: MIT GitHub Release Issues Made with Python FastAPI SQLAlchemy Docker

Catalog items Comics Manga Anime Books Games Board Games Movies TV Music

Shared metadata engine for Collectarr: canonical catalog, provider ingest, image delivery, admin tooling, and search infrastructure.

Collectarr Core owns the shared catalog and provider pipeline. Personal library state such as owned items, grades, notes, wishlists, and local tags stays in collectarr-app and can optionally sync through collectarr-sync. Core is the place where canonical metadata gets normalized, enriched, indexed, and exposed to clients.


✨ Features

πŸ“š Canonical Catalog

  • Multi-media catalog covering comics, manga, anime, books, games, board games, movies, TV, and music
  • Canonical entities for series, volumes, items, editions, variants, releases, people, organizations, story arcs, characters, and shared series tags
  • Typed item/search/admin responses so clients consume one normalized metadata contract instead of provider-specific payloads
  • Shared editorial metadata that complements local-first personal data in the app

πŸ”Œ Provider And Search Pipeline

  • 10 provider integrations: GCD, ComicVine, Hardcover, AniList, MangaDex, OpenLibrary, BGG, MusicBrainz, IGDB, and TMDb
  • Provider-aware title normalization, alias handling, issue matching, and barcode / UPC lookup
  • DB-backed ingest queue with retries, status tracking, and worker processing
  • Optional Meilisearch indexing for fast catalog queries and richer search previews

πŸ–ΌοΈ Image And Storage Infrastructure

  • External image URLs by default, with optional MinIO / S3 mirroring for controlled hosting
  • MangaDex cover proxy support, WebP normalization, cache budgeting, and origin tracking
  • Content-addressed image handling for uploaded assets and derived media variants
  • Image cache health surfaced through admin tooling instead of ad hoc scripts

πŸ› οΈ Admin And Operations

  • Admin dashboard in the Collectarr desktop app for provider health, ingest queues, duplicate review, user management, image cache stats, and audit logs
  • Role-based access with viewer / editor / admin permissions
  • OpenAPI docs at /docs for API exploration and schema-backed integration work
  • Daily-refreshable catalog badges and provider support docs generated from the live registry

🧱 Database Schema

The generated schema docs stay in sync with SQLAlchemy metadata and include columns, enums, indexes, defaults, unique constraints, and cross-domain references.

Migration policy (pre-1.0)

While the schema is still evolving toward its near-final shape, Core ships a single squashed Alembic baseline (alembic/versions/20260624_1000_clean_schema_baseline.py) rather than an incremental migration history. The server database starts empty, so when the models change the baseline is regenerated and the database is recreated instead of stacking migrations. python -m app.scripts.bootstrap_alembic builds a fresh database straight from that baseline (alembic upgrade head). Once the schema stabilizes, normal incremental autogenerated revisions will be stacked on top of the baseline.


πŸš€ Quick Start

Start the Docker stack

Copy-Item .env.example .env
docker compose up --build -d
docker compose exec api python -m app.scripts.bootstrap_alembic
docker compose exec api python -m app.scripts.seed_comics

Run local development tooling

python -m pip install -e .[dev]
python -m ruff check .
python -m pytest

Common helper commands

.\tools\dev.ps1 start            # Start Docker stack
.\tools\dev.ps1 start -WithSync  # Start Core + collectarr-sync dev stack
.\tools\dev.ps1 migrate          # Run Alembic migrations
.\tools\dev.ps1 seed             # Seed sample comics data
.\tools\dev.ps1 test             # Run test suite
.\tools\dev.ps1 check            # Lint + type check
.\tools\dev.ps1 smoke-providers  # Smoke test all providers
.\tools\dev.ps1 reset-stack      # Clean reset of containers and volumes
python -m scripts.export_provider_support

🧩 Extending Metadata For New Libraries

Core is the canonical source of cross-library metadata. When a provider exposes a new field, wire it through the normalized metadata contract first and only then project it into the client.

  1. Normalize the field in the provider ingest pipeline.
  2. Expose it through public schemas used by the app: item responses, search results, and admin / provider previews.
  3. Add it to Meilisearch documents and display attributes when it should affect search or preview UX.
  4. Keep field names stable so collectarr-app can cache and render the same canonical shape offline.

When normalizing provider data, preserve the provider-native raw payload exactly as returned upstream. If a workflow also needs the canonical provider item id, use ProviderItem.provider_item_id alongside the raw mapping instead of rewriting raw['id'], because some providers expose numeric or kind-specific identifiers that are not interchangeable with the canonical route id.

That keeps provider growth additive: new library kinds can share the same catalog/search/admin contract instead of inventing parallel app-only fields.


🌐 Local URLs

Service URL
API http://localhost:8010
API docs (Swagger) http://localhost:8010/docs
Sync service http://localhost:8020
Meilisearch http://localhost:7700
MinIO console http://localhost:9001

πŸ”„ Releases

Release publishing is manual-only. The Release GitHub Actions workflow uses workflow_dispatch; pushing to main runs CI only and never auto-publishes.

Current stable release: v1.0.0

Current backend image tags:

  • ghcr.io/collectarr/collectarr-core:v1.0.0
  • ghcr.io/collectarr/collectarr-core:latest

When a releasable version is detected, the workflow publishes a GitHub Release and pushes the backend container image to ghcr.io/collectarr/collectarr-core with both the semantic version tag and latest.

The first published GHCR package defaults to private. After the first real release, open the package page in the collectarr organization and switch collectarr-core to public before expecting anonymous docker pull operations to work:

  • https://github.com/orgs/collectarr/packages/container/package/collectarr-core

For personal LAN deployment on unRAID with Docker Compose, see docs/unraid.md.


πŸ“ˆ Catalog Badges

The repo includes snapshot badges for total catalog items and per-kind item counts. .github/workflows/catalog-badges.yml refreshes them on a daily schedule or manual dispatch.

To switch from placeholder badges to live counts, configure:

  • COLLECTARR_BADGES_BASE_URL for the public Core base URL (e.g., COLLECTARR_BADGES_BASE_URL=http://localhost:8010)
  • COLLECTARR_BADGES_TOKEN for bearer-token access to /admin/catalog/summary

Or, instead of a static token:

  • COLLECTARR_BADGES_EMAIL
  • COLLECTARR_BADGES_PASSWORD

The workflow logs in through /auth/login when a bearer token is not provided.


πŸ”— Related Repos

Repo Purpose
collectarr-app Flutter client for local-first collection browsing, editing, and admin-facing UX
collectarr-sync Optional personal sync service for multi-device shelf state

πŸ“¦ Provider Support

See docs/provider-support.md for the generated support matrix derived from the provider registry.

🧭 Library Parity Contract

See docs/library-parity-contract.md for the cross-repo active-kind/provider guarantees and enforcement points.

πŸ—ΊοΈ Roadmap

See docs/implementation-plan.md for the full roadmap.

Current active tracks:

  • typed-per-kind metadata contract hardening and typed drift diagnostics as release gate
  • per-media normalization depth (video/book/manga/game provider mapping)
  • duplicate/merge review workflow on top of confidence signals + public-deployment hardening

Support

If Collectarr is useful to you, you can support ongoing development on Ko-fi:

Support me on Ko-fi

About

Canonical metadata backend for Collectarr with provider ingest, image delivery, search, and admin tooling.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors