Comprehensive reference for AI coding agents. Skill Seekers is a Python CLI tool (v3.6.0) that converts documentation sites, GitHub repos, PDFs, videos, notebooks, wikis, and more into AI-ready skills for 21+ LLM platforms and RAG pipelines.
Skill Seekers is a universal preprocessing layer that transforms raw documentation and code into structured knowledge assets. It supports 17+ source types and exports to 21+ AI platforms including Claude, Gemini, OpenAI, LangChain, LlamaIndex, and various vector databases.
- Source Types (17): Documentation websites, GitHub repos, PDFs, Word docs, EPUBs, videos, local codebases, Jupyter notebooks, HTML, OpenAPI specs, AsciiDoc, PowerPoint, Confluence, Notion, RSS feeds, man pages, chat exports
- Export Targets (21): Claude, Gemini, OpenAI, MiniMax, OpenCode, Kimi, DeepSeek, Qwen, OpenRouter, Together AI, Fireworks AI, Markdown, LangChain, LlamaIndex, Haystack, Weaviate, ChromaDB, FAISS, Qdrant, Pinecone
- MCP Server: FastMCP-based Model Context Protocol server for AI assistant integration
# REQUIRED before running tests (src/ layout — tests hard-exit if package not installed)
pip install -e .
# With dev tools (pytest, ruff, mypy, coverage)
pip install -e ".[dev]"
# With specific LLM platform support
pip install -e ".[gemini]" # Google Gemini
pip install -e ".[openai]" # OpenAI ChatGPT
pip install -e ".[all-llms]" # All LLM platforms
# With all optional dependencies (except video-full)
pip install -e ".[all]"
# Full video processing (heavy dependencies)
pip install -e ".[video-full]"Note: tests/conftest.py checks that skill_seekers is importable and calls sys.exit(1) if not. Always install in editable mode first.
Create a .env file or export these variables:
ANTHROPIC_API_KEY # For Claude AI enhancement
GOOGLE_API_KEY # For Gemini support
OPENAI_API_KEY # For OpenAI support
GITHUB_TOKEN # For GitHub repo scraping (higher rate limits)# Full suite (never skip — all must pass)
pytest tests/ -v
# Fast iteration (skip slow, integration, E2E, network, MCP)
pytest tests/ -m "not slow and not integration and not e2e and not network and not serial and not mcp_only" -q
# Fast parallel (install pytest-xdist first)
pytest tests/ -n auto --dist=loadfile -m "not slow and not integration and not e2e and not network and not serial and not mcp_only" -q
# 3-phase runner script (recommended for local dev)
bash scripts/run_tests_fast.sh
# Single test
pytest tests/test_scraper_features.py::test_detect_language -v
# Skip slow/integration
pytest tests/ -v -m "not slow and not integration"
# With coverage
pytest tests/ --cov=src/skill_seekers --cov-report=term
# Lint + format check (matches CI)
ruff check src/ tests/
ruff format --check src/ tests/
# Type check (non-blocking — mypy is continue-on-error in CI)
mypy src/skill_seekers --show-error-codes --prettyPytest config: asyncio_mode = "auto", so @pytest.mark.asyncio is implicit. Test markers: slow, integration, e2e, venv, bootstrap, benchmark, asyncio, serial, network, mcp_only.
CI note: CI pins ruff==0.15.8 (not the >=0.14.13 dev dep). If formatting behaves differently locally, check the CI version.
CI test phases: Tests are split into 3 parallel jobs:
test-fast— 3386 unit tests with xdist across OS/Python matrixtest-serial— 69 serial/integration/E2E/network teststest-mcp— 193 MCP tests (requires[mcp]extras)
- Line length: 100 characters
- Target Python: 3.10+
- Enabled lint rules: E, W, F, I, B, C4, UP, ARG, SIM
- Ignored rules: E501 (line length handled by formatter), F541 (f-string style), ARG002 (unused method args for interface compliance), B007 (intentional unused loop vars), I001 (formatter handles imports), SIM114 (readability preference)
- Sort with isort (via ruff);
skill_seekersis first-party - Standard library → third-party → first-party, separated by blank lines
- Use
from __future__ import annotationsonly if needed for forward refs - Guard optional imports with try/except ImportError (see
adaptors/__init__.pypattern):try: from .claude import ClaudeAdaptor from .minimax import MiniMaxAdaptor except ImportError: ClaudeAdaptor = None MiniMaxAdaptor = None
- Files:
snake_case.py(e.g.,source_detector.py,config_validator.py) - Classes:
PascalCase(e.g.,SkillAdaptor,ClaudeAdaptor,SourceDetector) - Functions/methods:
snake_case(e.g.,get_adaptor(),detect_language()) - Constants:
UPPER_CASE(e.g.,ADAPTORS,DEFAULT_CHUNK_TOKENS,VALID_SOURCE_TYPES) - Private: prefix with
_(e.g.,_read_existing_content(),_validate_unified())
- Gradual typing — add hints where practical, not enforced everywhere
- Use modern syntax:
str | NonenotOptional[str],list[str]notList[str] - MyPy config:
disallow_untyped_defs = false,check_untyped_defs = true,ignore_missing_imports = true - Tests are excluded from strict type checking (
disallow_untyped_defs = false,check_untyped_defs = falsefortests.*)
- Module-level docstring on every file (triple-quoted, describes purpose)
- Google-style docstrings for public functions/classes
- Include
Args:,Returns:,Raises:sections where useful
- Use specific exceptions, never bare
except: - Provide helpful error messages with context
- Use
raise ValueError(...)for invalid arguments,raise RuntimeError(...)for state errors - Guard optional dependency imports with try/except and give clear install instructions on failure
- Chain exceptions with
raise ... from ewhen wrapping
- Use inline
# noqa: XXXXcomments (e.g.,# noqa: F401for re-exports,# noqa: ARG001for required but unused params)
src/skill_seekers/ # Main package (src/ layout)
cli/ # CLI commands and entry points (100+ files)
adaptors/ # Platform adaptors (Strategy pattern, inherit SkillAdaptor)
arguments/ # CLI argument definitions (one per source type)
parsers/ # Subcommand parsers (one per source type)
storage/ # Cloud storage (inherit BaseStorageAdaptor)
main.py # Unified CLI entry point (COMMAND_MODULES dict)
source_detector.py # Auto-detects source type from user input
create_command.py # Unified `create` command routing
config_validator.py # VALID_SOURCE_TYPES set + per-type validation
unified_scraper.py # Multi-source orchestrator (scraped_data + dispatch)
unified_skill_builder.py # Pairwise synthesis + generic merge
mcp/ # MCP server (FastMCP + legacy)
tools/ # MCP tool implementations by category (10 files)
server_fastmcp.py # FastMCP server implementation
server_legacy.py # Legacy MCP server
sync/ # Sync monitoring (Pydantic models)
benchmark/ # Benchmarking framework
embedding/ # FastAPI embedding server
workflows/ # 67 YAML workflow presets
_version.py # Reads version from pyproject.toml
tests/ # 160 test files (pytest)
test_adaptors/ # 22 adaptor-specific test files
conftest.py # Test configuration with package check
configs/ # Preset JSON scraping configs
docs/ # Documentation (guides, integrations, architecture)
Adaptor (Strategy) pattern — all platform logic in cli/adaptors/. Inherit SkillAdaptor, implement format_skill_md(), package(), upload(). Register in adaptors/__init__.py ADAPTORS dict.
Scraper pattern — each source type has: cli/<type>_scraper.py (with <Type>ToSkillConverter class + main()), arguments/<type>.py, parsers/<type>_parser.py. Register in parsers/__init__.py PARSERS list, main.py COMMAND_MODULES dict, config_validator.py VALID_SOURCE_TYPES set.
Unified pipeline — unified_scraper.py dispatches to per-type _scrape_<type>() methods. unified_skill_builder.py uses pairwise synthesis for docs+github+pdf combos and _generic_merge() for all other combinations.
MCP tools — grouped in mcp/tools/ by category. scrape_generic_tool handles all new source types.
CLI subcommands — git-style in cli/main.py. Each delegates to a module's main() function.
Supported source types (17): documentation (web), github, pdf, local, word, video, epub, jupyter, html, openapi, asciidoc, pptx, confluence, notion, rss, manpage, chat. Each detected automatically by source_detector.py.
Supported platforms (21): claude, gemini, openai, minimax, opencode, kimi, deepseek, qwen, openrouter, together, fireworks, markdown, langchain, llama-index, haystack, weaviate, chroma, faiss, qdrant, pinecone.
# Core commands
skill-seekers create <source> # Create skill from any source (auto-detects type)
skill-seekers scan <dir> # AI-detect a project's tech stack and emit per-framework configs
skill-seekers enhance <directory> # AI-powered enhancement
skill-seekers package <directory> # Package skill for target platform
skill-seekers upload <file> # Upload skill to target platform
skill-seekers install <source> # One-command workflow (scrape + enhance + package + upload)
# Utilities
skill-seekers estimate <source> # Estimate page count before scraping
skill-seekers doctor # Health check for dependencies
skill-seekers config # Configure API keys and settings
skill-seekers workflows # List and apply workflow presets
skill-seekers resume <job_id> # Resume interrupted scraping
# Advanced
skill-seekers stream <source> # Streaming ingestion
skill-seekers update <directory> # Incremental update
skill-seekers multilang <directory> # Multi-language support- Unit tests:
tests/test_*.py— test individual modules - Adaptor tests:
tests/test_adaptors/test_*_adaptor.py— test platform adaptors - E2E tests:
tests/test_*_e2e.py— end-to-end integration tests
# Fast test run (skip slow/integration tests)
pytest tests/ -v -m "not slow and not integration"
# Full test suite
pytest tests/ -v
# With coverage report
pytest tests/ --cov=src/skill_seekers --cov-report=term-missing
# Specific test categories
pytest tests/ -v -m "slow" # Only slow tests
pytest tests/ -v -m "integration" # Only integration tests
pytest tests/ -v -m "e2e" # Only E2E testsTest fixtures are located in tests/fixtures/ and include sample configs, HTML files, and mock data.
main— production, protecteddevelopment— default PR target, active dev- Feature branches created from
development
ruff check src/ tests/
ruff format --check src/ tests/
pytest tests/ -v -x # stop on first failureNever commit API keys. Use env vars: ANTHROPIC_API_KEY, GOOGLE_API_KEY, OPENAI_API_KEY, GITHUB_TOKEN.
GitHub Actions (7 workflows in .github/workflows/):
- tests.yml — ruff + mypy lint job, then pytest matrix (Ubuntu + macOS, Python 3.10-3.12) with Codecov upload
- release.yml — tag-triggered: tests → version verification → PyPI publish via
uv build - test-vector-dbs.yml — tests vector DB adaptors (weaviate, chroma, faiss, qdrant)
- docker-publish.yml — multi-platform Docker builds (amd64, arm64) for CLI + MCP images
- quality-metrics.yml — quality analysis with configurable threshold
- scheduled-updates.yml — weekly skill updates for popular frameworks
- vector-db-export.yml — weekly vector DB exports
Multi-stage Dockerfile with Python 3.12 slim base:
# Build image
docker build -t skill-seekers .
# Run CLI
docker run -v $(pwd)/output:/output skill-seekers create https://docs.example.com
# Run MCP server
docker run -p 8765:8765 skill-seekers skill-seekers-mcpThe MCP server provides Model Context Protocol integration:
# Start FastMCP server
skill-seekers-mcp
# Or use the Python module
python -m skill_seekers.mcp.server_fastmcp- API Keys: Never commit API keys to version control. Use environment variables or
.envfiles (already in.gitignore) - Docker: Runs as non-root user (
skillseeker, UID 1000) - Dependencies: Regular security updates via
pip auditorsafety check - Sandboxing: Video processing uses optional dependencies that can be heavy; install
[video-full]only when needed
- Website: https://skillseekersweb.com/
- Documentation: https://skillseekersweb.com/
- PyPI: https://pypi.org/project/skill-seekers/
- Repository: https://github.com/yusufkaraaslan/Skill_Seekers
- Config Browser: https://skillseekersweb.com/
- Project Board: https://github.com/users/yusufkaraaslan/projects/2