Editor Assistant is an AI-powered Python CLI and library for turning research papers, articles, web pages, and converted documents into briefs, outlines, and translations with LLMs. It is built for personal research workflows, but the core client and config modules can also be imported by other Python projects.
Version: 0.5.1
- Async processing: Uses
asyncioandhttpxfor concurrent conversion and LLM calls. - Unified CLI:
brief,outline,translate,process,batch,convert,clean,history,stats,show,resume, andexport. - Typed source inputs:
briefandprocessaccept typed sources such aspaper=...andnews=...; multiple supplied inputs are processed independently. - SQLite run history: Runs, inputs, outputs, and token usage are saved to a local database.
- Optional file outputs: Add
--save-filesto write generated markdown and token reports to disk. - Multi-provider models: DeepSeek, Gemini, Qwen, GLM/Zhipu, Doubao, OpenAI via OpenRouter, and Anthropic via OpenRouter.
- Python 3.10+
uvfor project installation and command execution- API keys for the model provider(s) you use
git clone https://github.com/lanmogu98/editor-assistant.git
cd editor-assistant
uv syncFor a runtime-only environment:
uv sync --no-devWhen running from the source checkout, prefer uv run:
uv run editor-assistant --helpIf you activate the project virtual environment or install the package elsewhere, the primary console scripts are also available directly as editor-assistant and any2md.
Set only the API keys for providers you plan to use:
# DeepSeek via Volcengine
export DEEPSEEK_API_KEY_VOLC=your_volcengine_api_key
# DeepSeek official API
export DEEPSEEK_API_KEY=your_deepseek_api_key
# Gemini paid API key
export GEMINI_API_KEY=your_gemini_api_key
# Gemini AI Studio free-tier key
export GEMINI_FT_API_KEY=your_gemini_free_tier_api_key
# Qwen via Alibaba Bailian / DashScope-compatible endpoint
export QWEN_API_KEY=your_qwen_api_key
# GLM via Zhipu AI
export ZHIPU_API_KEY=your_zhipu_api_key
# GLM via OpenRouter
export ZHIPU_API_KEY_OPENROUTER=your_openrouter_api_key
# Doubao via Volcengine
export DOUBAO_API_KEY=your_doubao_api_key
# OpenAI models via OpenRouter
export OPENAI_API_KEY_OPENROUTER=your_openrouter_api_key
# Anthropic models via OpenRouter
export ANTHROPIC_API_KEY_OPENROUTER=your_openrouter_api_keyRun history is stored in ~/.editor_assistant/runs.db by default. Set EDITOR_ASSISTANT_DB_DIR to use a different directory.
All examples below use uv run because this is the recommended source-checkout workflow.
brief supports one or more typed sources. Valid source types are paper and news. When you pass multiple sources, each source is processed independently; the current CLI does not merge paper and news into a single prompt.
uv run editor-assistant brief paper=https://example.com/research-article
uv run editor-assistant brief paper=paper-a.pdf paper=paper-b.pdf --model deepseek-r1 --debug
uv run editor-assistant brief paper=paper.pdf --save-filesoutline accepts a single input path or URL.
uv run editor-assistant outline https://arxiv.org/paper.pdf
uv run editor-assistant outline paper.pdf --model deepseek-r1
uv run editor-assistant outline paper.pdf --save-filestranslate accepts a single input path or URL and creates Chinese output. The translation task also stores a bilingual output variant.
uv run editor-assistant translate https://arxiv.org/paper.pdf
uv run editor-assistant translate document.pdf --model gemini-2.5-flash-free
uv run editor-assistant translate research.md --model deepseek-r1 --debug
uv run editor-assistant translate research.md --save-filesprocess uses the same typed source format as brief. It runs the selected tasks serially for each supplied source, so choose tasks that make sense for every source you pass.
uv run editor-assistant process paper=paper.pdf --tasks brief,outline
uv run editor-assistant process paper=paper-a.pdf paper=paper-b.pdf --tasks brief --no-stream
uv run editor-assistant process paper=paper.pdf --tasks brief,outline --no-stream --save-filesbatch processes files in a directory concurrently with one task.
uv run editor-assistant batch ./docs/ --ext .md --task translate
uv run editor-assistant batch ./papers/ --ext .pdf --task brief --model deepseek-v3.2
uv run editor-assistant batch ./papers/ --ext .html --task outline --save-filesuv run editor-assistant convert document.pdf
uv run editor-assistant convert *.docx
uv run editor-assistant clean "https://example.com/page.html" -o clean.md
uv run editor-assistant clean page.html --stdout
# Utility console scripts are also available through uv:
uv run any2md *.docx -o converted/uv run editor-assistant history
uv run editor-assistant history -n 50
uv run editor-assistant history --search "arxiv"
uv run editor-assistant stats
uv run editor-assistant stats -d 30
uv run editor-assistant show 1
uv run editor-assistant show 1 --outputuv run editor-assistant resume --dry-run
uv run editor-assistant resume --save-files
uv run editor-assistant export history.json
uv run editor-assistant export history.csv --limit 100These options are available on generation commands such as brief, outline, translate, process, and batch:
--model: Choose an LLM model. Default:glm-4.7-or.--thinking: Reasoning level for supported Gemini models:low,medium, orhigh.--no-stream: Disable streaming output.--save-files: Persist generated markdown files and token reports to disk. The SQLite run database is still updated either way.--debug: Enable detailed debug logging.
Global options:
--version: Show version information.--help: Show CLI help. Use subcommand help, such asuv run editor-assistant brief --help, to see current model choices.
Model names are loaded from src/editor_assistant/config/llm_config.yml, which is the source of truth. Use uv run editor-assistant brief --help to see the current --model choices.
Current default model: glm-4.7-or.
- Volcengine (
DEEPSEEK_API_KEY_VOLC):deepseek-v3.2,deepseek-r1 - Official API (
DEEPSEEK_API_KEY):deepseek-v4-flash,deepseek-v4-pro
- Paid key (
GEMINI_API_KEY):gemini-3-flash,gemini-3.1-flash-lite,gemini-3.1-pro - Free-tier key (
GEMINI_FT_API_KEY):gemini-2.5-flash-free,gemini-2.5-flash-lite-free,gemini-3-flash-free,gemini-3.1-flash-lite-free
qwen-turbo,qwen-plus,qwen3.5-plus,qwen3-max-preview,qwen3-max
- Zhipu API (
ZHIPU_API_KEY):glm-4.5,glm-4.6,glm-4.7,glm-5,glm-5.1 - OpenRouter (
ZHIPU_API_KEY_OPENROUTER):glm-4.5-or,glm-4.6-or,glm-4.7-or,glm-5-or,glm-5.1-or,glm-5-turbo-or
doubao-seed-1.6
gpt-4o-or,gpt-4.1-or,gpt-5-or,gpt-5.2-or,gpt-5.4-or,gpt-5.5-or
claude-sonnet-4-or,claude-opus-4.6-or,claude-sonnet-4.6-or,claude-opus-4.7-or,claude-haiku-4.5-or
import asyncio
from editor_assistant.data_models import Input, InputType, ProcessType
from editor_assistant.main import EditorAssistant
async def main():
assistant = EditorAssistant("glm-4.7-or", debug_mode=True)
await assistant.process_multiple(
[Input(type=InputType.PAPER, path="paper.pdf")],
ProcessType.OUTLINE,
)
await assistant.process_multiple(
[Input(type=InputType.PAPER, path="paper.pdf")],
ProcessType.BRIEF,
)
if __name__ == "__main__":
asyncio.run(main())The converter supports common document and web formats through MarkItDown plus local HTML extraction helpers:
- Documents: PDF, DOCX, DOC, PPTX, PPT, XLSX, XLS, EPUB
- Web content: HTML files and URLs
- Text/data: TXT, MD, CSV, JSON, XML, ZIP
- Media formats supported by MarkItDown, such as images and audio, may also work depending on installed extras
- Run history, inputs, outputs, and token usage are stored in SQLite.
- Default database:
~/.editor_assistant/runs.db - Override database directory:
EDITOR_ASSISTANT_DB_DIR=/path/to/dir - With
--save-files, generated files are written next to the source/converted markdown underllm_summaries/<model>/.
llm_summaries/
└── <model>/
├── response_<title><task_suffix>_<model>_<timestamp>.md
├── response_bilingual_<title>_translate_<model>_<timestamp>.md # translate only
└── token_usage_<title><task_suffix>_<model>_<timestamp>.txt
uv sync
uv run pytest tests/unit/
uv run flake8 src/
uv run mypy src/
uv run black src/ tests/See DEVELOPER_GUIDE.md for architecture details and docs/design_docs/, docs/decisions/, and docs/reports/ for supporting design notes and reports.
Important current CLI conventions:
- Source checkout commands should generally use
uv run .... briefandprocessrequire typed source arguments:paper=...ornews=...; multiple supplied inputs are processed independently.outlineandtranslatetake a single plain input path or URL.- Generated responses are saved to SQLite by default; file output is opt-in with
--save-files. - The default model is
glm-4.7-or.
Older v0.1 syntax such as --article paper:paper.pdf is no longer supported.
This project is licensed under the MIT License.
- Microsoft MarkItDown for document conversion
- Readabilipy and Trafilatura for web content extraction
- DeepSeek, Google Gemini, Qwen, GLM/Zhipu, Doubao, OpenAI, Anthropic, and OpenRouter for LLM capabilities
Editor Assistant 是一个 AI 驱动的 Python CLI 和库,用于把研究论文、文章、网页和转换后的文档处理成简讯、大纲和翻译。它主要服务个人研究工作流,也可以被其他 Python 项目复用其 LLM client 和配置模块。
版本:0.5.1
- 异步处理:基于
asyncio和httpx,支持并发转换和 LLM 请求。 - 统一 CLI:包含
brief、outline、translate、process、batch、convert、clean、history、stats、show、resume、export。 - 带类型输入:
brief和process支持paper=...、news=...这类带类型的输入;传入多个输入时会逐个独立处理。 - SQLite 历史记录:运行记录、输入、输出和 token 用量会写入本地数据库。
- 可选文件输出:使用
--save-files才会把生成的 Markdown 和 token 报告写到磁盘。 - 多模型提供商:支持 DeepSeek、Gemini、Qwen、GLM/智谱、Doubao、OpenRouter 上的 OpenAI 和 Anthropic 模型。
- Python 3.10+
- 使用
uv安装和运行项目 - 至少配置一个要使用的模型提供商 API key
git clone https://github.com/lanmogu98/editor-assistant.git
cd editor-assistant
uv sync仅安装运行依赖:
uv sync --no-dev在源码目录运行时,推荐使用 uv run:
uv run editor-assistant --help如果已经激活项目虚拟环境,或把包安装到了其他环境中,也可以直接使用 editor-assistant 和 any2md 这些主要 console scripts。
只需要设置你实际使用的模型提供商 API key:
# DeepSeek via Volcengine
export DEEPSEEK_API_KEY_VOLC=your_volcengine_api_key
# DeepSeek official API
export DEEPSEEK_API_KEY=your_deepseek_api_key
# Gemini paid API key
export GEMINI_API_KEY=your_gemini_api_key
# Gemini AI Studio free-tier key
export GEMINI_FT_API_KEY=your_gemini_free_tier_api_key
# Qwen via Alibaba Bailian / DashScope-compatible endpoint
export QWEN_API_KEY=your_qwen_api_key
# GLM via Zhipu AI
export ZHIPU_API_KEY=your_zhipu_api_key
# GLM via OpenRouter
export ZHIPU_API_KEY_OPENROUTER=your_openrouter_api_key
# Doubao via Volcengine
export DOUBAO_API_KEY=your_doubao_api_key
# OpenAI models via OpenRouter
export OPENAI_API_KEY_OPENROUTER=your_openrouter_api_key
# Anthropic models via OpenRouter
export ANTHROPIC_API_KEY_OPENROUTER=your_openrouter_api_key运行历史默认保存到 ~/.editor_assistant/runs.db。如需更换目录,可以设置 EDITOR_ASSISTANT_DB_DIR。
下面的示例都使用 uv run,适用于源码 checkout 中的日常使用。
brief 支持一个或多个带类型的来源,类型为 paper 或 news。传入多个来源时,每个来源会被独立处理;当前 CLI 不会把 paper 和 news 合并到同一个 prompt。
uv run editor-assistant brief paper=https://example.com/research-article
uv run editor-assistant brief paper=paper-a.pdf paper=paper-b.pdf --model deepseek-r1 --debug
uv run editor-assistant brief paper=paper.pdf --save-filesoutline 接收单个输入路径或 URL。
uv run editor-assistant outline https://arxiv.org/paper.pdf
uv run editor-assistant outline paper.pdf --model deepseek-r1
uv run editor-assistant outline paper.pdf --save-filestranslate 接收单个输入路径或 URL,生成中文译文,并同时保存双语对照输出。
uv run editor-assistant translate https://arxiv.org/paper.pdf
uv run editor-assistant translate document.pdf --model gemini-2.5-flash-free
uv run editor-assistant translate research.md --model deepseek-r1 --debug
uv run editor-assistant translate research.md --save-filesprocess 使用与 brief 相同的带类型输入格式。它会对每个来源串行执行所选任务,所以请只传入适合这些任务的来源。
uv run editor-assistant process paper=paper.pdf --tasks brief,outline
uv run editor-assistant process paper=paper-a.pdf paper=paper-b.pdf --tasks brief --no-stream
uv run editor-assistant process paper=paper.pdf --tasks brief,outline --no-stream --save-filesbatch 会对目录中的文件并发执行一个指定任务。
uv run editor-assistant batch ./docs/ --ext .md --task translate
uv run editor-assistant batch ./papers/ --ext .pdf --task brief --model deepseek-v3.2
uv run editor-assistant batch ./papers/ --ext .html --task outline --save-filesuv run editor-assistant convert document.pdf
uv run editor-assistant convert *.docx
uv run editor-assistant clean "https://example.com/page.html" -o clean.md
uv run editor-assistant clean page.html --stdout
uv run any2md *.docx -o converted/uv run editor-assistant history
uv run editor-assistant history -n 50
uv run editor-assistant history --search "arxiv"
uv run editor-assistant stats
uv run editor-assistant stats -d 30
uv run editor-assistant show 1
uv run editor-assistant show 1 --outputuv run editor-assistant resume --dry-run
uv run editor-assistant resume --save-files
uv run editor-assistant export history.json
uv run editor-assistant export history.csv --limit 100以下选项适用于 brief、outline、translate、process、batch 等生成命令:
--model:选择 LLM 模型。默认值:glm-4.7-or。--thinking:支持的 Gemini 模型推理强度,可选low、medium、high。--no-stream:关闭流式输出。--save-files:把生成的 Markdown 文件和 token 报告写入磁盘。无论是否启用,SQLite 数据库都会更新。--debug:启用详细调试日志。
全局选项:
--version:显示版本。--help:显示帮助。使用子命令帮助,例如uv run editor-assistant brief --help,可以查看当前模型选择列表。
模型名称来自 src/editor_assistant/config/llm_config.yml,这是当前单一事实来源。可用 uv run editor-assistant brief --help 查看最新 --model 选项。
当前默认模型:glm-4.7-or。
- 火山引擎 (
DEEPSEEK_API_KEY_VOLC):deepseek-v3.2、deepseek-r1 - 官方 API (
DEEPSEEK_API_KEY):deepseek-v4-flash、deepseek-v4-pro
- 付费 key (
GEMINI_API_KEY):gemini-3-flash、gemini-3.1-flash-lite、gemini-3.1-pro - 免费层 key (
GEMINI_FT_API_KEY):gemini-2.5-flash-free、gemini-2.5-flash-lite-free、gemini-3-flash-free、gemini-3.1-flash-lite-free
qwen-turbo、qwen-plus、qwen3.5-plus、qwen3-max-preview、qwen3-max
- 智谱 API (
ZHIPU_API_KEY):glm-4.5、glm-4.6、glm-4.7、glm-5、glm-5.1 - OpenRouter (
ZHIPU_API_KEY_OPENROUTER):glm-4.5-or、glm-4.6-or、glm-4.7-or、glm-5-or、glm-5.1-or、glm-5-turbo-or
doubao-seed-1.6
gpt-4o-or、gpt-4.1-or、gpt-5-or、gpt-5.2-or、gpt-5.4-or、gpt-5.5-or
claude-sonnet-4-or、claude-opus-4.6-or、claude-sonnet-4.6-or、claude-opus-4.7-or、claude-haiku-4.5-or
import asyncio
from editor_assistant.data_models import Input, InputType, ProcessType
from editor_assistant.main import EditorAssistant
async def main():
assistant = EditorAssistant("glm-4.7-or", debug_mode=True)
await assistant.process_multiple(
[Input(type=InputType.PAPER, path="paper.pdf")],
ProcessType.OUTLINE,
)
await assistant.process_multiple(
[Input(type=InputType.PAPER, path="paper.pdf")],
ProcessType.BRIEF,
)
if __name__ == "__main__":
asyncio.run(main())转换功能基于 MarkItDown 和本项目的 HTML 提取工具,支持常见文档和网页格式:
- 文档:PDF、DOCX、DOC、PPTX、PPT、XLSX、XLS、EPUB
- 网页内容:HTML 文件和 URL
- 文本/数据:TXT、MD、CSV、JSON、XML、ZIP
- MarkItDown extras 支持的图片、音频等媒体格式也可能可用
- 运行历史、输入、输出和 token 用量会保存到 SQLite。
- 默认数据库:
~/.editor_assistant/runs.db - 自定义数据库目录:
EDITOR_ASSISTANT_DB_DIR=/path/to/dir - 使用
--save-files时,生成文件会写到输入/转换后的 Markdown 旁边的llm_summaries/<model>/。
llm_summaries/
└── <model>/
├── response_<title><task_suffix>_<model>_<timestamp>.md
├── response_bilingual_<title>_translate_<model>_<timestamp>.md # 仅 translate
└── token_usage_<title><task_suffix>_<model>_<timestamp>.txt
uv sync
uv run pytest tests/unit/
uv run flake8 src/
uv run mypy src/
uv run black src/ tests/架构细节见 DEVELOPER_GUIDE.md,设计记录和报告见 docs/design_docs/、docs/decisions/、docs/reports/。
当前 CLI 约定:
- 源码目录中建议使用
uv run ...。 brief和process使用paper=...或news=...这类带类型输入;传入多个输入时会逐个独立处理。outline和translate接收单个普通路径或 URL。- 生成结果默认写入 SQLite;文件输出需要显式加
--save-files。 - 默认模型为
glm-4.7-or。
旧版 v0.1 的 --article paper:paper.pdf 语法已经不再支持。
本项目使用 MIT License。
- Microsoft MarkItDown 提供文档转换能力
- Readabilipy 和 Trafilatura 提供网页内容提取
- DeepSeek、Google Gemini、Qwen、GLM/智谱、Doubao、OpenAI、Anthropic、OpenRouter 提供 LLM 能力