Guidance for AI agents working in this repository. Read this before editing code.
Oletools GUI is a FastAPI single-page web app for static malware/SOC triage of uploaded Office, RTF, and PDF documents. It wraps oletools-style analyzers and a custom PDF static analyzer, then renders structured findings in the browser.
Security posture: uploaded documents are untrusted. Do not add behavior that opens documents in desktop viewers, executes embedded content, detonates samples, or extracts payloads to persistent paths unless explicitly requested and isolated.
app/
main.py FastAPI app, routes, static mount
config.py constants: paths, upload limit, timeouts, version, default tools
models.py Pydantic response models and Severity enum
analyzers/
base.py Analyzer interface and AnalyzerContext
registry.py central analyzer registration
cli_analyzer.py subprocess helper for oletools CLI modules
oleid_analyzer.py OleID wrapper
olevba_analyzer.py OleVBA wrapper
mraptor_analyzer.py MacroRaptor wrapper
objects_analyzer.py oleobj/rtfobj wrapper
pdf_static_analyzer.py custom static PDF triage logic
services/
analysis.py tool selection, per-file/bulk response assembly
filetype.py lightweight type detection / analyzer gating
risk.py risk aggregation and summary text
storage.py upload sanitization and size-limited writes
static/
index.html UI shell/templates
app.js client-side upload, API calls, result rendering
styles.css UI styling
uploads/.gitkeep upload directory placeholder; uploaded files are runtime data
Dockerfile.kasm Kasm workspace container build
kasm-workspace.json Kasm workspace metadata
requirements.txt pinned Python runtime dependencies
cd /home/ubuntu/ole-gui
python3 -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
uvicorn app.main:app --reload --host 0.0.0.0 --port 8081Open http://localhost:8081.
The Kasm container uses port 8085 via OLE_GUI_PORT; local README examples use 8081.
There is no dedicated test suite in this repo right now. Before finishing Python changes, run at least:
cd /home/ubuntu/ole-gui
. .venv/bin/activate 2>/dev/null || true
python -m compileall app
python - <<'PY'
from app.main import app
from app.analyzers.registry import ANALYZERS
print(app.title, sorted(ANALYZERS))
PYFor frontend-only changes, at minimum load the app in a browser or review static/app.js for syntax issues. If a server is needed, start it with:
uvicorn app.main:app --host 127.0.0.1 --port 8081All analyzers implement Analyzer.analyze(context) -> AnalyzerResult from app/analyzers/base.py.
To add a new analyzer:
- Create
app/analyzers/<name>_analyzer.py. - Set stable
key, human label, and description. - Return
AnalyzerResultwith:status:ok,skipped, orerror.- concise
summary. - structured
findings: list[Finding]. - optional
datafor tables/detail rendering. - optional
raw_outputfor CLI output.
- Register an instance in
app/analyzers/registry.py. - Update
DEFAULT_TOOLSinapp/config.pyif it should run by default. - Update
auto_tools_for_filename()inapp/services/analysis.pyand matching client-side arrays instatic/app.jsif auto-selection changes. - Update README and this file if the workflow changes.
Keep analyzer keys stable because the UI and API use them.
Important endpoints:
GET /servesstatic/index.html.GET /api/versionreturns{version}.GET /api/toolsreturns registered analyzer metadata.POST /api/analyzeanalyzes one file.POST /api/analyze/bulkanalyzes multiple files and is what the UI currently uses.
Response shapes are defined in app/models.py. If you change response data, update both backend models and static/app.js rendering.
- Uploaded files are saved by
app/services/storage.pyunderuploads/with UUID-prefixed sanitized names. - Max upload size is
MAX_UPLOAD_BYTESinapp/config.py. - Treat every upload as hostile binary data.
- Avoid logging secrets/passwords.
zip_passwordandoffice_passwordare user-provided analysis options. - Do not extract embedded files to durable storage by default. Prefer in-memory inspection with hashes, magic, first bytes, and previews.
- Do not submit samples to external scanners/services without explicit user approval.
- Keep static analysis deterministic and offline.
app/analyzers/pdf_static_analyzer.py is a large custom analyzer. Be careful with broad refactors.
Current responsibilities include:
- PDF magic/version detection.
- hash, entropy, metadata, URI extraction.
- keyword counts with hex-obfuscated PDF name handling.
/OpenActionclassification: distinguish benign destination/view actions from action dictionaries./AAadditional action decoding on pages, annotations, and form fields.- PyMuPDF embedded-file inspection in memory.
- encrypted/password-gated document handling.
- incremental update marker detection.
When changing PDF findings:
- Prefer precise evidence over generic keyword alerts.
- For
/OpenActionand/AA, show action type, event/owner context, page/object reference when available, and risk classification. - For
/ObjStm, do not stop at a generic “/ObjStmpresent” notice. Show useful details such as object stream object number, byte offset,/N,/First,/Length, filters, stream size, and embedded object previews when safely available. - Do not classify ordinary navigation/view actions as high risk just because an action key exists.
- Do not rely solely on raw keyword counts for verdicts when parser-level details are available.
- CLI-backed analyzers should use
run_module()fromapp/analyzers/cli_analyzer.pyso timeouts and Python executable selection stay consistent. ANALYZER_TIMEOUT_SECONDSis currently 45 seconds.- PDF files should generally skip Office/OLE analyzers.
- RTF handling differs:
ObjectsAnalyzerusesrtfobjfor.rtf, otherwiseoleobj. - Macro analyzers use
should_run_macro_analyzers()to avoid noisy macro findings on non-document inputs.
static/app.js mirrors some backend constants:
pdfToolsofficeToolsallTools- PDF/Office extension sets
- displayed fallback app version
If backend analyzer keys or auto-selection rules change, update static/app.js too.
Rendering is intentionally simple vanilla JS. Escape untrusted strings with escapeHtml() before putting them in innerHTML.
Severity values are: info, low, medium, high, error.
Aggregation lives in app/services/risk.py. error currently ranks like medium. Keep summaries cautious: “no obvious malicious indicators” does not mean safe.
- Python: type hints, small helpers, explicit structured dictionaries for analyzer
data. - Keep analyzer outputs JSON-serializable.
- Prefer adding focused helper functions over deeply nested logic.
- Preserve backward-compatible response fields when possible.
- Avoid sweeping formatting-only changes in
pdf_static_analyzer.py; it is long and easier to review with targeted diffs. - Keep UI dependency-free unless there is a strong reason to add a build step.
- Forgetting to register a new analyzer in
registry.py. - Updating backend auto-selection but not frontend auto-selection.
- Returning non-serializable objects from analyzer
dataorFinding.value. - Treating uploaded filename extension as authoritative; use
filetype.pymagic checks where possible. - Letting subprocess analyzers run without timeout.
- Storing extracted embedded payloads on disk during normal analysis.
- Overstating safety. Static analysis can miss malicious behavior.
Dockerfile.kasm builds on kasmweb/firefox:1.18.0, installs requirements into /opt/ole-gui/.venv, copies app, static, and docker/kasm/startup.sh, and exposes 8085.
If runtime port/startup behavior changes, update:
Dockerfile.kasmdocker/kasm/startup.shkasm-workspace.json- README