Building a filesystem-backed retrieval workflow similar to OpenViking — looking to exchange architecture experience #2151

No8ah · 2026-05-20T17:15:33Z

No8ah
May 20, 2026

Hi OpenViking team,

I’m ZinanOu, currently building a local knowledge workflow with an architecture that feels quite similar to OpenViking.

Some overlapping ideas include:

directory-based indexing
hierarchical retrieval
reusable context blocks
metadata-driven routing
context reuse / cache-oriented organization

My use case is more document / math knowledge workflow oriented, but I found many architectural ideas close to OpenViking’s retrieval and context design.

I’ve been documenting part of my work here:
https://no8ah.cn

I’d really love to exchange some architecture design experience, especially around:

directory semantics vs semantic retrieval tradeoffs
metadata normalization across multiple index dimensions
context reuse / cache strategy
retrieval layering and context loading design

@zhoujh01 if this overlaps with retrieval architecture, I’d really appreciate any thoughts or experience you’d be willing to share.

Thanks — looking forward to learning from the team.

No8ah · 2026-05-22T09:38:46Z

No8ah
May 22, 2026
Author

I’ve been thinking about a hybrid context workflow that may balance both retrieval efficiency and reasoning quality.

One possible architecture:

DeepSeek V4 Pro / Flash
→ used for large-scale context traversal, cache building, and framework-level retrieval

GPT / Claude
→ used for deeper reasoning, abstraction, proof generation, and high-quality content synthesis

The intuition is that these models may have different strengths:

DeepSeek-style models can be cost-efficient for long-context scanning, repeated retrieval, shared-prefix reuse, and cache-oriented workflows.
GPT / Claude may be stronger for higher-level reasoning, non-standard derivation, abstraction, and content quality.

So instead of asking one model to do everything, the pipeline becomes:

Context Build / Index / Cache
→ DeepSeek

Framework Search / Broad Traversal
→ DeepSeek

Deep Reasoning / Synthesis / Final Output
→ GPT / Claude

This may help balance:

retrieval breadth
cache efficiency
token cost
reasoning depth
output quality

In a way, this feels less like “model comparison” and more like:

Hybrid Model Routing for Long Context Systems.

Curious whether others building context-heavy workflows (RAG / memory / agent / knowledge systems) are exploring similar split-routing strategies.

2 replies

No8ah May 22, 2026
Author

One interesting result from my long-context workflow:

Retrieval cache hit rate can reach 98%+ at the framework/search layer.

Then GPT is mainly used for reasoning on top of cached context, where cache hit rate can still remain stable at 90%+.

So instead of rebuilding context every time, the system becomes:

Framework Retrieval → Cache
Deep Reasoning → Cache-aware Thinking

This significantly improves both cost efficiency and long-context stability.

Feels increasingly like:
Search in cache, think in cache.

No8ah May 22, 2026
Author

DS V4 Pro for retrieval/cache, Claude for reasoning.

~90M tokens at ~3–4 RMB.

Feels increasingly like:

retrieve in cache, reason on top.

No8ah · 2026-05-22T09:45:45Z

No8ah
May 22, 2026
Author

Another observation from my long-context workflow:

When context gets compressed, the system does not necessarily lose the retrieval framework.

Repeated structural signals such as:

directory hierarchy
chapter / section paths
subfile dependency graph
Notes ↔ Homework mapping
reusable block structure

tend to survive compression because they appear repeatedly and function as high-weight context signals.

So the retrieval framework itself gradually becomes part of the compressed long-term context.

This means context compression is not only about reducing tokens.
It can also preserve the stable routing structure that future retrieval depends on.

In other words:

Index structure can become compression-resistant if it is repeated, stable, and central enough.

0 replies

No8ah · 2026-05-23T03:53:02Z

No8ah
May 23, 2026
Author

learning from the team.

I also put together a small architecture sketch here:

https://no8ah.cn/ai-usage

Mostly around cache, retrieval, compression, and long-context reasoning.

Happy to exchange ideas if anyone is exploring similar workflows.

0 replies

No8ah · 2026-05-23T05:13:27Z

No8ah
May 23, 2026
Author

This architecture also seems to pair well with Windsurf Inline / Copilot Inline.

Since they mainly operate on short local context windows, stable file/block structures make generated outputs much more pattern-consistent.

In practice, local code similarity within a context segment becomes very high.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Building a filesystem-backed retrieval workflow similar to OpenViking — looking to exchange architecture experience #2151

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Building a filesystem-backed retrieval workflow similar to OpenViking — looking to exchange architecture experience #2151

Uh oh!

No8ah May 20, 2026

Replies: 4 comments · 2 replies

Uh oh!

No8ah May 22, 2026 Author

Uh oh!

No8ah May 22, 2026 Author

Uh oh!

No8ah May 22, 2026 Author

Uh oh!

No8ah May 22, 2026 Author

Uh oh!

No8ah May 23, 2026 Author

Uh oh!

No8ah May 23, 2026 Author

No8ah
May 20, 2026

Replies: 4 comments 2 replies

No8ah
May 22, 2026
Author

No8ah May 22, 2026
Author

No8ah May 22, 2026
Author

No8ah
May 22, 2026
Author

No8ah
May 23, 2026
Author

No8ah
May 23, 2026
Author