PRD — KPubData

1. Document information

Product name: KPubData
Product type: Python data access framework / SDK
Target runtime: Python 3.10+
Primary users: Python developers, backend developers, data engineers, agent/MCP developers
Product stage: architecture and bootstrap

2. Problem statement

Korean public-data APIs are difficult to use consistently because they vary by provider and service family in:

authentication conventions
parameter naming and validation
response envelopes and error signaling
representation format (JSON, XML, file, sheet, mixed)
pagination semantics
schema stability and metadata quality

As a result, developers repeatedly rebuild the same glue code:

auth/key injection
request construction
XML/JSON parsing
result normalization
provider-specific error handling
schema probing
debugging around raw payloads

3. Product vision

KPubData will provide a Pythonic, dataset-oriented, provider-aware framework for accessing Korean public-data services through a small canonical core, while preserving the ability to drop down to provider-specific/raw behavior when needed.

4. Product definition

KPubData is not a single fake standard API. It is a dialect-inspired integration layer:

a stable core defines canonical contracts
adapters absorb provider differences
users interact through a consistent dataset-oriented Python API

5. Target users

5.1 Application developers

Need to integrate Korean public data into product features without learning every provider from scratch.

5.2 Data engineers / analysts

Need repeatable access, metadata visibility, and raw/debug paths for pipelines.

5.3 Agent / MCP developers

Need a stable core library so that the future MCP layer can remain thin.

5.4 Contributors

Need a predictable adapter contract and a clear definition of done when adding a provider.

6. Jobs to be done

Discover datasets/services without manually browsing each portal.
Query datasets through a consistent Python entry point.
Know which operations each dataset supports.
Access raw payloads when normalization is insufficient.
Add a new provider adapter without changing the whole framework.

7. Product principles

Dataset-oriented, provider-aware
Small stable core, provider complexity pushed outward
Capability-first honesty
Canonical + raw together
Modern Python packaging and typing discipline
Agentic-development friendly repository structure

8. Goals

8.1 Product goals

provide a canonical query/result/error/capability model
support multiple heterogeneous provider families under one public interface
preserve provider identity and raw access
keep the public API small and learnable
make adapter authoring systematic

8.2 Success criteria for the first meaningful release

three materially different provider/service families integrated
each provider passes contract tests for its declared capabilities
first successful query possible within 10 minutes for a developer with keys
public API docs cover common discovery and query workflows
raw access remains available in every provider adapter

9. Non-goals for v0.1

complete coverage of all Korean public-data APIs
universal query language across all providers
async support
built-in scheduling/caching/warehouse loading
full MCP server in the initial release
full field-level semantic standardization across every provider

10. Functional requirements

FR-1. Client initialization

The framework must support explicit config and environment-based config.

Acceptance:

Client(...)
Client.from_env()
provider key lookup order documented

FR-2. Provider registry

The framework must register and retrieve provider adapters by identifier.

Acceptance:

duplicate registration blocked
lazy loading allowed
adapter metadata available

FR-3. Dataset discovery

Users must be able to list and search dataset descriptors.

Acceptance:

client.datasets.list()
client.datasets.search(text)
each descriptor exposes id, provider, name, representation, capabilities

FR-4. Dataset binding

Users must be able to bind a dataset reference into an object with operations.

Acceptance:

client.dataset("provider.dataset")
unknown dataset raises a clear exception

FR-5. Record listing/query

A list/query operation must return a canonical result envelope.

Acceptance:

dataset.list(...)
returns RecordBatch
includes items
includes raw
includes metadata about pagination and provider

FR-6. Schema/metadata access

Datasets may expose schema or field metadata.

Acceptance:

dataset.schema()
returns SchemaDescriptor | None

FR-7. Raw access

Every provider adapter must expose a raw-call path.

Acceptance:

dataset.call_raw(...) or provider raw call equivalent
raw payload preserved without normalization loss

FR-9. Error normalization

Provider-specific failures must map to canonical error types.

Acceptance:

transport, auth, parse, unsupported capability, and provider-response failures mapped explicitly
original status/message retained in exception metadata

FR-10. Representation handling

The framework must support multiple content and representation types through adapters.

Acceptance:

XML and JSON in core scope
representation metadata attached to dataset descriptors
file/sheet/download capability represented even if not normalized in v0.1

11. Non-functional requirements

NFR-1. Python quality bar

Python 3.10+
type hints throughout public API
pyproject.toml packaging
src/ layout

NFR-2. Quality gates

lint
format
typecheck
unit tests
contract tests

NFR-3. Public API stability

public contracts documented
breaking changes require changelog and versioning discipline

NFR-4. Observability and debugging

raw response access
structured exception metadata
optional logging hooks

12. UX requirements

The API must read like Python, not like a copied portal endpoint name.

Preferred style:

client.dataset("molit.apartment_trades").list(lawd_code="11680", deal_ym="202503")

Avoid as the primary public surface:

client.getRTMSDataSvcAptTradeDev(...)

13. Release plan

v0.1

core contracts
sync transport
3 adapters
XML/JSON
raw access
tests/docs/release skeleton

v0.2

richer discovery metadata
plugin loading
pandas adapter
broader provider coverage

v0.3

thin MCP adapter
provider scaffolding tools
improved docs/examples

14. Risks

Over-abstracting into a fake universal API
Letting one provider's quirks contaminate the core model
Under-investing in raw/debug paths
Treating dataset metadata as optional rather than foundational
Allowing the public API surface to grow too fast

15. Mitigations

keep the canonical core intentionally small
add capabilities instead of pretending support exists
enforce adapter contract tests
keep raw access mandatory
require architecture documentation for public-surface changes

문서	설명
ARCHITECTURE.md	시스템 아키텍처 설계
ROADMAP.md	프로젝트 로드맵 및 단계별 계획
API_SPEC.md	파이썬 API 명세

저장소	문서	설명
kpubdata-builder	PRD.md	Builder 제품 요구사항
kpubdata-studio	PRD.md	Studio 제품 요구사항

FilesExpand file tree

PRD.md

Latest commit

History

PRD.md

File metadata and controls

PRD — KPubData

1. Document information

2. Problem statement

3. Product vision

4. Product definition

5. Target users

5.1 Application developers

5.2 Data engineers / analysts

5.3 Agent / MCP developers

5.4 Contributors

6. Jobs to be done

7. Product principles

8. Goals

8.1 Product goals

8.2 Success criteria for the first meaningful release

9. Non-goals for v0.1

10. Functional requirements

FR-1. Client initialization

FR-2. Provider registry

FR-3. Dataset discovery

FR-4. Dataset binding

FR-5. Record listing/query

FR-6. Schema/metadata access

FR-7. Raw access

FR-9. Error normalization

FR-10. Representation handling

11. Non-functional requirements

NFR-1. Python quality bar

NFR-2. Quality gates

NFR-3. Public API stability

NFR-4. Observability and debugging

12. UX requirements

13. Release plan

v0.1

v0.2

v0.3

14. Risks

15. Mitigations

관련 문서

이 저장소 내 문서

KPubData Product Family