- Product name:
KPubData - Product type: Python data access framework / SDK
- Target runtime: Python 3.10+
- Primary users: Python developers, backend developers, data engineers, agent/MCP developers
- Product stage: architecture and bootstrap
Korean public-data APIs are difficult to use consistently because they vary by provider and service family in:
- authentication conventions
- parameter naming and validation
- response envelopes and error signaling
- representation format (JSON, XML, file, sheet, mixed)
- pagination semantics
- schema stability and metadata quality
As a result, developers repeatedly rebuild the same glue code:
- auth/key injection
- request construction
- XML/JSON parsing
- result normalization
- provider-specific error handling
- schema probing
- debugging around raw payloads
KPubData will provide a Pythonic, dataset-oriented, provider-aware framework for accessing Korean public-data services through a small canonical core, while preserving the ability to drop down to provider-specific/raw behavior when needed.
KPubData is not a single fake standard API. It is a dialect-inspired integration layer:
- a stable core defines canonical contracts
- adapters absorb provider differences
- users interact through a consistent dataset-oriented Python API
Need to integrate Korean public data into product features without learning every provider from scratch.
Need repeatable access, metadata visibility, and raw/debug paths for pipelines.
Need a stable core library so that the future MCP layer can remain thin.
Need a predictable adapter contract and a clear definition of done when adding a provider.
- Discover datasets/services without manually browsing each portal.
- Query datasets through a consistent Python entry point.
- Know which operations each dataset supports.
- Access raw payloads when normalization is insufficient.
- Add a new provider adapter without changing the whole framework.
- Dataset-oriented, provider-aware
- Small stable core, provider complexity pushed outward
- Capability-first honesty
- Canonical + raw together
- Modern Python packaging and typing discipline
- Agentic-development friendly repository structure
- provide a canonical query/result/error/capability model
- support multiple heterogeneous provider families under one public interface
- preserve provider identity and raw access
- keep the public API small and learnable
- make adapter authoring systematic
- three materially different provider/service families integrated
- each provider passes contract tests for its declared capabilities
- first successful query possible within 10 minutes for a developer with keys
- public API docs cover common discovery and query workflows
- raw access remains available in every provider adapter
- complete coverage of all Korean public-data APIs
- universal query language across all providers
- async support
- built-in scheduling/caching/warehouse loading
- full MCP server in the initial release
- full field-level semantic standardization across every provider
The framework must support explicit config and environment-based config.
Acceptance:
Client(...)Client.from_env()- provider key lookup order documented
The framework must register and retrieve provider adapters by identifier.
Acceptance:
- duplicate registration blocked
- lazy loading allowed
- adapter metadata available
Users must be able to list and search dataset descriptors.
Acceptance:
client.datasets.list()client.datasets.search(text)- each descriptor exposes
id,provider,name,representation,capabilities
Users must be able to bind a dataset reference into an object with operations.
Acceptance:
client.dataset("provider.dataset")- unknown dataset raises a clear exception
A list/query operation must return a canonical result envelope.
Acceptance:
dataset.list(...)- returns
RecordBatch - includes
items - includes
raw - includes metadata about pagination and provider
Datasets may expose schema or field metadata.
Acceptance:
dataset.schema()- returns
SchemaDescriptor | None
Every provider adapter must expose a raw-call path.
Acceptance:
dataset.call_raw(...)or provider raw call equivalent- raw payload preserved without normalization loss
Provider-specific failures must map to canonical error types.
Acceptance:
- transport, auth, parse, unsupported capability, and provider-response failures mapped explicitly
- original status/message retained in exception metadata
The framework must support multiple content and representation types through adapters.
Acceptance:
- XML and JSON in core scope
- representation metadata attached to dataset descriptors
- file/sheet/download capability represented even if not normalized in v0.1
- Python 3.10+
- type hints throughout public API
pyproject.tomlpackagingsrc/layout
- lint
- format
- typecheck
- unit tests
- contract tests
- public contracts documented
- breaking changes require changelog and versioning discipline
- raw response access
- structured exception metadata
- optional logging hooks
The API must read like Python, not like a copied portal endpoint name.
Preferred style:
client.dataset("molit.apartment_trades").list(lawd_code="11680", deal_ym="202503")Avoid as the primary public surface:
client.getRTMSDataSvcAptTradeDev(...)- core contracts
- sync transport
- 3 adapters
- XML/JSON
- raw access
- tests/docs/release skeleton
- richer discovery metadata
- plugin loading
- pandas adapter
- broader provider coverage
- thin MCP adapter
- provider scaffolding tools
- improved docs/examples
- Over-abstracting into a fake universal API
- Letting one provider's quirks contaminate the core model
- Under-investing in raw/debug paths
- Treating dataset metadata as optional rather than foundational
- Allowing the public API surface to grow too fast
- keep the canonical core intentionally small
- add capabilities instead of pretending support exists
- enforce adapter contract tests
- keep raw access mandatory
- require architecture documentation for public-surface changes
| 문서 | 설명 |
|---|---|
| ARCHITECTURE.md | 시스템 아키텍처 설계 |
| ROADMAP.md | 프로젝트 로드맵 및 단계별 계획 |
| API_SPEC.md | 파이썬 API 명세 |
| 저장소 | 문서 | 설명 |
|---|---|---|
| kpubdata-builder | PRD.md | Builder 제품 요구사항 |
| kpubdata-studio | PRD.md | Studio 제품 요구사항 |