Is your feature request related to a problem?
Daft currently supports several object-storage backends (S3, GCS, Azure, OSS, COS, OBS, HuggingFace, etc.) through native clients or via OpenDAL. However, there is no first-class support for GooseFS — Tencent Cloud's distributed caching/acceleration filesystem that is widely used in front of COS for big-data and AI workloads.
Users running Daft on GooseFS-backed datasets currently have to:
- Mount GooseFS as a local filesystem and use the
local:// scheme, which loses cloud-native semantics (credentials, multi-node access, prefix listing, etc.); or
- Pre-copy data to S3/COS, which defeats the purpose of using GooseFS as a cache layer.
OpenDAL 0.57.0 has shipped a services-goosefs backend, so Daft can now expose GooseFS natively through the same OpenDALSource machinery used by OSS / COS / OBS, with minimal new code.
Describe the solution you'd like
Add a new goosefs:// URL scheme to Daft I/O, mirroring the existing COS / OBS integrations:
-
Config
- New
daft.io.GooseFSConfig (Python) / common_io_config::GooseFSConfig (Rust) with fields:
endpoint (GooseFS master endpoint, e.g. http://master:9200)
root (optional root path inside the namespace)
anonymous (bool, default false)
- Wired into
IOConfig next to cos, obs, oss, etc., with full __init__ / replace / __repr__ / pickling / Python bindings.
-
I/O source
- Register
goosefs in OpenDALSource::available_schemes().
- Build the operator via OpenDAL's
Goosefs service (opendal = "0.57", feature services-goosefs).
- Route
goosefs://... URIs in daft-io's scheme dispatcher to the new source.
-
SQL / catalog
- Recognize
goosefs:// in URL parsing so read_parquet / read_csv / read_json / write_* work out of the box.
-
Tests & docs
- Unit tests for config round-tripping (Rust + Python).
- Add
goosefs to the supported-schemes table in the I/O docs.
Describe alternatives you've considered
- Mount GooseFS via FUSE and read through
local:// — works, but requires every Daft worker/node to mount the filesystem, and it bypasses GooseFS's native client optimizations (locality hints, async prefetch, credential handling).
- Use the COS backend pointing at the underlying bucket — loses the GooseFS cache benefits, which is the main reason users adopt GooseFS in the first place.
- Implement a fully custom GooseFS client in Daft — far more code to maintain; unnecessary now that OpenDAL 0.57 ships an official
services-goosefs backend that's already battle-tested.
Implementing it through OpenDAL (option chosen) is the most consistent path: it reuses the exact same plumbing as oss / cos / obs, keeps the surface area small, and lets us inherit upstream improvements for free.
Component(s)
No response
Additional Context
No response
Would you like to implement a fix?
Yes
Is your feature request related to a problem?
Daft currently supports several object-storage backends (S3, GCS, Azure, OSS, COS, OBS, HuggingFace, etc.) through native clients or via OpenDAL. However, there is no first-class support for GooseFS — Tencent Cloud's distributed caching/acceleration filesystem that is widely used in front of COS for big-data and AI workloads.
Users running Daft on GooseFS-backed datasets currently have to:
local://scheme, which loses cloud-native semantics (credentials, multi-node access, prefix listing, etc.); orOpenDAL 0.57.0 has shipped a
services-goosefsbackend, so Daft can now expose GooseFS natively through the sameOpenDALSourcemachinery used by OSS / COS / OBS, with minimal new code.Describe the solution you'd like
Add a new
goosefs://URL scheme to Daft I/O, mirroring the existing COS / OBS integrations:Config
daft.io.GooseFSConfig(Python) /common_io_config::GooseFSConfig(Rust) with fields:endpoint(GooseFS master endpoint, e.g.http://master:9200)root(optional root path inside the namespace)anonymous(bool, defaultfalse)IOConfignext tocos,obs,oss, etc., with full__init__/replace/__repr__/ pickling / Python bindings.I/O source
goosefsinOpenDALSource::available_schemes().Goosefsservice (opendal = "0.57", featureservices-goosefs).goosefs://...URIs indaft-io's scheme dispatcher to the new source.SQL / catalog
goosefs://in URL parsing soread_parquet/read_csv/read_json/write_*work out of the box.Tests & docs
goosefsto the supported-schemes table in the I/O docs.Describe alternatives you've considered
local://— works, but requires every Daft worker/node to mount the filesystem, and it bypasses GooseFS's native client optimizations (locality hints, async prefetch, credential handling).services-goosefsbackend that's already battle-tested.Implementing it through OpenDAL (option chosen) is the most consistent path: it reuses the exact same plumbing as
oss/cos/obs, keeps the surface area small, and lets us inherit upstream improvements for free.Component(s)
No response
Additional Context
No response
Would you like to implement a fix?
Yes