[SYCL][Doc] Add sycl_ext_oneapi_register_host_memory extension spec#22324
[SYCL][Doc] Add sycl_ext_oneapi_register_host_memory extension spec#22324againull wants to merge 4 commits into
Conversation
74a7ce3 to
56b5bf1
Compare
Add an experimental extension specification for registering existing host/system memory with the SYCL runtime so that it behaves like a USM host allocation: usable from device code, queryable via get_pointer_type, and faster for explicit copies. The memory is released via unregister_host_memory rather than sycl::free. Co-Authored-By: Greg Lueck <gregory.m.lueck@intel.com> Assisted-By: Claude
56b5bf1 to
2e23afc
Compare
|
|
||
| [NOTE] | ||
| ==== | ||
| An implementation-provided validation or debug layer may optionally track |
There was a problem hiding this comment.
Note: some diagnostics are intentionally optional. My current thinking is to implement overlap and invalid-unregistration checks in the UR validation layer rather than in the normal runtime path. That avoids adding registration tracking and associated overhead for all applications just to provide unified error reporting. It also gives us a single place to handle backend differences, since native backends do not necessarily behave the same way in these cases.
|
@intel/llvm-gatekeepers please consider merging |
gmlueck
left a comment
There was a problem hiding this comment.
Here's a few more comments on some specific wording.
| range must additionally be writable by the application for the lifetime of | ||
| the registration. | ||
| * The range `[ptr, ptr + numBytes)` does not overlap any range that is | ||
| currently registered through this extension for the same context. |
There was a problem hiding this comment.
I think we should add a precondition something like:
- The memory range is one of the following storage durations: heap memory, automatic (stack), or static (global) storage.
There was a problem hiding this comment.
To be honest, I think this may be too restrictive. I think, probably the existing first precondition that the range is valid host memory mapped into the host address space is sufficient.
For example, if I am not mistaken memory obtained via mmap() is neither heap, stack, nor static storage, but it should still be a valid candidate for registration. E.g. the upcoming update to Level Zero spec states:
Any host pointer may be passed, including heap, stack, and statically-allocated (global) storage.
Please correct me if I am wrong.
There was a problem hiding this comment.
Hah! I was also thinking about the mmap case. Are you sure Level Zero can register such pages? I think the proposed updates to the Level Zero spec do not clearly state that this is allowed. I asked to have this clarified in the internal tracker. Let's see what the response is.
There was a problem hiding this comment.
Sounds good, thank you!
gmlueck
left a comment
There was a problem hiding this comment.
Much better! A couple small comments below.
| cannot be registered by the implementation. | ||
|
|
||
| [_Note:_ This extension does not provide a query for the host page size. It can | ||
| be obtained using operating system APIs such as `+sysconf(_SC_PAGESIZE)+` on |
There was a problem hiding this comment.
Hi, One minor question, _SC_PAGESIZE looks only for host standard 4K page. what about host huge page?
|
@intel/llvm-gatekeepers please consider merging |
Add an experimental extension specification for registering existing host/system memory with the SYCL runtime so that it behaves like a USM host allocation: usable from device code, queryable via get_pointer_type, and faster for explicit copies. The memory is released via unregister_host_memory rather than sycl::free.
Co-Authored-By: Greg Lueck gregory.m.lueck@intel.com
Assisted-By: Claude