Skip to content

[Feat] use host-pinned memory with dual CPU/device addresses for transport buffers#1024

Merged
mag1c-h merged 1 commit into
ModelEngine-Group:feature_26h1from
yumingyue624:adapt_connection
Jun 17, 2026
Merged

[Feat] use host-pinned memory with dual CPU/device addresses for transport buffers#1024
mag1c-h merged 1 commit into
ModelEngine-Group:feature_26h1from
yumingyue624:adapt_connection

Conversation

@yumingyue624

Copy link
Copy Markdown
Contributor

Purpose

Switch ASU send/flag buffers from plain host memory to host-pinned
memory so that CPU code packs SQEs through the local mapping while
HCOMM/RDMA uses the device-visible mapping of the same allocation.

Modifications

  1. BufferManager: allocate host-pinned memory via aclrtMallocHost +
    aclrtHostRegisterV2, obtain device pointer via
    aclrtHostGetDevicePointer. ScatterGatherEntry gains device_addr.
    RegisterMemory uses device address for host-pinned regions.
  2. AsuTransportImpl: send/flag buffers use HOST_PINNED instead of HOST.
  3. asu_submit_flow: pass device_addr to SendIoBatch.
  4. sqe_request: use flagBuffer.device_addr for response_buffer_addr.
  5. Tests: added host-pinned dual-address and device_addr assertions.

Test

  • buffer_manager_test: HostPinnedRegistersDeviceAddress.
  • asu_submit_flow_test: BuildSubBatchSendBuffersUsesHostPinnedDeviceAddresses.
  • sqe_request_test: packed response address matches device_addr.

@yumingyue624 yumingyue624 changed the base branch from develop to feature_26h1 June 12, 2026 03:57
@yumingyue624 yumingyue624 requested a review from nrj868 as a code owner June 12, 2026 03:57
Comment thread ucm/transport/kv/asu/trans/src/asu_submit_flow.cpp Outdated
Comment thread ucm/transport/kv/asu/trans/src/buffer_manager.cpp Outdated
Comment thread ucm/transport/kv/asu/trans/src/buffer_manager.cpp Outdated
Comment thread ucm/transport/kv/asu/trans/src/buffer_manager.cpp Outdated
Comment thread ucm/transport/kv/asu/trans/src/buffer_manager.cpp Outdated
Comment thread ucm/transport/kv/asu/trans/src/asu_submit_flow.cpp Outdated
Comment thread ucm/transport/kv/asu/trans/src/buffer_manager.cpp Outdated
Comment thread ucm/transport/kv/asu/trans/src/buffer_manager.h
Comment thread ucm/transport/kv/asu/trans/src/sqe_request.cpp
@yumingyue624 yumingyue624 force-pushed the adapt_connection branch 4 times, most recently from 88298b3 to c5cfef8 Compare June 15, 2026 08:12
Comment thread ucm/shared/trans/ascend/ascend_buffer.cc Outdated
Comment thread ucm/transport/kv/asu/trans/include/asu_transport/asu_transport.h
@yumingyue624 yumingyue624 force-pushed the adapt_connection branch 3 times, most recently from fba4e10 to 548365b Compare June 16, 2026 07:38
Comment thread ucm/transport/kv/asu/trans/src/asu_transport_impl.cpp Outdated
Comment thread ucm/transport/kv/asu/trans/src/asu_transport_impl.cpp Outdated
Comment thread ucm/transport/kv/asu/trans/src/asu_transport_impl.cpp Outdated
Comment thread ucm/transport/kv/asu/trans/src/asu_transport_impl.cpp Outdated
@yumingyue624 yumingyue624 force-pushed the adapt_connection branch 3 times, most recently from 59a10dd to 02f205c Compare June 16, 2026 09:18
Infinite666
Infinite666 previously approved these changes Jun 16, 2026
Infinite666
Infinite666 previously approved these changes Jun 16, 2026
Comment thread ucm/shared/trans/ascend/ascend_buffer.cc Outdated
Comment thread ucm/shared/trans/buffer.h
Comment thread ucm/shared/trans/ascend/ascend_buffer.cc Outdated
…r transport buffers

## Purpose
Switch ASU send/flag buffers from plain host memory to host-pinned
memory so that CPU code packs SQEs through the local mapping while
HCOMM/RDMA uses the device-visible mapping of the same allocation.

## Modifications
1. BufferManager: allocate host-pinned memory via aclrtMallocHost +
aclrtHostRegisterV2, obtain device pointer via
aclrtHostGetDevicePointer. ScatterGatherEntry gains device_addr.
RegisterMemory uses device address for host-pinned regions.
2. AsuTransportImpl: send/flag buffers use HOST_PINNED instead of HOST.
3. asu_submit_flow: pass device_addr to SendIoBatch.
4. sqe_request: use flagBuffer.device_addr for response_buffer_addr.
5. Move IsTransportBufferReady from asu_submit_flow to buffer_manager.
6. Tests: added host-pinned dual-address and device_addr assertions.

## Test
- buffer_manager_test: HostPinnedRegistersDeviceAddress.
- asu_submit_flow_test: BuildSubBatchSendBuffersUsesHostPinnedDeviceAddresses.
- sqe_request_test: packed response address matches device_addr.
@mag1c-h mag1c-h merged commit 69a48ff into ModelEngine-Group:feature_26h1 Jun 17, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants