gRPC-Relay | 中文版

gRPC-Relay is a cross-domain communication relay system designed to establish a secure, controllable, and high-performance gRPC channel between internal devices and external controllers.

It is intended for the following scenarios:

Internal devices managed by a public-network or office-network Controller through a Relay
Bidirectional streaming data transfer, including control commands and file/data uploads
MQTT-based device online/offline notifications, status reporting, and telemetry
gRPC-based online device discovery and streaming relay capabilities

Background and Goals

The core goal of gRPC-Relay is to provide cross-domain gRPC relay capability so that devices inside private networks, without public IP addresses, can still be accessed and managed securely by external controllers.

Design Principles

Controllable relay: Relay sees metadata only and never decrypts business payloads
End-to-end encryption: Business data is encrypted/decrypted only by Device and Controller
Availability-first baseline: Deliver a single-node MVP first, then expand to multi-node
Observable by default: Built-in health checks, metrics, logs, audit, and tracing
Transport roadmap: Use HTTP/2 today and keep QUIC as the v2 low-latency transport target

Core Roles

Role	Description	Responsibilities
Device	Physical device such as an IoT device or workstation	Runs a Device Agent and executes business logic
Device Agent	Agent process running on the device	Maintains long-lived connection with Relay, registers, heartbeats, reconnects, reports status
Controller	Human-operated control system	Discovers devices, initiates sessions, sends control commands, receives responses
Relay	Relay server	Manages long-lived connections, forwards traffic, publishes notifications, provides query APIs
MQTT Broker	Message broker	Transmits telemetry data and device online/offline notifications

System Architecture

Protocol Layers

Link	Protocol	Purpose
Device ↔ Relay	gRPC over HTTP/2 today; QUIC in v2	Long-lived device connection
Controller ↔ Relay	gRPC over HTTP/2 + TLS 1.3	Controller access and querying
Relay ↔ MQTT Broker	MQTT over TLS 1.3	Device notifications and telemetry
Fallback	TLS/TCP	Used as the TCP transport baseline

Architectural Characteristics

Relay handles metadata, authentication, authorization, rate limiting, and stream forwarding only
Business payloads between Device and Controller are end-to-end encrypted
MQTT Broker is deployed independently and decoupled from Relay
The first release uses a single Relay node, with multi-node and load balancing in later versions

Core Workflows

1. Device Registration and Online Status

Device starts the Device Agent
Device Agent connects to Relay
Relay verifies device identity
Relay assigns a connection_id
Relay publishes a device online event to MQTT
Device Agent may optionally publish its own status as backup validation

2. Heartbeat and Liveness

Device Agent sends a heartbeat every 30 seconds
Relay updates the device last_seen
If no heartbeat is received for 120 seconds, the device is marked as suspected offline
If no heartbeat is received for 300 seconds, Relay closes the connection and publishes an offline event

3. Device Discovery by Controller

Three complementary discovery methods are supported:

Relay publishes online/offline events through MQTT
Device Agent reports status through MQTT
Controller queries the online device list through gRPC

4. Controller Session Initiation

Controller obtains target device information
Controller connects to Relay and specifies target_device_id
Relay verifies Controller identity and permissions
Relay creates a stream mapping between Controller and Device
Relay starts forwarding bidirectional stream data

5. Device Reconnect and Session Recovery

Device reconnects automatically after disconnection
Reconnect requests include previous_connection_id
Relay attempts to restore the session within the recovery window
If recovery fails, a new session is created and a new connection_id is assigned

6. Idempotency

Requests carry a globally unique sequence_number
Relay caches recently processed sequence numbers
Duplicate requests return cached responses to avoid repeated execution

API Design

gRPC Services

Core services include:

DeviceConnect(stream DeviceMessage) returns (stream RelayMessage)
ListOnlineDevices(ListOnlineDevicesRequest) returns (ListOnlineDevicesResponse)
ConnectToDevice(stream ControllerMessage) returns (stream DeviceResponse)
RevokeToken(RevokeTokenRequest) returns (RevokeTokenResponse)

Key Messages

DeviceMessage: device registration, heartbeat, data reporting
RelayMessage: registration response, heartbeat response, data request
ControllerMessage: request from controller to device
DeviceResponse: response from device
ListOnlineDevicesRequest/Response: online device query
RevokeTokenRequest/Response: admin token revocation

MQTT Topics

Topic	Purpose
`relay/device/online`	Device online notification
`relay/device/offline`	Device offline notification
`device/{device_id}/status`	Device self-reported status
`telemetry/{device_id}`	Device telemetry data
`telemetry/relay/{relay_id}`	Relay telemetry data

Error Codes

OK
DEVICE_OFFLINE
UNAUTHORIZED
DEVICE_NOT_FOUND
RATE_LIMITED
INTERNAL_ERROR

Security and Authorization Model

Authentication

Device: mTLS device certificates are recommended, with pre-provisioned tokens as an alternative
Controller: HS256 JWT token authentication with controller_id, role, allowed projects, expiry, issuer, and audience claims

Authorization

The system uses RBAC + device ownership:

admin: access all devices
operator: access authorized devices and perform control/data transfer
viewer: read-only access

Security Requirements

All connections must use TLS 1.3
Business payloads must be end-to-end encrypted
Relay must not log encrypted payload contents
Rate limiting must apply at device, Controller, and global levels
Metadata such as device_id, controller_id, and method_name must be validated
Admin Controllers can revoke Controller or Device tokens through the gRPC RevokeToken API; the current MVP/P1 implementation keeps revocations in Relay memory

Non-Functional Requirements

Performance Targets

Metric	Target
Single-instance long-lived connections	10,000
Concurrent active streams	1,000
Relay additional hop latency	P50 < 5ms, P99 < 20ms
Maximum single-stream bandwidth	10 MB/s
Memory budget	< 2 GB for 10K connections
CPU usage	< 80% at 10K connections and 1K active streams

Availability Targets

Service availability: 99.9%
Device reconnect time: < 10 seconds
Session recovery success rate: > 95%
MTTR: < 5 minutes

Observability

The system provides:

/health health check (with component-level status)
Full Prometheus /metrics endpoint (connection, stream, latency, error, resource metrics)
Structured JSON logging (via tracing-subscriber)
Audit logging (auth events, connections, rate limits, errors)
OpenTelemetry distributed tracing (OTLP exporter, configurable sampling)
MQTT relay telemetry publishing
Built-in alerting engine (CPU, memory, MQTT, connection thresholds)

CI/CD

Three GitHub Actions workflows automate quality checks, releases, and publishing.

Workflow	Trigger	What it does
CI	push (master/main), PR (master/main), tag, manual	`cargo fmt --check` → `cargo clippy` → `cargo check` → unit tests + integration tests → coverage (80% threshold) → Docker build
Create Release	manual (`workflow_dispatch`)	Validates version vs `Cargo.toml`, runs full test suite, builds release binary, verifies `relay --version`, creates git tag, generates categorized release notes, creates GitHub release, triggers Release
Release	`release: published`	Publishes `relay-proto` to crates.io, waits for index propagation, publishes `relay-agent-sdk` and `relay-controller-sdk`, builds and pushes Docker image to GHCR

Release Flow

  prepare-release.sh          PR merge              create-release.yml       release.yml (auto)
  (local: bumps version,  →   (CI validates    →    (tag + GitHub       →    (crates.io + GHCR
   opens a PR)                on the branch)        release)                 Docker image)

See doc/RELEASE.md for the full release process, including SemVer guidance, rollback procedures, and troubleshooting.

Deployment and Operations

See deploy/README.md for the full deployment documentation covering Docker Compose, bare-metal, Kubernetes, Prometheus, and Grafana. See deploy/BUILD.md for manual build instructions (binary and Docker image).

Pre-built Docker images are published to ghcr.io/cokkiy/grpc-relay on every release — no local Rust toolchain required to run the relay.

Deployment Options

Method	Directory	What's included
Docker	`Dockerfile`, `docker-compose.yml`, `deploy/docker/`	Pre-built GHCR image, Compose with MQTT + Prometheus + Grafana + Jaeger
Bare Metal	`deploy/bare-metal/`	systemd service, install/uninstall/upgrade scripts, env template
Kubernetes	`deploy/kubernetes/`	Deployment, Service, ConfigMap, Secret, HPA, NetworkPolicy, PDB, ServiceAccount, Namespace, Kustomization

Monitoring Stack

Component	Path	Purpose
Grafana	`deploy/grafana/`	Pre-built `relay-overview` dashboard + Prometheus datasource
Prometheus	`deploy/prometheus/`	Scrape config targeting relay metrics endpoint
MQTT Broker	`docker-compose.yml`	Eclipse Mosquitto service for local Docker deployments

Recommended Ports

Port	Protocol	Purpose
`50051`	TCP	gRPC (HTTP/2)
`50052`	UDP	gRPC over QUIC (v2.0)
`8080`	TCP	`/health` and `/metrics`
`8883`	TCP	MQTT over TLS

Configuration

The relay server is configured via a single YAML file (example). Key sections:

Section	Contents
`relay`	id, address, QUIC address, max connections, heartbeat interval
`relay.stream`	idle timeout, max active streams, per-controller limits
`relay.rate_limiting`	per-device/controller/global request + connection + bandwidth limits, CPU/memory thresholds
`relay.idempotency`	cache capacity + TTL
`relay.auth`	enable flag, token maps (device + controller), method whitelist, JWT config
`relay.mqtt`	enable flag, broker address, credentials, telemetry interval, reconnect config
`relay.tls`	enable flag, cert/key/CA paths
`observability`	logging level/format, health bind, audit config, OpenTelemetry tracing, alerting rules

Testing Strategy

Unit Tests

Coverage includes:

Authentication and authorization
Sequence number deduplication
Session management
Rate limiting
Error handling

Integration Tests

Coverage includes:

Device connection and registration
Controller session initiation
Bidirectional data transfer
Device reconnect and session recovery
MQTT notifications and queries
Authentication failure handling
Authorization rejection handling
Rate limit triggering

Performance Tests

10K concurrent connections
1K concurrent active streams
Latency target validation
Long-running stability validation

Security Tests

Unauthenticated access
Forged tokens
Cross-device privilege escalation
DDoS simulation
Large payload attacks
Replay attacks

MVP Scope and Roadmap

v1.0 MVP

The first release focuses on:

Device ↔ Relay HTTP/2 connection, with QUIC deferred to v2
Controller ↔ Relay HTTP/2 connection
Bidirectional stream relay
Registration, heartbeat, reconnect, offline handling
MQTT online/offline notifications
Controller online device query
RBAC authorization
Idempotency
End-to-end encryption
Basic rate limiting and input validation
Metrics, logs, and audit
Relay telemetry
Health checks
Docker / Kubernetes deployment

Future Versions

v1.1: Session persistence and stronger recovery
v1.2: Multi-Relay nodes, high availability, load balancing
v2.0: Controller QUIC, connection migration, 0-RTT, ABAC

References

Document Notes

This README was created based on the following project documents:

doc/requirements.md
doc/architecture.md
doc/action_plan.md
doc/RELEASE.md
doc/v1.0_release_summary.md

It is intended as a user-facing entry document that emphasizes project overview, architecture, and implementation path.

Name		Name	Last commit message	Last commit date
Latest commit History 208 Commits
.github/workflows		.github/workflows
config		config
crates		crates
deploy		deploy
doc		doc
scripts		scripts
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
README-cn_ZH.md		README-cn_ZH.md
README.md		README.md
docker-compose.yml		docker-compose.yml
rust-toolchain.toml		rust-toolchain.toml

Folders and files

Latest commit

History

Repository files navigation

gRPC-Relay | 中文版

Table of Contents

Background and Goals

Design Principles

Core Roles

System Architecture

Protocol Layers

Architectural Characteristics

Core Workflows

1. Device Registration and Online Status

2. Heartbeat and Liveness

3. Device Discovery by Controller

4. Controller Session Initiation

5. Device Reconnect and Session Recovery

6. Idempotency

API Design

gRPC Services

Key Messages

MQTT Topics

Error Codes

Security and Authorization Model

Authentication

Authorization

Security Requirements

Non-Functional Requirements

Performance Targets

Availability Targets

Observability

CI/CD

Release Flow

Deployment and Operations

Deployment Options

Monitoring Stack

Recommended Ports

Configuration

Testing Strategy

Unit Tests

Integration Tests

Performance Tests

Security Tests

MVP Scope and Roadmap

v1.0 MVP

Future Versions

References

Document Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages