Skip to content

Latest commit

 

History

History
151 lines (114 loc) · 6.36 KB

File metadata and controls

151 lines (114 loc) · 6.36 KB

clawcall vs. Alternatives

A factual comparison of clawcall against the other voice-AI options commonly used with OpenClaw and similar self-hosted agent gateways.


Quick summary

clawcall @openclaw/voice-call clawphone deepclaw Vapi / Retell / Bland
Agent tools mid-call Every turn, natively Via openclaw_agent_consult (#71272) No (CLI spawn) No (side LLM) Provider-defined
Conversational latency Higher (STT → agent turn → TTS) Lower (realtime LLM + async consult hop) Medium Medium Low–medium (managed infra)
STT Deepgram or null (keyless) OpenAI Realtime / ElevenLabs Twilio built-in Deepgram Provider-managed
TTS ElevenLabs or Twilio <Say> ElevenLabs Twilio built-in ElevenLabs Provider-managed
Keyless dev mode Yes No Yes No No
Self-hostable Yes Plugin only (needs OpenClaw gateway) Yes Yes No (cloud SaaS)
Inbound allowlist Yes Yes No No Plan-dependent
Barge-in Yes Partial No Yes Yes
SMS Yes (POST /twilio/sms, keyless, same allowlist) No Yes (basic) No Some plans
License MIT MIT (OpenClaw) MIT MIT Commercial/SaaS
Price Free + provider costs Free + provider costs Free + provider costs Free + provider costs Starts ~$0.05/min (managed)

Detailed breakdown

clawcall

clawcall is a self-hosted TypeScript service that bridges Twilio inbound voice calls directly to an OpenClaw gateway via chat.send. Every utterance runs a complete agent turn — the same path as a chat message — so the agent's full tool-calling loop executes on every turn without any special bridging tool.

Strengths:

  • Full agent tool access on every utterance or SMS, with no special configuration
  • Inbound SMS via POST /twilio/sms — same allowlist, same gateway session, keyless (no STT/TTS keys needed for text-only SMS flows)
  • Pluggable STT/TTS providers; keyless dev mode works out of the box with zero API keys (null STT, Twilio <Say> TTS)
  • Simple mental model: a phone call or text message is just another chat.send
  • MIT license, no external dependency beyond the OpenClaw gateway

Limitations:

  • Voice latency is higher than realtime approaches because each utterance requires a full STT → gateway round-trip → TTS synthesis cycle
  • Requires a running Twilio number and (in production) Deepgram + ElevenLabs keys
  • No built-in outbound calling or outbound SMS

@openclaw/voice-call (native realtime plugin)

The native plugin (PR #71272, merged as e2f13959d4) integrates directly into OpenClaw. It uses a realtime LLM session (OpenAI Realtime API or Gemini Live) to handle the audio end-to-end. When the realtime LLM needs to reach the agent's tools, it calls the special openclaw_agent_consult tool, which bridges to the gateway.

Strengths:

  • Sub-second conversational latency — the realtime LLM responds without waiting for a full agent turn on every utterance
  • Tight integration with OpenClaw — no separate service to deploy
  • Agent tools available via the openclaw_agent_consult hop

Limitations:

  • Tools are not invoked on every turn — only when the realtime LLM decides to call openclaw_agent_consult, introducing an extra hop for tool access
  • Requires an OpenAI Realtime or Gemini Live API key; no keyless dev mode
  • Less flexibility over STT/TTS provider choice

Choose the native plugin when you need the lowest possible conversational latency and are comfortable with the realtime LLM + consult-hop model.


clawphone

clawphone is a lighter-weight voice bridge that spawns CLI processes to handle calls. It uses Twilio's built-in <Say> for TTS and Twilio's transcription for STT, which means no extra API keys but also no streaming and no agent tool access mid-call. Each call runs a one-shot agent invocation rather than a stateful turn.

Strengths:

  • Zero external API keys — Twilio built-ins only
  • Simple setup for single-purpose bots that don't need tool access mid-call

Limitations:

  • No agent tool access during the call
  • No barge-in / interruption handling
  • Higher latency from non-streaming STT/TTS

deepclaw

deepclaw is a separate open-source project that routes Twilio calls through a standalone LLM running alongside your OpenClaw installation. It uses Deepgram for STT and ElevenLabs for TTS but maintains its own LLM conversation loop rather than routing through the OpenClaw gateway protocol.

Strengths:

  • Barge-in support
  • Deepgram streaming STT

Limitations:

  • No native agent tool access — the sidecar LLM operates independently of the OpenClaw gateway's tool-calling loop
  • Requires both Deepgram and ElevenLabs keys (no keyless mode)
  • A separate codebase to keep in sync with your gateway

Managed platforms (Vapi, Retell, Bland)

Vapi, Retell, and Bland are fully managed cloud services for building voice AI applications. They handle telephony, STT, LLM, and TTS as a complete hosted stack.

Strengths:

  • Lowest operational overhead — no infra to run
  • Sub-second latency on optimised infrastructure
  • Outbound calling, SMS (plan-dependent), analytics dashboards
  • Extensive provider integrations

Limitations:

  • Not self-hostable — your call audio and transcripts go through their cloud
  • Per-minute pricing (typically $0.05–$0.15/min depending on plan and volume)
  • Agent customisation limited to what their platform exposes
  • Not designed for OpenClaw's gateway protocol

Choose a managed platform when operational simplicity and outbound calling matter more than self-hosting or full gateway tool control.


When to choose clawcall

  • You already run an OpenClaw gateway and want the simplest possible way to give it a phone number
  • You want every agent tool available on every caller utterance without a consult hop
  • You want to prototype without any API keys (STT_PROVIDER=null, TTS_PROVIDER=twilio-say)
  • You need MIT-licensed, fully self-hosted infrastructure with no cloud dependencies beyond Twilio

Built and maintained by Code and Trust. Companion guide: Give your OpenClaw agent a phone number