Skip to content

Latest commit

 

History

History
176 lines (135 loc) · 6.85 KB

File metadata and controls

176 lines (135 loc) · 6.85 KB

3D Avatar Customization

Clawsome ships with an Avatar tab that renders a per-agent 3D talking head with real-time lip-sync. Fresh clones see a friendly bundled smiley out of the box; named per-agent avatars with precise per-viseme lipsync take a few minutes to set up.

To wire up your own avatar you need two things:

  1. A GLB file — Avaturn's free generator is the path of least resistance (Generating an avatar).
  2. A small bit of configuration pointing an agent at the GLB — handled by the Settings UI (no JSON editing required).

TTS providers — lipsync quality ladder

Clawsome supports three TTS providers with a meaningful quality gap for the avatar experience. They form a fall-through ladder so any user gets audio out of the box:

  • ElevenLabs (recommended) — the alignment-rich happy path. Clawsome gets audio and character-level timing in one call, then runs a grapheme→phoneme pass to map characters to the 15 Oculus visemes, each driving a hand-tuned set of ARKit morph targets. Precise per-phoneme mouth shapes locked to audio. Requires an API key (their free tier is generous). Set under messages.tts.elevenlabs.apiKey in openclaw.json, or ELEVENLABS_API_KEY in your shell env.
  • OpenAI TTS (preferred fallback) — gpt-4o-mini-tts (cheap, ~$0.015/min) with named voices (nova, alloy, echo, shimmer, etc.). Audio quality is excellent — on par with mid-tier ElevenLabs — but the API returns no alignment data, so lipsync degrades to amplitude-based jaw flap. Requires OPENAI_API_KEY.
  • Microsoft Edge neural TTS — truly-zero-config fallback. Hits Microsoft's public Edge voice endpoint. No API key, free, neural-quality audio. Same amplitude-based jaw flap as OpenAI. Hosted public service with no SLA; treat as best-effort.

Selection order when nothing is forced:

  1. Voice ID looks like an Edge voice (xx-XX-NameNeural) → Edge.
  2. Voice ID looks like an OpenAI voice (alloy/nova/…) → OpenAI.
  3. ElevenLabs configured → ElevenLabs.
  4. OPENAI_API_KEY resolves → OpenAI.
  5. Otherwise → Edge.

If you care about the "talking head" feel, configure ElevenLabs. If you just want the avatar to nod along to TTS while you build something else, either fallback is fine.


Defaults

Clawsome ships with a friendly schematic smiley as the bundled default avatar (static/avatar/clawsome.glb). Any agent without an avatar assigned automatically renders this smiley in the Avatar tab — fresh clones see something working without touching config.

Voice defaults follow the same three-tier fall-through:

  • ElevenLabs configured — unconfigured agents get Rachel (21m00Tcm4TlvDq8ikWAM); per-agent voice ID overrides.
  • OpenAI key only — falls through to OpenAI TTS, default voice alloy.
  • No keys — falls through to Edge neural, default voice en-US-AriaNeural.

Previewing defaults

Add ?previewDefaults=1 to the URL to bypass per-agent config and see what a fresh-install user would hear — default avatar, default fallback voice — regardless of what you have configured. Useful for testing the out-of-box experience.


Setting up an avatar

The recommended path uses the Settings UI — gear icon → Avatars and Agents sub-tabs. No JSON editing required.

  1. Generate or obtain a GLB. See Generating an avatar with Avaturn below.
  2. Upload it. In the Avatars sub-tab, click + Upload GLB and pick the file (max 50 MB). It lands in static/avatar/ — gitignored, so your personal avatars stay out of the repo.
  3. Tune it. Each GLB is a row; click to expand for sliders that adjust framing, orbit, aim, and gaze. Changes preview live in the Avatar tab.
  4. Assign it. Switch to Agents, pick the GLB from the avatar dropdown for whichever agents you want, optionally set a voice ID.

Mapping is 1:1 between GLB files and avatars. Multiple agents can reference the same avatar; tuning is shared. Bundled GLBs (🔒) cannot be deleted; user uploads (📦) can be removed via the trash button.


Generating an avatar with Avaturn (recommended)

The supported happy-path is Avaturn (free). Avaturn produces GLBs with everything Clawsome's avatar pipeline expects: separate eye/eyelash/head/teeth/tongue meshes, ARKit-standard morph target names, the full 15-viseme Oculus set, standard bone names, and a baked idle animation.

A "T2" Avaturn export takes three reference photos (front + left profile + right profile) and spits out a 10–15 MB GLB in about five minutes.

Step 1: Get three reference photos

You need a front view and two side profiles of the face you want the avatar to wear.

Photos of a real person — three well-lit shots: dead-on frontal, hard-left profile, hard-right profile. Plain background, neutral expression, hair pulled back from the face if possible.

AI-generated face — ask an image model (GPT Image, Gemini Imagen, Midjourney, whatever you've got) to produce three consistent shots of the same fictional person:

Generate three portrait photographs of the same person against a plain white background, neutral expression, even studio lighting, shoulders-up framing:

  1. Front view, looking directly at the camera.
  2. Hard left profile (left ear toward camera).
  3. Hard right profile (right ear toward camera).

Keep face shape, hair, age, and ethnicity identical across all three images.

This is the workflow Kay's avatar uses. Three images, ~30 seconds of generation, done.

Step 2: Upload to Avaturn

Go to avaturn.me, sign in (free), start a new avatar. Pick the photo-based creation flow and upload the three images. Avaturn does the photogrammetry + rigging itself.

Step 3: Export as GLB

When the avatar is ready, export as GLB. Pick the format that includes ARKit blendshapes and a baked idle animation — that's the default for the "T2" preset.

Step 4: Upload into Clawsome

Open the Avatars sub-tab in settings, click + Upload GLB, pick the file. It appears as an assignable avatar in the Agents sub-tab immediately.


Other rigs

Non-Avaturn GLBs (Reallusion CC ExPlus, MetaHuman, custom Blender exports) can be made to work but require translation: Clawsome looks up morph targets by ARKit names (jawOpen, mouthSmileLeft, …) and bones by Avaturn-flavoured names (Head, Neck, …). Other rigs use different conventions.

morphNameMap and boneNameMap fields can alias one set to the other at load time. Both are JSON-editable in clawsome.json and not surfaced in the Settings UI — they're a power-user escape hatch. See the source in src/lib/avatar/ for the lookup tables Clawsome expects.