Skip to content

siromermer/robotics-vla-IK

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Robotics VLA and Vision Teleoperation

Robotics project combining vision-language-action policy inference, robot simulation, hand-object perception, and inverse kinematics.

Components

  • vla_policy_inference/: SmolVLA inference in the LeRobot ALOHA insertion environment.
  • vision_teleoperation/: Hand-video-to-VX300s teleoperation pipeline with MediaPipe, YOLO, Depth Anything V2, MuJoCo, and damped least-squares IK.

Results

SmolVLA ALOHA insertion preview (GIF, slowed)

SmolVLA ALOHA insertion preview

Vision teleoperation preview (GIF, slowed)

Vision teleoperation preview

Quick Start

Create separate environments for the two components because the VLA policy stack and the teleoperation stack have different dependency profiles.

SmolVLA Inference

conda create -n robotics-vla python=3.10 -y
conda activate robotics-vla
pip install lerobot huggingface_hub imageio[ffmpeg]

python vla_policy_inference/run_smolvla_aloha_insertion.py \
  --seed 1 \
  --max-steps 500 \
  --save-video vla_policy_inference/results/smolvla_aloha_insertion_seed1_500.mp4

Vision Teleoperation

conda create -n robotics-teleop python=3.10 -y
conda activate robotics-teleop
pip install -r vision_teleoperation/requirements.txt

python -m vision_teleoperation.teleop_main \
  --video vision_teleoperation/test_video.mp4 \
  --output vision_teleoperation/results/teleop_hand_yolo_depth_robot.mp4

Implementation Notes

  • VLA policy: c27e/smolvla_aloha_sim_insertion_human.
  • Simulation backends: MuJoCo ALOHA and Trossen VX300s MJCF.
  • Teleoperation perception: MediaPipe hand landmarks, YOLOv8n cup detection, Depth Anything V2 relative depth.
  • IK method: from-scratch geometric Jacobian with damped least-squares updates.

Documentation

About

Robotics simulation project for SmolVLA policy inference in ALOHA insertion and vision-based VX300s teleoperation using MuJoCo, MediaPipe, YOLO, Depth Anything V2, and damped least-squares IK.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages