Speculative Decoding for Vision-Language-Action Models
Research

Speculative Decoding for Vision-Language-Action Models

Accelerating robot control with efficient draft models

Role

Draft model architecture design, training pipeline implementation, evaluation

Investigated whether Mamba state-space models could improve inference speed for Vision-Language-Action (VLA) systems in robotic manipulation. Implemented and evaluated Mamba-based draft models against Llama baselines on the LIBERO-Goal benchmark, exploring architectural approaches for real-time robot action generation. Team project for CS229 Machine Learning.

Tech Stack

PythonPyTorchMamba SSMOpenVLALIBERO

Gallery