Skip to the content.

Multimodal Vision-Language-Action Model for Enhanced Robotic Manipulation

Project ID: 2531ad1518

(You will need this ID for your application)

Research Theme: Artificial Intelligence and Robotics

UCL Lead department: Computer Science

Department Website

Lead Supervisor: (Chris) Xiaoxuan Lu

Project Summary:

Deep Learning, which utilizes multi-layered neural networks to simulate human cognitive functions, often relies on extensive labeled datasets for training, limiting its effectiveness in dynamic, real-world scenarios. In the field of robotics, the ability to perform complex tasks like object manipulation requires an integration of various sensory inputs. Humans instinctively blend multiple senses, such as visual, auditory, and temperature, to inform their actions and decisions. This natural capacity for multisensory processing may be the key to enhancing robotic systems. This research investigates the critical question: can the integration of multisensory data improve robotic manipulation, enabling machines to learn more effectively from real-world experiences rather than depending solely on large annotated datasets?

To answer the above question, you will focus on developing a multimodal Vision-Language-Action (VLA) model designed to integrate diverse sensory inputs, including visual, auditory, and temperature data, to enhance robotic manipulation tasks. You will conduct hands-on research to create models that draw from human learning processes to inform actions. Additionally, you will implement efficient fine-tuning techniques to enable these models to adapt quickly to specific manipulation tasks with minimal human intervention, improving their performance in real-world applications.

You will collaborate with a passionate team of researchers and experts in robotics, computer vision, and machine learning. This project is part of a larger initiative that includes various stakeholder partners dedicated to driving innovation in robotic systems.

We are seeking motivated and innovative candidates with a background in robotics, computer vision, or machine learning. Ideal applicants should have strong analytical skills, a passion for robotics system implementation, and a desire to contribute to the advancement of machine intelligence capable of complex actions in dynamic environments.