Monday, May 12, 2025
News PouroverAI
Visit PourOver.AI
No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
News PouroverAI
No Result
View All Result

Researchers at Stanford Introduce RoboFuME: Revolutionizing Robotic Learning with Minimal Human Input

November 3, 2023
in AI Technology
Reading Time: 4 mins read
0 0
A A
0
Share on FacebookShare on Twitter


In many domains that involve machine learning, a widely successful paradigm for learning task-specific models is to first pre-train a general-purpose model from an existing diverse prior dataset and then adapt the model with a small addition of task-specific data. This paradigm is attractive to real-world robot learning since collecting data on a robot is expensive, and fine-tuning an existing model on a small task-specific dataset could substantially improve the data efficiency for learning a new task. Pretraining a policy with offline reinforcement learning and then fine-tuning it with online reinforcement learning is a natural way to implement this paradigm in robotics. However, numerous challenges arise when using this recipe in practice. 

Firstly, compared to the local robot platform, off-the-shelf robot datasets frequently employ different objects, fixture placements, camera perspectives, and lighting conditions. Effectively fine-tuning a robot policy becomes challenging due to non-trivial distribution shifts between pretraining and online fine-tuning data. Most previous studies only highlight the advantages of the pre-train and fine-tune paradigm, in which the robot employs the same hardware instance for both the fine-tuning and pretraining stages. Second, significant human supervision is frequently needed when training or fine-tuning a policy in the actual world. This supervision involves manually resetting the environment between trials and designing reward functions. 

They aim to tackle these two issues in this study and provide a workable framework that allows robot fine-tuning with the least human and time-consuming effort. In the last several years, significant advancements have been made in developing effective and self-governing reinforcement learning algorithms. However, only the system could learn with human supervision and various demonstration datasets without requiring human-engineered incentive mechanisms and manual environment resets. Reset-free reinforcement learning (RL) is one method put forth in certain works to lessen the necessity for manual environment resets. During training, an agent alternates between executing a task policy and a reset policy, updating both with online experience. 

These efforts, however, do not use a variety of commercial robot datasets. Though these new techniques have not been included in a system that attempts to minimize human supervision during the fine-tuning phase, recent advancements in offline reinforcement learning algorithms have enabled policies to exploit various offline data and develop further via online fine-tuning. Other papers suggest that learning reward prediction models can replace the requirement for human-specified reward functions; nevertheless, they discovered that many of these models can be fragile when used in an actual RL fine-tuning environment. In conclusion, while earlier research has provided the necessary individual components for constructing a functional system for effective and human-free robot learning, it is still being determined which components and how to assemble them. 

Researchers from Stanford University created ROBOFUME, a system that uses a variety of offline datasets and online fine-tuning to enable autonomous and effective real-world robot learning. Their system has two stages of operation. They assume that during the pretraining phase, they have access to a varied prior dataset, a small collection of sample failure observations in the target task, a few task demonstrations, and reset demonstrations of the target task. They derive a language-conditioned, offline reinforcement learning multitask strategy from this data. They require an algorithm that can both robustly fine-tune in environments distinct from those seen in the offline dataset and efficiently digest heterogeneous offline data to handle the shift in distribution between offline interactions and online interactions. 

They discover that calibrated offline reinforcement learning techniques ensure that the pre-trained policy can efficiently process a variety of offline data and keeps improving during online adaptation by correcting the scale of the learned Q-values and underestimating predicted values of the learned policy from offline data. They must eliminate the requirement for reward engineering by developing a reward predictor to guarantee that the online fine-tuning phase requires as little human input as possible. 

Their clever approach involves using a sizable vision-language model (VLM) to provide a reliable pre-trained representation, then honing it with a tiny quantity of in-domain data to make it specific to the reward classification scenario. Pre-trained VLMs have already been trained using large-scale linguistic and visual data from the internet. Compared to the models employed in earlier efforts, this makes the model more resilient to changes in lighting and camera placement. During the fine-tuning stage, a robot independently adjusts the policy in the actual world by alternating between trying to complete the job and restoring the environment to its initial state distribution. Meanwhile, the agent updates the procedure using the pre-trained VLM model as a stand-in reward. 

To assess their framework, they pre-train it on the Bridge dataset and then test it on various downstream real-world tasks, such as folding and covering cloths, picking up and placing sponges, covering pot lids, and setting pots in sinks. They discover that with as little as three hours of in-person instruction, their strategy offers notable advantages over offline-only techniques. In a simulation scenario, they conduct additional quantitative trials to show that their strategy works better than imitation learning and offline reinforcement learning approaches that either don’t fine-tune online or don’t use a variety of previous data. 

A fully autonomous system for pre-training from an earlier robot dataset and fine-tuning on an unknown downstream task with a minimum number of resets and learned reward labels are among their primary contributions. Secondly, they have developed a technique for refining vision-language models that have already been trained and utilizing them to create a surrogate reward for downstream reinforcement learning.

Check out the Paper and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 32k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on Telegram and WhatsApp.

Source link

Tags: humanInputIntroduceLearningMinimalResearchersRevolutionizingRoboFuMEroboticStanford
Previous Post

SOCIAL MEDIA MARKETING TRENDS | SOCIAL MEDIA MARKETING 2023 | DIGITAL MARKETING TRENDS 2023

Next Post

Buy Chevron Stock. It’s Been Punished Enough for Earnings, Hess Deal.

Related Posts

How insurance companies can use synthetic data to fight bias
AI Technology

How insurance companies can use synthetic data to fight bias

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset
AI Technology

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
How Game Theory Can Make AI More Reliable
AI Technology

How Game Theory Can Make AI More Reliable

June 9, 2024
Decoding Decoder-Only Transformers: Insights from Google DeepMind’s Paper
AI Technology

Decoding Decoder-Only Transformers: Insights from Google DeepMind’s Paper

June 9, 2024
Buffer of Thoughts (BoT): A Novel Thought-Augmented Reasoning AI Approach for Enhancing Accuracy, Efficiency, and Robustness of LLMs
AI Technology

Buffer of Thoughts (BoT): A Novel Thought-Augmented Reasoning AI Approach for Enhancing Accuracy, Efficiency, and Robustness of LLMs

June 9, 2024
Deciphering Doubt: Navigating Uncertainty in LLM Responses
AI Technology

Deciphering Doubt: Navigating Uncertainty in LLM Responses

June 9, 2024
Next Post
Buy Chevron Stock. It’s Been Punished Enough for Earnings, Hess Deal.

Buy Chevron Stock. It's Been Punished Enough for Earnings, Hess Deal.

The 5 most HATED programming languages 👩‍💻 #programming #technology #software #career

The 5 most HATED programming languages 👩‍💻 #programming #technology #software #career

CEO of big bank sounds alarm over ‘low growth economy’

CEO of big bank sounds alarm over 'low growth economy'

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
Is C.AI Down? Here Is What To Do Now

Is C.AI Down? Here Is What To Do Now

January 10, 2024
Porfo: Revolutionizing the Crypto Wallet Landscape

Porfo: Revolutionizing the Crypto Wallet Landscape

October 9, 2023
23 Plagiarism Facts and Statistics to Analyze Latest Trends

23 Plagiarism Facts and Statistics to Analyze Latest Trends

June 4, 2024
A Complete Guide to BERT with Code | by Bradney Smith | May, 2024

A Complete Guide to BERT with Code | by Bradney Smith | May, 2024

May 19, 2024
How To Build A Quiz App With JavaScript for Beginners

How To Build A Quiz App With JavaScript for Beginners

February 22, 2024
Saginaw HMI Enclosures and Suspension Arm Systems from AutomationDirect – Library.Automationdirect.com

Saginaw HMI Enclosures and Suspension Arm Systems from AutomationDirect – Library.Automationdirect.com

December 6, 2023
Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

June 10, 2024
AI Compared: Which Assistant Is the Best?

AI Compared: Which Assistant Is the Best?

June 10, 2024
How insurance companies can use synthetic data to fight bias

How insurance companies can use synthetic data to fight bias

June 10, 2024
5 SLA metrics you should be monitoring

5 SLA metrics you should be monitoring

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

June 10, 2024
Facebook Twitter LinkedIn Pinterest RSS
News PouroverAI

The latest news and updates about the AI Technology and Latest Tech Updates around the world... PouroverAI keeps you in the loop.

CATEGORIES

  • AI Technology
  • Automation
  • Blockchain
  • Business
  • Cloud & Programming
  • Data Science & ML
  • Digital Marketing
  • Front-Tech
  • Uncategorized

SITEMAP

  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2023 PouroverAI News.
PouroverAI News

No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing

Copyright © 2023 PouroverAI News.
PouroverAI News

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In