Meta’s Self-Taught Evaluator enables LLMs to create their own training data (2024)

Ben Dickson@BenDee983

Meta’s Self-Taught Evaluator enables LLMs to create their own training data (1)

Image credit: VentureBeat with DALL-E 3

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

Human evaluation has been the gold standard for assessing the quality and accuracy of large language models (LLMs), especially for open-ended tasks such as creative writing and coding. However, human evaluation is slow, expensive, and often requires specialized expertise.

Researchers at Meta FAIR have introduced a novel approach called the Self-Taught Evaluator, which leverages synthetic data to train LLM evaluators without the need for human annotations. The method comes with a few caveats, but it could significantly improve the efficiency and scalability of LLM evaluation for enterprises that want to build custom models.

The challenges of LLM evaluation

LLMs are often used as evaluators themselves, playing a crucial role in aligning other models with human preferences or improving their own performance during training. This is especially important for tasks where multiple valid answers are possible, as is often the case with creative or complex instructions.

However, training accurate LLM evaluators typically relies on extensive human-annotated data, which is costly and time-consuming to acquire. This bottleneck becomes self-defeating, hindering the rapid development and deployment of new LLM-based applications.

The Self-Taught Evaluator addresses this challenge by using a training approach that eliminates the need for human-labeled data. It is built on top of the LLM-as-a-Judge concept, where the model is provided with an input, two possible answers, and an evaluation prompt. The LLM-as-a-Judge model aims to determine which response is better by generating a reasoning chain that reaches the correct result.

Self-Taught Evaluator starts with a seed LLM and a large collection of unlabeled human-written instructions, such as those commonly found in production systems.

First, the model selects a set of instructions from the uncurated pool. For each instruction, the Self-Taught Evaluator generates a pair of model responses: one designated as “chosen” and the other as “rejected.” The chosen response is designed to be of higher quality than the rejected response.

The model is then trained iteratively. In each iteration, it samples multiple LLM-as-a-Judge reasoning traces and judgments for each example. If the model produces a correct reasoning chain, the example is added to the training set. The final dataset is composed of a series of examples comprising the input instruction, a pair of true and false answers, and a judgment chain. The model is then fine-tuned on this new training set, resulting in an updated model for the next iteration.

Meta’s Self-Taught Evaluator enables LLMs to create their own training data (2)

Putting the Self-Taught Evaluator to the test

The researchers initialized their Self-Taught Evaluator with the Llama 3-70B-Instruct model. They used the WildChat dataset, which contains a large pool of human-written instructions, and selected more than 20,000 examples in the reasoning category. They also tested other datasets and tasks including coding and word math problems. They let the self-teaching pipeline generate the entire answers and training set without any human interference.

Their experiments showed that the Self-Taught Evaluator significantly improved the accuracy of the base model on the popular RewardBench benchmark, increasing it from 75.4% to 88.7% after five iterations without any human annotation. This performance comes close to, and in some cases surpasses, models trained on human-labeled data, even surpassing some private frontier models.

They observed similar improvements on the MT-Bench benchmark as well, which evaluates the performance of LLMs on multi-turn conversations.

Implications for enterprises

This research contributes to a growing trend of techniques that use LLMs in automated loops for self-improvement. These techniques can significantly reduce the manual effort required to create high-performing LLMs, paving the way for more efficient and scalable development and deployment of AI-powered applications.

The Self-Taught Evaluator can benefit enterprises that possess large amounts of unlabeled corporate data and want to fine-tune models on their own data without the need for extensive manual annotation and evaluation. It can also provide hints at how Meta will use its rich dataset of unlabeled user-generated data to train and improve its current and future models.

While promising, the Self-Taught Evaluator does have limitations. It relies on an initial seed model that is instruction-tuned and aligned with human preferences. In their experiments, the researchers used the Mixtral 8x22B mixture-of-experts model as the seed for creating their initial training dataset.

Enterprises will need to carefully consider the seed and base models that are relevant to their specific data and tasks. It is also important to note that standardized benchmarks often don’t represent the full capabilities and limitations of LLMs. At the same time, fully automated loops that rely solely on LLMs to self-evaluate their own outputs can fall on meaningless shortcuts that optimize the model for a benchmark but fail on real-world tasks. Enterprises will have to do their own manual tests at different stages of the training and evaluation process to make sure that the model is in fact getting closer to the kind of performance they have in mind.

VB Daily

Stay in the know! Get the latest news in your inbox daily

By subscribing, you agree to VentureBeat's Terms of Service.

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

Meta’s Self-Taught Evaluator enables LLMs to create their own training data (2024)
Top Articles
Golo Shop Online | Golo Price | Golo Discount Code | GOLO
Bofa Drive Thru
Great Clips Mount Airy Nc
Lowe's Garden Fence Roll
Thor Majestic 23A Floor Plan
Tmf Saul's Investing Discussions
Uti Hvacr
Metallica - Blackened Lyrics Meaning
Boggle Brain Busters Bonus Answers
Arrests reported by Yuba County Sheriff
Flights to Miami (MIA)
More Apt To Complain Crossword
Toonily The Carry
What is the difference between a T-bill and a T note?
Wisconsin Women's Volleyball Team Leaked Pictures
Summer Rae Boyfriend Love Island – Just Speak News
Gon Deer Forum
Illinois Gun Shows 2022
Theresa Alone Gofundme
Nick Pulos Height, Age, Net Worth, Girlfriend, Stunt Actor
Prestige Home Designs By American Furniture Galleries
Craigslist Toy Hauler For Sale By Owner
Nhl Tankathon Mock Draft
Dallas Mavericks 110-120 Golden State Warriors: Thompson leads Warriors to Finals, summary score, stats, highlights | Game 5 Western Conference Finals
The Old Way Showtimes Near Regency Theatres Granada Hills
All Breed Database
The EyeDoctors Optometrists, 1835 NW Topeka Blvd, Topeka, KS 66608, US - MapQuest
Pensacola Tattoo Studio 2 Reviews
Is Light Raid Hard
Trust/Family Bank Contingency Plan
Gus Floribama Shore Drugs
6465319333
Garrison Blacksmith's Bench
Tas Restaurant Fall River Ma
Pensacola 311 Citizen Support | City of Pensacola, Florida Official Website
Omnistorm Necro Diablo 4
Waffle House Gift Card Cvs
Indiefoxx Deepfake
Wayne State Academica Login
Ferguson Employee Pipeline
Vons Credit Union Routing Number
Engr 2300 Osu
If You're Getting Your Nails Done, You Absolutely Need to Tip—Here's How Much
Fool's Paradise Showtimes Near Roxy Stadium 14
Senior Houses For Sale Near Me
Dyi Urban Dictionary
R/Gnv
Nearest Wintrust Bank
The Pretty Kitty Tanglewood
Bonecrusher Upgrade Rs3
Craigslist Sparta Nj
Jimmy John's Near Me Open
Latest Posts
Article information

Author: Duane Harber

Last Updated:

Views: 5701

Rating: 4 / 5 (71 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Duane Harber

Birthday: 1999-10-17

Address: Apt. 404 9899 Magnolia Roads, Port Royceville, ID 78186

Phone: +186911129794335

Job: Human Hospitality Planner

Hobby: Listening to music, Orienteering, Knapping, Dance, Mountain biking, Fishing, Pottery

Introduction: My name is Duane Harber, I am a modern, clever, handsome, fair, agreeable, inexpensive, beautiful person who loves writing and wants to share my knowledge and understanding with you.