What is Retrieval Augmented Fine-Tuning (RAFT)?

Large language models (LLMs) have proven valuable tools for a range of business applications. However, they are not always perfectly suited to every business use case right out of the box.

Until now, adapting and optimizing LLMs to improve performance and reliability in specific use cases has typically been handled by one of two approaches: retrieval-augmented generation (RAG) or fine-tuning.

But both of these approaches have well-documented downsides. That’s why a new, hybrid approach for optimizing LLMs for specific business applications, retrieval augmented fine-tuning (RAFT), has recently caught the eye of machine learning (ML) experts. 

In this post we’ll examine the difference between RAFT, RAG, and fine-tuning; the benefits of RAFT for LLM training; and the use case for which it’s best suited.

LLM Optimization: RAG and Fine-Tuning

The goal of both RAG and fine-tuning is to make LLMs as useful as possible for real-world applications. Although both techniques have significant benefits for ML teams, they also have weaknesses that create gaps in their effectiveness.

RAG

RAG incorporates external knowledge from sources such as documents or databases into an existing LLM during text generation using a two-step pipeline (retrieval and generation). Such a two-step workflow helps reduce the risks of AI hallucinations by grounding responses with factual evidence before generating a final text output. 

It’s because of this two-step pipeline that RAG is often compared to an open-book exam.

One of RAG’s biggest benefits is its access to real-time, up-to-date information when retrieving data. RAG also allows organizations to inject their private LLMs with domain—or company-specific data, helping to scale domain-specific learning, reduce hallucinations, and improve interpretability.

However, RAG also has troublesome drawbacks, such as added complexity and latency due to its two-step workflow, added costs, and even greater potential for bias depending on the training data used. RAG also doesn’t learn on a domain-specific set of documents during training. 

Additionally, because documents are retrieved based on their semantic proximity to the query, RAG models don’t understand which documents are truly relevant. 

Fine-Tuning

Fine-tuning, also known as supervised fine-tuning (SFT) or domain-specific fine-tuning (DSF), is another way of optimizing LLMs for specific use domains. Instead of relying on a real-time check of external information to improve performance, however, fine-tuning helps optimize models for specific tasks by training them on domain-specific documents in advance.

This could include fine-tuning a customer service chatbot on previous customer interactions or training a medical LLM on specific medical documents.

The advantages of fine-tuning LLMs are reduced hallucinations and greater domain authority without needing to query external documents during the generation process, which can help with proper terminology, nuances, and style. 

It’s also more efficient and has less latency than RAG since fine-tuned models don’t require a real-time query of external documents.

The disadvantages of fine-tuning, however, include its dependence on large amounts of domain-specific training data. Fine-tuned models also can’t query the latest information about a subject, can be less interpretable than RAG, and require periodic retraining to stay relevant.

Understanding Retrieval Augmented Fine-Tuning 

Retrieval augmented fine-tuning, or RAFT is a hybrid approach to optimizing LLMs for specific use cases and domains that takes inspiration from RAG and fine-tuning. 

But RAFT goes one better than either RAG or fine-tuning by incorporating the benefits of both approaches – real-time retrieval and training on a domain-specific corpus – into one very effective methodology.

The RAFT technique was initially proposed by Berkeley researchers Tianjun Zhang and Shishir G. Patil after using Meta Llama 2 and Azure AI Studio to implement the approach. It has been called a hybrid approach because it attempts to leverage the most beneficial elements of both RAG and fine-tuning.  

“(Zhang and Patil) hypothesized that a student who studies the textbooks before the open-book exam was likely to perform better than a student who studies the textbook,” writes Cedric Vidal in MS Tech Community. “If a model ‘studied’ the documents beforehand, could that improve its RAG performance?”

As it turns out, the answer is almost certainly “yes.” By sitting at the intersection of RAG and SFT, RAFT is able to simultaneously prime an LLM on domain-specific training data and improve answer quality through real-time retrieval—the best of both approaches. 

Benefits of Retrieval Augmented Fine-Tuning

RAFT builds upon the success of RAG and traditional fine-tuning to improve the suitability of LLMs trained on several domains, such as Llama 2, for more niche applications.

The Berkley researchers equate RAFT to a “domain-specific open-book exam,” in which the model has already been fine-tuned on that domain and has access to additional domain-specific resources to improve answer quality.

Additional documents used in RAFT models can include internal code repositories and enterprise documents. This approach allows for several benefits over both RAG and traditional fine-tuning, including:

  • Improved model accuracy and reduced hallucinations compared to both RAG and SFT
  • Increased efficiency and speed compared to both RAG and SFT
  • Reduced need for large amounts of fine-tuning data
  • Increased scalability by expanding retriever data rather than retraining the entire model
  • Easier to scale and adapt models to new domains

RAFT models have already demonstrated a high degree of competence around several different use cases across multiple departments, including product or service recommendations, sales strategy development, FAQ automation, content idea generation and brainstorming, market trend analysis, product feature development, and security awareness training.   

Conclusion

RAG and supervised fine-tuning are traditionally the most common and effective ways of optimizing LLMs for specific business cases and applications. But these approaches each have serious drawbacks: RAG doesn’t allow for model training on specialized documents, and SFT doesn’t have access to additional resources when generating responses. 

Retrieval augmented fine-tuning, on the other hand, utilizes the strengths of both traditional approaches. It allows models to take advantage of the “open book” nature of RAG, along with the intense studying performed by fine-tuned models in advance – setting the stage for improved performance and accuracy.

CapeStart’s machine learning engineers, data scientists, and subject matter experts can help optimize LLMs for your domain or specific business case using RAG, SFT, or RAFT techniques. Contact us today to schedule a quick discovery call so you can start scaling your business’s innovation with AI today.

Contact Us.