7 Fine-Tuning Strategies for LLM: Techniques, Tips and Best Practice

Essential techniques for enhancing fine-tuning of LLM. Deep dive best practice, tips for driving optimization and perfromance.

Oct 17, 2023

If you are familiar with LLMs, you already know the concept fine-tune, whether you’ve actually done it or just heard or know about it, then you are already a step ahead of the game.

For those who are newbies in the LLM world and looking for a guide to developing an LLM application or RAG, you can refer to my previous post on Medium and navigate through the subsequent posts.

Zero to One: A Guide to Building a First PDF Chatbot with LangChain & LlamaIndex - Part 1

Ryan Nguyen

May 11, 2023

Zero to One: A Guide to Building a First PDF Chatbot with LangChain & LlamaIndex - Part 1

Welcome to Part 1 of our engineering series on building a PDF chatbot with LangChain and LlamaIndex. Don't worry, you don't need to be a mad scientist or a big bank account to build and train a model. With the rise of Large Language Models (LLMs) such as ChatGPT and GPT-3, it's now easier than ever to build intelligent chatbots that are smarter than you…

Read full story

Now, for those who already know and have put your hand in practice to fine-tune LLM for specific data, I believe you’ve gone through a lot of guides on how to fine-tune LLM. This article is not another post on how to do it, though I want to write a comprehensive one on how to fine-tune efficiently along with the performance comparison as well as when you should fine-tune and when you should do RAG or just prompt engineer. However, that will be for future posts.

Back to our story, in my opinion, most of the fine-tuning process is usually a very straightforward approach through these steps.

You have data
You select the built-in framework such as LlamaIndex or Langchain or any other frameworks
Choose the base LLM, which could be GPT-3.5 or Llama-2 or the two brand news LLMs that are shocking the world Mistral and Zephyr
Fine-Tune the data
Compare the RAG performance with new fine-tuned LLMs vs base LLMs.

What I’ve observed is that many tend to take guides at face value, diligently applying them to their datasets without much inquiry.

However, there’s a crucial element in step 4 that often goes overlooked: the art of fine-tuning data efficiently. To rephrase, what exactly is the optimal strategy for tailoring the fine-tuning process to your unique dataset?

Because taking a naive approach to fine-tuning can be perilous. Without a well-thought-out strategy and careful consideration of the data, there’s a tangible risk of degrading the model’s performance or inducing drastic changes in its behaviour. Such alterations can compromise the reliability of the model, potentially leading to outputs that diverge from intended results or lack consistency with the original training, or even worse, you waste your money for nothing.

In this short article, we’re set to explore the diverse methods at hand for refining LLMs, providing you with an extensive array of tools to guide your choices. Instead of pushing a generic approach or adhering to a one-size-fits-all method, our goal is to showcase the spectrum of options, each designed to cater to the unique intricacies of varied datasets.

Overview

If you need a refresher on how to fine-tune LLM then I highly recommend perusing LlamaIndex’s official documentation. I frequently use LlamaIndex for many of my RAG applications, and their guide on finetuning is exceptionally thorough, catering to both beginners and advanced.

Why do we need Finetuning?

Adjusting a model through finetuning involves refining its parameters over a specific dataset to enhance its overall performance. In other words, it’s all about getting better results, cutting down on weird outputs, remembering data better, and saving time and money.

At the heart of our tools is something called in-context learning mixed with retrieval augmentation. This typically means leveraging the models during the inference stage rather than actively training them.

Furthermore, while finetuning can indeed incorporate external data to “enrich” a model, it seamlessly aligns with retrieval augmentation to achieve diverse improvements.

Embedding Finetuning Benefits

Finetuning the embedding model can allow for more meaningful embedding representations over a training distribution of data –> which leads to better retrieval performance.

LLM Finetuning Benefits

Allow it to learn a style over a given dataset
Allow it to learn a DSL that might be less represented in the training data (e.g. SQL)
Allow it to correct hallucinations/errors that might be hard to fix through prompt engineering
Allow it to distill a better model (e.g. GPT-4) into a simpler/cheaper model (e.g. GPT-3.5, Llama 2)

That is why, we can not overlook “just fine-tuning” the data, we need to experiment and select the most effective strategy to fine-tune LLMs. At the time of writing, here are a few common techniques that I know:

Full fine-tunning approaches
Parameter-efficient fine-tuning methods (PEFT)
Prompt engineering strategies
Multi-task learning
Adapter-based fine-tuning
Meta-Adapters: Parameter Efficient Few-shot Fine-tuning
Sandboxed tuning environments

Getting a handle on these techniques paves the way for adapting LLMs to a wide range of applications effectively. First up, let’s dive into the Full fine-tunning approaches

Full Fine-Tuning

This is the easiest approach, you tweak every single model parameter to optimize performance specific to a particular domain.

Approaches

Train every layer from start to finish using fresh data
Slowly release the constraints on foundational layers as time progresses
Apply constraints to minimize deviation from initial settings

Advantages

Optimal results in the desired area
Recognizes intricate patterns unique to the domain

When to Use

When there’s a lot of domain-specific data on hand
When top-notch accuracy is essential
When there’s wiggle room in the computational budget

Full fine-tuning is suitable when customization is the top priority.

Parameter-efficient fine-tuning (PEFT)

Unlike full fine-tuning, PEFT modifies only a limited number of model parameters when fine-tuning. This ensures the broad knowledge from the pre-trained model remains intact.

Approaches

Adjust only the adapter layers added to the core model
Refine the upper layers, keeping the foundational layers static
Pinpoint vital parameters using preliminary tasks prior to adjustments

Advantages

Maintains the extensive skills of the primary model
Needs less data to tweak effectively
Offers computational savings

When to Use

As a preliminary step before comprehensive fine-tuning
When there are sparse domain-specific data
In settings with limited resources

Prompt Engineering for Fine-Tuning

Prompt engineering is the art of crafting specific prompts to steer the model during the fine-tuning process.

Approaches:

Include clear examples within prompts
Introduce new ideas via detailed walkthroughs
Favour descriptive language instead of plain labels
Refine prompts based on the model’s responses over iterations

Advantages:

Better at imparting new ideas
Offers a deeper understanding of the domain
Boosts clarity and uniformity in outputs

When to Use:

When there’s restricted flexibility in modifying training data or labels
To supplement datasets with illustrative examples
To elevate logical thinking in intricate domains

Multi-Task Learning

Adjusting a model on various connected tasks at the same time can enhance its broad applicability.

Approaches:

Train collectively using combined datasets
Switch between small data batches from different datasets
Merge gradients from distinct optimizers

Advantages:

Enhances efficiency across multiple tasks
The model develops a more versatile understanding

When to Use:

When all tasks rely on akin features or data
When aiming for robust adaptability across different applications
To benefit from the synergies of multiple datasets

Adapter-Based Fine-Tuning

Adapter tuning involves integrating extra adjustable modules into the core model structure to make it more specialized.

Approaches:

Incorporate compact adapters after specific layers
Focus on training the adapters while keeping the core model unchanged
Combine different reusable adapters for diverse tasks

Advantages:

Focuses modifications to prevent major shifts in the main model
Offers agile and adaptable expansion into fresh domains

When to Use:

When regularly branching out to different domains
To uphold robust foundational performance
In scenarios with tight computational limits

Meta-Adapters: Parameter Efficient Few-shot Fine-tuning

Meta-learning focuses on prepping models to swiftly adjust using minimal examples.

Approaches:

Expose the model to a wide variety of tasks during its early training phase
Develop an optimization method tailored for rapid fine-tuning
Set the initial model weights in a way that favours speedy adjustments

Advantages:

Enables learning even with a tiny set of domain-specific data
Quick pivot to unfamiliar tasks

When to Use:

For crucial projects where data is a rarity
In user-centric apps that demand swift localization adjustments.

Sandboxed Fine-Tuning Environments

This is more like a risk-control approach. One of the main issues of fine-tuning is the ability to learn from the training dataset. If the company fine-tuned on bunch of dialogs, and conversations between the customer and the support team, there may be a case when the customer is toxic and talks to the supporter in an inappropriate way. Having a sandbox and testing separately is the way to eliminate the toxic conversation that is part of the training.

Approaches:

Keep the training hub distinct from the live production setting
Use virtualization to separate and protect against unfamiliar data and code
Adopt containerization to ensure a uniform and repeatable environment

Advantages:

Minimizes dangers linked to introducing fresh code and data
Shields the ongoing production processes
Guarantees stable and reliable tuning settings

When to Use:

When dealing with ultra-sensitive production information
In sectors with tight regulations, like healthcare
On platforms designed for developing models across multiple users or tenants

Evaluating Fine-Tuned LLMs

There are multiple ways to evaluate Fine-Tuned LLMs but the main approach is to compare the results from Fine-Tuned LLMs again base LLM and base LLM + RAG.

According to LlamaIndex, these are the metrics that are generally used to evaluate performance.

Quantitative and Qualitative Response Evaluation

Retrieval Evaluation

Retrieval Evaluation

If you want to know more about how to evaluate your RAG application, you can refer to this post. Apart from fine-tuning techniques, consider these techniques to improve your RAG performance

Summary

There is no perfect solution or one-size-fits-all approach when it comes to fine-tuning. You need to understand your data and may go through different experiments to see what is the best fit.

Fine-tuning is powerful. You can fine-tune GPT-3.5 on your data to get a response that surpasses GPT-4. In addition, having regular fine-tuning on your data + RAG is one of the best combos to get quality answers from your LLM application.

Throughout this article, we’ve explored various strategies to fine-tune LLMs securely, all aimed at boosting performance in specific domains. While I only listed 7 techniques, I believe there are a lot more. If you feel there’s a critical approach I’ve overlooked, or if there’s something specific you’d love to see covered, please drop your thoughts in the comment section below. Your insights enrich our collective understanding!

In conclusion, by integrating these techniques, you can seamlessly tailor LLMs to meet the distinct requirements of your organization. Proper fine-tuning truly unleashes the immense capabilities of foundational models.

And don’t ever forget to aseess your fine-tuned LLMs. It is crucial. So always remember “never skip the evaluation”

❤️ If you found this post helpful, I’d greatly appreciate your support by giving it a clap. It means a lot to me and demonstrates the value of my work. Additionally, you can subscribe to my substack as I will cover more in-depth LLM development in that channel

Want to Connect?

If you need to reach out, don't hesitate to drop me a message via my Twitter 
or LinkedIn and subscribe to my Substack, as I will cover more learning 
practices, especially the path of developing LLM in depth in my Substack 
channel.

References

LlamaIndex: Finetuning

GPT-3.5 Fine-Tuning

Embedding Fine-tunning Blog

Fine-tunning an Adapter

LlamaIndex: Evaluation

How AI Built This

Zero to One: A Guide to Building a First PDF Chatbot with LangChain & LlamaIndex - Part 1

Discussion about this post