LORA'S ROLE IN ACCESSIBLE MODEL ADAPTATION
Knowledge has to be improved, challenged, and increased constantly, or it vanishes. -- Peter Drucker
Can we teach today’s AI models new information? Fine-tuning is a common technique that allows pre-trained models to be adapted to specific tasks through further training on a smaller, more specialized dataset. For example, you can create models specialized in creative writing, coding, or even mathematical proofs. Fine-tuning allows researchers and developers to control the power of large, complex models without starting from scratch, significantly reducing the time and computational resources required to develop task-specific solutions. However, traditional fine-tuning methods often require updating all or most of the model’s parameters, the adjustable values inside a neural network that it learns during training, like the dials and knobs the network tweaks to get better at its task. Adjusting these weights is computationally expensive, requires a lot of power, and can cause catastrophic forgetting (i.e., it forgets everything it previously knew except what is in the new training dataset).
Enter LoRA (Low-Rank Adaptation), an efficient method for adding information while keeping the original model’s parameters frozen. This approach reduces the required computation and helps preserve the general knowledge acquired during pre-training (a very good thing). Thus, it becomes easier to add new concepts or styles to diffusion models that create images or large language models that predict text. Can we add my dog’s face to an image model? Can an LLM output new Seinfeld scripts based on any topic? Let’s find out!
NOTE: We are experimenting with an AI-generated podcast that summarizes this post by Google’s NotebookLM. Listen here and let us know what you think:
What is LoRA?
How do you teach a neural network new ideas or styles? LoRA was developed by Microsoft to efficiently adapt pre-trained neural networks for specific tasks or domains. It introduces small, trainable matrices to the layers of the neural network via lots of awesome maths. These difference modules can be trained to modify the behavior of the original model without altering its base weights – it’s like an image filter that improves saturation or turns the image to black and white. The insight with LoRA is that you can get fairly similar results to a full fine-tuning by just training a few parameters and leaving the rest alone.
Therefore, LoRA has several advantages over fine-tuning, perhaps the biggest for me is requiring much less powerful hardware to train in new concepts or styles. The efficiency of LoRA training on consumer-grade hardware means you don’t need to rent a GPU cluster to do your training, you can do it in your own AI Lab. This accessibility is why LoRA has taken off in generative image creation as a way for anyone to adapt a model version to their specific needs, like giving my dog a chance to surf. As we mentioned earlier, LoRA also minimizes the risk of catastrophic forgetting, a common issue in full fine-tuning where the model is trained so hard on one particular topic that it forgets everything else it used to know (a bad thing). By keeping model weights frozen, LoRA preserves the base model’s performance across all kinds of tasks. And because they are small and modular, they can be easily swapped or combined, allowing for flexible model customization.
LoRA in Diffusion Models
LoRA has become widely used with diffusion models, to add new concepts for image generation, like my dog. Diffusion models like Dall-E and Midjourney generate images by gradually denoising random noise guided by your text prompt. These models have been trained on images and their text descriptions with the goal of having a model that “understands” the images and can produce an image of whatever you ask for. To do this well, the model needs to have some conception of things like cat, dog, painting, photo, purple, orange, etc. that are in the images in its dataset. But can the model produce an image of something not in its dataset? Not likely. One of my dogs isn’t in the training dataset for models like Stable Diffusion or FLUX, so they can’t make accurate reproductions of her. They can get close, but it’s not her. With LoRA, it will be her.
Here’s how I generated the images with my puppy. 1) I started with a bunch of pictures of her from various angles, so the model can get a better 3D sense of her face (good thing I like to take pictures of her). 2) Using an open-source model, I fine-tuned the entire model on her images. Consideration: because my dataset only contains pictures of her, it might forget how to make images of the other important elements like astronauts and light sabers, so I can’t combine my dog and astronauts or light sabers in latent space. However, with LoRA I can train a new concept, based on her images, that gets merged into the main model, just adding this one new concept leaving the rest alone.
Even if you don’t have an adorable puppy that you want to apply to images, you can do other cool stuff too.
You can add yourself to a model to create any image with your likeness
Your company’s logo can be added to any scene or image
You can train in new style concepts. For example, if you’re an artist, you can train a LoRA on your “style”. It can provide inspiration of what your style might look like, applied to any image.
Because they are modular, you can combine them, adding several concepts and styles at once. Also, it’s easy to share them with others.
LoRA in Large Language Models (LLMs)
You can add styles of concepts to large language models (LLMs) following a similar path. Traditional fine-tuning of LLMs often requires updating billions of parameters, which is computationally intensive and can lead to overfitting on small datasets. LoRA comes to the rescue again with its efficiency – it only allows a small number of parameters to be trained. Therefore, you can quickly and easily train a model for a particular task or dataset, such as the coding of open-ended survey responses or on important documents, for more accurate recall. We can adapt an LLM to understand and generate text in specialized fields like law, medicine, or technical domains without compromising its broad language characteristics. Flexible adaptability is key in quickly changing fields like AI.
Not surprisingly, training a LoRA for LLMs is a similar process to diffusion models – it all starts with the dataset. In our case, we have the scripts for all the episodes of one of my favorite TV shows Seinfeld that we can use to both impart detailed knowledge about the Seinfeld universe, and also to get the styles and personalities of the characters. By using a foundational, pretrained model like Meta’s Llama 3.1 8B model as a basis (which likely has some information about Seinfeld, but doesn’t have all of the scripts perfectly memorized), we can teach it to predict the Seinfeld script with high accuracy by having the model study it many times over repeatedly, slowly getting better and better at predicting their witty dialogue. Then we save the changes from this training to a small LoRA file that we can apply to the base model to get our Seinfeld predictions. You get the best of both worlds – a model that can perfectly replicate dialogue given a real starting point and one that retains all its knowledge so it can make up new, original dialogue on any topic that it originally knew. Being able to train on consumer-grade hardware means these model updates can be made by many researchers and enthusiasts for any number of purposes.
Limitations and Challenges
While LoRA has the advantages of requiring less compute and changing fewer parameters, it has its limitations and drawbacks. Compared to a full finetune, LoRAs can be limited in how much of the training data gets captured and the extent of information that can be added. If your training dataset requires a significant modification of the base models’ capability, you may get reduced performance. If an image model is only trained to output photorealistic images, it will struggle to produce illustrations or watercolor paintings. Also, the LoRA that is created from training is tied to the base model used, whether it is FLUX.1 Dev or Meta Llama 3.1 70B, it can only be used reliably with that same model. You would need to train Stable Diffusion XL or Google’s Gemma 2 9B separately to see the benefits of the LoRA.
Future Directions
With LoRA’s rising popularity among researchers and hobbyists, there may be further improvements and optimizations to the process. There are more sophisticated rank adaptation or selection techniques that can capture more complex changes while maintaining efficiency. Similarly, we can combine LoRA with other fine-tuning techniques, such as parameter-efficient fine-tuning (PEFT) methods to create a hybrid approach that combines the strengths of multiple strategies.
There will also be new uses for LoRA training, such as training a vision LoRA that could quickly adapt large vision transformers for specific recognition tasks like instantly recognizing your friends and family in your doorbell cam video feed. In speech recognition and synthesis, LoRA might enable rapid adoption of models to new accents, speaking styles, even celebrity voices. The robotics field could benefit from LoRAs for quickly adapting control policies to new environments without losing general capabilities. In reinforcement learning, LoRA could make transfer learning more efficient between related tasks, something that could bolster reasoning abilities. As AI continues to permeate various industries from healthcare to creative arts, LoRAs efficiency and adaptability are likely to play a role to tailoring powerful general-purpose models to specific, specialized needs.
Conclusion
As I’ve been experimenting with LoRA training, it’s become clear how efficient the process is for adding new concepts and styles to pre-trained diffusion models and LLMs. We can train these models with minimal computational overhead while preserving their general capabilities, allowing us to customize these models for nearly any specific task. LoRA’s advantages of reduced memory and faster training all while using less power have opened new possibilities in AI development and deployment. But perhaps its biggest contribution is in democratizing AI development. Using less computing power means researchers can iteratively work with state-of-the-art AI models, offering flexibility, accessibility and innovation. Smaller teams and individual researchers like me can explore applications and adaptations of large models that were previously out of reach. As AI continues to grow in usage, LoRAs efficiency and flexibility could be a key component in the ongoing research and applications of AI. Being small and efficient can be quite effective.
Comments