“The inner workings of modern AIs are a mystery. This is because AIs are language models that are grown, not designed.” -- Neuronpedia on Gemma Scope
As I was checking out a demo of Google Deepmind’s attempt to understand AI better through Interpretability, I was struck by the simplicity and profound nature of the intro statement “AIs are language models that are grown, not designed.” I hadn’t ever thought of it that way, but it absolutely makes sense. It captures the shift in how we understand and approach the development of today’s advanced AI systems compared to previous generations of programming. It reflects the early days where programs are constructed (i.e., “designed”) line by line but also the significance of training data in shaping today’s large language models (LLMs). If we frame AI models as being “grown” vs. “designed”, we can acknowledge the complex, iterative, and unpredictable nature of their development and usage. Training data, learning algorithms, and other training like RLHF shapes AI capabilities, much like how genetics, experiences, and environment influence the growth of living organisms, including us. This growth environment creates the potential for AI systems to develop new capabilities that may not have been explicitly programmed or anticipated by their creators but also make mistakes and hallucinate things that aren’t true. Ultimately, it all comes down to starting with flexibility in mind.
Traditional Programing vs. AI Models
Traditional rule-based programming has been the basis for software development since it started. Rule-based programming is based on explicit, predefined rules to guide the system (i.e., do this then do that). Programmers meticulously craft sets of if-then statements, decision trees, and functions that dictate how the software should respond to different input. These systems are deterministic, meaning that given the same input and rules, they will always produce the same output. Rule-based programming is great when the logic is well defined, and the problem space is relatively constrained.
Rule-based systems are everywhere, whether in medical diagnoses where a series of logical steps based on symptoms and test results can lead to a diagnosis or tax prep software using rules to navigate complex tax codes and determine your deductions (e.g., Are you married, filing jointly or separately? Different responses lead you down different logical routes to your eventual final tax form). These types of systems are highly effective when the rules are clear, static, and can be comprehensively defined by humans. However, rule-based approaches are limited when confronted with complex, dynamic, or ambiguous domains. As the number of variables and potential scenarios increases, the task of manually defining all the possible rules becomes impractical or even impossible. Real-world problems can rarely be easily reduced to a set of rules. Once in place, these rule-based systems often struggle with adaptability, meaning they can’t easily adjust to new situations or learn from experience without someone going in and updating or adding more rules. In domains like natural language processing, computer vision or strategic decision-making, the complexity is astounding, and the variability of the problem spaces make it impossible to create a truly comprehensive set of rules that can handle every possible situation effectively. Flexibility and adaptability are missing – an area of opportunity for neural networks to shine.
The Growing Nature of AI Models
The notion of “growing” AI models is rooted in (pun intended) the machine learning and neural networks that are designed to learn patterns and make decisions based on data, rather than following explicitly programmed instructions. Like the human brain, neural networks consist of interconnected nodes (akin to our brain’s neurons) that process and transmit information. Through training, these networks are exposed to relevant data that causes them to adjust their internal connections, learning over time to recognize patterns and make more accurate predictions. This ability to learn and adapt is what we mean by AI models being “grown”. We don’t explicitly tell the neural network how to differentiate cats and dogs, we give it a bunch of training examples of pictures of dogs labeled as “dog” and cats as “cat”. With enough complexity and training, a neural network can learn how to effectively differentiate them.
This training process resembles (but on a different scale) our growth process, both in its iterative nature and dependence of data. After hearing people talk around them, children will start to pick up words, then phrases, sentences and eventually be able to read blog posts. AIs need a similar stream of data to develop their capabilities. The model starts in a neutral state with randomized parameters, kind of like a newborn ready to learn whatever language is around them. It has the building blocks necessary to learn and as it’s fed data, it begins to form connections and recognize patterns, gradually improving its performance on the given task. Just like learning different languages requires different sets of words and syntactic rules, different datasets can give models different capabilities. Give a model lots of math proofs and maybe it can learn some formal math/ logic rules or give it the writings of Carl Sagan and it could write beautifully. You can “grow” all sorts of different models with different training datasets.
Parallels with Human Development
The growth of AI models is akin to human development, particularly the interaction between innate structures and environmental influences – the good old “Nature vs. Nurture” debate. In human development, our genetic makeup provides an initial framework for our bodies and brains, analogous to the architecture and initial parameters of an AI model before training. However, our experiences and environment both play a necessary role in shaping our knowledge, skills, and even personality. While the basic structure of an AI model is determined by its architecture (e.g., how many parameters, how many layers, etc.), its ultimate capabilities and behaviors are influenced by the data it is trained on and the feedback it receives. This dynamic interplay between inherent structure and learned information is the basis of growth of both humans and AI models.
If you look at the growth of neural networks, you can see a similar process of development that our old friend Jean Piaget discussed for children. Human infants develop from basic sensory-motor interactions (like sucking on a thumb or shaking a rattle) to more abstract thinking (like language and logic) and AI models often start with simple pattern recognition before advancing to more complex reasoning tasks. The early stages of training a vision AI model might focus on basic feature extraction similar to how infants start by learning basic shapes and sounds. As training progresses, the model can engage in more sophisticated tasks, mirroring the development of logical reasoning and abstract thought in humans. There are many differences between how humans and AI learn (how quickly and how much data is needed, the actual mechanisms of learning, etc.), but the point still stands that they both are developed from initial conditions, unlike rule-based programming.
Implications of "Growing" AI
Humans and AI both grow – and both are unpredictable in how they grow. 10 years ago, I couldn’t have predicted who my son is now at 17. Unlike traditional programming where every function is defined, AI models can flexibly learn based on their training data. Or not. They might not learn anything from the training data. Or they could develop emergent capabilities that surprise even their creators. This unpredictability is both exciting and concerning. It opens the possibilities for AI to solve problems in novel ways to make unexpected discoveries and potentially figure out how to reason. But it also raises questions about control and reliability, which are necessary in critical applications like medical diagnoses or court cases. As AI models become more complex and are applied to more domains, understanding and managing emergent behaviors while also guaranteeing learning the right things becomes crucial. It’s a delicate balancing act to find sweet spot.
For example, a Large Language Model (LLM) like ChatGPT that is trained on tons of human text may learn how to write poetry or code in Python, but it might also learn our stereotypes and biases that underlies our text. Because these complex models are black boxes with little transparency to explain their decision-making processes, we will continue to struggle to understand what biases might exist under the hood. The ease of misinformation, whether purposeful or not, is also a big concern as we’re easily persuaded by today’s AI. Thus, AI Safety is an important topic for AI researchers because we need to make sure AI systems behave – just like we have laws to keep our kids safe. Both can be very unpredictable. You just hope both make it out of their teenage years and can make sound decisions in the wild.
As we mentioned earlier, growing also means there is so much potential for the future as our understanding of how to “nurture” AI improves. The power of training datasets on model performance suggests that Microsoft’s Phi project is on the right path of using high-quality, curated datasets to “teach” models more effectively. More diverse datasets may help transfer learning across subject matter, similar to how humans can apply knowledge from one area to another. There’s also the possibility for AI to assist in its own development, whether through playing games or other reinforcement techniques, like human play. These changes could lead to rapid acceleration in AI capabilities, just like how children can learn so much in little time if given an enriching environment. But as these systems become more sophisticated, we need to make sure they continue to grow in the right direction (i.e., aligned with our values) so they can be as beneficial as possible.
Conclusion
Changing your view of AI being “designed” versus seeing them “growing” helps understand their potential capabilities and often-unpredictable nature. We can see how capabilities may emerge after studying the right training data which could lead to novel problem-solving capabilities. But it also emphasizes that we don’t necessarily know what fruit might develop from the seeds – it depends just as much on its growth environment as the type of seed. Growth also means adaptability and potential for greater sophistication that could lead to future breakthroughs. However, we need to make sure we nurture these models in the right direction by providing for and teaching them well. The journey doesn’t end when the AI model is designed, it’s only just starting.
Comments