top of page
  • Writer's pictureGreg Robison

LLMs & Math: Problem Solved by Coding

‘“There is no domain of mathematics on which you can fully trust ChatGPT’s output (this is also true beyond mathematics, due to the stochastic nature of language models). This is a general limitation of language models, that there are no rigorous approaches to guarantee correctness of output, of which ChatGPT also suffers.” -- Simon Frieder

Large Language Models (LLMs) are a groundbreaking development in artificial intelligence that have revolutionized the way we interact with and utilize AI. These models, like OpenAI's GPT series or Anthropic’s Claude, are trained on huge amounts of text data, allowing them to understand and generate human-like language with remarkable accuracy. LLMs work by learning patterns and relationships within the training data, enabling them to predict the most likely next word or sequence of words in each context. This capability allows LLMs to perform a wide range of language-related tasks, from answering questions and writing essays to competent coding.

Despite their impressive linguistic abilities, LLMs seem to have trouble with maths. This issue becomes a significant hurdle because many real-world applications, such as scientific research, market analysis, financial analysis, and machine learning, heavily rely on mathematical computations. This inability of LLMs to directly tackle complex mathematical problems can greatly hinder their usefulness.

As a user, you need to understand the mathematical limitations of LLMs and any potential solutions (good news, there are some!). Many of us who work with data can greatly benefit from the power of LLMs in their work. However, they may be hesitant to do so because of legitimate concerns about the models' mathematical capabilities. By exploring how LLMs can use Python programming to overcome their mathematical deficiencies, we will help you get comfortable with using LLMs to do complex math.

gpt screenshot
No Claude Opus, you don’t have access to a calculator, and you’re wrong, the right answer is 8747.7704…

While they may not be good at math, they are competent coders and can indirectly tackle mathematical problems. This capability opens up new possibilities for professionals to incorporate LLMs into their workflows to analyze data with the power of real math and statistical analyses via programming.

Large Language Models & Math

Imagine trying to learn math, even basic addition and subtraction, from only hearing people talking about it in everyday life. Examples are going to be few and highly variable so it’s going to be hard to learn the rules. Maybe if you can read some books you can pick them up, but without formal math education it’s still going to be tough. That’s how LLMs learn math – by noticing that when “1 + 1 =” occurs, it is highly likely that “2” occurs next. It’s a purely statistical likelihood in finding the answer, with no real reasoning behind it.

At their core, Large Language Models (LLMs) are designed and trained primarily for natural language processing tasks, which is why they often struggle with mathematical computations. The architecture of LLMs, such as the transformer-based models like GPT, is optimized for capturing and generating sequential patterns in text data. They excel at tasks like language translation, text completion, and sentiment analysis because these tasks heavily rely on understanding the context and relationships between words and phrases. However, mathematical tasks often require a different set of skills, such as symbolic manipulation, numerical precision, and logical reasoning, which are not inherently captured in the language-centric architecture of LLMs.

The training data used for LLMs mainly consists of text from various sources, such as books, articles, and websites, which may not contain enough high-quality mathematical content. While these texts may include some mathematical concepts and equations, they often lack the depth, rigor, and diversity needed to train LLMs to perform complex mathematical tasks accurately. The lack of structured mathematical data in the training corpus limits the ability of LLMs to develop a robust understanding of mathematical principles and procedures.

Mathematical tasks often require a high degree of precision and exactness, which can be challenging for LLMs that are designed to generate plausible and coherent text based on patterns and probabilities. LLMs may struggle with tasks that require exact calculations, symbolic manipulations, or adherence to strict mathematical rules and axioms because there is randomness built into the system. The probabilistic nature of LLMs can lead to approximations or errors in mathematical outputs, which can be problematic in fields where accuracy is necessary (like ours).

Because the architecture and training data of LLMs are primarily geared towards natural language processing, their ability to perform mathematical tasks with the same level of proficiency is limited. However, by leveraging their strengths in language understanding and generation, LLMs can still contribute to mathematical problem-solving by generating Python code that harnesses the power of specialized libraries and tools to handle the necessary computations. Real math abilities are no longer needed, they can be coded.

Python Programming as a Solution

Python stands out in the programming world for its simplicity and efficiency, particularly for handling mathematical operations. Known for its readability and straightforward syntax, Python is the go-to language for developers, data scientists, and researchers who seek to solve complex problems with minimal code. Its extensive libraries, such as math, NumPy and SciPy, provide powerful tools for mathematical computations, making Python an ideal language for tasks that require various forms of mathematical processing. This accessibility and computational power make Python a perfect tool for LLMs when they face mathematical challenges that are beyond their direct processing capabilities.

gpt screenshot
Here's GPT-4 using Python to correctly solve mathematical questions

When it comes to performing complex mathematical computations, Python's ecosystem of libraries makes the impossible possible. NumPy offers support for large, multi-dimensional arrays and matrices, along with a collection of high-level mathematical functions. It provides a solid foundation for efficient numerical operations, making it an essential tool for tasks like vector and matrix operations, Fourier transforms, and random number generation. SciPy builds upon NumPy, providing even more advanced functions for optimization, linear algebra, integration, and signal and image processing. With SciPy, we can perform tasks like interpolation, optimization, clustering, and statistical distributions. Equipped with these packages, LLMs can now write Python scripts that efficiently handle complex mathematical computations. The integration of these libraries with LLMs overcomes their mathematical limitations.

chat gpt screenshot
Claude Opus gave me 72 five times in a row, how random is that?

Consider the task of generating a random number - a process that seems simple on the surface but is quite intricate for an LLM due to its deterministic nature (lack of diversity of responses). By leveraging Python, an LLM can write a script to perform this task easily and reliably. For example, the LLM can generate Python code that imports the 'random' library and uses its 'randint' function to produce a random number. This little demo shows the LLM's programming capability but also how it can cleverly circumvent its own limitations in mathematical abilities. Through this interaction between Python programming and LLM's language processing, even tasks that require an element of randomness or precision become achievable, revealing a practical application of LLMs in programming to extend their functionality. 

chat gpt screenshot
GPT-4 writes and executes Python code to generate a much more random number

chat gpt screenshot
GPT-4 can generate thousands of random numbers to show real randomness

chat gpt screenshot
The distribution of 1000 random numbers generated via Python

Tackling More Complex Mathematical Tasks with Python

Logistic regression is a more sophisticated mathematical task, far beyond simple arithmetic or random number generation. It's a type of regression analysis used in statistical modeling to predict the outcome of a dependent variable based on one or more predictor variables. The goal is to find the best-fitting model to describe the relationship between the dependent variable and independent variables. For example, you might use regression analysis to find out which survey scores best predict success in market or which student factors most impact academic success. This method is particularly important in fields like medical research, economics, and machine learning, where it helps in making decisions or predictions based on observed data.

This type of advanced statistical analysis is exactly what LLMs are not good at – again, if they can’t code it. If they can code the analysis in Python using packages like pystat, it’s not particularly complicated. It starts by importing necessary libraries for conducting the logistic regression then working through the regression itself. The sequence of steps, generated by the LLM, demonstrates its ability to construct and execute a data analysis pipeline, turning a complex statistical task into a manageable and executable Python script.

chat gpt screenshot
By writing Python code, GPT-4 can perform logistic regression

Potential Applications and Implications

The integration of Python programming with LLMs unlocks so many applications across industries, significantly enhancing their utility in performing analysis and insights. In data analysis and predictive modeling, LLMs can automate the process of writing scripts to analyze large datasets, recode variables, identify patterns, and make predictions, streamlining the workflow for data scientists and analysts. In financial forecasting, LLMs can generate Python code to process historical financial data, apply statistical models, and predict future market trends, providing valuable insights for investors and financial advisors. Additionally, in machine learning model development, LLMs can write and refine Python code for training models, tuning parameters, and testing algorithms, accelerating the development process and enabling more sophisticated model creation. The AI is now writing code to automate itself…

Allowing LLMs to utilize Python programming democratizes access to advanced computational tools, enabling users with limited programming expertise to find insights from sophisticated mathematical and statistical techniques. I firmly believe that putting smart tools into people’s hands will make them smarter. However, with great power comes great responsibility. First, we have to trust the quality and accuracy of the Python code generated by the LLMs (not all LLMs are equally good coders)  - any errors could lead to incorrect results and interpretations. As with everything AI generates, humans need to check and make sure everything is right. Second, there are limitations to the complexity of code that can be generated or the packages that can be used, limiting more sophisticated analyses. However, these limitations are likely short-term hurdles and in the near future, sophisticated analyses like Structural Equation Modeling (SEM) may be easily possible with AI tools.


The basic architecture of today’s LLMs may limit their true mathematical understanding, but it is a boon for coding capabilities. With the ability to code solutions to provide answers that can be reproduced and verified, LLMs now have a huge toolbox from which to draw. From the basics of creating random numbers to executing advanced statistical models like logistic regression, LLMs broaden their applicability to fields like data analysis, financial forecasting, and machine learning. This ability enables the democratization of data analysis tools across organizations, allowing deeper analysis of data for more people. As AI coding improves and future models better understand our natural language requests for statistics, even more complicated analyses will be at our fingertips.


Die Kommentarfunktion wurde abgeschaltet.
bottom of page