LLMs AND THE RISE OF INTELLIGENT SEARCH
Whoops! The Web is not the Web we wanted in every respect. -- Tim Berners-Lee, “Inventor of the World Wide Web” in 2019
Introduction
I am fortunate to not only surf the current AI wave, but I was also around for the dawn of the internet and saw how it changed our lives. We went from trying to replicate the phone book to Google saving us with their amazing search product to today’s world of SEO-dominated spam getting in the way of finding the right info. By crawling the web and indexing their content, complex algorithms rank and display results, which has led to prioritizing prime placement in search over actual usefulness. These days I can’t rely on the right link being at the top of the results, I need to try out a few links to find the information I need. Searching with Google takes longer and is much more frustrating now.
I think we’re on the brink of another revolution in search – AI-powered search. Why not have an AI agent read the potential links for me and give me a summary? When Large Language Models (LLMs) like OpenAI’s ChatGPT, Google’s Gemini or Perplexity are integrated with web search capabilities, these models can process natural language queries, comprehend context, and synthesize information from multiple sources. This integration allows them to find relevant web pages, but also understand the context, extract key information, and present it in a coherent and tailored manner. This process isn’t foolproof, but it’s already changed the way I search for information. The "intelligent search" understands my natural language and answers my question with references, getting right to the point quickly and easily.
The Current State of Web Search
The current state of web search is really just Google. In 2024, Google has over 90% of the global search engine market, followed very distantly by Bing, Yahoo, and a few others (how is Ask Jeeves still around?). This dominance for over a decade has led to their control of how information is discovered and accessed on the internet - their algorithms and policies have a tremendous influence on what content is seen, shared, and considered valuable online.
This market dominance has also given rise to a whole industry focused on search engine optimization (SEO). While SEO originally helped websites improve their visibility for relevant searches, it has unfortunately evolved into a complex set of practices that often prioritize search engine rankings over user relevance. Spam basically. Websites are now designed with search algorithms in mind, often leading to keyword-stuffed articles, clickbait headlines and content that’s more focused on rankings than providing real value to readers. Thus, the web has become cluttered with low-quality, repetitive content designed to game the system rather than inform or engage us.
And because of SEO practices, the quality of search results has been steadily declining. Users are increasingly frustrated with having to scroll through multiple pages of search results to find genuinely helpful information. The top results are often dominated by heavily optimized pages which may not always provide the most relevant or up-to-date information. And now AI-generated content optimized for search engines threatens to flood the internet with even more low-quality, generic articles (as if that was possible). Users are growing dissatisfied and are recognizing that web search may be broken and reached its limits in terms of providing valuable, accurate and relevant information.
LLMs with Web Search: A New Paradigm
Can any of our new AI tools help instead of hurt search? Some LLMs like ChatGPT have web search capabilities integrated into chat and represent a new way to interact with and retrieve information from the internet. Unlike traditional search engines that primarily match keywords and assess page authority, LLMs with web search functionality approach the task more holistically. When a user searches with an LLMs, these systems first interpret the questions using their natural language processing capabilities. Then they can formulate appropriate search terms, send those to a web search API (like Bing), and retrieve the relevant web pages.
The important difference is what happens next: instead of simply displaying a list of links and titles, the LLM processes the content of the pages, digesting the information in context, and synthesizing a coherent response that directly addresses the user’s search query.
This new way of searching offers several important advantages over traditional search engines. The natural language understanding capabilities of LLMs allow for nuanced and context-aware searches.
Users can phrase their queries in conversational language, ask follow-up questions, or engage in a dialogue to refine their search. The systems can usually interpret ambiguous terms, understand synonyms, and even infer the intent behind a query, leading to more accurate and relevant results. The contextual comprehension abilities of LLMs can enable them to understand information within its broader context. They can connect related concepts, recognize subtle implications, and provide more comprehensive answers that consider various perspectives on the topic. Or they get can right to the point and give you just the answer you need.
And LLMs are particularly good at summarization, they address a key pain point of traditional search. Instead of requiring us to click through multiple links and piece together the information ourselves, LLMs can distill the key points from numerous sources into a concise, coherent summary. This summary saves times time and helps us quickly grasp the essential information on a topic. Now searching has become dynamic, both in terms of being able to dig in and ask questions of the website but can also continue adapting and refining the search. The combo of natural language understanding, contextual comprehension, and summarization capabilities positions LLMs with web search as a powerful tool that could fundamentally change how we access and interact with information online.
Finally, adding search capabilities helps mitigate a key downfall of today’s LLMs – they are hard to update with new information. Afterall, the P in GPT stands for Pretrained. If we want to add the latest current events or the newest scientific papers, we need to retrain our models, which is both very expensive and time consuming. But allowing the models to go online to find the most up-to-date information on the number of three pointers Stephen Curry has made or the latest Supreme Court decisions, we don’t need to rely on this information being known by the model. We can provide real ground truths via RAG and links to the information that is being referenced.
Challenges and Concerns
Web search through LLMs is not perfect, we still need to work on accuracy, hallucinations/ confabulations and data privacy. Despite their technical sophistication or even having the right data in front of them, they can sometimes generate incorrect information or make up facts that don’t exist in their training data or web content. In web search, we expect reliable and factual information and LLMs are not good at being anything but heroically confident when wrong. When an LLM confidently presents inaccurate information as fact, it can be more misleading than a traditional search engine’s list of results. Improvements in model training, fact-checking mechanisms, and external knowledge bases may help to verify information before it gets to users. But in the meantime, at least you get references and links so you can check out the sources yourself and make a judgement whether you agree with the LLM’s assessment.
There is also a huge potential for biased or manipulated results. We’ve already discussed how LLMs themselves can be manipulative through biases in the model’s outputs. And now we add in potential biases from the web search (are clickbait articles dominating search? That’s what LLMs will have to wade through as well) we can amplify misinformation even more. And just as SEO has become a dominant industry, there will be attempts by bad actors to manipulate LLMs’ search results. We need to reduce bias, ensure fairness, and protect against manipulation to get the best results. Luckily, we get links to sources so we can verify the source, its accuracy, and get more depth.
Data privacy is another issue in the LLM search space – users might be uncomfortable with the idea of AI systems analyzing their search queries in greater depth, and perhaps in a more personalized way than traditional search engines. These systems can collect and retain personal information from seemingly private interactions that can be used for other purposes beyond the search experience. Soon, we will talk more about data privacy in that age of AI and what you can do to protect yourself.
The Future of Web Search
The search landscape is going to change so Google is getting in on it themselves, with mixed results. We will see a gradual evolution where traditional search results are supplemented with AI-generated summaries and insights, eventually leading to more conversation and interactive search interfaces. This integration can fundamentally change how we interact with search engines, moving from keyword-based searches to natural language conversations. Adoption will be slow as the kinks get worked out, trust is built, and the revenue model gets figured out, especially if people are accessing search through other portals like ChatGPT or Perplexity’s website.
The future will hopefully mean a shift from focusing on SEO to focusing on producing high-quality, in-depth content that truly serves users’ needs and can be summarized and conversed with via an LLM. Because of this synthesis ability, content creators will need to consider how their work can effectively be parsed and summarized by AI systems. Hopefully this change means high-quality content that is easily digested by LLMs for consumption by end users.
Our expectations for search experiences will likely evolve rapidly as LLM integration becomes more widespread. Users like me may come to expect more direct answers to their queries, rather than lists of links to explore (have I mentioned you get links for reference). It may feel like looking through a physical phone book feels now. The ability to ask follow-up questions and engage in a dialogue with the search system could become a standard feature, leading to more in-depth and nuanced search sessions. We will expect more personalized search experiences, with LLMs tailoring results based on our past queries and preferences. However, we’ll also need to develop a sense of when to trust AI-generated responses and when to seek additional verification, requiring a new kind of digital literacy centered around AI. And we’ll need to trust the companies doing the searches and interpreting the results to truly feel comfortable with the results. Don’t forget you can always check the references yourself to assess the capabilities of various options.
Conclusion
My web search behavior has radically changed in the past few months that LLMs can reliably go online – I rarely search through traditional search engines anymore. AI-powered search addresses many key issues in traditional search, from an overemphasis on SEO to finding truly relevant information among a sea of content to just getting a simple answer to your question. By utilizing natural language understanding, contextual comprehension, and advanced summarization capabilities, LLMs offer a more intuitive, conversational, and informative search experience. They can cut through the noise of keyword-optimized content, prioritize quality information, and provide us with direct, synthesized answer to queries. However, it’s a big change and we need to be careful how we do it. There are issues of accuracy, bias, privacy, and potential for making things up, so we’ll need to the right balance between harnessing the power of AI and mitigating these issues. The future of search will reshape the internet, hopefully into one that prioritizes accurate information, deep and thoughtful content, and respects user privacy.
Comments