How a groundbreaking technique could reshape the way we find information online.
If there’s one force shaping our modern world, it’s the promise that somewhere, among the near-infinite layers of digital information, the answer exists. With a few keystrokes, we trust a search engine to pluck the precise insight we need. Yet beneath that veneer of convenience lies an old problem—one that limits what we can find and, ultimately, what we can know.
Search engines today rely on a timeworn two-step process. First, they retrieve a pile of documents that seem, at first glance, to match the query. Then, they rerank that initial pile—rearranging it to surface the most relevant results. But this two-step dance has a blind spot. If a critical piece of information doesn’t make it into the initial retrieval, it’s gone. No matter how sophisticated the reranking algorithm, it cannot promote what it never sees.
This is the “bounded recall” problem. It’s not just a technical hiccup; it’s a fundamental flaw in how we’ve approached search for decades. And it’s become more pressing than ever as we rely on search engines to sift through increasingly vast troves of data. If something slips through the cracks, it’s not just an oversight—it’s an opportunity missed, an insight never uncovered.
A World of Lost Possibilities
Consider the world of scientific research. A biologist looking for groundbreaking studies might rely on a search engine to return the most relevant articles. But if the initial retrieval misses a key paper—one buried deep in the archives or using slightly different terminology—that insight might never influence their work. In journalism, the stakes are equally high. A reporter chasing a story often depends on search engines to surface historical records, court documents, or obscure interviews. When a relevant source doesn’t make the initial cut, the reporter’s understanding of the issue remains incomplete. In both cases, the bounded recall problem isn’t just a technical challenge; it’s a barrier to progress.
And yet, our tools have hardly changed. While algorithms have become more advanced and retrieval has become faster, the basic structure of the search process—retrieve first, rerank second—remains the same. It’s a system that works well enough when the initial retrieval is thorough, but it stumbles when that first step is incomplete. The best reranker in the world can’t rescue a document that never made it into the starting lineup.
Rethinking the Foundations
Mandeep Rathee, Sean MacAvaney, and Avishek Anand recognized this limitation. They saw that real progress wouldn’t come from endlessly refining the reranking process. Instead, it would come from reimagining retrieval itself. Their solution, called SlideGar, introduces a new paradigm: an adaptive retrieval process that evolves as the search progresses. SlideGar doesn’t just rank documents better—it rethinks how those documents are chosen in the first place.
At the heart of SlideGar is a deceptively simple insight: the search engine can learn as it goes. Instead of treating retrieval as a single, static action, SlideGar makes it dynamic. It continuously evaluates what it’s found, what it’s missed, and what else might be out there. The algorithm uses feedback from the reranking process to improve the next round of retrieval. It’s an iterative cycle that doesn’t just surface the best results from a given pile—it ensures that the pile itself keeps improving.
The Sliding Window
One of SlideGar’s key innovations is the sliding window approach. Think of the retrieval process as if you were reading a long scroll of documents. Instead of looking at the entire scroll at once, SlideGar examines it piece by piece, focusing on a small window at a time. Each batch is reranked using a listwise approach—a technique that evaluates the relevance of documents relative to one another, rather than in isolation. After reranking the current window, the algorithm looks at which documents stand out and uses that information to guide the next window’s selection.
The Corpus Graph
But how does SlideGar decide what to include next? That’s where the corpus graph comes in. Imagine a map that connects documents by their similarities. If Document A is found to be highly relevant, the corpus graph helps identify which other documents are closely related—ones that might have been overlooked in the initial retrieval. These neighbors can then be added to the next batch, ensuring that the search doesn’t miss important, connected information. This step doesn’t require creating new queries or re-running the initial search. It simply uses the structure of the data itself to bring hidden gems to light.
Adapting in Real Time
This approach allows SlideGar to adapt in real time. It alternates between the original set of documents and the new neighbors it discovers. With each iteration, the algorithm refines its understanding of what’s relevant. Over time, this process results in a final ranked list that is far more comprehensive and accurate than what a traditional two-step system can produce. It’s as if the search engine is having a continuous conversation with the data—asking questions, receiving answers, and then asking better questions based on what it learns.
Results That Matter
The impact of this approach is striking. In rigorous tests on widely used datasets, SlideGar improved recall by up to 28 percent and nDCG scores (a measure of ranking quality) by more than 13 percent—all without increasing the number of expensive LLM inferences. It’s a step change in search efficiency and accuracy. SlideGar shows that we don’t have to accept the limitations of bounded recall. Instead, we can build systems that continuously expand their horizons, finding more relevant results than ever before.
The Bigger Picture
What makes this breakthrough especially exciting is its broader significance. At its core, SlideGar represents a shift in how we think about search. It’s not just about tweaking algorithms; it’s about changing the entire paradigm. Instead of seeing retrieval as a fixed step, we can view it as a dynamic, adaptive process—one that grows smarter with every interaction.
This shift matters because search engines are more than just tools; they’re our primary means of navigating knowledge. They shape how we understand the world, how we make decisions, and how we uncover new ideas. When search engines improve, so do the countless fields that rely on them—from science and journalism to medicine and education.
The paper: https://arxiv.org/pdf/2501.09186