To guide an LLM to ask follow-up questions when retrieved information is insufficient, you need a system that evaluates the quality of retrieved data and triggers clarification requests based on gaps. Here’s how this can be structured:
Detecting Insufficient Information: The system must first determine whether the retrieved context answers the user’s query. This can be done by embedding explicit checks in the LLM’s instructions. For example, the system prompt could include a step where the LLM evaluates the relevance of retrieved documents to the query. If key details (e.g., dates, names, metrics) are missing, the model is instructed to flag the gap. Tools like confidence scores for retrieved documents or semantic similarity thresholds can automate this detection. For instance, if the similarity between the query and retrieved text falls below a predefined threshold, the system marks the information as insufficient.
Generating Contextual Follow-Ups: Once a gap is detected, the LLM should formulate a specific follow-up question. This requires the model to analyze why the information is lacking. For example, if a user asks, “What were Tesla’s Q3 2024 earnings?” and the retrieved data only covers up to 2023, the LLM might ask, “I don’t have data for Q3 2024 yet. Would you like Q3 2023 results, or should I check alternative sources?” To enable this, the LLM’s prompt might include templates for common scenarios (ambiguity, outdated data, incomplete scope) or use a chain-of-thought approach to reason about missing information. Frameworks like ReAct or FLARE can help structure these multi-step reasoning processes.
Iterative Retrieval Loops: In an agent-based system, the process can loop until the query is resolved. For example, after the first retrieval fails, the agent stores the follow-up question in the conversation history and triggers a new search with the updated context. Tools like LangChain’s
ConversationalRetrievalChain
or LlamaIndex’sQueryEngine
support such cycles by retaining dialogue context across retrievals. To avoid infinite loops, implement safeguards like maximum iteration limits or fallback responses (e.g., “I still can’t find details—try rephrasing or ask about a related topic”).
Example Workflow:
- User asks, “How does Project A’s latency compare to Project B?”
- Initial retrieval returns docs about Project A’s features but no latency metrics.
- The LLM detects the missing comparison and asks, “I have Project A’s features but no latency data. Should I search for benchmarks or focus on Project B’s specs?”
- The user clarifies, and the agent performs a refined retrieval.
This approach combines prompt engineering, retrieval evaluation, and loop management to create a responsive system that actively seeks clarity when needed.