TAG (Table-Augmented Generation) is a method to expand the context and capabilities of language models (LLMs), allowing them to provide more useful responses.
What is it?
Despite the similarity in name with RAG (Retrieval-Augmented Generation), TAG is based on a different concept: The interaction between an LLM and a relational database, enabling the language model to obtain structured data in real-time. This allows the model to provide responses that are well-founded and up-to-date.
The practical implementation of this involves executing SQL queries generated by the model on the database. Afterward, the model decides whether additional queries need to be performed, using the incrementally obtained data, before finally generating a response for the user.
Functionality
There are two basic requirements for implementing a TAG system:
-
Data Model Context: For interactions between the database and the LLM to make sense, the latter must understand the structure of the data it will query. The way this initial context is provided depends on the format of the prompt, which we'll review later.
-
Query Tool: Although one could implement a system without an LLM capable of performing tool calls, it significantly simplifies implementation if the model can send queries directly using this tool.
The interaction process itself is straightforward:
- Query Generation: The LLM generates an SQL query based on the given data model.
- Data Retrieval and Obtaining: The database processes the query and returns the data. Intermediate logic in the query tool may format the results specifically.
- Query-Table Loop: The LLM can perform multiple queries, using data obtained in each cycle for the next query.
- Response Generation: Once the necessary data is obtained, the model generates a response to the user. Depending on the model's precision, there may be discrepancies with actual data during generation.
Challenges
As with any generative AI system, there are potential risks associated with both response generation and model interaction.
-
Jailbreak: Without proper verification of the SQL queries generated by the model, an attacker could trick the LLM into generating harmful SQL commands, such as
DROP DATABASE ...
, or the infamousDELETE FROM ...
without anyWHERE
clause. -
Infinite Loops: This is possible with an LLM if it lacks the necessary "intelligence" to process a user's request and becomes trapped in a continuous loop of malformed SQL queries, resulting in errors indefinitely or until the model's internal probability decides to give a response without making another tool call.
-
Lack of Context: A complex data model requires substantial context. As such, compact models may face difficulties generating appropriate queries unless they are supplemented with examples and other advanced methods to improve their statistics.
Conclusion
Although it might seem less notable than RAG or web navigation, TAG is a crucial element in integrating generative AI with traditional software, as most applications and services operate on relational databases like PostgreSQL, SQL Server, SQLite, or Oracle Database. From ERPs to platforms like Instagram, using TAG allows for quick data retrieval and transformation without needing to know SQL or develop complex queries.