As companies face the challenge of integrating generative AI into their operations, many are discovering that the real power—and complexity—lies in their own data. In this TechVoices interview, Daniel Avancini, Chief Data Officer at Indicium, offers a clear-eyed view of how companies can leverage proprietary data effectively while avoiding common pitfalls.
From the rise of retrieval-augmented generation (RAG) to the risks of shadow AI and the ongoing need for robust data governance, Avancini lays out a practical roadmap for deploying AI responsibly and at scale.
Key Points: RAG, Shadow AI, and Data Governance
- Most companies are not fine-tuning large language models but instead using Retrieval-Augmented Generation (RAG)to allow natural language querying over structured or semi-structured proprietary data.
- Shadow AI is a real enterprise risk.
- With open models and public datasets widely available, data—not models—is the key differentiator.
- AI amplifies the need for data governance.
Key Quotes: “…the most common way people are using GenAI in companies right now.”
Avancini discussed how AI uses data, the problems with shadow AI, and the pressing need for AI governance.
How AI models access data
“You are not training a model, as people like to say, but you are using the models to search and to have a natural language interaction with your data. When you have a chatbot, for instance, and you ask, ‘How much did I sell last month?’ the LLM converts your question into a query, gets the data back, and converts the result into a natural language response. So the model isn’t really accessing the data—it’s just formatting the queries. This is probably the most common way people are using GenAI in companies right now.”
Shadow AI
“Many companies that are afraid of using AI are actually creating a problem for themselves because their employees are using AI without a license, and then they’re incurring this risk. That’s shadow AI. It’s kind of the reverse of what IT departments think: ‘We’ll block AI to protect data,’ but that just drives people to use unapproved tools, increasing the likelihood of data leakage or compliance issues.”
AI requires data governance
“AI does not fix data quality problems—especially when those problems come all the way from the source. It enables many more people to work with data, yes, but it requires an even larger investment in data governance and data quality than companies may have made in the past five to ten years. It’s not a silver bullet. If anything, AI makes data platform engineering and stewardship more critical, not less.”