In my conversation with Venkat Rajaji, SVP of Product Management at Cloudera, he made a strong case for open source AI models ultimately prevailing due to their lower switching costs, economic sustainability, and functional parity with proprietary alternatives.
Cloudera’s approach centers on enabling “private AI,” where open source models run securely on an organization’s internal data, preserving privacy and compliance. These models already power enterprise use cases such as customer support chat, voice-to-text analysis, natural language querying, and SQL co-pilots.
Rajaji detailed how AI agents can orchestrate complex data workflows—like ETL and analytics pipelines—automatically adapting to upstream changes and reducing manual intervention.
Key Points: Open Source AI Models
Select Quotes: Open Source AI and the Future of Data Workflows
My conversation with Rajaji provided a snapshot of a future where open models, private AI deployments, and intelligent agents come together to deliver more secure, efficient, and adaptive data systems.
Why Open Source Models Will Likely Win
“The cost to build a model is incredibly high for the customer. The cost to switch is incredibly low from model to model. So it’s not like you have to rewrite an application if you want to use another model… Most of the models today generally do basically the same things with very little differentiation.
“And so in the world of models, [AI models have] kind of moved into a commodity… I think at the end of the day, the open source models will likely win out if there’s not much differentiation that can be seen from model provider to model provider.”
The Power of Private AI with Open Source
“What we’ve done at Cloudera is we’ve made it such that you can bring any model that you want and run it on your private data. It’s not sending any of that data for training anywhere else—it’s running directly inside of the Cloudera system. So we call it private AI.
“Most of what we’re doing is leveraging open source models… and running them off customer data in their own deployments. That’s a key reason why we believe open source models will likely win out.”
How AI Agents Will Reshape Data Pipelines
“Think about the complex data and analytics landscape. Data moves from a transaction system into a data lake, goes through ETL, into a compute engine, and finally into a BI tool like Tableau or Power BI.
“Now imagine if you had an agent that could orchestrate that entire process—track lineage, detect changes, and automatically update downstream scripts. What causes problems today? A new feature upstream, which is really just a new column with some computation. If no one downstream knows about it, it breaks everything. But if an agent could see that, notify people, and auto-update the workflows—that’s the future of computing in the data space.”