As clever as they are at generating text and images, AI assistants have a blind spot for video. Ask one for the best face cream, and you'll probably get an answer drawn from product reviews, blog posts, and retailer websites. But what if the best explanation of that product lives in a 10-minute demonstration video instead?
Large language models (LLMs) such as ChatGPT, Gemini and Claude remain surprisingly limited in what they can extract from video. They can process transcripts or metadata, but most of the audio and visual context - the products on screen, the demonstrations, the branding, the sequence of events - is lost on them.

French startup Aive believes that it is about to become a much bigger problem as AI assistants become the first stop for finding information online. The company, which recently announced a collaboration with Nvidia around its open-source Nemotron models, says its technology can transform videos into structured knowledge that AI models can understand and cite.
Rather than generating new videos, Aive analyses existing ones using its proprietary Multimodal Generative Technology (MGT), extracting information from the audio and visual content and converting it into machine-readable knowledge.
"If I'm advertising a cosmetics product through a ten-minute video, today's LLMs mostly see the transcript, if it's available to them," CEO and co-founder Olivier Reynaud said. "They don't really understand what's happening visually. Our technology transforms that video into knowledge that AI can use."

From video editing to AI-readable knowledge
As The French Tech Journal reported in our previous profile of Aive, the company built its business around automating enterprise video post-production, so a single video can be adapted for TikTok, Instagram, LinkedIn and other platforms in different formats and languages.
That remains Aive's core business, but Reynaud believes AI search is opening a new opportunity.
"The way people search is changing," he said. "More and more searches are happening through LLMs rather than traditional search engines like Google."
In practice, Reynaud says, this means an AI assistant could eventually answer a question such as "What's the best face cream?" using information extracted directly from a product demonstration video, rather than relying on text already published elsewhere.
Why Nvidia matters

For Aive, the Nvidia collaboration is about accelerating that ambition, not adding another AI model.
The company has integrated Nvidia's Nemotron models into its own technology, which already combines more than 25 AI models to analyze scenes, objects, products, emotions, speakers and other visual signals before structuring them into data that LLMs can process.
According to Reynaud, this lets brands, broadcasters, and media companies make years' worth of existing video content accessible to AI assistants without having to recreate or manually rewrite it.

A new visibility challenge
Those same companies have spent years building vast video libraries, yet most of that content remains invisible to today's AI assistants.
Aive says interest accelerated after it unveiled its video GEO platform at VivaTech, drawing inquiries from brands, television companies and media organizations wanting to make their video content discoverable by AI systems. The company raised €15 million last November to fund the development of the technology and its international expansion.
It is still too early to measure the commercial impact. But Aive is betting that making video understandable to LLMs could become as important as optimizing websites once was for traditional search.
If the bet pays off, the next generation of AI assistants won't just retrieve videos - they'll understand them.