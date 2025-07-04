STAMFORD, Conn., July 4, 2025 — Enterprise software is poised for a seismic transformation over the next five years, with 80% of applications projected to be multimodal by 2030 — a dramatic leap from less than 10% in 2024, according to a new prediction from Gartner, Inc.

Driven by the rapid advancement of multimodal generative AI (GenAI), this shift promises to enrich enterprise applications with capabilities once considered unattainable. “The shift to multimodal enterprise software is a fundamental transformation in business operations and innovation,” said Roberta Cozza, Senior Director Analyst at Gartner.

Multimodal GenAI models combine diverse data types — text, audio, images, video, and numerical inputs — within a single generative framework. This enables systems to interpret and respond to complex, real-world contexts far more naturally and effectively than unimodal solutions. For example, a multimodal model could process a video of a production line, analyze sensor data, interpret spoken operator feedback, and then generate a cohesive, actionable insight in real time.

According to Gartner, this integrated approach will prove especially impactful in domains such as healthcare, finance, and manufacturing, where domain-specific language models combined with multimodal reasoning can dramatically improve accuracy, drive contextual decision intelligence, and automate routine or complex processes.

“Multimodal generative AI will revolutionize enterprise applications by adding previously unattainable features and functionalities,” Cozza explained. “It will enable AI to take proactive actions across tasks, improving the depth and relevance of its decisions.”

Gartner’s latest Emerging Tech Impact Radar places multimodal GenAI at the heart of strategic enterprise technology investment. Product leaders, the report suggests, must move quickly to evaluate and integrate multimodal AI capabilities to deliver heightened business value and competitive differentiation.

Currently, many generative models handle two or three modalities, such as text-to-video or speech-to-image, but Gartner expects this to expand dramatically in the next few years. As more data types are integrated, the user experience and operational possibilities of enterprise applications will evolve exponentially.

“Enterprises should focus on integrating multimodal capabilities into their software to enhance user experiences and operational efficiency,” Cozza added. “By leveraging the diverse data inputs and outputs that multimodal GenAI offers, businesses can unlock new levels of productivity and innovation.”

Industry watchers say this multimodal momentum could trigger a new wave of AI-powered disruption, similar to — but potentially even more impactful than — the arrival of large language models over the past two years. As enterprises strive to keep pace with rising user expectations and competitive pressures, Gartner’s forecast signals a clear imperative: those who invest in multimodal GenAI today will be best positioned to shape — and dominate — the enterprise landscape of tomorrow.