Microsoft’s New Multimodal SLMs Trained on NVIDIA GPUs

February 28, 2025

Microsoft Introduces Revolutionary Phi SLMs with NVIDIA GPUs

In a groundbreaking announcement, Microsoft has introduced the latest additions to its Phi family of small language models (SLMs), setting a new standard in AI capabilities. The Phi-4-multimodal and Phi-4-mini models, trained on NVIDIA GPUs, offer a glimpse into the future of efficient and versatile language processing.

Advancements in Small Language Models

Small language models (SLMs) have emerged as a practical solution to the resource-intensive nature of large language models (LLMs). While LLMs have dominated the AI landscape, their heavy computational requirements limit their practicality in real-world applications. SLMs, on the other hand, are designed to operate efficiently in constrained environments, making them ideal for deployment on devices with limited resources.

Microsoft’s Phi-4-multimodal model stands out for its ability to process a variety of data types, including text, audio, and images. This multimodal approach opens up a world of possibilities for applications such as automated speech recognition, translation, and visual reasoning. The model’s training, which involved 512 NVIDIA A100-80GB GPUs over 21 days, underscores the sheer computational power required to achieve its impressive capabilities.

Exploring Phi-4-multimodal and Phi-4-mini

The Phi-4-multimodal model, with 5.6 billion parameters, has already proven its mettle in automated speech recognition, claiming the top spot on the Huggingface OpenASR leaderboard with a remarkable word error rate of 6.14%. This achievement highlights the model’s potential to revolutionize speech recognition technologies and enhance user experiences.

In addition to the Phi-4-multimodal model, Microsoft has also introduced Phi-4-mini, a text-only model tailored for chat applications. With 3.8 billion parameters and a context window of 128K tokens, Phi-4-mini excels in handling long-form content efficiently. Its training, which utilized 1024 NVIDIA A100-80GB GPUs over 14 days, reflects the model’s emphasis on quality educational data and code.

Deployment and Accessibility of Phi SLMs

Both the Phi-4-multimodal and Phi-4-mini models are readily accessible on Microsoft’s Azure AI Foundry, providing a user-friendly platform for designing, customizing, and managing AI applications. Furthermore, users can explore these models through the NVIDIA API Catalog, which offers a sandbox environment for testing and integrating these cutting-edge technologies into a wide range of applications.

NVIDIA and Microsoft: Partners in Innovation

Beyond training these sophisticated models, NVIDIA and Microsoft have forged a strategic partnership aimed at optimizing software and models like Phi to enhance AI transparency and support open-source initiatives. This collaboration is poised to drive advancements in AI technology across diverse industries, from healthcare to life sciences, revolutionizing the way we interact with AI-powered systems.

For more in-depth insights and updates, be sure to visit the NVIDIA blog and stay tuned for the latest developments in AI innovation.