Introducing MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2: Microsoft Foundry’s Next Leap in AI
Microsoft launches MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2 in Foundry, bringing advanced speech and image AI to developers with enterprise-grade reliability and efficiency. Learn what’s new and how to get started.
Microsoft continues to push the boundaries of AI innovation with the public preview of three new models in Microsoft Foundry: MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2. These models are designed to empower developers with advanced multimedia AI capabilities, now available for integration into real-world applications via Azure Speech and Foundry.
What’s New?
MAI-Transcribe-1
A first-generation speech recognition model, MAI-Transcribe-1 delivers enterprise-grade accuracy across 25 languages at roughly half the GPU cost of leading alternatives. It’s engineered for reliability across accents, languages, and challenging audio conditions, making it ideal for:
- Real-time transcription for IVR systems, virtual assistants, and call centers
- Live captioning for events and meetings
- Automated media subtitling and archiving
- Education and e-learning platforms
- Customer and market insights through structured data extraction
MAI-Voice-1
This high-fidelity speech generation model can produce 60 seconds of expressive audio in under one second on a single GPU. It powers features like Copilot’s Audio Expressions and podcast tools, and is available for developers to create custom voices (with responsible AI safeguards). Use cases include:
- Conversational AI and agent assist
- Accessibility and live captioning
- Media production and content creation
MAI-Image-2
MAI-Image-2 is Microsoft’s most advanced text-to-image model, debuting in the top-3 on the Arena.ai leaderboard. It excels at generating photorealistic images, rendering text in graphics, and handling complex layouts. It’s already powering Bing Image Creator, Copilot, and PowerPoint, and is now available for developers to:
- Ideate and visualize creative concepts
- Generate custom visuals for enterprise communications
- Prototype UX and product concepts
Why It Matters
These models are not just technological milestones—they’re already powering Microsoft’s own products and are now accessible to developers everywhere. With lower costs, high efficiency, and enterprise-grade reliability, they enable scalable, production-ready AI solutions.
Getting Started
- Try the models: Experiment with MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2 in the MAI Playground
- Build in Foundry: Deploy these models via Azure Speech and Foundry APIs
- Learn more: Official announcement
Microsoft’s ongoing collaboration with NVIDIA and advances in Azure AI infrastructure ensure that these models are supported by world-class hardware and cloud engineering, ready for the most demanding enterprise workloads.
Sources:
Share this post