Microsoft takes on AI rivals with three new foundational models
The release signals Microsoft’s continued push to build out its own stack of multimodal AI models — and compete with rival AI labs — even though it remains tied to OpenAI.
MAI-Transcribe-1 transcribes speech across 25 different languages into text and is 2.
5 times faster than Microsoft’s Azure Fast offering, according to a company press release.
MAI-Voice-1 is an audio-generating model.
This voice model allows users to generate 60 seconds of audio in one second and allows users to create a custom voice.
MAI-Image-2 is a video-generating model.
MAI-Image-2 was originally released on MAI Playground, a new large language model testing software, on March 19.
“At Microsoft AI, we’re building Humanist AI. We have a distinct view when creating our AI models — putting humans at the center, optimizing for how people actually communicate, training for practical use,” Suleyman wrote in the blog post. “You’ll see more models from us soon in Foundry and directly in Microsoft products and experiences.
MAI-Transcribe-1 starts at $0.
MAI-Voice-1 starts at $22 per 1 million characters, and MAI-Image-2 starts at $5 for 1 million tokens for text input and $33 for 1 million tokens for image output.
Logic Quality Breakdown:
- Updated_At:
- Truth_Blocks:
- Analysis_Method: