Startup Gimlet Labs is solving the AI inference bottleneck in a surprisingly elegant way
The round was led by Menlo Ventures.
It can split an AI app’s work across both traditional CPUs and AI-tuned GPUs, as well as high-memory systems. “We basically run across whatever different hardware that’s available,” Asgar told TechCrunch.
” That’s what Tully believes Gimlet Labs offers.
” So he and his cofounders, Michelle Nguyen, Omid Azizi, and Natalie Serrino, set about building orchestration software that slices up agentic workloads so that they can be simultaneous spread across all kinds of hardware.
Gimlet Labs claims it reliably speeds AI inference up by 3x to 10x for the same cost and power.
Gimlet says it can even slice the underlying model so that it runs across different architectures, using the best chip for each portion of the model. The company has already partnered with chip makers NVIDIA, AMD, Intel, ARM, Cerebras and d-Matrix. Gimlet’s product, delivered either as software or through an API to its own Gimlet Cloud, isn’t for the rank-and-file AI app developer.
It’s for the largest AI model labs and data centers.
The cofounders had previously worked together at Pixie, a startup that created an open source observability tool for Kubernetes.
(Pixie’s tech is now part of the open source org that oversees Kubernetes. )
After Asgar randomly ran into Tully about a year ago and also received angel investments from Stanford professors, VCs started calling. After launch, a term sheet landed on Asgar’s desk.
With the previous seed, the startup has now raised a total of $92 million, including from a slew of angels like Sequoia’s Bill Coughran, Stanford Professor Nick McKeown, former CEO of VMware Raghu Raghuram and Intel CEO Lip-Bu Tan.
The company currently employs 30 people.
Other investors include Factory, who led the seed, Eclipse Ventures, Prosperity7 and Triatomic
Logic Quality Breakdown:
- Updated_At:
- Truth_Blocks:
- Analysis_Method: