Gemma 4 runs on NVIDIA Jetson Orin Nano for edge devices through Blackwell GPUs for datacenters, plus AMD ROCm and Google TPUs.
Gemma 4's on-device optimization makes it unique among large multimodal models. The smaller variants (E2B, E4B) can run on NVIDIA Jetson Orin Nano, enabling deployment on edge devices for local inference without cloud dependencies. For more demanding workloads, the 26B and 31B variants scale across professional and datacenter GPUs.
Gemma 4 supports multiple accelerator platforms beyond NVIDIA: AMD ROCm-compatible GPUs and Google TPUs. This hardware flexibility means you can choose infrastructure based on your cost, availability, and geographic requirements rather than being locked into a single vendor ecosystem.
When building vector search applications with Zilliz Cloud and Gemma 4, your infrastructure topology becomes flexible. You can run Gemma 4 locally or in your cloud environment to generate embeddings, then securely transmit them to Zilliz Cloud for indexing and search. Zilliz Cloud's managed infrastructure handles the compute-intensive retrieval operations, while your Gemma 4 deployment optimizes for embedding generation.
This separation of concerns simplifies your architecture: your embedding generation infrastructure scales independently from your vector search infrastructure, each optimized for its specific workload.
Related Resources