Google has announced Gemini, its most advanced and versatile AI model yet, capable of understanding and generating different types of information such as text, code, audio, image and video.
Gemini, which stands for Generative Multimodal Intelligence, is the result of a large-scale collaboration between Google and its subsidiary DeepMind, the leading AI research company. Gemini is designed to be natively multimodal, meaning it can seamlessly operate across and combine different modalities, unlike previous models that had to train separate components for each modality and then stitch them together.
According to Google, Gemini is also its most flexible model, able to run efficiently on everything from data centers to mobile devices. Gemini has been optimized for three different sizes: Ultra, Pro and Nano, each with different capabilities and use cases.
Gemini Ultra is the largest and most capable model, with state-of-the-art performance on many leading benchmarks, including natural image, audio and video understanding, mathematical reasoning, and coding. Gemini Ultra is the first model to outperform human experts on MMLU, a test of world knowledge and problem-solving abilities across 57 subjects, such as math, physics, history, law, medicine and ethics.
Gemini Pro is the best model for scaling across a wide range of tasks, and will be available to developers and enterprise customers via the Gemini API in Google AI Studio or Google Cloud Vertex AI. Gemini Pro will also power Bard, Google’s AI assistant, which will use Gemini for more advanced reasoning, planning, understanding and more.
Gemini Nano is the most efficient model for on-device tasks, and will be available to Android developers via AICore, a new system capability in Android 14, starting on Pixel 8 Pro devices. Gemini Nano will also power new features on Pixel, such as Summarize in the Recorder app and Smart Reply in Gboard.
Google said Gemini will be rolling out across a range of products and platforms, such as Search, Ads, Chrome and Duet AI, in the coming months. Gemini Ultra, however, will undergo extensive trust and safety checks, including external testing by experts and partners, before making it broadly available early next year.
Google DeepMind’s CEO and co-founder, Demis Hassabis, said Gemini is a significant milestone in the development of AI, and the start of a new era for Google as it continues to rapidly innovate and responsibly advance the capabilities of its models.
He said Gemini has the potential to create opportunities and deliver new breakthroughs for people and society, from enhancing creativity and extending knowledge to advancing science and transforming the way billions of people live and work around the world.
Google vs OpenAI
Gemini and OpenAI’s models are both advanced and versatile AI models that can understand and generate different types of information, such as text, code, audio, image and video. However, there are some differences between them, such as:
- Gemini is a single model that can seamlessly operate across and combine different modalities, unlike OpenAI’s models that have to train separate components for each modality and then stitch them together. For example, Gemini can generate text from an image, or generate an image from text, without needing any additional modules or adaptations. OpenAI’s models, such as GPT-4, DALL·E, and TTS, are mainly focused on one modality, and require other models or APIs to handle other types of information12
- Gemini is also more flexible and scalable, as it can run efficiently on everything from data centers to mobile devices. Gemini has been optimized for three different sizes: Ultra, Pro and Nano, each with different capabilities and use cases. OpenAI’s models, such as GPT-4 and GPT-4 Turbo, are mainly designed for cloud-based applications, and require high-performance hardware and systems, such as Cloud TPU v5p, to train and run.
- Gemini is also more advanced and versatile, as it can perform state-of-the-art tasks, such as natural image, audio and video understanding, mathematical reasoning, and coding, and even outperform human experts on some of them. Gemini Ultra is the first model to outperform human experts on MMLU, a test of world knowledge and problem-solving abilities across 57 subjects, such as math, physics, history, law, medicine and ethics. OpenAI’s models, such as GPT-4 and GPT-4 Turbo, are also very capable, but they have not achieved the same level of performance and generality as Gemini.
- Gemini is also more collaborative and integrative, as it can work with other tools and APIs, and power various Google products and platforms, such as Bard, Duet AI, Search, Ads, Chrome and more. Gemini will be rolling out across a range of products and platforms, such as Search, Ads, Chrome and Duet AI, in the coming months. OpenAI’s models, such as GPT-4 and GPT-4 Turbo, are also available to developers and enterprise customers via the OpenAI API, but they are not as widely integrated and accessible as Gemini.