A New Chapter in the Gen AI Race:

Introducing GPT-4o

Insight
5 minute read
17/05/24

The introduction of GPT-4o by OpenAI marks a significant improvement on the previous GPT-4 model, with enhancements in speed, multimodal capabilities, cost-effectiveness and accessibility. Let's delve into these new features and explore the impact of this technological leap forward.

Sebastian Ahrens

AI Center of Excellence Leader, PwC Switzerland

Multimodal Capabilities

GPT-4o stands out as a truly multimodal model, capable of processing and generating content across text, audio and images. This omni-capability facilitates versatile interactions, where users can input combinations of text, audio, image and video, and receive responses in text, images and audio. By contrast, GPT-4 primarily focused on text and some level of image handling, lacking native support for audio inputs.

Performance and Efficiency

One of the most notable improvements in GPT-4o is its speed. The model offers significantly faster response times compared to GPT-4, even outperforming GPT-4 Turbo. It's designed for real-time interactions, capable of responding to audio inputs in as little as 232 milliseconds, rivalling human response times. GPT-4 was optimised for text processing, but did not emphasise speed and real-time interaction as much as GPT-4o.

Cost-Effectiveness

The introduction of ChatGPT-4o, or GPT-4 Omni, has significantly reshaped the cost landscape for language models. Priced at $5.00 per million input tokens and $15.00 per million output tokens, GPT-4 Omni offers substantial savings compared to GPT-4 Turbo, which costs $10.00 and $30.00 respectively for the same amounts. This reduction in costs makes GPT-4 Omni a highly competitive choice compared to Google's Gemini models and Bedrock's offerings. Google's Gemini 1.5 Pro models remain competitive, especially the Flash version, which is the most economical for input and output tokens. Bedrock models, particularly Claude 3 Haiku, offer the lowest prices, making them attractive for cost-sensitive applications. The overall effect is a more competitive market where users can choose from a variety of high-performance models at different price points, which enables broader adoption and more diverse use cases.

Accessibility and User Experience

GPT-4o is available to all users, including those on the free tier of ChatGPT, with GPT-4-level intelligence. This broad accessibility ensures that more users can benefit from the advanced features of GPT-4o without needing a paid subscription. By contrast, GPT-4 was available to ChatGPT Plus subscribers, with a cost to access the most advanced features.

Language and Vision Capabilities

GPT-4o boasts improved language support, offering better performance in non-English languages and enhanced vision capabilities for analysing images and video content. This makes it a more globally accessible and versatile tool. GPT-4 provided advances in language processing over its predecessors, but did not have the same level of support for non-English languages or the advanced vision capabilities of GPT-4o.

Conversational Abilities

GPT-4o supports more natural and conversational interactions, allowing users to interrupt the model, share emotions and engage in a more human-like dialogue. It also introduces voice and video interactions for a more immersive experience. While GPT-4 improved conversational abilities over GPT-3.5, it lacked the real-time, multimodal interaction capabilities and the ability to process emotional cues as effectively as GPT-4o.

Drive for Efficiency and Sustainability

In the race to advance artificial general intelligence (AGI), efficiency and sustainability are paramount. OpenAI, like other foundation model providers, is focused on making computations more effective to lower the per-token prices. Techniques such as the mixture of experts approach, which ensures that only a fraction of neurons are active during inference, intelligent quantisation and novel methods all contribute towards this goal. These innovations not only make AI more cost-effective but also align with the imperative to reduce CO2 emissions and promote green technologies.

A Glimpse into the Future

It is truly mind-boggling to remind ourselves that the concept of a universal translator, which was envisioned in the Star Trek universe as a handheld device with a keypad and display invented shortly before 2151, has become a reality with the latest generation of models. We can now ask a question in English on our smartphones and receive real-time translations into languages like Japanese or Swahili. Such advances were unimaginable just five years ago.

In summary, GPT-4o represents a leap forward in AI technology, offering faster, more cost-effective and versatile capabilities that enhance user experience and accessibility. Its multimodal nature, combined with improvements in language and vision processing, sets a new standard for what AI models can achieve, bringing us closer to the future we once only dreamed of.

Artificial Intelligence everywhere

Time to step into the forefront of innovation. AI is revolutionizing the business landscape and unlocking new possibilities for efficiency and innovation. To leverage the full potential of AI, organisations must reassess their strategies and be equipped to successfully navigate change.

We leverage our broad AI expertise to guide you through the path of embracing AI and driving competitive advantage for future growth.

Explore