OpenAI and Google announced their latest AI offerings this week. For Google, this was part of the Google I/O developers conference, while OpenAI had their "Spring Update" one day before Google's event. Now that both announcements are done, we can compare where these two giants stand in the AI race.
I've divided my analysis into different product categories: Free chatbots, desktop apps, platform integration, real-time assistants, etc. Note that some of the features demonstrated in this week's events are available now, some will be rolled out gradually over the next few months, and some are research projects with no target date.
Free AI Chatbot
In my view, the most significant announcement by OpenAI was that they are making GPT-4o, their latest Large Language Model, available for free to all ChatGPT users. Previously, the free version of ChatGPT was based on the less powerful and completely outdated GPT-3.5 LLM. But now, ChatGPT users can access GPT-4o for free, and also enjoy the advanced tools that were previously available only to paid users: file upload, code Interpreter (which can run Python code to process files and analyze spreadsheets), and custom GPTs.
Google announced this week that it has integrated the Gemini 1.5 Pro model into Gemini Advanced, the paid version of the Gemini AI chatbot. This version has a one million token context window, so users can upload 1500-page documents and Gemini will process them in one take. But the free version of the Gemini chatbot is still based on the Gemini 1.0 Pro LLM, and doesn't support file uploads. So in this category, OpenAI is the clear winner.
Desktop App
OpenAI launched a new desktop app, initially on macOS for paying users. In the desktop app you can use the voice conversation feature, which is currently available on the mobile app (but not in the web version). This feature has a delay of 2-3 seconds, but in the future OpenAI plans to integrate the low-delay conversational mode (see below) into the app, as well as video capture. So imagine that while working on your computer, you can talk to ChatGPT about software, documents and websites which are open on your screen. This is a real game changer...
Gemini has a mobile app for Android only, and Google didn't announce any plans for a desktop app, So on this one - another win for OpenAI! But, Gemini is integrated into the Chrome browser - just write @gemini in the address bar followed by a prompt, and Gemini will launch with your prompt already loaded.
Platform Integration
This week Google announced that Gemini is being integrated into Search, Android, YouTube, and Google Photos. In Search, AI Overviews provide summaries in response to your search queries, and you can customize their length and tone. The Android integration lets Gemini view your phone’s screen and help you using that context. Gemini also listens to your calls and alerts you if it detects spam calls or scams. When watching YouTube educational videos, you can ask questions, get explanations and answer quizzes on the topic presented in the video. Ask Photos adds a natural language conversational search experience for Google Photos, so you can say, for example: “Show me the best photo from each national park I’ve visited”, or “What themes have we had for Lena’s birthday parties?”.
Integrating an AI chatbot directly into existing web services and mobile platforms can only be done if you own them, and OpenAI obviously doesn't... So a clear win for Google on this one. The only chance for OpenAI to compete in this area is by signing a deal with Apple to integrate ChatGPT directly into iOS (and perhaps macOS as well). There are a few rumors pointing in this direction, but we'll have to wait for Apple's developer conference which begins on June 10th to find out.
Real-time Assistant
One of the major highlights of OpenAI's demo was the "real-time assistant": A low-delay voice conversation with ChatGPT, where ChatGPT's voice features enhanced emotion and sounds very natural. OpenAI also demonstrated the vision capabilities of the new model, by asking for help in solving an equation that was written on paper and captured on live video.
A day later, Google demoed Project Astra with very similar features: combining live video from the phone's camera with a real-time voice conversation. Google's demo looked more impressive, and Google promised that these capabilities will be introduced later this year to Google's products as "Gemini Live". OpenAI's capabilities seem more immediate, but we'll have to wait and see...
Custom GPTs
Through my work with customers on Generative AI Transformation projects, I have found that OpenAI's Custom GPTs are invaluable productivity tools in many departments. You can upload a few documents, define a "system prompt", and create a no-code, customized AI chatbot that is tailored to your exact needs. For example: A Policy Bot, a Persona Generator or a Customer Support Assistant. This week OpenAI announced that Custom GPTs will be available to free users as well - they will be able to discover GPTs on the GPT store and use them, but they won't be able to create them using the GPT Builder.
Google announced Gems, which are custom versions of Gemini, but they are not available yet - they said that they are "coming soon" to Gemini Advanced subscribers. So here again OpenAI has a lead, especially considering the fact that their GPT store already had 3 million GPTs as of January of this year.
Text to Video
Gemini announced Veo, a text-to-video model which is their answer to OpenAI's Sora. Both proudcts have not been released yet, and are available only to a select number of creative professionals. Google showed a limited demo of Veo on YouTube, and it seems to be lower quality than Sora. OpenAI on the other hand released dozens of video in full quality for download when Sora was launched back in February, so my overall impression is that Sora is more mature.
Video Analysis
Gemini 1.5 Pro, which is available in Google AI Studio supports analyzing videos of up to one hour in length. Gemini 1.5 Pro now is now available to paying users of Gemini Advanced, and supports file upload, but doesn't support video. Google also demoed search with video, which lets users upload a video and then search for information about its content.
OpenAI's real-time assistant uses real-time video in addition to voice, and there is one demo on their website of analyzing a presentation video (speakers with slides), but they didn't show a demo of analyzing an action video. So here the win goes to Google - their video analysis feature is really amazing, as I've shown in this LinkedIn post.
Autonomous Agents
Autonomous AI Agents are the next step in the evolution of AI. They get a high level task, devise a plan to perform it, and execute on that plan. Google's CEO Sundar Pichai showed some early "experiments" which Google is conducting with AI agents, such as returning a pair of shows: Gemini finds the receipt in your email, fills the return form on the vendor's website, and even schedules the pickup in your calendar. Google also showed a demo of agent-like functionality in Google Workspace - finding receipts in your email and putting them in a new Google Sheets document. Another interesting demo was an AI teammate that collaborates with you and your team on Google Workspace.
OpenAI, on the other hand, didn't mention agents at all in their presentation. So this one goes to Google, at least for the vision.
Summary
It's been a very hectic week in AI, with a head-to-head battle between OpenAI and Google on new AI products and features. I call it a tie: OpenAI is leading with the most feature-rich free product, the only desktop app, a mature custom chatbot offering, and higher quality text-to-video. Google is leading with its platform integration, real-time assistant capabilities, video analysis and its vision for future autonomous agents. And if you want to know which one of them to use today - just ask them!
Comments