Emerging Signs of the "AI Platform Shift"
OpenAI, Google, and Microsoft events shine light on the current state of Generative AI
There have been a barrage of AI updates from some of the biggest players in the market over this last two weeks with OpenAI, Google, and Microsoft each hosting events. Each event tells a particular story of each company’s AI journey and where the industry is right now especially in light of the “AI platform shift” narrative. This turned into a longer piece than I expected, so feel free to skip to the end for the TL;DR.
OpenAI “Spring Updates”
OpenAI announced significant updates to ChatCPT and the GPT-4 family of models, namely introducing GPT-4o (“o is for “omni”, apparently). The event was a different format than we have seen in the past from OpenAI (especially without Sam Altman presenting), and I encourage you to watch the replay.
GPT-4o is not a completely new model (i.e. GPT-5), but instead a significant update to the GPT-4 model. While there was certainly speculation (and perhaps an expectation) that a brand new model was going to be released (GPT-4 launched in 2022), this update is quite significant. We tend to be caught up the technical nature of the LLM itself (especially model size), but this was truly a product update versus just a model update. Speed was certainly the key theme on display here.
For example, Voice Mode in GPT-3.5 had latency of 2.8 seconds and GPT-4 had latency of 5.4 seconds. In GPT-4o, the model can complete a conversational interaction in as little as 232 milliseconds. OpenAI covers more details around this speed increase in their blog post here. Outside of the model performance, the previous ChatGPT Voice Mode user experience was quite clunky as it required you to fumble through the app. That iteration was good enough to show the vision, but this update makes this much closer to a compelling product. There are certainly still issues with the model responses, the model “personality” (including a mini-scandal with Scarlett Johansson over similarities to her voice), and a glitchy demo (even with wired connectivity). I’m willing to overlook these shortcomings for now as a fairly significant step change forward in getting to a true assistant that you can actually talk naturally with. We aren’t quite at Her-level interactivity or anything resembling AGI (yet).
The other notable update is that this is available to all free users. The vast majority of ChatGPT users are still at the free tier, and have been using GPT-3.5 Turbo up until now. This is an enormous leap forward for those users, and speaks to the confidence of OpenAI’s ability to scale GPT-4o and their infrastructure. While inference costs are slowly going down, I’m particularly interested in seeing how OpenAI will be able to continue to financially support the free tier. There also does not seem to be a great value proposition for the paid tier now, which is probably a signal a significant update is coming soon for paid users (perhaps GPT-5). OpenAI is making good progress in the enterprise space, but seemingly still needs to figure out the consumer business model. I can’t imagine this can be the loss leader for the company forever.
Google I/O
Google’s annual developer conference comes just a month after the company’s enterprise keynote, but unfortunately fell a bit flat coming off of the heels of the OpenAI update.
Starting with the positives, all of the updates to infrastructure and existing products were quite compelling and are shipping in the coming weeks or months. Google’s approach to integrating AI with data inside existing products or environments was particularly impressive. A few announcements that I found interesting:
Gemini: Significant updates to Gemini 1.5 Pro and new version, Flash, are available immediately. The 1M token context window is already incredibly powerful, and the company announced a 2M token context window model will be available in preview for both model variants. Gemini 1.5 Flash promises to be both faster and 1/10 the cost, which should be a boon for quite a few use enterprise use cases.
Search: Updates that include Gemini-created “AI overviews” and grouping of results will be rolling out in the next few weeks in the US. I’ve been impressed using the Search Generative Experience (SGE) preview for a while now. Extending SGE with the announced multi-step reasoning, Search Engine Results Page (SERP), etc. is a nice improvement that doesn’t feel like nifty features haphazardly bolted onto search. I applaud the confidence to roll this out broadly. It also exemplifies the infrastructure advantage Google has to do this at scale. Search continues to be Google’s core business, and they don’t seem to be afraid to continue to evolve the product in a meaningful way.
Gmail: The demo with Gemini was something I could see using immediately myself. Impressively, everything in the demo happens within Gmail and the mobile app to summarize and compile documents with all of the data that resides within Gmail. Compiling a variety of documents, emails, and other attachments are real life use cases I have on a daily basis in both my personal and professional life. Doing this kind of thing on a tiny smartphone screen would be challenging at best. I believe some of this is still done in the cloud, which also speaks to the importance of vertical integration there.
Android: This demo was impressive with Gemini working at the system level in the device to do a variety of tasks like searching a PDF without having to jump between apps. Technically impressive and incredibly powerful from a usability standpoint. I would expect Apple to announce some similar capabilities in iOS at WWDC, but won’t imagine it will be as impressive (yet).
Generative Media: This segment showed some great promise around music production that actually facilitates the creative process instead of just regurgitating training data like many other tools out there.
On to the negatives, much of the rest of the event was a series of hand-wavy concept videos. Project Astra is a very similar concept to what OpenAI actually demoed, but the Google project has a roll out planned at some point later this year. Similarly, the AI Agents part of the presentation was seemingly barely a vision (unlike what Microsoft demoed this week). Google is famous for showing off concept videos in their events, but when you see a live demo like OpenAI (flaws and all) the same week the message doesn’t seem as compelling. Personally, I would have not included any of the products that aren’t shipping in 2024 in the presentation.
Google’s strength, as evidenced by the product updates shipping soon, is really in the engineering prowess and the infrastructure capability to create performant, integrated systems at scale. The enterprise keynote last month truly showed that off, but this presentation veered too far off into concept especially when compared to OpenAI and Microsoft’s presentations. To that point, any actual new product announcements were curiously absent from I/O. LINK
Microsoft Windows AI & Build
In my opinion, Microsoft had the most impressive announcements this month. The company covered a myriad of areas from hardware to services, and most of which are all either available now or shipping soon.
Copilot+ PCs: This is an enormous step forward to support AI workloads in the Windows ecosystem especially with on-device inference and native integrations. These new machines are Arm-based utilizing the new Qualcomm Snapdragon X chips instead of x86 Intel chips (though Intel and AMD chips will be in other devices moving forward apparently). Microsoft announced their own devices (new Surface and a tablet) and those from many OEMs you would expect (including Dell, ASUS, Samsung, Acer, Lenovo, and HP). The shift from x86 to Arm is a huge signal, and a perhaps a sign of the slow death of x86 computing. Microsoft has been attempting to support Arm for around a decade at this point, but now seems to be the time to shift akin to Apple’s transition a few years ago (especially if Qualcomm can scale up more and developers jump to support Arm-native apps). Speaking of Apple, the comparison to Apple’s MacBook Air were very direct including a head-to-head live demo as part of the keynote with the Surface machine clearly the winner (Microsoft claims it is 58% faster than the Air M3). On that note, interestingly the current generation Air does not quality as a Copilot+ device because the neural processing unit (NPU) is not performant enough. I haven’t seen any of the devices hands-on yet, but they look great and make me excited about Windows hardware again. The tablet in particular with the removable keyboard looks particularly well designed, and in many ways is what I wish Apple had done with the Magic Keyboard for the iPad Pro. LINK
Recall & Windows Semantic Index: New feature within Windows that allows AI-powered photographic “memory” that lets users access basically everything they may have seen on the device. Importantly, this is all processed on-device (hopefully reducing the privacy risk), and an example why this new generation of hardware with deeper software integration is needed. I’m excited to test this feature to see if it’s actually useful. LINK
Copilot: Major updates to the generative AI services within the 365 suite and Teams that include features like note-taking, chat moderation, agents, and contextual Q&A. The agent experience seems to be quite useful, and the examples provided (from data entry to more complicated tasks like project management) are all real improvements to the working life of many professionals. Microsoft said this won’t completely displace jobs, but instead simply take away the these menial tasks. While I tend to agree most roles won’t be 100% impacted here, there is so much swivel-chair work still out there in every organization that could simply go away if this works better than most other automation today. I’ll be excited to evaluate when it hits Copilot Studio later this year. LINK
Azure AI & Models: Microsoft announced an update to the Phi-3 AI model family called Phi-3-vision. This new model is multi-modal with focus on reading text and interrogating images, but importantly is small enough that it should work on mobile devices. Lots of use cases need this type of functionality, and seems to further validate the need for models of all sizes. The company also announced updates to the Azure OpenAI service including availability of GPT-4o, fine tuning for GPT-4, Assistants API, etc. All of these are nice updates to what is already a compelling offering through Azure. LINK
Advanced Paste: New feature within the PowerToys suite inside Windows 11 that allows you to convert items within your clipboard and invoke OpenAI through a prompt window. To use the AI function, you need to have credits in your OpenAI account. The feature itself is somewhat interesting, but for me the bigger takeaway is the signal of how pricing models may work for AI-enabled features. LINK
Microsoft has been positioning themselves as the AI platform player, and with the huge steps forward in their hardware portfolio and tighter integration with their software and services that vision seems much closer to a reality. Also, many of these announcements are Microsoft first-party offerings, but the partnership components (OEMs for the Copilot+ hardware, OpenAI, Khan Academy, etc.) and positioning should enable some significant scale and reach advantage Google and Apple haven’t quite achieved yet with their respective approaches.
Apple rumors
The UX barrier for third party AI assistants, even with this GPT-4o update, is that you still have that step to unlock your phone and launch an app to interact with it. Apps also doesn’t necessarily have any deeper integration with the other data or services on the device. Apple has been not-so-subtly focused on readying their hardware for all things AI especially for on-device inference, but need some partner to provide the LLM and “killer apps” still given the state of its in-house models.
There have been many rumors that Apple was negotiating a deal with Google to provide Gemini within the iOS ecosystem (especially in light of the search partnership), but now it seems OpenAI is the clear frontrunner. This could be a move to keep OpenAI closer and avoid medium-to-long term competition around hardware that has been rumored. This also could be a clever move, much like the Google search partnership, to shield Apple from any potential issues (like hallucinations, privacy, etc.) with the model itself. It could be all of the above. I’m not sure OpenAI, especially given the myriad of controversies, is the best fit for such a risk-averse company like Apple. If not Google, Microsoft feels like the more stable, mature bet personally. It is unclear if Microsoft even wants to look at that kind of partnership even though they have a significant blind spot in the mobile space.
Mark Gurman wrote a piece this week about what we may expect from the WWDC announcements, including some nifty capabilities across the core apps within iOS and macOS including software to route specific tasks to process either on-device or in the cloud. This all sounds like catch-up so far.
Key questions come to mind already: What is the business model and structure for this kind of partnership? Is this some sort of revenue share? Does Apple open up the ability to select the “default” LLM/assistant outside of Siri?
I am expecting to see some big announcements at WWDC this summer to help shed light here, but I would not bet on anything that transcends. I can’t remember a WWDC with more expectation especially given the competitive narrative these days. LINK
Key takeaways on the “AI Platform Shift” [TL;DR]
In historical platform shifts, there has been fundamental technological innovation followed by a distribution shift that together enables reorientation of markets and value chains. In the mobile example, websites initially turned into apps, but real disruption didn’t happen until companies did things you could only do with everything else that came with the mobile devices (e.g., sensors, GPS, cameras, connectivity, etc.) to leverage in specific markets (e.g., Uber and the taxi industry). That distribution and value aggregation shift usually takes time to manifest.
There are phases to the process, and we are still in the early innings of the Generative AI story despite over a decade of machine learning. Generative AI is certainly here to stay, and is becoming a part of every software application and modern tech product. We seem to be moving further into the phase where we’re better understanding the technology and what it may be useful for. Incumbents are incorporating features into current products and continuing to exert their existing strengths in infrastructure, scale, and distribution. Those incumbents are attempting to build the foundational integrated platform components that likely allow them to capture a significant part of the value chain for now. Upstarts haven’t quite cracked how to unbundle pieces of the existing value chain yet, but that will likely be part of the next phase here.
Here are some of my key takeaways from the announcements this week:
Model performance: Multi-modal and general purpose model performance is important to many of the current use cases. Smaller, more efficient models seem to be finding an important place within these systems as well.
Speed: Speed is becoming the critical differentiator, especially for the general purpose, conversational AI “killer apps”. This speed does not simply come from the underlying model itself, though certainly an important part of the system.
Infrastructure & scale: Incumbents are continuing to leverage existing infrastructure and scale strength to great effect. Existing reach and distribution is helping solidify positions by introducing AI in a meaningful way into existing scaled products.
Hardware integration: A new generation of end user hardware is enabling more integrated, on-device AI experiences. This tighter coupling at the device level in concert with cloud services should enable more of a true “platform” and ownership of a significant portion of the value chain for now.
Partnerships: Upstarts are pushed to partner even if the product or technology is truly differentiated as the “platform” becomes more vertically integrated by the incumbents. Applications like chatbots don’t have to live within the “platform” per-se (and can likely be a standalone app), but being embedded into the core platform could provide competitive advantages.
In light of these themes, each company is in their own place at the moment:
OpenAI: One of a few traditional “startup” examples (hard to call a company with a $86B valuation a startup) in the space today with a compelling set of unique offerings. It seems partnerships may be the short-medium term pathway to continued growth.
Google: Seemingly forging their own path across the entire AI ecosystem with a limited partner ecosystem. The quintessential vertical integrator leveraging their infrastructure and scale strengths.
Microsoft: The most compelling provider in the market today that leverages infrastructure, scale, first-party offerings, and meaningful partnerships. The developer experience seems to be the most “complete” to simply build your app within their platform. However, Microsoft is still missing a mobile angle.
Apple: The foundational elements of a platform are there, but it seems evident partnerships will be needed. The gauntlet has been set to catch up with significant pressure on this year’s WWDC to shed light on the vision to do so.
The technology shift around AI is happening at a breakneck pace, but there is still a long road to truly tell the magnitude of the platform shift. Right now, Microsoft seems to be in the best position to navigate this shift and capture the majority of the value of the incumbents today. We still have some key questions still to be answered:
How much better do models get?
Does a competitive moat reside with the models or model providers themselves?
Can you have a new platform if you don’t have hardware?
Can new business models truly emerge?
Are there network effects?
Can you aggregate distribution in a meaningful way?