This week, I take a look at the surprisingly strong state of Google, Meta gets a new chief AI researcher, and more. If you haven’t already, be sure to check out this week’s Decoder episode about deepfakes and where they are headed.
Also, do you use an AI coding tool like Cursor or GitHub Copilot? I’d love to know what works and what doesn’t…
After spending time with Google executives during the company’s I/O conference in May, it was clear that they were feeling confident. Now, I’m beginning to see why.
ChatGPT is not making Google Search obsolete. If anything, AI is making Google stronger than before.
During Google’s earnings call this week, CEO Sundar Pichai announced that AI Overviews in search results “are now driving over 10-percent more queries globally for the types of queries that show them, and this growth continues to increase over time.” Put simply, when Google works like ChatGPT, people use it more. Pichai noted that this is particularly true for younger people, a demographic that the 10-blue-links version of Google had been losing relevance with for a long time.
ChatGPT doesn’t appear to be curbing the growth of the Gemini app, either. Pichai said that daily prompts to Gemini increased by over 50 percent from the previous quarter. Gemini now has more than 450 million monthly users, up from 350 million in March. Google processed nearly a quadrillion AI tokens across all its products last month, which is more than double the number it processed in May.
Another telling sign of confidence has been Google’s reaction to the AI talent wars. “I look at both our retention metrics as well as the new talent coming in, and both are healthy,” Pichai said on the earnings call. “I do know individual cases can make headlines. But when we look at numbers deeply, I think we are doing very well through this moment.”
While Mark Zuckerberg has managed to poach talented researchers from DeepMind, my sources say that Pichai and Demis Hassabis have been resistant to bidding wars and amenable to letting most people go. Contrast this with the mood at OpenAI, where research chief Mark Chen compared Meta’s poaching to the feeling of a home invasion.
There’s an industry-wide belief that DeepMind’s bench is deep enough to withstand defections and that the company can quickly make reverse acquihire moves, such as its recent Windsurf deal, as more AI startups seek refuge from the money-intensive game that only Big Tech seems capable of truly playing.
“Meta right now is not at the frontier,” Hassabis said in an interview with Lex Fridman this week. “Maybe they’ll manage to get back on there, and it’s probably rational what they’re doing from their perspective because they’re behind and they need to do something.” Implicit in that statement is the idea that Google is operating from a position of strength in the AI race, a notion with which all the major players I’ve spoken with privately agree.
Google is by no means unassailable. GPT-5 is coming soon and could blow past Gemini. ChatGPT is the Kleenex of chatbots, and that doesn’t appear to be changing anytime soon. Meanwhile, Google is sending fewer clicks to websites, which threatens the give-and-take model that has fueled its business to this point. There’s a real chance that Google’s business may be broken up by the US government. At the very least, it will probably have to stop paying Apple for default status on the iPhone.
Even still, AI so far isn’t the threat to Google’s business that many thought it would be. Instead, it’s increasingly looking like Google is stronger than ever.
– Microsoft CEO Satya Nadella in a public memo to employees acknowledging recent layoffs and the company’s climbing stock price.
– Anthropic CEO Dario Amodei in a leaked message to employees about seeking funding from the Middle East.
– Incoming OpenAI exec Fidji Simo subtly laying the groundwork for ads in ChatGPT.
– Nvidia CEO Jensen Huang on the All-In podcast.
– Windsurf’s second hire, Prem Qu Nair, on X, describing the way Google hired away the startup’s core engineering team.
Some interesting career moves this week:
More to click on:
If you haven’t already, don’t forget to subscribe to The Verge, which includes unlimited access to Command Line and all of our reporting.
Let me know if you have thoughts on this issue or a good story about the AI talent wars. You can respond here or ping me securely on Signal.
Thanks for subscribing.
]]>If you’re like me, then lately you’ve scrolled past something on social media and thought, “Wait, was that real?” Deepfakes are everywhere, and they’re getting a lot more convincing.
That brings me to my Decoder guest today: Gaurav Misra, the CEO of Captions. You may not have heard of Captions yet, but you’ve probably seen a video that was generated using its AI models. The company’s Mirage Studio platform lets anyone generate AI versions of real people, and the results are alarmingly realistic.
Captions just put out a blog post titled, “We Build Synthetic Humans. Here’s What’s Keeping Us Up at Night.” It’s a good overview of the state of deepfakes and where they’re headed.
As the CEO of a company building deepfake technology, I wanted to know what specifically keeps Gaurav up at night, which you’ll hear us get into. I’m generally more optimistic about the long-term impacts of AI than a lot of people, but as you’ll hear in this conversation, I’m a lot more nervous about this topic.
Ultimately, I came away from this episode unsettled by the fact that the deepfakes of today are the least believable they’ll ever be, we are not ready, and the companies building this tech are racing ahead anyway.
If you’d like to read more on what we talked about in this episode, check out the links below:
Questions or comments about this episode? Hit us up at decoder@theverge.com. We really do read every email!
]]>Happy Friday. I’m back from vacation and still getting caught up on everything I missed. AI researchers moving jobs is getting covered like NBA trades now, apparently.
Before I get into this week’s issue, I want to make sure you check out my interview with Perplexity CEO Aravind Srinivas on Decoder this week. It’s a good deep dive on the main topic of today’s newsletter. Keep reading for a scoop on Substack and more from this week in AI news.
So far, when most people think of the modern AI boom, they think of a chatbot like ChatGPT. Now, it’s becoming increasingly clear that the web browser is where the next phase of AI is taking shape.
The reason is simple: the chatbots of today don’t have access to your online life like your browser does. That level of context — read and write access to your email, your bank account, etc. — is required if AI is going to become a tool that actually goes off and does things for you.
Two recent product releases point to this trend. The first is OpenAI’s ChatGPT Agent, which uses a basic browser to surf the web on your behalf. The second is Comet, a desktop browser from Perplexity that takes it a step further by allowing large language models to access logged-in sites and complete tasks on your behalf. (OpenAI is rumored to be planning its own full-fledged browser.)
Neither ChatGPT Agent nor Comet works reliably at the moment, and access to both is currently gated to expensive subscription tiers due to the higher compute costs required to run the reasoning models they necessitate. Perhaps most frustratingly, both products claim to do things they can’t, not just in marketing materials, but in the actual product experience.
ChatGPT Agent is a read-only browser experience — it can’t access a logged-in site like Comet — and that severely limits its usefulness. It’s also very slow. My colleague Hayden Field asked it to find a particular kind of lamp on Etsy, and ChatGPT Agent took 50 minutes to come back with a response. It also failed to add items to her Etsy cart, despite claiming it had done so.
While Comet is nowhere near as slow, I’ve had numerous experiences with it claiming it has completed tasks it hasn’t, or stating it can do something, only to immediately tell me it can’t after I make a request. Its sidecar interface, which places the AI assistant to the right of a webpage, is excellent for read-only tasks, such as summarizing a webpage or researching something specific I’m looking at. But as I told Perplexity CEO Aravind Srinivas on Decoder this week, the overall experience feels quite brittle.
It’s easy to be a cynic and think the current state of products like Comet is the best AI can do at completing tasks on the web. Or, you can look at the last few years of progress in the industry and make the bet that the same trend line will continue.
During our chat this week, Srinivas told me he’s “betting on progress in reasoning models to get us there.” OpenAI built a custom reasoning model specifically for ChatGPT Agent that was trained on more complex, multi-step tasks. (The model has no public name and isn’t available via an API.)
Even with the many limitations and bugs that exist today, using Comet for just a few days has convinced me that the mainstream chatbot interface will merge with the browser. It already feels like taking a step back to merely prompt a chatbot versus interacting with a ChatGPT-like experience that can see whatever website I’m looking at. Standalone chatbots certainly aren’t going away, especially on smartphones, but the browser is what will unlock AI that actually feels like an agent.
Some noteworthy career moves
More to click on:
If you haven’t already, don’t forget to subscribe to The Verge, which includes unlimited access to Command Line and all of our reporting.
As always, I welcome your feedback, especially if you have thoughts on this issue or a story idea to share. You can respond here or ping me securely on Signal.
Thanks for subscribing.
]]>Hello, and welcome to Decoder! I’m Alex Heath, deputy editor at The Verge and author of the Command Line newsletter. I’m hosting our Thursday episodes while Nilay is out on parental leave.
Today, we’re talking about how AI is changing the way we use the web. If you’re like me, you’re probably already using apps like ChatGPT to search for things, but lately I’ve become very interested in the future of the web browser itself.
That brings me to my guest today: Perplexity CEO Aravind Srinivas, who is betting that the browser is where more useful AI will get built. His company just released Comet, an AI web browser for Mac and Windows that’s still in an invite-only beta. I’ve been using it, and it’s very interesting.
Aravind isn’t alone here: OpenAI is working on its own web browser, and then there are other AI native web browsers out there like Dia. Google, meanwhile, may be forced to spin off Chrome if the US Department of Justice prevails in its big antitrust case. If that happens, it could provide an opening for startups like Perplexity to win market share and fundamentally change how people interact with the web.
In this conversation, Aravind and I also discussed Perplexity’s future, the AI talent wars, and why he thinks people will eventually pay thousands of dollars for a single AI prompt.
I hope you enjoy this conversation as much as I did.
This interview has been lightly edited for length and clarity.
Alright, Aravind, before we get into Comet and how it works, I actually want to go back to our last conversation in April for my newsletter Command Line. We were talking about why you were doing this, and you told me at the time that the reason we’re doing the browser is, “It might be the best way to build agents.”
That idea has stuck with me since then, and I think it’s been validated by others and some other recent launches. But before we get into things, can you just expand on that idea: Why do you think the browser is actually the route to an AI agent?
Sure. What is an AI agent? Let’s start from there. A rough description of what people want out of an AI agent is something that can actually go and do stuff for you. It’s very vague, obviously, just like how an AI chatbot is vague by definition. People just want it to respond to anything. The same thing is true for agents. It should be able to carry out any workflow end to end, from instruction to actual completion of the task. Then you boil that down to what does it actually need to do it? It needs context. It needs to pull in context from your third-party apps. It needs to go and take actions on those third-party apps on your behalf.
So you need logged in versions of your third-party apps. You need to access your data from those third-party apps, but do it in a way where it doesn’t actually constantly ask you to auth again and again. It doesn’t actually need your permission to do a lot of the things. At the same time, you can take over it and complete the things when it’s not able to do it because no AI agent is foolproof, especially when we are at a time when reasoning models are still far from perfection.
So you want this one interface that the agent and the human can both operate in the same manner: their logins are actually seamless, client-side data is easy to use, and controlling it is pretty natural, and nothing’s going to truly be damaging if something doesn’t work. You can still take over from the agent and complete it when you feel like it’s not able to do it. What is that environment in which this can be done in the most straightforward way without creating virtual servers with all your logins and having users worry about privacy and stuff like that? It’s the browser.
Everything can live on the client side, everything can stay secure. It only accesses information that it needs to complete the task in the literal same way you access those websites yourself, so that way you get to understand what the agent is doing. It’s not like a black box. You get full transparency and visibility, and you can just stop the agent when you feel like it’s going off the rails and just complete the task yourself, and you can also have the agent ask for your permission to do anything. So that level of control, transparency, trust in an environment that we are used to for multiple decades, which is the browser — such a familiar front end to introduce a new concept of AI is going and doing things for you — makes perfect sense for us to reimagine the browser.
How did you go about building Comet? When I first opened it, it felt familiar. It felt like Chrome, and my understanding is that it’s built on Chromium, the open-source substrate of Chrome that Google maintains, and that allows you to have a lot of easy data importing.
I was struck when I first opened it that it only took one click to basically bring all my context from Chrome over to Comet, even my extensions. So, why decide to go that route of building Comet on Chromium versus doing something fully from scratch?
First of all, Chromium is a great contribution to the world. Most of the things they did on reimagining tabs as processes and the way they’ve gone about security, encryption, and just the performance, the core back-end performance of Chromium as an engine, rendering engines that they have, is all really good. There’s no need to reinvent that. And at the same time, it’s an open-source project, so it’s easy to hire developers for Perplexity. They can work on the Comet browser, especially if it’s something that has open standards, and we want to continue contributing to Chromium also.
So we don’t want to just consume Chromium and build a product out of it, but we actually want to give back to the ecosystem. So that’s natural. And the second thing is, it’s the dominant browser right now.Chrome, and almost if you actually include Edge — which is also a Chromium fork — DuckDuckGo, Brave, they’re all Chromium forks, only Safari’s based on WebKit. So, it’s actually the dominant browser and there’s no need to reinvent the wheel here.
In terms of UI, we felt like it would be better to retain the most familiar UI people are already used to, which honestly is the Chrome UI. And Safari is a slightly different UI and some people like it, some people do not, and it’s still a much smaller share of the market. And imports need to work, otherwise you’re going to be like, ‘Oh, this is not working, oh, that thing doesn’t have all my personal contacts, I’m missing out on it. I don’t want to go through the friction of logging into all the apps again.’
I think that that was very important for us for the onboarding step, which is not only onboarding you as a human but also onboarding the AI. Because the moment you’re already logged into all the third-party apps that you are logged in on Chrome in the exact same security standards, the agent gets access to that on your client and can immediately show you the magic of the product.
And the agent is seeing it, but you, Perplexity, are not. You’re not using all of the Chrome data I instantly bring over to train on me or anything like that?
No. The agent only sees it when you ask a relevant prompt. For example, ‘Based on what I’ve ordered on Amazon in the last month, recommend me some new supplements’ or, ‘Go and order the magnesium supplement that I’ve already ordered frequently on Amazon.’ The agent only sees that for that one singular prompt and doesn’t actually store your entire Amazon history on our servers, and you can always ensure that your prompts get deleted from our servers.
So, even the prompts we can choose not to look at, even for fine-tuning purposes. Let’s say we want to make our agents good at an aggregate or like, users have done Amazon shopping queries, let’s go and make it better on that. We don’t even need to look at that if you choose to not retain your prompt. So that’s the level of privacy and security we want to offer.
At the same time, the frontier intelligence is all on the server side. This is one of the main reasons why Apple is struggling to ship all Apple Intelligence being on iOS or macOS or whatever, because I think there’s generally an expectation that everything needs to live on the client side. That’s not necessary to be private. You can still be pretty secure and private with frontier intelligence on the server. So that’s the architecture we brought in on Comet.
We are talking now a couple of weeks or so after Comet came out and it’s still invite-only — or I think it’s also restricted to your premium tier, your $200 a month tier — but you’ve been tweeting a lot of examples of how people have been using it. They’ve been using it to make Facebook ads, do FedEx customer support chat, run their smart home accessories, make Facebook marketplace listings, schedule calendar meetings, there’s been a lot of stuff that you’ve shown.
Unsubscribing from spam emails, which is a favorite use case of a lot of people.
So maybe that’s the one. But I was going to say, what has been the main use case you’ve seen so far that people are finding with Comet?
Actually, while these are the more glamorous use cases, I would say the boring dominant one is always invoking the sidecar and having it do stuff for you on the webpage you’re on. Not necessarily just simple summarization, but more complex questions. Let’s say I’m watching Alex Heath’s podcast with Zuckerberg or something and I want to know specifically what he said about a topic, and I want to take that and send it as a message to my teammates on Slack.
I think that’s the thing, you can just invoke the assistant on the site and do it instantly. It’s connected to your Gmail, your calendar. It’s also able to pull the transcript from the YouTube video. It has fine-grain access, and it’s immediately able to retrieve the relevant snippet. I can even ask it to play it from that exact timestamp instead of going through the entire transcript, like whatever I want. That is the level of advantage you have.
It almost feels like you should never watch a YouTube video standalone anymore unless you have a lot of time on your hands, and it’s fantastic. And people use it for LinkedIn. Honestly, searching over LinkedIn is very hard. It doesn’t have a working search engine, basically. So the agent figures out all these shortcuts, like how we figure out using these filters — people search, a connection search — and it’s able to give recruiting power that was never possible before. I would say it’s better than using LinkedIn Premium.
I’m glad you brought up the sidecar because for people who haven’t tried it or seen it, that is the main way Comet diverts from Chrome, is that you’ve got this AI assistant orchestration layer that sits on the side of a webpage that you can use to interact with the webpage and also just go off and do things.
That interface suggests that you see the web as being less about actually browsing. You just said no one really has time to watch a YouTube video and more about an action interface. Is the browsing part of the browser becoming less meaningful in the world of AI is what I’m wondering?
I think people are still going to watch YouTube videos for fun or exploration. But when I’m actually landing at a video — you do a lot of intellectual stuff, so it’s not always fun to watch the entire thing — but I like watching specific things in the video. And also, by the way, when I’m in the middle of work, I can’t be watching The Verge podcast. I want to instantly know what Zuckerberg might have said in your video about their cluster or something, and then on the weekend, I can go back and watch the entire thing. I might have a lot more time on my hands, so it’s not actually going to stop the regular browsing.
I actually think people are going to scroll through social platforms or watch Netflix or YouTube even more, I would say, because they have more time on their hands. The AI is going to do a lot of their work. It’s just that they would choose to spend it on entertainment more than intellectual work, so intellectual browsing. Or if people derive entertainment from intellectual stuff like intellectual entertainment, I think that’s fine, too.
Like reading books, all these things are fine, like reading blog posts that you otherwise wouldn’t get time to read when you’re in the middle of work. I think these are the kind of ways in which we want the browser to evolve where people launch a bunch of Comet assistant jobs, like tasks that would take a few minutes to complete in the background and they’re chilling and scrolling through X or whatever social media they like.
Your tagline for Comet is enabling people to “Browse at the speed of thought.” I find that there’s actually a very steep learning curve to understanding what it can do.
By the way, Alex, I want to make one point. There was some article either from The Verge or somewhere else that Google was trying to use Gemini to predict maximal engagement time on a YouTube video and show the ad around that timestamp. Perplexity on the Comet browser was using AI to exactly save your time, to get you the exact timestamp you want on a fine-grain basis and not waste your time. So often people ask, why would Google not do this and that? The incentives are completely different here.
And I want to get into that and I have a lot of business model questions about Comet because it is also very compute intensive for you and expensive to run, which you’ve talked about. But to my point about the learning curve and making it approachable, how do you do that? Because when I first opened it, it’s kind of like I don’t know what I can do with this thing. I mean, I go to your X account and I see all the things you’re sharing. But I do think there’s going to be a learning curve that the people building these products don’t necessarily appreciate.
No, no, I appreciate that and it’s been the thing for me, myself as a user is that even though it’s fun to build all these agent use cases, it takes a while to stop doing things the usual way and start using the AIs more, which includes even basic things like what reply you type onto an email thread. Even though Google has these automatic suggested replies, I don’t actually usually like it and it doesn’t often pull context from outside Gmail to help me do that. Or like checking on unread Slack messages. I usually just go open Slack as a tab and try to scroll through those 50, 100 channels I’m on, clicking each of those channels, reading all the messages that are unread. It takes time to actually train myself to use Comet. So what we plan to do is actually publish a lot of the early use cases on educational material and have it be widely accessible.
I think it’s going to go through the same trajectory that chatbots had. I think in the beginning when ChatGPT was launched, I’m sure not a lot of people knew how to use it. What are all the ways in which you could take advantage of it? In fact, I still don’t think people really… It’s not really a widespread thing. There are some people who really know how to use these AI tools very well and most people have used it at least once or twice a week, and they don’t actually use it in their day-to-day workflows.
The browser is going to go through a similar trajectory, but on the other hand, the one use case that’s been very natural, very intuitive that you don’t even have to teach people how to use this is the sidecar. It’s just picked up so much that I feel like it’ll be so intuitive. It’ll almost be like, without the sidecar, why am I using the browser anymore? That’s how it’s going to feel.
It does quickly make the traditional chatbot, the Perplexity or ChatGPT interface, feel a little arcane when you have the sidecar with the webpage.
Exactly, a lot of people are using ChatGPT for… You’re on an email and you want to know how to respond, so you copy / paste a bunch of context. You go there, you ask it to do something, and then you copy / paste it back. You edit it finally in your Gmail box or you do it in your Google Sheets or Google Docs. Comet is just going to feel much more intuitive. You have it right there on the side and you can do your edits, or you’re using it to draft a tweet, or Elon Musk posts something and you want to post a funny response to that. You can literally ask Comet, ‘Hey, draft me a funny reply tweet to that,’ and it’ll automatically have it ready for you. You literally have to click the post button.
All that stuff is going to definitely reduce the amount of times you really open another tab and keep asking the AI. And firing up jobs right from your current website to go pull up relevant context for you and having it just come back and push notify you when it’s ready, that’s feeling like another level of delegation.
Where is Comet struggling based on the early data you’ve seen?
It’s definitely not perfect yet for long-horizon tasks, something that might take 15 minutes or something. I’ll give you some examples. Like I want a list of engineers who have studied at Stanford and also worked at Anthropic. They don’t have to be currently working at Anthropic, but they must have worked at Anthropic at least once. I want you to give me an exhaustive list of people like that ported over to Google Sheets with their LinkedIn URLs, and I want you to go to ZoomInfo and try to get me their email so that I can reach out to them. I also want you to bulk draft personalized cold emails to each of them to reach out to for a coffee chat.
I don’t think Comet can do this today. It can do parts of it, so you still have to be the orchestrator stitching them together. I’m pretty sure six months to a year from now, it can do the entire thing.
You think it happens that quickly?
I’m betting on progress in reasoning models to get us there. Just like how in 2022, we bet on models like GPT-4 and Claude 3.5 Sonnet to arrive to make the hallucination problem in Perplexity basically nonexistent when you have a good index and a good model. I’m betting on the fact that in the right environment of a browser with access to all these tabs and tools, a sufficiently good reasoning model — like slightly better, maybe GPT-5, maybe like Claude 4.5, I don’t know — could get us over the edge where all these things are suddenly possible and then a recruiter’s work worth one week is just one prompt: sourcing and reach outs. And then you’ve got to do state tracking.
It’s not just about doing this one task, but you want it to keep following up, keep a track of their responses. If some people respond, go and update the Google Sheets, mark the status as responded or in progress and follow up with those candidates, sync with my Google calendar, and then resolve conflicts and schedule a chat, and then push me a brief ahead of the meeting. Some of these things should be proactive. It doesn’t even have to be a prompt.
That’s the extent to which we have an ambition to make the browser into something that feels more like an OS where these are processes that are running all the time. And it’s not going to be easy to do all this today, but in general, we have been successful at identifying the sweet spots where things that are currently on the edge of working and we nail those use cases, get the early adopters to love the product, and then ride the wave of progress and reasoning models. That’s been the strategy.
I’m not sure if it’s just the reasoning models or it’s just the product’s early or I haven’t figured out how to use it correctly. My experience—
It’s not like I’m saying everything will work out of the box with a new model. You really have to know how to harness the capabilities and have the right evals and version control the prompts and do any post-training of auxiliary models, which is basically our expertise. We are very good at these things.
I would say that based on — and I’ll caveat that I haven’t spent weeks yet with it — but based on my early experience with it, I would describe it as a little brittle or unpredictable in terms of the success rate. I asked it to take me to the booking page for a very specific flight that I wanted and it did it. It took me to the page and it filled in some stuff, whereas the normal Perplexity or ChatGPT interface would just take me to the webpage. It actually took me a little bit further. It didn’t book it, but it took me further, which was good.
But then I asked it like, “Create a list of everyone who follows me on X that works at Meta,” and it gave me one person, and I know for a fact there’s many more than that. Or for example, I said, “Find my last interview with the CEO of Perplexity,” and it said it couldn’t, but then it showed a source link to the interview, so the answer said it but the source didn’t. I see some brittleness in the product and I know it’s early, but I’m just wondering is all of that just bugs or is that anything inherent in the models or the way you’ve architected it?
I can take a look at it if you can share the link with me, but I would say the majority of the advertised use cases that we ourselves advertised are things that are expected to work. Now, will it always 100 percent of the time work in a deterministic way? No. Are we going to get there in a matter of months? I think so, and you have to be timing yourself where you’re not exactly waiting for the moment where everything works reliably. You want to be a little early, you want to be a little edgy, and I think there are some people who just love feeling being part of the ride, too.
The majority of the users are going to wait until everything works stable, so that’s why we think the sidecar is already a value add for those kinds of people where they don’t have to use the agents that much. They can use the sidecar, they can use Gmail, they can use calendar connectors, they can use all those LinkedIn search features, YouTube, or just basic stuff like searching over your own history. These are things that already work well and this is already a massive value add over Chrome. And once several minutes’ worth of long-horizon tasks start working reliably, that’s going to make it feel more than just a browser. That’s when you make it feel like an OS. You want everything in that one container, and you’ll feel like the rest of the computer doesn’t even matter.
We started this conversation talking about how you think the browser gives you this context to be able to create an actually useful agent, and there’s this other technical path that the industry is looking at and getting excited about, which is MCP, model context protocol. And at a high level, it’s just this orchestration layer that lets an LLM talk to Airtable, Google Docs, whatever, and do things on your behalf in the same way that Comet is doing that in the sidecar.
You’re going at this problem through the browser and through the logged-in state of the browser that you talked about and that shortcut, while a lot of people — Anthropic and others, OpenAI — are looking at MCP as maybe the way that agents actually get built at scale. I’m curious what you think of those two paths, and are you just very bearish on MCP or do you think MCP is for other kinds of companies?
I’m not extremely bearish on MCP. I just want it to mature more, and I don’t want to wait. I want to ship agents right now. I feel like AI as a community, as an industry has just been talking about agents for the last two years and no one’s actually shipped anything that worked. And I got tired of that and we felt like the browser is a great way to do that today.
MCP is going to definitely play a contributing factor to the field in the next five years. There’s still a lot of security issues they need to figure out there. Having your authentication tokens communicated from your client to an MCP server or from a remote MCP server to another client, all these things are pretty risky today, way more risky than just having your persistent logins on your client on the browser. The same issues exist with OpenAI’s Operator, which tries to create server-side versions of all your apps.
I think there’s going to be some good MCP connectors that we’ll definitely integrate with Linear or Notion. I guess GitHub has an MCP connector. So whenever it makes sense to use those over an agent that just opens these tabs and scrolls through them and clicks on things, we’re going to use that. But it’s always going to be bottlenecked by how well these servers are maintained and how you orchestrate these agents to use the protocol in the right way. It doesn’t solve the search problem on those servers, by the way. You still have to go and figure out what data to retrieve.
You define it as the orchestration layer. It’s not the orchestration layer, it’s just a protocol for communicating between servers and the client, or one server or another server. But it’s still not solving the problem of reasoning and knowing what information to extract and knowing what actions to take and all that chaining together different steps, trying things when things don’t work. Whereas the browser is basically something that’s been designed for humans to actually operate in, and extracting a DOM and knowing what actions to take seems to be something that these models, the reasoning models, seem to be pretty good at.
So we are going to do a hybrid approach and see what works best. In the end, it has to be fast, it has to be reliable, and it has to be cheap. So if MCP lets us do that better than the browsing agent, then we’ll do that. There’s no dogmatic mission here.
At The Verge, we care a lot about the way our website looks and feels, the art of it, the visual experience, and with all this agent talk and it collapsing into browsers, I’m curious what you think happens to the web and to websites that devote a lot to making their sites actually interesting to browse. Does the web just become a series of databases that agents are crawling through MCP or whatever and this entire economy of the web goes away?
No. I actually think if you have a brand, people are going to be interested in knowing what that brand thinks, and it might go to you, the individual, or it might go to Verge, or it might go to both. It doesn’t matter. So even within Verge, I might not be interested in articles written by some other people. I might be interested in specific people who have data content or something. So I think the brand will play an even bigger role in a world where both AIs and humans are surfing the web, and so I don’t think it’s going to go away. Maybe the traffic for you might not even come organically. It might come through social media. Let’s say you publish a new article, some people might come click on it through Instagram or X or LinkedIn. It doesn’t matter.
And whether it would be possible for a new platform to build traffic from scratch by just doing the good old SEO tricks, I’m actually bearish on that. It’s going to be difficult to create your own presence by just playing the old playbook. You’ve got to build your brand through a different manner in this time period, and the existing ones who are lucky enough to already have a big brand presence, they have to maintain the brand also with a different playbook, not just doing SEO or traditional search engine growth tactics.
On Comet as a business, it’s very compute-intensive and it’s still invite-only. I imagine you wish you could just throw the gates open and let anyone use it, but it would melt your servers or your AWS bills, right? So how do you scale this thing? Not only do you scale it from the product sense and it becomes a thing that normal people can easily use and understand that curve of learning it that we talked about, but also just the business of it. You’re not profitable, you’re venture-backed, you have to make money one day, you have to be profitable. How do you scale something like this that is actually even more compute-intensive than a chatbot?
I think if the reliability of these agents gets good enough, you could imagine people paying usage-based pricing. You might not be part of the max subscription tier of $200 a month or anything, but there’s one task you really desperately want to get done and you don’t want to spend three hours doing that, and as long as the agent actually completes and you’re satisfied with the response rate, the success rate, you’ll be okay with trusting the agent to paying an advance fee of $20 for the recruiting task I described, like give me all the Stanford alumni who worked at Anthropic.
I think that is a very interesting way of thinking about it, which is otherwise going to cost you a lot more time or you have to hire a sourcing consultant, or you have to hire a full-time sourcer whose only job is that. If you value your time, you’re going to pay for it.
Maybe let me give you another example. You want to put an ad on Meta, Instagram, and you want to look at ads done by similar brands, pull that, study that, or look at the AdWords pricing of a hundred different keywords and figure out how to price your thing competitively. These are tasks that could definitely save you hours and hours and maybe even give you an arbitrage over what you could do yourself, because AI is able to do a lot more. And at scale, if it helps you to make a few million bucks, does it not make sense to spend $2,000 for that prompt? It does, right? So I think we’re going to be able to monetize in many more interesting ways than chatbots for the browser.
It’s still early, but the signs of life are already there in terms of what kind of use cases people have. And if you map reduce your cognitive labor in bulk to an AI that goes and does it reliably, it almost becomes like your personal AWS cluster with natural language-described tasks. And I think we have to execute on it, but if we do execute on it and if the reasoning models are continuing to work well, you could imagine something that feels more like Cloud Code for life. And Cloud Code is a product that people are paying $1,000 a month also because, even though it’s expensive, it helps you maybe get a promotion faster because you’re getting more work done and your salary goes up, and it feels like the ROI is there.
Are you betting so much on the browser for the next chapter of Perplexity because the traditional chatbot race has just been completely won by ChatGPT? Is Perplexity as it exists today going away and the future of it is just going to be Comet?
I wouldn’t say that I’m betting on it because the chatbot race is over. Let me decouple the two things. The chatbot race does seem like it’s over in the sense that it’s very unlikely that people think of another product for day-to-day chat. From the beginning, we never competed in that market. We were always competing on search. We were trying to reimagine search in the conversational style. Yes, every chatbot has search integrations. Some people like that, some people still like a more search-like interface that we have, so we never wanted to go after that market and we are not competing there either. Google is trying to catch up and Grok’s trying to catch up, Meta’s trying to catch up, but I feel like all that is wasted labor in my opinion at this point.
But the way I would phrase it is the browser is bigger than chat. It’s a more sticky product, and it’s the only way to build agents. It’s the only way to build end-to-end workflows. It’s the only way to build true personalization, memory, and context. And so it’s a bigger price in my opinion than trying to nail the chat game, especially in a market that’s so fragmented. And it’s a much harder problem to crack, too, in terms of intelligence, how you package it, how you context engineer it, how you deal with all the shortcomings at the current moment, as well as end-user-facing UX — which could be the front end, the back end, the security, the privacy, and all the other bugs that you’ get to deal with when working with a much more multifaceted product like the browser.
Do you think that’s why OpenAI is going to be releasing a browser? Because they agree with that?
I don’t know if they are. I’ve read the same leaks that you have, and it was very interesting it came two hours after we launched. You also made another point about Perplexity being ignored and Comet being the next thing. I don’t see it that way because you cannot build a browser without a search. A lot of people praised the Comet browser because it doesn’t feel like another browser. You know why? One of the main reasons is, of course we have the sidecar and we have the agent and all that, but the default search is Perplexity. And we made it in a way where even if you’re having an intent to navigate, it’ll understand that.
It’ll give you four or five links if it feels like it’s a navigational query, it’ll give you images pretty quickly. It’ll give you a very short answer also, so you can combine informational queries or navigational queries, agent queries in one single search box. That is only doable if you actually are working on the search problem, which we’ve been working on since the last two and a half years. So I would say I don’t see it as two separate things. Basically, you cannot build a product like Chrome without building Google. Similarly, you cannot build a product like Comet without building Perplexity.
So is there a Comet standalone mobile app and a standalone Perplexity app?
Yeah, there will be standalone apps for both. Some people are going to use the standalone Comet app just like how they use Chrome or Safari, and it’s okay. They probably won’t do that because it’s going to have an AI that you can talk to on every webpage, including in voice mode actually. But you still want to just navigate and get to a website quickly. I just want to go and browse Verge without actually having any question in my mind, that’s fine. And I could go to Perplexity and have all the other things the app has like Discover feeds and Spaces and just quick, fast answers without the web interface. That’s fine, too.
We are going to support a packaged version of the browser Comet within the Perplexity app, just like how the Google app still supports navigation like Chrome. So, by the way, both the Google app and the Chrome app are WebKit apps on iOS. Similarly, both the Google app and the Chrome app are Chromium apps on Android. We’ll have to follow the same trajectory.
Speaking of competition, I’m curious what you think of Dia, what The Browser Company has done. They released it around the same time as you, they’re moving in this direction as well. Obviously they’re a smaller startup, but they got a lot of buzz with Arc, their original browser, and now seem to be betting on the same idea that you have with Comet. I’m curious if you’ve gotten to try it or how you think it will stack up against Comet.
I haven’t tried it myself. I’ve seen what other people have said. I think they have some interesting ideas on the visuals on the front end. And if I were them, I would’ve just tried it in the same browser they had instead of going and trying to build distribution on a new one. But yeah, it’s interesting. We are definitely going to study every product out there. Our focus, though, more goes on Chrome. It is the big brother. And the way I think about it is even if I take 1 percent of the Chrome users, set their default as Comet, that’s a massive, massive win for us and a massive loss for them, too, by the way, because any ad revenue lost is massive at that scale.
Is word of mouth the main way you’re going to grow Comet or are you looking for distribution partnerships beyond that?
In the beginning, we’re going to do more word of mouth growth. It’s very powerful. It’s worked out well for us in the past with Perplexity itself, and we’re going to try to follow the same trajectory here. And luckily we have an installed base of Perplexity already of 30 to 40 million people. So even if we get a good chunk of those people to try out Comet and convert some of those people who tried it into setting it as default, it’ll already be a massive victory without relying on any distribution partnerships.
And then we’re obviously going to try seeing how to convert that progress into a partnership like Google has with a bunch of people. I just want to caveat that by saying it’s going to be extremely hard. We’ve spoken about this in the past where Google makes sure every Android phone has Google Chrome as a default browser and you cannot change that.
You lose a lot of money if you change that. And Microsoft makes sure every Windows laptop is coming with Edge as the default browser. Again, you cannot change that. You will lose a lot of money if you change that. Now the next step is okay, let them be the default browser, at least can you have your app as part of the Android or Windows build? You still cannot change that easily. Especially on Windows, it’s basically pretty impossible to convince large OEMs to change that. So they have all these agreements that are several years locked in, and you work with companies that plan for the device that they’re shipping two years in advance.
That’s their mode in some sense. It’s not even the product, it’s not even exactly in the distribution world, it’s more in the legalities of how they crafted these agreements, which is why I’m happy that the DOJ is at least looking into Google. And we’ve made a list of recommendations on that, and I hope something happens there.
Yeah, it may have forced a spinoff of Chrome, which would be really interesting and reset things. There’s a lot of people that think Apple should buy you. And Eddy Cue, one of their top execs, actually had some pretty nice things to say about you on the stand when he was there during the Google trial and said that you guys had talked about working together. Obviously you can’t talk about something that hasn’t been announced yet, especially with Apple, but yeah, what do you make of that and Apple?
I mean, I’m firstly honored by Eddy mentioning us in the trial as a product that he likes, and he’s heard from his circles that people like it. I would love to work with Apple on integrations with Safari or Siri or Apple Intelligence. It’s the one product that almost everybody loves using or it’s a status symbol. Everybody wants to graduate using an Apple device.
So I’m pretty sure that we share a lot of design aesthetics in terms of how we do things and how they do things. At the same time, my goal is to make Perplexity as big as possible. It’s definitely possible that this browser is so platform-agnostic that it can benefit Android and iOS ecosystems, Windows and Mac ecosystems, and we can be pretty big on our own just like Google was. Of course, Google owns Android, but you could imagine they would’ve been pretty successful if they just had the best search engine and the best browser and they didn’t actually own the platform either.
I and others also reported that Mark Zuckerberg approached you about potentially joining Meta and working on his reboot of their AI efforts. What was Zuck’s pitch? I’m curious. Tell me.
Zuck is awesome. He’s doing a lot of awesome things, and I think Meta has such a sticky product. It’s fantastic, and we look at that as an example of how it’s possible to build a large business without having any platform yourself.
Were you shocked by the numbers that Zuck is paying for top AI research? These nine-figure compensation offers. I think a lot of them are actually tied to Meta stock needing to increase for those numbers to be paid. So it’s actually pretty contingent on the business and not just guaranteed payouts, but still huge numbers.
Yeah, huge. And definitely, I was surprised by the magnitude of the numbers. Seems like it’s needed at this point for them, but at the same time, Elon and xAI have shown you don’t need to spend that much to train models competitive with OpenAI and Anthropic. So I don’t know if money alone solves every problem here.
You do need to have a team that works well together, has a proper mission alignment and milestones, and in some sense, failure is not an option for them. The amount of investment is so big and I feel like the way Zuck probably thinks is, ‘I’m going to get all the people, I’m going to get all the compute and I’m going to get all the milestones set up for you guys, but now it’s all on you to execute and if you fail, it’s going to look pretty bad on me so you better not fail.’ That’s probably the deal.
What are the second order effects to the AI talent market, do you think, after Zuck’s hiring spree?
I mean, it’s definitely going to feel like a transfer market now, right? Like an NBA or something. There’s going to be a few individual stars who are having so much leverage. And one thing I’ve noticed is Anthropic researchers are not the ones getting poached.
Mostly. He has poached some, but not as many.
Yeah. So it does feel like that’s something labs need to work on, which is truly aligning people on one mission. That money alone is not the motivator for them. And as the company, your company’s doing well, the stock is going up and you feel dopamine from working there every day. You’re encountering new kinds of challenges, you feel a lot of growth, you’re learning new things, and you’re getting richer, too, along the way. Why would you want to go?
Do you think strongly about getting Perplexity to profitability to be able to control your own destiny, so to speak?
Definitely, it’s inevitable. We want to do it before the IPO and we think we can IPO in 2028 or 9. I would like to IPO, by the way, just to be clear. I don’t want to stay private forever like some of the companies have chosen to do so. Even though it gives you advantages in M&As and decision-making power, I do think the publicity and the marketing you get from an IPO and the fact that people can finally invest in a search alternative to Google is a pretty massive opportunity for us to IPO.
But I don’t think it makes sense to IPO before hitting $1 billion in revenue and some profitability along the way. So that’s definitely something we want to get to in the next four or three years. But I don’t want to stunt our own growth and not be aggressive and try new things today.
Makes sense. So, you launched Perplexity, and it’s crazy that it’s already been just over three years now, and it was right around when ChatGPT first launched. It’s wild to think about everything we’ve talked about and that all this has happened in barely three years. So maybe this is an impossible question, but I want to leave you with this question. If you look out three years from now, you just talked about the IPO, which is interesting, but what does Perplexity look like three years from now?
I hope it becomes the one tool you think of when you want to actually get anything done. And it has a lot of deep connection to you because it synchronizes with all your context and proactively thinks on your behalf and truly makes your life a lot easier.
Alright, we’ll leave it there. Aravind, thanks.
Thank you.
Questions or comments about this episode? Hit us up at decoder@theverge.com. We really do read every email!
]]>Welcome to Decoder! I’m Alex Heath, Deputy Editor at The Verge and author of the Command Line newsletter. This is the first in a series of Thursday Decoder episodes that I’ll be hosting while Nilay is out on parental leave.
I’ve been covering AI a lot at The Verge, and I’m excited to start sharing some of the conversations I regularly have with leaders in the space here. The plan is for each episode to focus on a specific theme, from the rise of deepfakes to how AI is reimagining the browser.
This week, I’m focusing on how AI companies talk about what they’re building. My guest is Ellis Hamburger. He’s the founder of Meaning, a marketing firm that works with a lot of buzzy AI startups. Ellis actually used to work at The Verge shortly after it first launched in 2012, when he covered the early mobile app boom.
Now, he’s in the trenches with a lot of AI startups, helping them figure out how to present their products to the world. That gives him a pretty unique perspective.
First, some disclosures: Ellis has a lot of clients that we cover at The Verge, including Nothing, Raycast, Readwise, Daylight, Friend, Mainframe, Tolan, and more. He also previously worked at The Browser Company and Snap. We recorded this episode together in Los Angeles, and as you’ll probably be able to tell, Ellis and I have been friends for a long time.
I’ve always found Ellis to be an original thinker, and I hope you find our conversation as interesting as I did.
If you’d like to read more on what we talked about in this episode, check out the links below:
Questions or comments about this episode? Hit us up at decoder@theverge.com. We really do read every email!
]]>So far, Runway is known for bringing generative AI to Hollywood. Now, the $3 billion startup is setting its sights on the gaming industry.
This week, I was granted access to a new interactive gaming experience that Runway plans to make available to everyone as soon as next week, according to CEO Cristóbal Valenzuela. The consumer-facing product is currently quite barebones, with a chat interface that supports only text and image generation, but Valenzuela says that generated video games are coming later this year. He says that Runway is also in talks with gaming companies about both using its technology and accessing their datasets for training.
Based on his recent conversations, Valenzuela believes the gaming industry is in a similar position to Hollywood when it was first introduced to generative AI. There was considerable resistance, but over time, AI has been gradually adopted in more areas of the production process. Valenzuela says Amazon’s recent show, House of David, was made in part with Runway’s technology, and that his company is working with “pretty much every major studio” and “most of the Fortune 100 companies.”
“If we can help a studio make a movie 40 percent faster, then we’re probably gonna be able to help developers of games make games faster,” he says. “They’re waking up, and they’re moving faster than I would say the studios were moving two years ago.”
Naturally, I couldn’t let Valenzuela get off our Zoom call without asking him about his recent acquisition talks with Zuckerberg: “I think we have more interesting intellectual challenges being independent, and remaining independent for now.”
This story was first published in Command Line.
]]>During a company-wide all-hands meeting on Thursday, some of Meta’s top executives were asked about the “$100 million signing bonuses” that OpenAI CEO Sam Altman claimed they had been offering to poach his employees.
“Sam is just being dishonest here,” Andrew Bosworth, Meta’s CTO, said at the meeting when asked about Altman’s remarks. “He’s suggesting that we’re doing this for every single person… Look, you guys, the market’s hot. It’s not that hot.”
The “$100 million bonus” headline has rightfully become a meme on social media since Altman said the number on his brother’s podcast. “What Sam neglects to mention is that he’s countering all these offers, creating a small market for a very, very small number of people who are for senior, senior leadership roles” in the new superintelligence AI team Meta is building, Bosworth told Meta employees today. “That is not the general thing that’s happening in the AI space. And of course, he’s not mentioning what the actual terms of the offer are. It’s not [a] sign-on bonus. It’s all these different things.”
Bosworth then referenced recent stories about a handful of OpenAI researchers who are joining Meta and said there are “quite a few more in the pipeline that I can’t announce or share right now.”
“Sam is known to exaggerate, and in this case, I know exactly why he’s doing it, which is because we are succeeding at getting talent from OpenAI,” he said. “He’s not very happy about that.”
At the Thursday meeting, there were many employees present from the company’s engineering “bootcamp,” a multi-week onboarding program that assigns new hires to various teams. “For all the new bootcampers here, you didn’t screw up,” Bosworth said to laughs and claps from the audience. “You made a great decision. Comp is right where it should be.”
Bosworth wasn’t the only Meta exec to mention OpenAI during the internal meeting. CPO Chris Cox also acknowledged that, while Meta AI has one billion monthly users, engagement “is not nearly as deep as the way that people are using ChatGPT.” The standalone Meta AI app has only 450,000 daily users, he told employees, and “a lot of those folks” are using it to manage their Ray-Ban Meta glasses.
“We are not going to go right after ChatGPT and try and do a better job with helping you write your emails at work,” Cox said. “We need to differentiate here by not focusing obsessively on productivity, which is what you see Anthropic and OpenAI and Google doing. We’re going to go focus on entertainment, on connection with friends, on how people live their lives, on all of the things that we uniquely do well, which is a big part of the strategy going forward.”
Meta declined to comment on the internal meeting.
When I spoke with Jason Rugolo on Thursday, I wanted to understand why he is suing the most influential company in tech.
Rugolo’s AI device startup, Iyo, recently won a temporary restraining order that bars OpenAI from using the “io” brand for Sam Altman’s new hardware division with Jony Ive. In response, Altman took to his X account to suggest that Rugolo filed his trademark lawsuit because OpenAI refused to invest in or buy Iyo, which is gearing up to release its first AI-powered, in-ear headphones later this year.
Rugolo acknowledges (and documents submitted to the court confirm) that he pitched Altman on investing multiple times. He also discussed an acquisition with io team members this year. Still, he says his lawsuit isn’t part of some revenge crusade, but rather intended to eliminate any confusion between his forthcoming Iyo One headphones and Altman’s io.
Trademark lawsuits are a dime a dozen, but this one has broken through for good reason. There’s intense interest in what Altman and Ive are building (the first device apparently won’t be an “in-ear” product or a “wearable”), and the case is a Rorschach test for how you feel about Altman, who is undoubtedly polarizing.
“I had a massive change in opinion on the guy,“ Rugolo tells me of Altman. “While I was meeting with them, I was under the spell of Sam Altman being a great entrepreneur and a really interesting person. That broke pretty instantly after their public announcement [of io].”
“Am I getting screwed here?” Rugolo recalls thinking. “When I talked to him on the phone and he made a Sopranos threat to sue me, I was just like, ‘Alright, this guy is a bad dude.’” Now, he says that Altman is trying to “manipulate the arguments in the public sphere” and “make me look like a money grubber or a sore loser, and I just don’t think it’s gonna work.”
“This is a baseless trademark dispute and not a case about stolen ideas or technology,” OpenAI spokesperson Kayla Wood says in a statement shared with me. “Iyo demoed a product in May 2025 that didn’t function properly or meet our standards in hopes that we’d acquire Iyo. We passed. Jason Rugolo was also well aware of the io name and never raised concerns before our announcement.”
Thanks to the millions of dollars he recently raised from his manufacturer, Pegatron, and a billionaire whom he refuses to name, Rugolo says Iyo has enough runway to last it through the end of 2026. When I ask if the device he teased in his viral TED talk last year is indeed shipping later this year, he says he’s about to fly to China to “basically be living at the factory.”
While he’s ready to go through the legal discovery process and take his case to trial, he hopes that OpenAI will “put their guns away” and “complete like grown-ups on product.”
“I will meet them in the market,” he tells me. “We will both try to launch stuff that’s really cool and see if we can serve our customers. They’ll just compete fairly and stop using the name. They have some of the best designers in the world, apparently. Think of a new name. You just can’t use the one that I told you about already, and that I’ve been using since 2019.”
So far, Runway is known for bringing generative AI to Hollywood. Now, the $3 billion startup is setting its sights on the gaming industry.
This week, I was granted access to a new interactive gaming experience that Runway plans to make available to everyone as soon as next week, according to CEO Cristóbal Valenzuela. The consumer-facing product is currently quite barebones, with a chat interface that supports only text and image generation, but Valenzuela says that generated video games are coming later this year. He says that Runway is also in talks with gaming companies about both using its technology and accessing their datasets for training.
Based on his recent conversations, Valenzuela believes the gaming industry is in a similar position to Hollywood when it was first introduced to generative AI. There was considerable resistance, but over time, AI has been gradually adopted in more areas of the production process. Valenzuela says Amazon’s recent show, House of David, was made in part with Runway’s technology, and that his company is working with “pretty much every major studio” and “most of the Fortune 100 companies.”
“If we can help a studio make a movie 40 percent faster, then we’re probably gonna be able to help developers of games make games faster,” he says. “They’re waking up, and they’re moving faster than I would say the studios were moving two years ago.”
Naturally, I couldn’t let Valenzuela get off our Zoom call without asking him about his recent acquisition talks with Zuckerberg: “I think we have more interesting intellectual challenges being independent, and remaining independent for now.”
No one knows what AGI actually means. That much is clear from this excellent deep dive from The Information into Microsoft’s deal with OpenAI. There has been a lot of good reporting on the negotiations between the two companies, but this piece is the most comprehensive and detailed I’ve seen yet. It states that Microsoft will no longer receive exclusive access to OpenAI’s IP if it achieves “sufficient AGI,” which is contractually defined as when OpenAI’s board determines that the AI “has the capability to generate” the maximum profits its investors are entitled to. Amazingly, OpenAI doesn’t have to actually generate these profits.
Two under-the-radar deals: Although they haven’t garnered many headlines, OpenAI announced an interesting partnership and a small acquisition this week. The first is a deal with Applied Intuition to “advance next-generation, AI-powered experiences in vehicles.” The second is the acquisition of the small team at Crossing Minds, an AI startup that helped e-commerce companies offer more personalized product recommendations. “Personally, joining OpenAI’s research team to focus on agents and information retrieval is a unique honor,” Crossing Minds founder Alexandre Eobicque writes. “These are precisely the problems I’ve always been passionate about: how systems learn, reason, and retrieve knowledge at scale, in real-time.”
Some interesting career moves in tech:
More to click on:
If you haven’t already, don’t forget to subscribe to The Verge, which includes unlimited access to Command Line and all of our reporting.
As always, I welcome your feedback. You can respond here or ping me securely on Signal.
Thanks for subscribing.
]]>Thanks to a related trademark lawsuit, we know what OpenAI and Jony Ive’s first AI device won’t be.
In court filings submitted this month, leaders from io — the consumer hardware team OpenAI recently acquired from Jony Ive’s design studio for $6.5 billion — testified that the first device they plan to release won’t be an “in-ear device” or a “wearable.” They also say the AI device won’t ship until “at least” 2026.
“The prototype Sam Altman referenced in the video is at least a year away from being offered for sale,” Tang Tan, io’s chief hardware officer and a former Apple design leader, said in a June 16th declaration. “Its design is not yet finalized, but it is not an in-ear device, nor a wearable device.”
Over the weekend, OpenAI was forced to remove public references to the io brand (which stands for “input/output”) due to a temporary restraining order that was granted on behalf of an audio device startup called Iyo. To support its case that OpenAI willfully infringed on its trademark, Iyo provides emails showing that leaders from io and OpenAI, including CEO Sam Altman, knew about its existence and even asked to demo the product — a yet-to-be-released, in-ear headphone billed as “the world’s first audio computer.”
“For many months after its founding, io surveyed the existing commercial offerings and engaged in prototyping exercises, as it considered a broad range of form factors, including objects that were desktop-based and mobile, wireless and wired, wearable and portable,” reads OpenAI’s June 12th opposition to Iyo’s lawsuit. “As part of these early efforts, io purchased a wide range of earbuds, hearing aids, and at least 30 different headphone sets from a variety of different companies.” (TechCrunch’s Maxwell Zeff first reported on the court documents.)
“thanks but im working on something competitive so will respectfully pass!”
While Tan’s declaration states that io’s first piece of hardware won’t be an “in-ear device,” it’s clear from the evidence submitted in the case that io and OpenAI have considered the category. In one email from late March, an io employee named Marwan Rammah told Tang that they should consider buying 3D scans of human ears “as a helpful starting point on ergonomics and HF.” And in another email earlier that month, Altman responded to Iyo’s offer to personally invest in the company by writing: “thanks but im working on something competitive so will respectfully pass!”
I’d love to chat. You can reach me securely and anonymously on Signal.
At this point, it’s becoming easier to say which AI startups Mark Zuckerberg hasn’t looked at acquiring.
In addition to Ilya Sutskever’s Safe Superintelligence (SSI), sources tell me the Meta CEO recently discussed buying ex-OpenAI CTO Mira Murati’s Thinking Machines Lab and Perplexity, the AI-native Google rival. None of these talks progressed to the formal offer stage for various reasons, including disagreements over deal prices and strategy, but together they illustrate how aggressively Zuckerberg has been canvassing the industry to reboot his AI efforts.
Now, details about the team Zuckerberg is assembling are starting to come into view: SSI co-founder and CEO Daniel Gross, along with ex-Github CEO Nat Friedman, are poised to co-lead the Meta AI assistant. Both men will report to Alexandr Wang, the former Scale CEO Zuckerberg just paid over $14 billion to quickly hire. Wang told his Scale team goodbye last Friday and was in the Meta office on Monday. This week, he has been meeting with top Meta leaders (more on that below) and continuing to recruit for the new AI team Zuckerberg has tasked him with building. I expect the team to be unveiled as soon as next week.
Rather than join Meta, Sutskever, Murati, and Perplexity CEO Aravind Srinivas have all gone on to raise more money at higher valuations. Sutskever, a titan of the AI research community who co-founded OpenAI, recently raised a couple of billion dollars for SSI. Both Meta and Google are investors in his company, I’m told. Murati also just raised a couple of billion dollars. Neither she nor Sutskever is close to releasing a product. Srinivas, meanwhile, is in the process of raising around $500 million for Perplexity.
Spokespeople for all the companies involved either declined to comment or didn’t respond in time for publication. The Information and CNBC first reported Zuckerberg’s talks with Safe Superintelligence, while Bloomberg first reported the Perplexity talks.
While Zuckerberg’s recruiting drive is motivated by the urgency he feels to fix Meta’s AI strategy, the situation also highlights the fierce competition for top AI talent these days. In my conversations this week, those on the inside of the industry aren’t surprised by Zuckerberg making nine-figure — or even, yes, 10-figure — compensation offers for the best AI talent. There are certain senior people at OpenAI, for example, who are already compensated in that ballpark, thanks to the company’s meteoric increase in valuation over the last few years.
Speaking of OpenAI, it’s clear that CEO Sam Altman is at least a bit rattled by Zuckerberg’s hiring spree. His decision to appear on his brother’s podcast this week and say that “none of our best people” are leaving for Meta was probably meant to convey a position of strength, but in reality, it looks like he is throwing his former colleagues under the bus. I was confused by Altman’s suggestion that Meta paying a lot upfront for talent won’t “set up a great culture.” After all, didn’t OpenAI just pay $6.5 billion to hire Jony Ive and his small hardware team?
When I joined a Zoom call with Alex Himel, Meta’s VP of wearables, this week, he had just gotten off a call with Zuckerberg’s new AI chief, Alexandr Wang.
“There’s an increasing number of Alexes that I talk to on a regular basis,” Himel joked as we started our conversation about Meta’s new glasses release with Oakley. “I was just in my first meeting with him. There were like three people in a room with the camera real far away, and I was like, ‘Who is talking right now?’ And then I was like, ‘Oh, hey, it’s Alex.’”
The following Q&A has been edited for length and clarity:
How did your meeting with Alex just now go?
The meeting was about how to make AI as awesome as it can be for glasses. Obviously, there are some unique use cases in the glasses that aren’t stuff you do on a phone. The thing we’re trying to figure out is how to balance it all, because AI can be everything to everyone or it could be amazing for more specific use cases.
We’re trying to figure out how to strike the right balance because there’s a ton of stuff in the underlying Llama models and that whole pipeline that we don’t care about on glasses. Then there’s stuff we really, really care about, like egocentric view and trying to feed video into the models to help with some of the really aspirational use cases that we wouldn’t build otherwise.
You are referring to this new lineup with Oakley as “AI glasses.” Is that the new branding for this category? They are AI glasses, not smart glasses?
We refer to the category as AI glasses. You saw Orion. You used it for longer than anyone else in the demo, which I commend you for. We used to think that’s what you needed to hit scale for this new category. You needed the big field of view and display to overlay virtual content. Our opinion of that has definitely changed. We think we can hit scale faster, and AI is the reason we think that’s possible.
Right now, the top two use cases for the glasses are audio — phone calls, music, podcasts — and taking photos and videos. We look at participation rates of our active users, and those have been one and two since launch. Audio is one. A very close second is photos and videos.
AI has been number three from the start. As we’ve been launching more markets — we’re now in 18 — and we’ve been adding more features, AI is creeping up. Our biggest investment by a mile on the software side is AI functionality, because we think that glasses are the best form factor for AI. They are something you’re already wearing all the time. They can see what you see. They can hear what you hear. They’re super accessible.
Is your goal to have AI supersede audio and photo to be the most used feature for glasses, or is that not how you think about it?
From a math standpoint, at best, you could tie. We do want AI to be something that’s increasingly used by more people more frequently. We think there’s definitely room for the audio to get better. There’s definitely room for image quality to get better. The AI stuff has much more headroom.
How much of the AI is onboard the glasses versus the cloud? I imagine you have lots of physical constraints with this kind of device.
We’ve now got one billion-parameter models that can run on the frame. So, increasingly, there’s stuff there. Then we have stuff running on the phone.
If you were watching WWDC, Apple made a couple of announcements that we haven’t had a chance to test yet, but we’re excited about. One is the Wi-Fi Aware APIs. We should be able to transfer photos and videos without having people tap that annoying dialogue box every time. That’d be great. The second one was processor background access, which should allow us to do image processing when you transfer the media over. Syncing would work just like it does on Android.
Do you think the market for these new Oakley glasses will be as big as the Ray-Bans? Or is it more niche because they are more outdoors and athlete-focused?
We work with EssilorLuxottica, which is a great partner. Ray-Ban is their largest brand. Within that, the most popular style is Wayfair. When we launched the original Ray-Ban Meta glasses, we went with the most popular style for the most popular brand.
Their second biggest brand is Oakley. A lot of people wear them. The Holbrook is really popular. The HSTN, which is what we’re launching, is a really popular analog frame. We increasingly see people using the Ray-Ban Meta glasses for active use cases. This is our first step into the performance category. There’s more to come.
What’s your reaction to Google’s announcements at I/O for their XR glasses platform and eyewear partnerships?
We’ve been working with EssilorLuxottica for like five years now. That’s a long time for a partnership. It takes a while to get really in sync. I feel very good about the state of our partnership. We’re able to work quickly. The Oakley Meta glasses are the fastest program we’ve had by quite a bit. It took less than nine months.
I thought the demos they [Google] did were pretty good. I thought some of those were pretty compelling. They didn’t announce a product, so I can’t react specifically to what they’re doing. It’s flattering that people see the traction we’re getting and want to jump in as well.
On the AR glasses front, what have you been learning from Orion now that you’ve been showing it to the outside world?
We’ve been going full speed on that. We’ve actually hit some pretty good internal milestones for the next version of it, which is the one we plan to sell. The biggest learning from using them is that we feel increasingly good about the input and interaction model with eye tracking and the neural band. I wore mine during March Madness in the office. I was literally watching the games. Picture yourself sitting at a table with a virtual TV just above people’s heads. It was amazing.
More to click on:
If you haven’t already, don’t forget to subscribe to The Verge, which includes unlimited access to Command Line and all of our reporting.
As always, I welcome your feedback, especially if you’ve also turned down Zuck. You can respond here or ping me securely on Signal.
Thanks for subscribing.
]]>Meta is announcing its next pair of smart glasses with Oakley. The limited-edition Oakley Meta HSTN (pronounced “how-stuhn”) model costs $499 and is available for preorder starting July 11th. Other Oakley models with Meta’s tech will be available starting at $399 later this summer.
Like the existing Meta Ray-Ban glasses, the Oakley model features a front-facing camera, along with open-ear speakers and microphones that are built into the frame. After they are paired with a phone, the glasses can be used to listen to music or podcasts, conduct phone calls, or chat with Meta AI. By utilizing the onboard camera and microphones, Meta AI can also answer questions about what someone is seeing and even translate languages.
Given the Oakley design, Meta is positioning these new glasses as being geared towards athletes. They have an IPX4 water resistance rating and offer double the battery life of the Meta Ray-Bans, providing 8 hours of use, along with a charging case that can power them for up to 48 hours. The built-in camera now shoots in 3K video, up from 1080p for the Meta Ray-Bans.
The new lineup comes in five Oakley frame and lens combos, all of which are compatible with prescriptions for an extra cost. The frame colors are warm grey, black, brown smoke, and clear, with several lens options available, including transitions. The limited-edition $499 model, available for order starting July 11th, features gold accents and gold Oakley PRIZM lenses. The glasses will be on sale in the US, Canada, the UK, Ireland, France, Italy, Spain, Austria, Belgium, Australia, Germany, Sweden, Norway, Finland, and Denmark.
Meta recently signed a multi-year deal with EssilorLuxottica, the parent company behind Ray-Ban, Oakley, and other eyewear brands. The Meta Ray-Bans have sold over two million pairs to date, and EssilorLuxottica recently disclosed that it plans to sell 10 million smart glasses with Meta annually by 2026. “This is our first step into the performance category,” Alex Himel, Meta’s head of wearables, tells me. “There’s more to come.”