Anthropic’s Claude 3.5 Sonnet wows AI users: “this is wild”

Don’t miss OpenAI leaders Chevron, Nvidia, Kaiser Permanente, and Capital One only at VentureBeat Transform 2024. Learn about GenAI and expand your network at this exclusive three-day event. More information

A new large language model (LLM) has apparently taken the performance crown from OpenAI GPT-4o about a month after its release: the new Claude 3.5 Sonnet and LLM chatbot from rival artificial intelligence firm Anthropic, launched today, outperforms all others in the world at key third-party benchmarks, according to the company. And at the same time it is faster and cheaper than previous Claude 3 models.

But it’s one thing to drop a new model and claim dominance, and another thing for users to actually experience and take advantage of the performance increase (Google Gemini family – I’m looking at you: allegedly better than the previous OpenAI flagship GPT-4 in some metrics, but who is he really using you?).

The latest release of Claude 3.5 Sonnet from Anthropic does not seem to have this problem. Many AI influencers and power users took to the web within hours of its release to share their largely positive impressions of the new Anthropic model and show off what the new, “smartest” LLM in the world is capable of.

Advancing coding skills and building products

As AI influencer and expert Allie K. Miller wrote on X Enterprise, Claude 3.5 Sonnet was able to create an entire playable game for her based on just a screenshot, in less than half a minute:

Countdown to VB Transform 2024

Join business leaders in San Francisco July 9-11 at our flagship AI event. Connect with peers, explore the opportunities and challenges of generative AI, and learn how to integrate AI applications into your industry. Register now

This is the wilderness.
In just 25 seconds, Claude 3.5 Sonnet coded me a fully functional Mancala?️ web application
I have only provided ONE screenshot of the game instructions.
The rest was done by:
– Coded the entire game
– Preview so I can test it
– Game rules provided pic.twitter.com/WLweZUGt5C
— Allie K. Miller (@alliekmiller) June 20, 2024

Similarly, the informative and up-to-date X account @TestingCatalog News showed how the newly launched “Artifacts” playground – which debuted alongside Claude 3.5 Sonnet, literally showing a glimpse of interactive outputs alongside a chatbot interface – can run code for a real, working website. the form that Claude 3.5 Sonnet built.

Claude 3.5 just generated React jsx code using a simple contact form and managed to run it on the Artifacts playground? pic.twitter.com/KREZaArObw
— TestingCatalog News ? (@testingcatalog) June 20, 2024

He was even able to recreate shots from the seminal 1995 film Hackers:

Pietro Schirano, founder of enterprise AI image generation startup EverArt, wrote on X that combining Claude 3.5 Sonnet with another tool, Maestro, showed “sparks of AGI?

Claude 3.5 Sonnet + Maestro = Sparks of AGI?
I asked to make a Mario clone using only geometric shapes and the wildest part is that it also gave the characters animations and the shapes look like new concepts.
It took 3 minutes. Check out the game! pic.twitter.com/YVQYp7m5Ed
— Pietro Schirano (@skirano) June 20, 2024

Anthropic staff go to bat for Claude 3.5 Sonnet

While he’s clearly biased, Anthropic’s head of developer relations Alex Albert posted a thread on X highlighting how Claude 3.5 Sonnet is “starting to get really good at coding and fixing pull requests autonomously” and even went so far as to say : “It’s clear that in a year a large percentage of the code will be written by LLMs.”

Claude is getting really good at coding and fixing pull requests autonomously. It is becoming clear that in a year a large percentage of the code will be written by LLMs.
Let me show you what I mean:
— Alex Albert (@alexalbert__) June 20, 2024

Similarly, Anthropic’s technical staff Maggie Vo posted on X that Claude 3.5 Sonnet can now do “half my work… and I couldn’t be happier”.

Put pressure on OpenAI

Others have noted that now that the Claude 3.5 Sonnet has eclipsed OpenAI’s GPT-4o and is available at a similar price, the company is under new pressure to continue to see its models as the right choice.

University of Pennsylvania Wharton School of Business professor and AI booster Ethan Mollick compared the Artifacts feature to “a simpler version of Code Interpreter” from OpenAI GPT-4.

I have been using the new Claude 3.5 model as a tester and now that it is out I can say that it is very impressive and the “artifacts” it generates are like a simpler version of Code Interpreter.
This is a real-time video of me creating a playable game and editing it with Claude pic.twitter.com/bWqw8F8CdH
— Ethan Mollick (@emollick) June 20, 2024

User X @kimmonismus went further, saying that OpenAI will “benefit AGI,” or artificial general intelligence, the company’s goal of an AI model that outperforms humans in the most economically valuable work. They criticized the company for announcing additional features with the GPT-4o that had not yet been delivered, including new voice modalities.

Hello, @OpenAI. You sleep through AGI. While you keep making promises (“Patience Jimmy, it’ll be worth the wait”) and announcing without delivering (“GPT-4o-Voice in weeks”), the competition manages to deliver without making big announcements beforehand! Take a leaf from… https://t.co/o6ROsZwDRG
— Chubby♨️ (@kimmonismus) June 20, 2024

Still not human level

Despite the high praise surrounding X, others noted that Claude 3.5 Sonnett still struggled with some seemingly basic cognitive tasks that humans can perform with relative ease, such as playing “tic tac toe”.

Borderline models like the GPT-4o (and now the Claude 3.5 Sonnet) may be at a “smart high schooler” level in some ways, but they still struggle with basic tasks like tic-tac-toe. There was hope that native multimodal training would help, but it didn’t. pic.twitter.com/1iDq0DCL4Q
— Noam Brown (@polynoamial) June 20, 2024

Similarly, tech journalist Timothy B. Lee, known for his @binarybits handle on X, noted that he “still makes the occasional goofy mistake” and posted a screenshot asking for an answer to a simple math word problem: which is worth more: 100 pennies or three quarters? to which it responded Three quartersinitially.

Despite these so-far minor issues, however, Claude 3.5 Sonnet appears to be a giant leap forward for Anthropic and LLM in general, and shows that the performance gains of individual AI model makers are certainly not slowing down with current levels of available computing resources (i.e. GPUs).

VB Daily

Stay tuned! Get the latest news delivered to your inbox every day

By subscribing, you agree to VentureBeat’s terms and conditions.

Thank you for subscribing. Check out other VB newsletters here.

An error occurred.

Advancing coding skills and building products

Anthropic staff go to bat for Claude 3.5 Sonnet

Put pressure on OpenAI

Still not human level

Leave a Comment Cancel Reply