Anthropic’s Claude 3.5 Sonnet wows AI users: “this is wild”

Don’t miss OpenAI leaders Chevron, Nvidia, Kaiser Permanente, and Capital One only at VentureBeat Transform 2024. Learn about GenAI and expand your network at this exclusive three-day event. More information


A new large language model (LLM) has apparently taken the performance crown from OpenAI GPT-4o about a month after its release: the new Claude 3.5 Sonnet and LLM chatbot from rival artificial intelligence firm Anthropic, launched today, outperforms all others in the world at key third-party benchmarks, according to the company. And at the same time it is faster and cheaper than previous Claude 3 models.

But it’s one thing to drop a new model and claim dominance, and another thing for users to actually experience and take advantage of the performance increase (Google Gemini family – I’m looking at you: allegedly better than the previous OpenAI flagship GPT-4 in some metrics, but who is he really using you?).

The latest release of Claude 3.5 Sonnet from Anthropic does not seem to have this problem. Many AI influencers and power users took to the web within hours of its release to share their largely positive impressions of the new Anthropic model and show off what the new, “smartest” LLM in the world is capable of.

Advancing coding skills and building products

As AI influencer and expert Allie K. Miller wrote on X Enterprise, Claude 3.5 Sonnet was able to create an entire playable game for her based on just a screenshot, in less than half a minute:


Countdown to VB Transform 2024

Join business leaders in San Francisco July 9-11 at our flagship AI event. Connect with peers, explore the opportunities and challenges of generative AI, and learn how to integrate AI applications into your industry. Register now


Similarly, the informative and up-to-date X account @TestingCatalog News showed how the newly launched “Artifacts” playground – which debuted alongside Claude 3.5 Sonnet, literally showing a glimpse of interactive outputs alongside a chatbot interface – can run code for a real, working website. the form that Claude 3.5 Sonnet built.

He was even able to recreate shots from the seminal 1995 film Hackers:

Pietro Schirano, founder of enterprise AI image generation startup EverArt, wrote on X that combining Claude 3.5 Sonnet with another tool, Maestro, showed “sparks of AGI?

Anthropic staff go to bat for Claude 3.5 Sonnet

While he’s clearly biased, Anthropic’s head of developer relations Alex Albert posted a thread on X highlighting how Claude 3.5 Sonnet is “starting to get really good at coding and fixing pull requests autonomously” and even went so far as to say : “It’s clear that in a year a large percentage of the code will be written by LLMs.”

Similarly, Anthropic’s technical staff Maggie Vo posted on X that Claude 3.5 Sonnet can now do “half my work… and I couldn’t be happier”.

Put pressure on OpenAI

Others have noted that now that the Claude 3.5 Sonnet has eclipsed OpenAI’s GPT-4o and is available at a similar price, the company is under new pressure to continue to see its models as the right choice.

University of Pennsylvania Wharton School of Business professor and AI booster Ethan Mollick compared the Artifacts feature to “a simpler version of Code Interpreter” from OpenAI GPT-4.

User X @kimmonismus went further, saying that OpenAI will “benefit AGI,” or artificial general intelligence, the company’s goal of an AI model that outperforms humans in the most economically valuable work. They criticized the company for announcing additional features with the GPT-4o that had not yet been delivered, including new voice modalities.

Still not human level

Despite the high praise surrounding X, others noted that Claude 3.5 Sonnett still struggled with some seemingly basic cognitive tasks that humans can perform with relative ease, such as playing “tic tac toe”.

Similarly, tech journalist Timothy B. Lee, known for his @binarybits handle on X, noted that he “still makes the occasional goofy mistake” and posted a screenshot asking for an answer to a simple math word problem: which is worth more: 100 pennies or three quarters? to which it responded Three quartersinitially.

Despite these so-far minor issues, however, Claude 3.5 Sonnet appears to be a giant leap forward for Anthropic and LLM in general, and shows that the performance gains of individual AI model makers are certainly not slowing down with current levels of available computing resources (i.e. GPUs).

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top