ChatGPT passed the famous “Turing Test”

  • The researchers claim that ChatGPT-4 is the first AI to pass the Turing Test for two players
  • The artificial intelligence was able to fool the human conversational partner in 54 percent of the cases



Since its first proposal in 1950, passing the “Turing Test” has been considered one of the highest goals in AI.

But now researchers claim that ChatGPT has become the first AI to pass this famous test of human intelligence.

Conceived by computing pioneer Alan Turing, it claims that AI should be considered truly intelligent if people cannot tell whether they are talking to a human or a machine.

In a preprint paper, UC San Diego cognitive scientists claim that ChatGPT-4 can fool human test subjects more than half the time.

However, researchers say this could say more about the Turing test than the intelligence of modern AI.

ChatGPT-4 passed the famous “Turing Test”, which was developed to determine whether computers have human-like intelligence
Overview of the Turing Test: A human investigator (C) asks questions of an AI (A) and another human (B) and evaluates the answers. The investigator does not know which is which. If the AI ​​tricks the interrogator into thinking its answers were generated by a human, it passes the test

What is the Turing Test?

The Turing Test was introduced in 1950 by World War II codebreaker Alan Turing.

He predicted that computers would one day be programmed to acquire abilities rivaling human intelligence.

He proposed a test to determine whether a computer could think.

A person, called an investigator, engages in a text conversation with another person and a computer – and must determine which is which.

If they are unable to do so, the computer has passed the test.

In 1950, British World War II codebreaker Alan Turing created what he thought would be the definitive test of computer intelligence.

He envisioned a human participant sitting at a screen and talking to either a human or a computer through a text-only interface.

If a computer could not be distinguished from a human in a wide range of possible subjects, Turing reasoned, we would have to admit that it is as intelligent as a human.

In replicating this famous test, the researchers asked 500 participants to talk to four different agents: three AI and one human.

The AIs were ChatGPT-4, ChatGPT-3.5 – the two models behind the popular chatbot from OpenAI – and a chat program from the 1960s called ELIZA.

Participants were placed in a chat room and told that they either needed to convince the other person that they were human or figure out who they were talking to.

Participants were then left to chat for five minutes before being asked whether they thought they were talking to a computer or a human, why they thought so and how sure they were.

Participants were placed in a chat room with a human or a computer and asked to guess who they were talking to
The experiment was a replica of one designed by Alan Turing (pictured) in the 1950s.
The Turing Test: Can You Tell the Difference? One of these conversations is with a human and all three others are with AI. Read them carefully and guess – the answers are in the box below

While the outdated ELIZA fooled participants only 22 percent of the time, ChatGPT-3.5 went undetected in 50 percent of conversations.

A more advanced iteration, ChatGPT-4, did even better, with participants identifying it as human 54 percent of the time.

Click here to resize this module

This means that ChatGPT-4 is capable of impersonating a human more often than chance would suggest.

And if that seems low, it’s worth noting that participants correctly identified people as such in only 67 percent of conversations.

The researchers write that these results “provide the first robust empirical demonstration that any artificial system passes the 2-player interactive Turing test.”

It’s worth noting that this is a pre-print paper, which means it’s currently awaiting peer review, so the results should be taken with a degree of caution.

However, if the results are supported, it would be the first strong evidence that AI has ever passed the Turing test, as Alan Turing envisioned.

Nell Watson, an AI researcher at the Institute of Electrical and Electronics Engineers (IEEE), told Live Science, “Machines can confabulate and mix together plausible ex-post-facto justifications for things, just like humans do.

“All these elements mean that human foibles and quirks are expressed in AI systems, making them more human than previous approaches, which had little more than a list of pre-prepared answers.”

People were correctly identified as humans just over 60 percent of the time (blue bar), while ChatGPT-4 was able to fool its conversation partners 54 percent of the time.

The Turing Test – Answers

Chat A: ChatGPT-4

Chat B: Human

Chat C: ChatGPT-3.5

Chat D: ELIZA

Importantly, the low performance of the ELIZA program also helps support the significance of these results.

While it may seem strange to include a program from the 1960s in a test of state-of-the-art technology, this model was included to test something called the “ELIZA effect.”

The ELIZA effect is the idea that people can attribute human characteristics to even very simple systems.

But the fact that people were fooled by ChatGPT and not ELIZA suggests that this result is “non-trivial”.

The researchers also point out that a shift in public perception of AI may have changed the results we should expect from the Turing Test.

They write: “The low rate of human passage may at first glance be surprising.

“If the test measures human likeness, shouldn’t humans be 100%?”

According to a new study, it is the first time an AI has passed a test invented by Alan Turing in 1950. The life of this early computer pioneer and the invention of the Turing Test was famously dramatized in The Imitation Game, starring Benedict Cumberbatch (pictured)

Click here to resize this module

In 1950, this assumption would have made perfect sense, because in a world without advanced artificial intelligence, we would have assumed that anything that sounded human was human.

But as the public becomes more aware of AI and our trust in AI grows, we are more likely to misidentify humans as AI.

This could mean that the small gap between human and ChatGPT-4 throughput rates is even more compelling as evidence of computer intelligence.

In February of this year, Stanford researchers found that ChatGPT could pass a version of the Turing test, in which artificial intelligence answered a widely used personality test.

Although these researchers found that ChatGPT-4 results were indistinguishable from humans, this latest paper is one of the first times an AI has passed a robust 2-player conversational Turing test.

However, scholars also acknowledge that there are longstanding and valid criticisms of the Turing Test.

The researchers point out that “stylistic and socio-emotional factors play a greater role in passing the Turing Test than traditional notions of intelligence.”

Scientists say this doesn’t necessarily show that AI has become intelligent, but that it has gotten better at imitating humans (image)

Investigators were much more likely to cite style, personality, and tone as reasons for identifying their conversational partner as a robot than anything related to intelligence.

Likewise, one of the most successful strategies for identifying bots was to ask about human experience, which worked 75 percent of the time.

This suggests that the Turing test does not actually prove that a system is intelligent, but rather measures its ability to imitate or deceive humans.

At best, the researchers suggest this provides “probabilistic” support for the claim that ChatGPT is intelligent.

Participants were more likely to identify the AI ​​based on an assessment of its personality and details given about it, rather than anything based on intelligence.

Click here to resize this module

But that doesn’t mean the Turing test is worthless, as the researchers note that the ability to impersonate humans will have huge economic and social consequences.

The researchers say that sufficiently convincing AIs could “serve economically valuable client-facing roles that have historically been the preserve of human workers, mislead the general public or their own human operators, and undermine societal trust in authentic human interactions.”

In the end, the Turing test could be only part of what we need to assess when we want to develop an AI system.

Ms Watson says: “Raw intellect only goes so far. What really matters is being intelligent enough to understand the situation, the skills of others, and have the empathy to bring these elements together.

“Skills are only a small part of AI’s value – their ability to understand the values, preferences and boundaries of others is also critical.”

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top