We’re interacting with artificial intelligence (AI) online not just more than ever—but more than we realize—so researchers asked people to talk to four agents, including one human and three different kinds of AI models, to see if they could tell the difference .
The “Turing Test,” first proposed as an “imitation game” by computer scientist Alan Turing in the 1950s, assesses whether a machine’s ability to demonstrate intelligence is indistinguishable from a human. To pass the Turing test, a machine must be able to talk to someone and trick them into thinking they are human.
The researchers decided to repeat the test by asking 500 people to talk to four respondents, including a human and the 1960s artificial intelligence program ELIZA, as well as GPT-3.5 and GPT-4, the AI ​​that powers ChatGPT. The interviews lasted five minutes – after which the participants had to say whether they believed they were talking to a human or an AI. In a study published May 9 on the arXiv preprint server, researchers found that participants mistook GPT-4 for human 54% of the time,
ELIZA, a system pre-programmed with responses but without a large language model (LLM) or neural network architecture, was considered human only 22% of the time. GPT-3.5 scored 50%, while the human participant scored 67%.
Read more: ‘It would be within its natural right to hurt us to protect itself’: How humans could be mistreating AI right now without knowing it
“Machines can confabulate and mix together plausible ex-post-facto justifications for things like humans do,” Nell Watson, an AI researcher at the Institute of Electrical and Electronics Engineers (IEEE), told Live Science.
“They can be subject to cognitive biases, fooled and manipulated, and become increasingly deceptive. All of these elements mean that human foibles and quirks are being expressed in AI systems, making them more human-like than previous approaches that have more.” rather than a list of pre-prepared answers.”
The study, which builds on decades of attempts to get AI agents to pass the Turing Test, reflected common concerns that AI systems deemed human would have “far-reaching social and economic consequences.”
The researchers also argued that there are valid criticisms of the Turing Test being too simplistic in its approach, saying that “stylistic and socio-emotional factors play a greater role in passing the Turing Test than traditional notions of intelligence”. This suggests that we’ve been looking in the wrong place for machine intelligence.
“Raw intellect only goes so far. What really matters is being intelligent enough to understand the situation, the skills of others, and have the empathy to bring these elements together. Skills are only a small part of the value of AI – their ability to understand values, preferences and the boundaries of others are also essential. It is these qualities that will enable AI to serve as a faithful and reliable steward of our lives.”
Watson added that the study poses a challenge for future human-machine interaction and that we will become increasingly paranoid about the true nature of interactions, especially in sensitive matters. She added that the study highlights how AI has changed during the GPT era.
“ELIZA was limited to canned responses, which greatly limited her abilities. It might fool someone for five minutes, but soon the limitations would be clear,” she said. “Language models are infinitely flexible, able to synthesize responses to a wide range of topics, speak specific languages ​​or sociolects, and display themselves with character and value-driven character. It’s a huge step forward from something hand-programmed by a human being.” no matter how smartly and carefully.”