OpenAI launches GPT-4o mini to replace GPT-3.5 in ChatGPT

Benj Edwards

On Thursday, OpenAI announced the launch of GPT-4o mini, a new, smaller version of its latest GPT-4o AI language model that will replace GPT-3.5 Turbo in ChatGPT, CNBC and Bloomberg report. It will be available today for free users and users with a ChatGPT Plus or Team subscription, and will come to ChatGPT Enterprise next week.

The GPT-4o mini will reportedly be multi-modal like its big brother (which launched in May), with image inputs currently enabled in the API. OpenAI says that in the future, the GPT-4o mini will be able to interpret images, text and sound, and will also be able to generate images.

GPT-4o mini supports 128,000 input context tokens and knowledge interrupts in October 2023. It is also very cheap as an API product, costing 60% less than GPT-3.5 Turbo at 15 cents per million input tokens and 60 cents per million output tokens. Tokens are fragments of data that AI language models use to process information.

OpenAI says the GPT-4o mini will be the company’s first AI model to use a new technique called “instruction hierarchy,” which will make the AI model prioritize some instructions over others, which can make it harder for humans to perform a quick injection. attacks or jailbreaks or extracting system commands that subvert built-in fine tuning or commands given by a system prompt.

Once the model is in the hands of the public (the GPT-4o mini is not currently available in our instance of ChatGPT), we’re sure to see how people test this new method of protection.

Performance

As expected, OpenAI says that the GPT-4o mini performs well in a number of benchmarks such as MMLU (undergraduate level of knowledge) and HumanEval (coding), but the problem is that these benchmarks don’t really mean much, and few of they measure something useful when it comes to the real use of the model in practice. This is because the sense of quality from a model’s output sometimes has more to do with style and structure than raw factual or mathematical ability. This kind of subjective “vibemarking” is one of the most frustrating things in the AI space right now.

Magnify / A graph from OpenAI shows that the GPT-4o mini outperforms the GPT-4 Turbo in eight selected benchmarks.

We’ll tell you this: OpenAI says the new model beat last year’s GPT-4 Turbo in the LMSYS Chatbot Arena rankings, which measure user ratings after randomly comparing one model to another. But even this metric is not as useful as once hoped in the AI community, as people have noticed that while the mini’s big brother (GPT-4o) regularly outperforms the GPT-4 Turbo in the Chatbot Arena, it tends to produce significantly less useful output in general (they tend to be tedious, for example, or do tasks you didn’t ask them to do).

The value of smaller language models

OpenAI is not the first company to release a smaller version of an existing language model. This is common practice in the AI industry from vendors such as Meta, Google and Anthropic. These smaller language models are designed to perform simpler, lower-cost tasks such as creating lists, summarizing, or suggesting words instead of performing in-depth analysis.

Smaller models are usually aimed at API users who pay a fixed price for token input and output to use the models in their own applications, but in this case offering GPT-4o mini for free as part of ChatGPT would seemingly save money for OpenAI. also.

Olivier Godement, head of API product at OpenAI, told Bloomberg: “In our mission to enable imperfections, to create the most powerful and useful applications, of course we want to continue to create frontier models and push the boundaries here. I want to have the best little models.”

Smaller large language models (LLMs) typically have fewer parameters than larger models. Parameters are numerical stores of value in a neural network that stores learned information. Fewer parameters mean that LLM has a smaller neural network, which typically limits the depth of an AI model’s ability to make sense of context. Models with larger parameters are usually “deeper thinkers” due to the greater number of connections between concepts stored in those numerical parameters.

However, to complicate matters, there is not always a direct correlation between parameter size and ability. The quality of the training data, the efficiency of the model architecture, and the training process itself also affect model performance, as we’ve seen recently with more capable small models like the Microsoft Phi-3.

Fewer parameters mean less computation required to run the model, which means either less powerful (and cheaper) GPUs are required or less computation on existing hardware, leading to lower energy bills and lower end-user costs.

Performance

The value of smaller language models

Leave a Comment Cancel Reply