After ChatGPT crashes, I installed an offline chatbot that never stops working

Calvin Wankhede / Android Authority

If you are a frequent user of ChatGPT, you may have noticed that the AI chatbot sometimes tends to crash or stop working at the most inopportune times. These outages usually don’t last too long, but after the last one left me stranded, I started wanting a more reliable alternative. Fortunately, it turns out there’s a simple solution in the form of local language models like LLaMA 3. The best part? They can even run on relatively pedestrian hardware like the MacBook Air! Here’s everything I learned from using LLaMA 3 and how it compares to ChatGPT.

Why you should care about local AI chatbots

Most of us have only used ChatGPT and well-known alternatives like Microsoft’s Copilot and Google’s Gemini. However, all these chatbots run on powerful servers in remote data centers. However, using the cloud means relying on someone else’s computer, which can crash or stop working for hours at a time.

It’s also unclear how cloud AI chatbots respect your data and privacy. We know that ChatGPT stores conversations to train future models, and the same probably goes for every other Big Tech company. It’s no surprise that companies around the world, from Samsung to Wells Fargo, have restricted their employees from using ChatGPT internally.

Online AI chatbots are neither reliable nor private.

This is where locally operated chatbots with artificial intelligence come in. Take LLaMA 3, for example, which is an open source language model developed by the AI division of Meta (yes, the same company that owns Facebook and WhatsApp). The key difference here is the open source status of LLaMA – meaning that anyone can download and run it for themselves. And since no data ever leaves your computer, you don’t have to worry about secrets being leaked.

The only requirement to run LLaMA 3 is a relatively modern computer. Unfortunately, this disqualifies smartphones and tablets. However, I found that you can run a smaller version of LLaMa 3 on shockingly low-end hardware, including many laptops released in the last few years.

LLaMA 3 vs ChatGPT: How does offline AI fare?

I’ll cover how to install LLaMA 3 on your computer in the next section, but first you might want to know how it stacks up against ChatGPT. The answer is not simple because ChatGPT and LLaMA 3 come in different variants.

Until last month, the free version of ChatGPT was limited to the older GPT-3.5 model, and you had to pay $20 per month to use GPT-4. However, with the release of GPT-4o, OpenAI now allows users to access its latest model for free, with some restrictions on how many messages you can send per hour.

LLaMA 3 also comes in two model sizes: 8 billion and 70 billion parameters. Version 8B is the only choice for those with limited computing resources, which basically means for all but the most hardcore PC gamers. You see, the larger 70B model requires at least 24GB of video memory (VRAM), which is currently only available on exotic $1,600 GPUs like the Nvidia RTX 4090. Even then, you’ll have to settle for a compressed version like the full-fledged 70B model requires 48 GB of VRAM.

Given all this, the LLaMA 3 8B is naturally our model of choice. The good news is that it holds up very well against GPT-3.5 or the base ChatGPT model. Here are some comparisons between the two:

Challenge 1: Write a cover letter for YouTube DevOps Engineer position. I have been working at Oracle Cloud since I graduated as a software engineer in 2019.

Result: Practically a tie, although I prefer LLaMA’s approach a bit more.

Challenge 2: How much is 8888×3+10?

Result: Both chatbots delivered the correct answer.

Challenge 3: Write a short Python program that simulates a simple dice game. The program should allow the user to enter the number of dice, the number of sides on each die, and how many times they want to roll. The program should then list the results of each roll.

Python problem with llama 3 vs chatgpt 3.5

Result: Both chatbots generated working code.

One caveat worth noting is that neither GPT-3.5 nor LLaMA 3 can access the Internet to get the latest information. For example, asking both models about the Pixel 8’s SoC yielded confident-sounding but completely inaccurate answers. If you’re ever going to ask substantive questions, I’d take a local model’s answers with a grain of salt. But for creative and even programming tasks, LLaMA 3 performs quite admirably.

How to download and run LLaMA 3 locally

Calvin Wankhede / Android Authority

As I mentioned above, the LLaMA 3 comes in two sizes. The LLaMA 3 8B requires nothing more than a mid-range computer. In fact, running it on my computer produced faster responses than ChatGPT or any online chatbot available today. Even though my computer has a mid-range gaming GPU, LLaMA 3 will run happily even on a laptop with modest hardware. Example: I still got reasonably fast responses when running on an M1 MacBook Air with 16GB of RAM. That’s four year old hardware, older than ChatGPT itself!

With that background out of the way, you’ll need some software to actually interface with LLaMA 3. This is because, while you can download the model for free, Meta doesn’t offer it as a program or app that you can simply duplicate. -click to start. However, thanks to the open-source community, we have several different LLM frontends available today.

After trying a few of them, I would recommend GPT4All because the process of downloading and running LLaMA 3 is as painless as it gets. Here’s a quick guide:

Download GPT4All for your Windows or macOS computer and install it.
Open GPT4All and click Download models.
Look for the “LLaMA 3 Instruct” model and click Download. This is the 8B tuned for conversations. The download may take some time depending on your internet connection.
After the download is complete, close the browser popup window and select LLaMA 3 Instruction from the model drop-down menu.
That’s it – you’re ready to start chatting. You should see the screen in the image above. Simply enter a prompt, press enter, and wait for the model to generate its response.

My undeniably powerful desktop can generate 50 tokens per second, which easily beats ChatGPT’s response speed. Apple’s silicon-based computers offer the best price-performance ratio thanks to their unified memory and will generate tokens faster than a human can read them.

If you are using Windows without a dedicated GPU, LLaMA 3 text generation will be slightly slower. Running it on my desktop CPU yielded just 5 tokens per second and required at least 16GB of system memory. However, on the other hand, cloud chatbots also slow down during periods of high demand. Besides, at least I can rest easy knowing that my chats will never be read by anyone else.

Why you should care about local AI chatbots

LLaMA 3 vs ChatGPT: How does offline AI fare?

How to download and run LLaMA 3 locally

Leave a Comment Cancel Reply