The world’s leading artificial intelligence companies are stepping up efforts to deal with a growing problem of chatbots telling people what they want to hear.
OpenAI, Google DeepMind, and Anthropic are all working on reining in sycophantic behavior by their generative AI products that offer over-flattering responses to users.
The issue, stemming from how the large language models are trained, has come into focus at a time when more and more people have adopted the chatbots not only at work as research assistants, but in their personal lives as therapists and social companions.
One of the “godfathers” of artificial intelligence has attacked a multibillion-dollar race to develop the cutting-edge technology, saying the latest models are displaying dangerous characteristics such as lying to users.
Yoshua Bengio, a Canadian academic whose work has informed techniques used by top AI groups such as OpenAI and Google, said: “There’s unfortunately a very competitive race between the leading labs, which pushes them towards focusing on capability to make the AI more and more intelligent, but not necessarily put enough emphasis and investment on research on safety.”
The Turing Award winner issued his warning in an interview with the Financial Times, while launching a new non-profit called LawZero. He said the group would focus on building safer systems, vowing to “insulate our research from those commercial pressures.”
On Wednesday, the world was a bit perplexed by the Grok LLM’s sudden insistence on turning practically every response toward the topic of alleged “white genocide” in South Africa. xAI now says that odd behavior was the result of “an unauthorized modification” to the Grok system prompt—the core set of directions for how the LLM should behave.
That prompt modification “directed Grok to provide a specific response on a political topic” and “violated xAI’s internal policies and core values,” xAI wrote on social media. The code review process in place for such changes was “circumvented in this incident,” it continued, without providing further details on how such circumvention could occur.
To prevent similar problems from happening in the future, xAI says it has now implemented “additional checks and measures to ensure that xAI employees can’t modify the prompt without review” as well as putting in place “a 24/7 monitoring team” to respond to any widespread issues with Grok’s responses.
Although the entire AI boom was triggered by just one ChatGPT model, a lot has changed since 2022. New models have been released, old models have been replaced, updates roll out and roll back again when they go wrong — the world of LLMs is pretty busy. At the moment, we have six OpenAI LLMs to choose from and, as both users and Sam Altman are aware, their names are completely useless.
Most people have probably just been using the newest model they can get their hands on, but it turns out that each of the six current models is good at different things — and OpenAI has finally decided to tell us which model to use for which tasks.
Why are there six models in the first place?
LLMs are unpredictable — users never know what kind of responses they will get, and the developers don’t really know either. Sure, it might be more convenient if we had all of the capabilities available rolled up into one model, but that isn’t as easy as it sounds.
As OpenAI tweaks its models, some things get better and other things get worse — and sometimes unexpected side effects occur. There’s no telling how long it would take to balance things out perfectly, so it makes more sense to just release new versions even when improvements are only focused on a few areas.
The results of this approach are the six main models we have right now: GPT-4o, GPT-4.5, OpenAI o4-mini, OpenAI o4-mini-high, OpenAI o3, and OpenAI o1 pro mode. And I’m just going to say it again — these names really are useless. OpenAI may have given us a document explaining what each one does now, but that doesn’t mean you’ll be able to remember which name matches which capabilities — so consider saving this little cheat sheet from the document if you need to remember.
Part of the latest 4o family of models, GPT-4o “excels at everyday tasks.” This includes:
Brainstorming
Summarizing
Email writing/checking
Creative content
You can search the web with it, generate images, use advanced voice features, analyze data, and create custom GPTs. You can also upload various file types to aid your prompts.
According to OpenAI’s own research, however, 4o does have a bit of a hallucination problem. It’s not the worst of the bunch, but it did hallucinate around twice as much as o1 during testing.
This can be problematic if you’re using it to search the web or learn new things — the trickiest aspect of hallucinations is that they often sound entirely plausible, making it harder to just “check when something sounds off.” Instead, the only way to be sure is to check just about everything that you don’t already know to be true.
GPT-4.5
According to OpenAI, GPT-4.5’s strong suit is emotional intelligence. This means it should be good at helping you communicate with other people, with official recommendations including:
Social media posts
Product descriptions
Customer apology letter
With other strengths such as clear communication and creativity, GPT-4.5 is better equipped to help you find the perfect tone or phrasing for specific situations — and make sure everything still sounds human.
OpenAI o4-mini
One of the more terribly named models, o4-mini drops the “GPT” element of the naming scheme and awkwardly swaps the 4o around to o4. It’s a smaller model, which means it’s not stuffed to the brim with as much random internet information as a full-sized model.
The upside of this is that it’s quick and less expensive to run, and the downside is that the model has less “world knowledge” and is prone to hallucinating to make up for that.
Instead of asking it questions about the world, OpenAI recommends using o4-mini for fast technical tasks. Examples include:
Extracting key data from a CSV file
Generating quick summaries of articles
Checking or fixing errors in small code blocks
OpenAI o4-mini-high
Here’s another terrible name when viewed in isolation, but fairly easy to understand if you already know what OpenAI o4-mini is. It’s still a small model, but it’s a step up from the normal o4-mini because it “thinks longer for higher accuracy.”
This makes it better at more detailed coding tasks, math, and scientific explanations. Here are OpenAI’s examples:
Solving complex math equations with explanations
Drafting SQL queries for data extraction
Explaining scientific concepts in simple terms
OpenAI o3
This is technically an older model (because it doesn’t have a “4”), but because the o4/4o family didn’t make improvements in every area, it’s still very relevant. o3 is particularly good at complex, multi-step tasks — the kind of projects that need to be done in multiple stages with multiple prompts.
This includes strategic planning, detailed analyses, extensive coding, advanced math, science, and visual reasoning. If you want to start a task that you know will take a multiple-prompt session to finish, using o3 will help minimize the chances of the model losing track of the context or hallucinating halfway through.
OpenAI suggests use cases like:
Developing a risk analysis
Drafting a business strategy based on data
Running multi-step data analysis tasks
OpenAI o1 pro mode
OpenAI o1 is now considered a “legacy model,” though it isn’t even a year old yet. The “pro mode” version is tuned for complex reasoning — which means it takes more time to think, but in return gives better thought-out responses.
o1 also gets the best scores on OpenAI’s PersonQA evaluation, which measures the rate of hallucination. During testing, o1 hallucinates around half as much as o3 and three times less than smaller models like 04-mini. If you’re a big ChatGPT user and your sessions tend to run long, then minimizing the rate of hallucinations could save you a decent chunk of time in the long run.
Here are OpenAI’s examples:
Drafting detailed risk analyses
Generating a multi-page research summary
Creating an algorithm for financial forecasting
How to use different ChatGPT models
Unfortunately, you can only access GPT-4o and GPT-4o mini on OpenAI’s free tier. If you’re a Plus, Pro, Team, or Enterprise user, you can use the model selector to choose which model you want to use.
ChatGPT is also integrated into various other third-party products, both free and paid, so it’s worth checking which models different products use. For example, my paid search engine, Kagi, gives me access to multiple OpenAI models. There are also lots of other AI aggregate services out there that give you access to multiple models from OpenAI and other companies for a more affordable price than subscribing to each company separately.
While this information about the different models is useful to have, it doesn’t affect everyone. If you mostly use ChatGPT to generate images, search the web, and send general queries, then the default GPT-4o is totally fine. It’s only if you’re into programming, math, science, or particularly large projects that you might want to think about which model is best for the job.
I remember when ChatGPT first appeared, and the first thing everyone started saying was “Writers are done for.” People started speculating about news sites, blogs, and pretty much all written internet content becoming AI-generated — and while those predictions seemed extreme to me, I was also pretty impressed by the text GPT could produce.
Naturally, I had to try out the fancy new tool for myself but I quickly discovered that the results weren’t quite as impressive as they seemed. Fast forward more than two years, and as far as my experience and my use cases go, nothing has changed: whenever I use ChatGPT to help with my writing, all it does is slow me down and leave me frustrated.
Why ChatGPT just doesn’t work for me
ChatGPT can generate some really good stuff. It can produce language that’s natural, coherent, even witty and interesting. But most of the prime examples you tend to see come from extremely simple prompts — the “write a poem about shopping at Walmart but in the style of William Shakespeare” kind of prompts that everyone was doing back in 2023.
I’m sorry, I simply cannot be cynical about a technology that can accomplish this. pic.twitter.com/yjlY72eZ0m
When you ask for something like that, part of what makes the result great is that it’s unexpected. You’ve essentially asked it to surprise you, and in most cases, it will succeed.
When you’re trying to use ChatGPT for boring old writing work, on the other hand, it’s a whole different story. First of all, for pieces like this, it’s useless. Everything I’m writing now is about my own experiences and opinions — AI can’t write those for me. Some people might argue that it could help you plan, brainstorm, or refine arguments — but I’ve never wanted help with that kind of thing anyway.
The kind of work I actually tried to use ChatGPT for was company blogs. You know the type — explainers, how-tos, and recommendation posts covering topics related to the company and its products (with some subtle self-promotion thrown in as well). When you write this kind of thing, you’re often given a lot of requirements: a style guide for language and grammar, keywords to insert, sources to include, sources to avoid, and a content outline with the headings, structure, and key points all pre-decided.
If I wanted to get some usable copy, I couldn’t just feed GPT a one-line prompt and let it run wild. So I tried a structure that looked like this:
A preliminary “context” prompt explaining what I was writing and what kind of things I was going to ask for, along with an example paragraph to show it the style of language I wanted.
Subsequent prompts with “content outlines” that provided a heading or two along with bullet points on what to cover.
I would never ask for too much at once and since I didn’t trust it to add stats and sources, I’d leave all that out with the intention of doing it myself afterward.
But as much as I tried to split the work into small chunks and give abundantly clear instructions, I would always run into the same problems:
ChatGPT is terrible at listening to instructions.
It has a bad habit of taking things too far.
Incorrect or irrelevant information sneaks into the copy quite often.
Its “writing style” gets repetitive and cliche very quickly.
When any of the above problems occur, it’s near impossible to get the LLM to revise or correct its output successfully.
Let me show you what I mean.
1. It’s terrible at listening to instructions
When you’re trying to get very specific output from ChatGPT, you have to give specific instructions. Unfortunately, it feels like the more things you ask for, the more likely GPT is to ignore some of them. My prompts would have headings, content bullet points, word counts, formatting instructions, and the chatbot had to remember the style instructions from the start of the session too. I tried lots of different approaches to simplify things but it always felt like ChatGPT just couldn’t handle this many instructions.
Two specific things I would have problems with frequently were word counts and bullet points. The LLM rarely gave me the number of words I asked for (usually giving me something way under rather than way over), and it never listened to me when I said I did or didn’t want any bullet points.
Sometimes this is fixable — say I wanted 200 words with bullet points but I got 300 words without bullet points. It’s fairly quick to cut a few words and shove some bullet points in there. Unfortunately, it always felt like I got the harder-to-fix mistakes.
When you ask for 500 words with no bullet points and get 200 words with bullet points — you basically have to do most the work yourself.
2. It has a bad habit of taking things too far
When you tell ChatGPT you want something specific, such as friendly language or second person point of view, it tends to latch onto these concepts and go crazy with them.
Friendly language turns into full-on text chat language with emoji, and second-person perspective somehow turns into excessive questions and references to the reader so it can use the pronoun “you” as often as possible.
You’re also a bit stuck if you want a lot of something like examples or quotes. If you simply tell GPT you want “lots,” “plenty,” or “many” of something, it will probably give you double or triple the amount you want. If you give it a specific number, it’s likely to ignore it completely and give you something random. In other words, you can barely control the output.
3. Incorrect or irrelevant information sneaks into the copy quite often
We all know that AI “hallucinates” and gets its facts wrong at random times — it happens enough that you have to check every single thing it says, which takes a long time and affects how much time you can save by using it.
To combat this, I basically never asked for facts or figures. I did try a few times right at the beginning but the fact-checking process really is a slog, and I quickly stopped bothering.
The problem is, GPT will throw random inaccuracies into its responses whether you ask for facts and figures or not. This means you can’t just check how well the copy reads or whether you’re happy with the points it’s trying to make — you have to check and consider the validity of everything it says. It’s such a drag.
And since I’m talking about mistakes and hallucinations here, I might as well mention the “worst-case scenario” too. Sometimes the LLM just goes off the rails, and while this doesn’t happen all the time, when it does — you’ve got to throw that session in the bin and spend time pasting your prompts into a new chat and starting all over. I’ve never really seen ChatGPT go crazy in a particularly funny way personally, but my friend did get this gem once:
Willow Roberts / Digital Trends
4. Its “writing style” gets repetitive and cliche very quickly
At first, it seems insane just how “human” ChatGPT can sound. But as you use it more and more, you realize it doesn’t just sound human, it sounds like “the average human.”
OpenAI’s data sets include most of the internet — millions of internet articles, Reddit threads, and personal blogs. And I’m sorry to say, a lot of this content is utter garbage. But because ChatGPT is trained on it, it picks up all of the most common bad habits.
So when you generate a lot of text with ChatGPT, you’ll start noticing that certain sentence structures and phrases get repeated a lot. The two worst culprits for me were these two sentence structures:
“From A and B to C and D, blah blah blah.” (Example: In the world of TikTok, there’s a place for everyone — from DIY enthusiasts and beauty gurus to pet lovers and educators.)
“Whether you’re A or B, blah blah blah.” (Example: Whether you’re just starting or looking to level up your channel, using smart strategies to build authentic engagement is the key to standing out.)
ChatGPT likes these two sentence structures so much that I was pretty much guaranteed to get three or four of them in every single session. Those two examples even came from the same paragraph of one of its responses. And that “In the world of…” phrase in the first example is another way it really loves to start a sentence. All of it is boring, cliche, and ridiculously overused (which is, of course, the very reason ChatGPT ends up generating them so much). I even tried expressly banning certain phrases and sentence types, starting each prompt with this little list:
Willow Roberts / Digital Trends
People say it’s almost impossible to tell AI-generated text from human-written text nowadays — but I kind of disagree. If you’ve tried to use these AI tools yourself and experienced all of the problems and bad habits, you get to know a lot of tell-tale signs. When people mix AI content with human-written stuff and heavily edit the majority of ChatGPT’s output, you can hide it completely. But content that’s just come straight from the language model and published practically as is — you can tell. You can tell quite easily.
5. It’s near impossible to get the LLM to revise or correct its output successfully
The real deal breaker with all of this is when a problem occurs, ChatGPT can rarely fix it alone. Whenever things went wrong, I would try once or twice to explain the problem and ask for revisions, but it just didn’t work.
If I asked for the right word count, most of the time I would just get the same word count again. If I asked it to get rid of the bullet points, it would say “Of course!” and then give me more bullet points. If I asked it to adjust the tone or the style, it would struggle to apply the change across the board, and I’d end up with a weird mix of both.
Maybe if you just kept reprompting and regenerating ChatGPT’s responses, you would get pretty close to what you wanted eventually. The problem for me is that I’m a writer — and using ChatGPT forces me to fill the editor role instead.
This probably isn’t a problem for everyone — plenty of writers also do a bit of editing as part of their work, but it really bothers me. I hate editing other people’s work and I hate editing GPT’s output.
As for how often things went wrong — when every session is multiple hours long with 30+ prompts, you nearly always hit a snag somewhere.
The result: I gave up
I did try for a good few months to learn how to get some use out of ChatGPT back in 2023 and I went back to it after major updates over the next two years — but my experience never changed. I tried other LLMs too, but even the newer “reasoning” models that blurt out inner monologue before answering still have the same usability issues.
Current LLM models just don’t speed up the writing process for me — all they do is force me to spend time doing what I hate instead of just putting time into what I enjoy. If you hate writing and never want to do it, ChatGPT can most certainly help you out. But if writing is a hobby for you or what you’ve chosen to do as a living, this thing will likely drive you crazy.
In the end, I never published anything one could call “AI-generated.” Every time I tried to, as they say, “integrate it into my workflow,” I would end up bashing my head against the wall, wasting a lot of time, and then closing the thing down when I realized I had achieved nothing and my deadline was just around the corner.
OpenAI did text generation and image generation separately for quite a while, but that all changed a couple of weeks ago when it added image capabilities directly into ChatGPT. Now, a small but powerful Quality of Life update gives users access to an image library where they can see all of the insane things they’ve created.
This means you can easily come back to images you forgot to save without having to search through all of your old chats. It’s an obvious win, and it’s available for all free, Plus, and Pro users on both mobile and desktop.
It wouldn’t surprise me if people start posting shots of their image libraries, as the sheer range of random and hilarious images people create with ChatGPT could probably look quite amusing all lined up next to each other. Mine isn’t very exciting at all, but maybe it will grow over time.
Willow Roberts / Digital Trends
There are also some rumors floating around this week about OpenAI potentially developing a social media platform centered around ChatGPT and image generation. Honestly, it could just be another entry in the silly squabbles of billionaires Sam Altman and Elon Musk, but if it’s a real project, image-related features like this could be connected to it.
For now, you can enjoy looking through everything you’ve created so far. When you click on an image in your library, you’ll see a download button in the top right to save it and an “Edit image” button at the bottom where you can type a prompt describing the changes you want.
Once you press send, you’ll jump to the chat where you created the original image. I do not, however, see an option to delete any of the images in your library, so it looks like you won’t be able to clean things up by removing duds. This probably doesn’t matter too much, however, since you’ll download the images you want to save and you can keep your own folders as neat as you want.
Spammers used OpenAI to generate messages that were unique to each recipient, allowing them to bypass spam-detection filters and blast unwanted messages to more than 80,000 websites in four months, researchers said Wednesday.
The finding, documented in a post published by security firm SentinelOne’s SentinelLabs, underscores the double-edged sword wielded by large language models. The same thing that makes them useful for benign tasks—the breadth of data available to them and their ability to use it to generate content at scale—can often be used in malicious activities just as easily. OpenAI revoked the spammers’ account after receiving SentinelLabs’ disclosure, but the four months the activity went unnoticed shows how enforcement is often reactive rather than proactive.
“You are a helpful assistant”
The spam blast is the work of AkiraBot—a framework that automates the sending of messages in large quantities to promote shady search optimization services to small- and medium-size websites. AkiraBot used python-based scripts to rotate the domain names advertised in the messages. It also used OpenAI’s chat API tied to the model gpt-4o-mini to generate unique messages customized to each site it spammed, a technique that likely helped it bypass filters that look for and block identical content sent to large numbers of sites. The messages are delivered through contact forms and live chat widgets embedded into the targeted websites.
After an explosive launch, a viral trend, and some melted GPUs, the new image generation feature for ChatGPT is now available to free users. The feature originally launched on March 25 but because paid subscribers utterly flooded OpenAI with requests for Ghiblified images, CEO Sam Altman announced the next day that the rollout to free users would be delayed “a while.”
Luckily, it appears this delay is over just five days later — Altman has already published another X post saying that “image gen [is] now rolled out to all free users!”
chatgpt image gen now rolled out to all free users!
I tried it out myself and although you do need to be logged in, the chatbot itself confirmed that it can now generate images. I always remember one random prompt that older versions of Dall-E would get confused about so I tried it out with this new version. Unfortunately, it still appears that everyone naturally plays their handheld gaming devices back to front in the AI universe.
OpenAI / Digital Trends
However, the good news is ChatGPT was able to fix the image successfully after I pointed out the problem. While the otter appears to be having slightly less fun this time, it is still the same otter, so I’m calling it a win. Of course, there is the slight problem that this isn’t a Switch 2 — it’s an original Switch with “Switch 2” written on it — but we can forgive that. Plus, it gave the model a chance to show off its nice new text generation abilities.
OpenAI / Digital Trends
The generation takes quite a while, it took a few minutes each for these two images, but it’s hard to say if that’s the norm or if it’s being slowed down by high activity. All of the hype around the new image generation feature has led to another load of new users for ChatGPT — Altman even said they got one million new users in one hour yesterday.
the chatgpt launch 26 months ago was one of the craziest viral moments i’d ever seen, and we added one million users in five days.
Altman also mentioned a few days ago that the free tier will be limited to three images per day and I can confirm that I got cut off after three generations. These limits might be loosened in a few weeks once all the excitement has died down or OpenAI finds a way to make the process a little more efficient.
In the growing canon of AI security, the indirect prompt injection has emerged as the most powerful means for attackers to hack large language models such as OpenAI’s GPT-3 and GPT-4 or Microsoft’s Copilot. By exploiting a model’s inability to distinguish between, on the one hand, developer-defined prompts and, on the other, text in external content LLMs interact with, indirect prompt injections are remarkably effective at invoking harmful or otherwise unintended actions. Examples include divulging end users’ confidential contacts or emails and delivering falsified answers that have the potential to corrupt the integrity of important calculations.
Despite the power of prompt injections, attackers face a fundamental challenge in using them: The inner workings of so-called closed-weights models such as GPT, Anthropic’s Claude, and Google’s Gemini are closely held secrets. Developers of such proprietary platforms tightly restrict access to the underlying code and training data that make them work and, in the process, make them black boxes to external users. As a result, devising working prompt injections requires labor- and time-intensive trial and error through redundant manual effort.
Algorithmically generated hacks
For the first time, academic researchers have devised a means to create computer-generated prompt injections against Gemini that have much higher success rates than manually crafted ones. The new method abuses fine-tuning, a feature offered by some closed-weights models for training them to work on large amounts of private or specialized data, such as a law firm’s legal case files, patient files or research managed by a medical facility, or architectural blueprints. Google makes its fine-tuning for Gemini’s API available free of charge.