On Tuesday, I was right about something in the morning and very wrong about it by evening.
We’d been running an AI 101 workshop which included a fair bit of talking through what you can and can’t do with ChatGPT. Asked about generating images, I told the group that it was best not to rely on the AI for decent images. They all come out the same and including them in presentations, for example, would be the 2020s version of c.2000 stick man Clip-Art (reference for younger readers, Clip Art was an ick image feature of PowerPoint).
Of course, as soon as the workshop had finished, it was breakfast time in Silicon Valley and Open launched its new image generation system and, well, blew everyone’s minds with how good it was.
“Seems to be true for now”
I’ve said it before, I’ll say it again – borrowing William Goldman’s famous quote and applying at all times to the current state of AI will save your sanity and your blushes, if you remember to include yourself in the “nobody” part of the phrase. There’s a sub-clause to this rule – because we can’t just sit in declared ignorance and not act as if we might have a clue – that we can say “x seems to be true for now”.
If I had more time and talent I would have created something like this by Tianyu Xu on LinkedIn – a beautiful carousel storyboard explaining how the image generation feature works. Click here to go and take a look – it’s so good!
“Capabilities of LLMs are discovered, not designed”*
In the three days since the image feature has been released people have been discovering all sorts of uses for it. Here’s one I tripped over, which doesn’t quite work – it will create diagrams and mark up images with captions. Or at least it will attempt to - here’s a one-shot result from a prompt asking it to create a diagram labelling the features on ChatGPT mobile app:
It’s a little confused between phone functions and app functions, but not bad all the same. I could clean this up manually or instruct ChatGPT to edit to get close to the outcome.
Here’s my obligatory Studio Ghibli remake:
Copyright handling is weird and oblique. Specific names trigger policy blocks - e.g. Hayao Miyazaki – while broader styles like “Studio Ghibli” do not. Some historical figures and public domain references like Fritz Lang are unrestricted, which highlights inconsistencies, but others are blocked.
Another example is more subtle. I took a quick photo of the crowds in Brighton’s North Laine yesterday. Then I asked ChatGPT to recreate it in the style of the famous Poolside Gossip photo by Slim Aarons (I had to reverse engineer the style first because Aarons was blocked, unlike, say, Ansel Adams.)
101 and on and on
Another thing I noticed about the ChatGPT app during coaching sessions and workshops recently: it takes a lot longer to explain the basics than it used to. The complexity was there, but it revealed itself once the individual started trying to do more challenging tasks.
Where previously a walkthrough of the app took around 5–10 minutes, it now easily runs to half an hour. We're going to need to increase the amount of reference notes for users.
This change is partly due to the introduction of so many new features in ChatGPT – Canvas, Search, Deep research, Projects – which need explanation. Then there's all the new reasoning models (which beginners are best leaving alone until they have the hang of the base (for now) 4o model).
Bottom line: I think I’m going to have to write a manual and put out a video, but it would need to be updated almost every week.
ChatGPT is still the best tool for everyday use
At Brilliant Noise, we continue to recommend ChatGPT as the primary tool for organisations trying to increase AI literacy and adoption to learn and grow their practice due to its relatively stable interface, powerful features, and ease of use.
The competition is improving though. Here's a summary of them in order of my current preference.
Claude: An absolute joy to use and incredibly powerful for many specific use cases like coding and writing. Its recent feature and interface updates make it much easier to recommend to less experienced users.
Google Gemini: Impressive improvements, but its overly sensitive political, security and IP guardrails make it too bumpy a ride for beginners. When it is good, it is the best model. But it is only good 70% of the time. The integrations with Google Workspace and the superlative NotebookLM tool mean it is getting a lot more use from intermediate and advanced users in my team.
Microsoft Copilot: A bit like Gemini, Copilot trips over itself too often, creating clunky interactions and ongoing performance issues. You don't have as much control over how it behaves as with the other models. It's like a car with driver assistance features that take over when you're not expecting it and don't need it ("WTF! Why am I changing lanes!").
Grok: Objectively a good gen AI tool, but carries the same queasy provenance and brand baggage as Tesla. For now it stays in the lab for testing and comparisons. I can't imagine a client we would recommend it to for general use.
That’s all for this week…
Thank you for reading – as ever a like is very much appreciated if you found it useful. And just in case you missed last week’s Antonym, it was our most popular post yet, a summary of the well-received Keynote at the recent Agency Hackers event at The British Library.
Antony
I agree with your breakdown of tools, although my personal preference is for Claude over ChatGPT, but when doing training we also favour ChatGPT for mostly the same reasons