I have seen a lot of posts (as have we all I’m sure!) on Linkedin, Medium and Substack which fall into the category of “look at this amazing ChatGPT prompt I put together for X” where X is anything from marketing to company valuation. I appreciate the enthusiasm for AI, it’s an exciting and still nascent field, and it’s great when people show their work, but at the same time I can’t quite escape the feeling that these posts and articles are written for clout and views, rather than content.
There’s another more insidious problem with us believing the hype here, though, which I want to dig into here. But first, a short introduction to my (admittedly imperfect) understanding of how LLMs work to ground our argument in fact rather than belief in magic.
How does GenAI work?
The process for training generative models hinges on data. The underlying model architecture, the famous Transformer is remarkably expressive, meaning that it can deal with a huge variety of possible correlations between predictor and target variables (e.g. the seed text we put in a prompt and the text that the model predicts should follow). This is an excellent thing when we have correlations between things that we can’t describe or explain but want to be able to model, but a bad thing if we want to understand how the model works. There’s also another problem that we’re violating the idea of Occam’s razor in taking this approach (see a lovely chapter of one of my favourite books here for details).
The down side of these very expressive architectures is because they lack any enforced internal structure (this is what makes them so expressive!) they require a huge amount of data to learn it from data instead. The famous bitter lesson of Rich Sutton tells us that we should prefer data over “smart” choices of architecture, and large language models and image generation networks take this to extremes. The current iterations of ChatGPT are trained on around 18Tn tokens of text - using conservative estimates this is close to one third of all the textual data that human beings have produced since we started writing. You might thing that this is representative, but unfortunately the reality is that the cost of creating and printing text has been falling exponentially since the invention of the Gutenberg press, and it’s now so easy that this injects a huge recency bias into the data. Take any given piece of text from the ChatGPT training set, and there’s a 50:50 chance it was written in the last four years!
Because of the size of these datasets, it’s almost impossible to pinpoint the particular pieces of training data that contributed to a particular prediction. This means that when ChatGPT does something amazing, we almost reflexively assume that it’s done something “intelligent” rather than reached for some unexpected element of its training set. Tests, however, show that it is likely that it can’t reason - a recent study showed that all generative models did very poorly on new international maths olympiad problems which were not already on the internet.
This all leads me to view generative AI as a giant regurgitation machine - able to reproduce and do some limited recombination of material within its giant memory, but not capable of producing something fundamentally new. There’s another wrinkle though - material that is better represented on the internet will win out over material that is rarer, all prompts being equal. If you want to find the less well represented material, you have to fine tune your prompts more to access that particular part of the training set - hence the focus on prompt engineering over the past two years. This means that if you don’t put significant effort into tuning things, you’re likely to get the “internet average human” view on what you search for. Which may be just what you need.
The average adult male for different countries around the world.
Where’s the risk?
Essentially, the problem we face is a loss of nuance. People on the internet aren’t exactly well-known for their subtle, careful arguments - in some of the darker underbellies of the online world in fact, trolling and outright lies are commonplace currency. So we should probably be a little circumspect in using the output of LLMs as though they are gospel truth, and we should also be cautious not to assume that they have much in the way of depth.
When we prompt them to give us a template for “go to market for a seed round B2C SaaS company” for example, we probably shouldn’t expect too much nuance around what we do. Even if we do manage to get something sensible for this, it will lack the specifics of our current company, and is also going to be backwards looking (and possible subject to a host of other biases, notably survivorship bias). This is what leads me to say “by all means use generative models, but take time at the end to put yourself back in” - what I really mean is that it will give you a generic template for your thoughts, but you still need to fill it in to have something valuable at the end.
We might also wonder about whether using such solutions might force us to think within a particular framework, which makes us less likely to be creative too… I’ll just drop this link here about that!
Internet average just became cheap.
Or to put my more insidious concern more aptly: If anyone can use it, is it an edge? It used to be that the internet average opinion required a chunk of time searching, reading, synthesising and aggregating ideas to come up with a point of view that reflected the majority. Now, we can just ask a language model, and it takes seconds. The question we might want to ask though is, just because it became a whole lot faster, does that make it more or less valuable? Just because we can doesn’t mean we should, after all.
There’s actually an even deeper concern we might have - there are a chunk of places where the focus should be on tail risk - where being wrong small and often can be wildly outstripped by being right big very occasionally. The classic example of this is venture capital - the difference between mediocre funds and the very best houses out there isn’t the vast majority of deals, which if we’re totally honest tend to be losers anyway. It’s that the best funds get into the biggest opportunities early, allowing them in on the ground floors of companies that go on to be worth many many multiples of the average.
Obviously, there are common characteristics that are worth looking for - the right combinations of founding team, problem and a large and potentially untapped market are all good ones to start with. The extra signal that allows investors to spot that AirBnB are going to be a great bet, rather than an airbed and a croissant in someone else’s house, that’s tail risk - and it’s very unlikely to be well-represented in the training set up to 2025.
Imagine my concern then when I have seen people posting about prompts they used to do due diligence on potential investments, using ChatGPT to cookie-cutter them a form to fill in, and describing how it will help you pick winners. An equivalent thing would have been possible years ago with a lot more effort, but now all it takes is however long it takes to type in a 200 word prompt (or ctrl-c ctrl-v from Linkedin) and wait for the AI to do its magic. If it’s that easy, then anyone can do it. And as we all know from economics courses, its scarcity that gives things value.
I’m sure you don’t need me to tell you this, dear readers, but if it looks like a get rich quick scheme, and it seems like you’re going to make money for nothing, unless you’re a billionaire (scarcity of opportunity), it’s unlikely that it’s going to make you rich. That’s all, that’s the message.
Where from here?
So, how should you use Generative AI? That’s a big question to answer. Instead, let me tell you how I use it!
I use generative AI when I want to produce boiler plate - if I already know exactly what it is I am going to do (or have done!), and I want to produce documentation to describe it or explain it, that’s not a bad use. You still have to read the words or the code, and check that the model has done what you want (because often it hasn’t!), but it saves you from writing all of it. I think of this more as a box-ticking exercise to use the language of David Graeber’s bullsh*t jobs - I have already come to a conclusion, now I need to justify it.
Where I refuse to use Gen AI is when I want to produce something - in the creative process, it’s more likely to hem me in than set me free, and the point of e.g. writing these notes is to think freely about something in my own way. I don’t want the influence of the internet average on that - I prefer the replies and agreement or disagreement of real humans, rather than the validation of an avatar of “internet human”.
If something really matters to you - do the work!
Where next?
In “The Time Machine”, H. G. Wells spells out a dystopian view of the far future, where humanity has split into two different species - the aloof, surface dwelling Eloi, and the frightening troglodyte Morlocks. It’s a disturbing view of where inequality of opportunities could lead, written 150 years ago, and based on the idea that the aristocratic class evolved into the Eloi, and the ruffians of the working class became the Morlocks. Whilst I find the overall premise a bit overblown, there is a potential split happening right now in the world of knowledge work which brings it to mind.
The CTO of a previous startup I worked at liked to make the provocative statement “in the future, we will divide into two different classes - the creatives and the data labellers”. To use a phrase that is in vogue, but detestable in its usual usage, this was meant to be taken “seriously but not literally”. What he got right is that AI has the potential to separate groups based on their usage of the technology - those who use it for everything will ultimately lose their creative drive, and fall into the “data labeller” class. So be careful - use your brains, by all means let the machines work for you, but don’t let them think for you!!