The grug brained approach to development is very appealing. To me, the whole manifesto can be boiled down to: “do not over-engineer systems” and “do not pre-maturely optimize”. So simple, so clean, so pure.
I agree with the merits of this approach, but I don’t often see it in practice, even in my own work. Most managers I’ve worked with have emphasized the grug brained mantra. Engineers, while developing, will self-regulate to try to keep things simple. So why aren’t more projects certified as grug brained then? How do over-engineered systems still exist with so many well intentioned and mantra-aligned developers?
Sometimes the cause of over-engineering is a lack of experience or a desire to try out a new technology. But I think this is a small minority of cases. My personal driver of over-engineering is almost always a lack of certainty. A simple system requires absolute certainty about how a system will be used, whereas a complex system does not limit itself in this way.
A simple, grug brained, system has chosen many opinionated, irreversible design decisions. What I mean by irreversible is that changing or expanding them requires an amount of work akin to starting from scratch. No one wants to throw away or waste work, so you design systems that will cover potential future needs, or at least don’t close future doors, and this leads to an over-engineered system.
Imagining every conceivable eventuality and making decisions to support or not support these eventualities is both challenging and time consuming. In practice, teams don’t have time to do this sort of analysis, so they over-engineer instead. They design a system which is capable of handling those future unknowns, to save time in rebuilding everything in the future. I think over-engineering is nearly always a time-saving exercise in the face of extreme uncertainty.
Let’s talk through a couple of examples. Usually a local-only, personal script is easiest to grug certify as simple. Let’s think through the implicit certainty in a script that makes some figures for a post:
1. No one but me will ever run this script
2. This script will only ever work with one input csv that I create.
3. This script will only ever need to make one plot.
4. The script will work from my local environment on my personal laptop.
These limitations are amazing — you can avoid so much complexity assuming all of the above will always be true. If you lift any one of these requirements it amounts to hundreds of extra lines of code.
In a larger project, you need to make the similar decisions to keep your simplicity in tact. Let’s imagine we have an AI agent chat app. Maybe you say:
1. We will have a max of 1 million users.
2. Only 1000 users will ever be simultaneously using the application.
3. Users don’t need to save conversations more than 7 days.
4. Users lose the current conversations on reconnection.
5. Users will only ever wait 5 seconds for an agent tool call.
6. Tool calls will always be IO bound.
7. Users can only upload pdfs, no other file types.
Pretty much all of the above lead to irreversible design decisions, that, if committed to, will allow for a pretty simple application. If you don’t commit fully and want to leave yourself open to eliminating one of these above requirements then you’re gonna end up making a lot more code, and different infrastructural choices.
So how can we make teams or projects more grug brained? It’s actually pretty easy, just create certainty. To be clear, you can’t create certainty in actuality, but you absolutely can in making the requirements for a system. Just be very opinionated, and tell your team that no user ever needs a response in less than 1 second, or that users will never have access to conversation histories, or that you’ll never have more than 1,000 writes per second to your backend. You’ll likely be wrong about these things, and then need to redesign your system. But I think you’ll net out as having saved time as your redesigns will happen when you have more certainty in your system’s usage.