How Stack Ranking Led Me to Burnout

A series of events led me to burnout, which I believe I am experiencing for the second time in my 20+ year career. Arguably this is the worst one, but I don’t feel ready to talk about it yet. So more on that later. Having said that, perhaps the last straw was the experience of so-called stack ranking, or the “curve”. I at least want to share how I feel and let it out.

I am a builder. I love engineering, building software, building teams as well as leading them to delivery. I believe in constant growth, like being better than yesterday, improving day by day, and leading people to make the work environment a better place. I am not really sure where that incentive and motivation come from. However, it’s definitely not money. Money cannot buy passion. It cannot buy love. You don’t have to love your job, but as far as I’m concerned, the most successful people are the ones who love their craft. They strive to be better at their craft just because of the passion, and probably for the satisfaction that comes from results, success, the end product, whatever you name it.

You have probably heard about the “raise the bar” culture that artificially tries to cultivate the growth mindset I explained above. Since this kind of growth mindset is not organic, companies come up with their made-up ways of implementing that. Moreover, the implementation mainly comes from non-builders, who, respectfully, know nothing about the passion and satisfaction that natural builders have and experience. Stack ranking is probably one of the worst implementations of the “raise the bar” culture, but it is also a very good example of non-builders shooting themselves in the foot.

Stack ranking simply works by considering the workforce as a normal distribution, where you fit performance evaluations to a predefined “curve”. If you have 5 levels of ranking each performance period, 1 being the worst and 5 the best, then roughly 10 to 15% of the employees have to be placed at 1 as well as at 5, while the majority of them are evaluated in the middle, 3, which means in line with the expectations from the job.

Normal distribution has many applications and works reasonably well with large-enough random populations. The problem of applying it in the software engineering domain comes from a few invalid assumptions and also creates side effects comparable to Goodhart’s law.

The major flaw I see is sampling bias. As I mentioned, normal distribution works well with random populations. In most software engineering teams, there is a high threshold to be able to be included in the team. Basically everyone tries really hard to hire the best of the best. This removes random sampling from the table.

After a certain scale, software engineering jobs have very well-defined expectations. The idea that given any population, at least some people will fall behind the expectations is simply wrong given that you hired the best. That being said, you might make mistakes during the hiring process so it is possible to have some people below expectations. But this is not inevitable. If you did your job well while hiring, then everyone would perform at least at the level of 3 if not more.

This is where the non-builder definition of “raise the bar” comes in. Their claim is that we need to raise the bar each period, which means the expectations should go higher. In practice, managers are expected to compare the builders to each other, which moves from an absolute ranking to a relative ranking. Then you fit this relative evaluation to the aforementioned curve.

Relative performance evaluation triggers a human instinct that is similar to Goodhart’s law. At least I interpret it similarly. Charles Goodhart stated in 1975 that “When a measure becomes a target, it ceases to be a good measure.” This is because humans are great at finding workarounds to hit a measure when it becomes the target, even though those workarounds have diminishing returns. This is basic human nature. I believe humans do this sometimes even without realizing they are doing it.

When one’s performance is evaluated relative to others, then the simplest strategy is not to perform higher, but to cause others to perform lower, which basically places you higher compared to others. This is sad, but it is the unfortunate reality as far as I have experienced it. When you see people talking behind others that they did not succeed or did not perform as expected, this should ring a bell. Impulsively, they want their peers to not perform well, to avoid being placed below the curve.

This part of human nature removes collaboration in teams, because there is no shared success, there are only a few seats for success, and people have to race for them. That is heartbreaking for me, as a builder. Even writing about it makes me question the leadership of non-builders.

As a builder, I love collaboration with my teammates. I enjoy committing to a common goal, and getting there together. Stack ranking kills the joy of building. Even worse, it made my life miserable as a manager. Honestly, if I wasn’t a builder before moving to management, I guess I might not have been affected this much.

Steve Jobs once said:

You know who the best managers are? They’re the great individual contributors, who never ever want to be a manager, but decide they have to be a manager because no one else is going to be able to do as good a job as them.

I wonder if they ever implemented stack ranking at Apple back then. I find it hard to believe any great individual contributor would recommend it though.

Anyway, I am now clueless, feeling numb, trying to recover. I hope I will get out of this once again.