Tech Turnaround

Deployments are scary. Outages are weekly. Your team is firefighting instead of building.

My personal story

The biggest challenge of my CTO career - which was full of challenges - was taking over a tech department with no process, no clear roles, a website that crashed with the company losing millions, no process, no engineering culture, no devops, random releases without process, no CI/CD, a relaunch project that seemed to never end - with constant changes from the foudners - a board which wanted Blackberry support and around 150 open bugs. I added some process, closed 2/3 of the open bugs, froze the relaunch requirements, the team excelled in delivering, and together with some excellent engineers we fixed the crashing website problems(many reasons b/c of explosive user growth, coding at night, praying in the morning). Stressful times but a tremendous success story in the end - with many more grey hairs.

The Fire-Fighting Trap

You’re not building anymore. You’re surviving.

Every deploy feels like Russian roulette and the on-call rotation is burning people out. That “temporary fix” from two years ago is now load-bearing infrastructure. Your best engineers are spending 60% of their time on incidents instead of features. PLUS the CEO asks why everything takes so long - and takes longer every quarter. You don’t have a good answer (see my iceberg example for an answer though!)

This is tech turnaround territory. ASee my personal story - I’ve been there several times.

How Systems Break Under Growth

There’s a pattern I see constantly in growing startups. The system that worked perfectly fine with 1000 users starts crumbling at 100.000 users - a typical scaling problem. Not because anyone did anything wrong - because the architecture was designed for a different scale - and unknown unknowns like ORM behaviour couldn’t be predicted.

The warning signs:

Deployments become events. Everyone fears deployments now. People avoid deploying on Fridays. Then Thursdays. Then any day that isn’t absolutely necessary.

Incidents cluster. One outage leads to another. The fix for yesterday’s problem causes tomorrow’s. Your monitoring dashboard looks like a Christmas tree - and not in a good way. You don’t even DataDog or Grafana anymore.

Velocity death spiral. Every sprint, less gets done. The backlog grows. Features that used to take days now take weeks. Your engineers are frustrated, and the good ones are updating their LinkedIn. Every feature is an emergency and top priority because the CEO thinks this is the only way to get anything moved.

Technical debt compounds. You know you should fix the foundation, but there’s always something more urgent. So you patch and pray. The debt compounds with interest.

The Blame Game

You sometimes inherit these problems. The previous CTO left a mess. The founders made expedient decisions in the early days (rightly so - they needed to survive - and don’t forget, that code made the company grow so they could hire you!) The system grew organically without a plan - because there was no business strategy or tech strategy in place.

No one cares - now it’s your problem.

And when things break, eyes turn to you. Never mind that you’ve been here six months and this architecture was built three years ago. Never mind that you’ve been warning about this exact risk since you arrived (you should have feature freezed the company and fix that mess in your first three months, no you’re perceived as part of the problem). You’re the CTO. The technology is broken. Therefore you’re failing.

This is the loneliest part of tech turnaround. You know the problems. You know what needs to be done. But you can’t do everything at once, and everything is on fire.

What Actually Works

I’ve led tech turnarounds at multiple companies. The playbook isn’t complicated, but it requires discipline.

Stop the bleeding first. If in a hole, stop digging. Before you can build anything new, you need stability. This might mean a feature freeze. It might mean adding monitoring and observability before adding features. It might highly likely mean saying no to the CEO for a quarter. Stabilization is a prerequisite for any progress.
Triage ruthlessly. Not all technical debt is equal. Some is deadly, some is ugly. And while I do understand no one wants to wade through ugly code, learn to tell the difference. Fix what’s critical. Document what’s acceptable. Ignore what doesn’t matter. I’ve seen teams waste months “cleaning up” code that nobody actually touches.
Make deployments boring again. If your deploys are scary, fix that first. CI/CD today is the minimum acceptable standard.
Build observability before features. You can’t fix what you can’t see. Before you write another line of feature code, make sure you can actually understand what your system is doing. Good monitoring isn’t expense - it’s insurance.
Create space for real work. The biggest trap in turnaround mode is that firefighting crowds out everything else. You need to protect time for actual improvement. This might mean explicit “no-meeting” days. It might mean rotating who’s on incident duty so others can focus. Find a way to make progress even while the fires burn.

The Timeline Question

Everyone wants to know: how long does a tech turnaround take?

Me: longer than you want, shorter than you fear.

Whatever you do: Don’t overpromise. I’ve done that, and later it’s very difficult to prove your promises after the rewrite - performance is most often not noticeably better.

Me: A true platform stabilization typically takes 6-12 months to see real results. The first 30 days are about stopping the bleeding and understanding the full scope. Months 2-4 are about building the foundation - CI/CD, monitoring, the boring stuff that makes everything else possible. Months 5-12 are when you start seeing compounding returns. Deploys get easier. Incidents decrease. Bugs go down. Website speed increases. Build times decrease. Test are more and work. Engineers start smiling again.

The mistake is expecting instant results and overpromising. The platform didn’t break overnight. It won’t heal overnight either.

When to Ask for Help

Tech turnaround is one of the hardest jobs in engineering leadership. You’re making high-stakes decisions with incomplete information while the building is on fire. You’re managing up (keeping the CEO calm), managing down (keeping the team motivated and on board), and managing out (keeping customers from leaving) - all at once.

Most CTOs I talk to in turnaround situations have the technical skills to fix the problems. What they lack is perspective. When you’re in the middle of a crisis, it’s hard to see clearly. You need someone who’s been through it before, who can help you prioritize, who can tell you which fires to fight and which to let burn.

That’s not weakness. That’s wisdom.

You Don't Have to Stabilize Alone

Tech turnaround is brutal. You’re blamed for problems you inherited. You’re expected to fix everything immediately. You’re making career-defining decisions every week.

The CTOs who succeed in turnaround situations aren’t necessarily smarter or more experienced than the ones who fail. They’re the ones who get help. Who find someone who’s seen these patterns before. Who can validate their thinking and help them prioritize when everything feels urgent.

If your platform is unstable and you’re not sure where to start - that’s exactly when outside perspective matters most.

Ready to Stabilize?

I’ve led tech turnarounds myself and coached CTOs through platform stabilization at companies from Series Seed to Series B and beyond. If you’re in firefighting mode and need someone who’s been there - let’s talk.

Learn About CTO Coaching

TL;DR: Tech turnaround territory means deployments feel like Russian roulette, on-call burns people out, and best engineers spend 60% of time on incidents instead of features. The playbook: stop the bleeding first (feature freeze if needed), triage ruthlessly (not all technical debt is equal), make deployments boring again with CI/CD, build observability before features, and create protected time for improvement amid firefighting. True platform stabilization takes 6-12 months - the first 30 days stop bleeding, months 2-4 build foundation, months 5-12 show compounding returns. Don't overpromise: the platform didn't break overnight and won't heal overnight.