Thoughts on Platforms, Core Teams, DORA Report and all that jazz
All the best in New Year! Here’s my summary of 2024: I survived! And that’s a success that I want to repeat in 2025. Right before christmas, I wrote a longer thread in social media with my thoughts on the hot topic: platform teams. I see many of my clients trying to apply it, some succeeding, some more, but all struggling. I’m not an expert in the internal platforms, but I was a part of the core teams, shaping continuous integration and delivery processes, and I see many similarities. As my thoughts spin up some good discussion (see here or there), I decided to also share it today with you. And that’s a free post, so if you enjoyed, feel free (pun intended) to share it with your friends! The challengeThe longer we work on some code, the more polished it gets? Actually, it’s not always like that. Past me believed in shared reusable code. Now, I’m looking at it from a different angle. Platforms, shared code, core libraries - they’re all faces of the same issue. We try to build one tool to rule them all, which never works. At least for a longer time. Whether you call it a “platform team“ or a “core team” the nature of the struggle remains the same: both groups aim to provide foundational solutions across the organization. They often wrestle with the same problem—trying to anticipate every scenario in a single, central codebase or service. Over time, these monolithic solutions become harder to maintain, and the team maintaining them drifts away from the real-world issues developers face day-to-day. That’s how both platform teams and core teams can fall into the trap of delivering an all-in-one system that erodes under the weight of endless patches and shifting priorities. Fallacy of the one to rule them allI believe that even in internal projects, we should focus more on small, focused tools rather than a monolithic platform. Tools are built for something; they don’t need to fulfil every possible scenario but can instead home in on a specific area. And I don’t mean only technical tools; I’m also talking about business features for user-facing areas—think back offices. Why does that matter? For most of our business, building one massive internal solution won’t ever be a business per se. It can only be a priority when it truly provides direct business value. That’s how we get to the counterintuitive part of “the longer honed the better.” Those big core/platform components tend to be the most stable after one or two years, when we already have a solid grasp of what we’re doing. Then they slowly degrade as the need for rapid delivery of internal capabilities becomes a priority, and teams focus back on their domain. Don’t believe me? Check this diagram from DORA Accelerate State of DevOps report 2024: Do you see the peak in 1-2 years? That’s when erosion starts. When we add more features, we already get an understanding of what we do. Then, they slowly degrade as the need for rapid delivery of internal capabilities slows down and thus becomes a priority. Teams start to focus on their domain. Adding a quick patch to a stable core won’t seem like a big deal, but it piles up. Meanwhile, the tech landscape changes. Better tooling emerges, and our homebrew solutions become obsolete. Add to that the fact that it’s hard to justify constantly working on a tech-focused platform that, according to management, “should have been already done.” Then the platform team often becomes detached from the daily struggles of other teams, and we start seeing Conway’s Law in action. Not great. What does that mean detached?
If you think that’s only my anecdotal evidence, check the interesting findings from DORA report. It can bring the instability of the development process:
And on skewed perspective of the organisation about the platform team impact:
Generic does not mean SimpleThat’s why I always say that “Generic does not mean Simple”. Reusability and maintainability are not values per se. My suggestion is to focus on delivering precise business or tech tools. You may not need to maintain them forever, and it can be simpler to justify and explain to management why you must keep them fresh. Of course, it’s less sexy than playing with the newest stack, but it pays off in the longer term. That’s what I was also doing with my OSS work. I have Event Sourcing samples that I delivered in the form: “take them in whole or parts, adjust to your needs”. Then observing how people interacted with them, seeing the usage patterns, motivated me to pack them into the reusable toolbox like Emmett. You can do it in the same way in your regular projects. Of course, we need to be careful. Copy/pasted templates can help, but they’re also cumbersome if we need to update them later and replicate changes across the organization. It’s more about “catalogue how we do it” than “enforce how we must do it” aiming to reduce the boring, repeatable setup. Creating a generic internal platform from the very beginning of a project can introduce artificial limitations. We’re making those hard decisions in the worst possible moment: when we’re dumbest. We don’t know the tech stack, we don’t know the domain, we’re shaping our product vision. We try to predict where and how the business and tech will evolve. Frankly, we’re more guessing than predicting. It’s often a sign of being too self-assured or short-sighted: it’s easy to whip up the solution that feels quick and friendly for a single case, but once we reach the long tail of features, we realize we didn’t anticipate everything. For example, rather than providing an all-encompassing core “everything module” that does automatic mapping, calls pluggable services, generates docs automatically, and so on, it’s more sustainable to offer smaller modules specifically focused on handling particular concerns. A huge internal platform must handle all possible scenarios, and maintainers have to guess the usage, whereas smaller, dedicated solutions let teams solve specific use cases based on the patterns they want to promote. People can also diverge more easily and adopt new tech when they’re not locked into an all-encompassing internal platform. It’s simpler to remove a small, now-obsolete piece of code than to tear out something massive. Plus, when multiple teams use those small modules, they can contribute improvements and provide feedback. Let’s quote DORA report again:
Plaform takes the painI love the term Platform Takes The Pain, it’s a great motto stated by Pia Nilsson responsible for Spotify platform. She explained it in the Corecursive Podcast episode. She told how they traansitioned from isolated teams into centralised, but focused platform. It’s a great story on transitioning the platform team mentality from playing with CI/CD tools and experimenting to becoming an enabling team for others. So yes, centralisation is not always bad. Some elements - like cross-service communication, documentation, or security - have to be enforced to keep consistency, but for most things, gentle guidance is enough. In Spotify they realised that platform should not be the elitist team enforcing on others how to behave, but helping them proactively, contributing hand in hand to introduce the new approach. That means making the actual Pull Requests to business modules modernising they’re way. That’s also a similar advice that Hazel Weakly did in our webinar on Applying Observability. She told on how to add observability and instrumentation:
Such approach, helps to set the priorities right, so enabling not enforcing. I saw that platform teams believe they’re the foundational and elitist team. At the same time, they’re there to help others deliver business value. We need to avoid the story shared by Ben Dumke-von der Ehe on his work in the Stack Overflow core team.
Pitfals and solutionsPart of the problem is also a mix of lack of trust and the mistaken assumption that centralizing everything will save development time overall. Early on, it might—but soon the Pareto principle hits, and we spend 80% of our time on the last tricky 20%. I’ve fallen into this trap myself, creating classes with 14 generic parameters or pipelines that claimed “zero config.” This quickly becomes a vicious circle: by dictating everything from the center, we don’t teach genuine ownership or a learning path. I’ve witnessed plenty of trouble arising from piling everything into one core. It might not be obvious at first, but after a year or two, a single, centralized repository of random code becomes an unmaintainable anchor. That’s not to say it can’t work—it’s just more likely to work with smaller teams that share an understanding of the domain. It’s also about how much knowledge each team has and how we foster continuous learning. A helpful tool might say: “If you stick to the recommended approach, you get OpenAPI for free. If you don’t, that’s fine, but then you own it and must maintain it.” Or another example: When building CI/CD pipelines, avoid magical, behind-the-scenes processes. Instead, provide plugins or templates with safe defaults. Infrastructure-as-Code tools like AWS CDK make it possible to build well-defined components rather than black boxes. In this way we’re showing people benefits. We’re giving the a carrot, not only a stick. And option to diverdge if that makes sense. It also gives people ownership and making them accountable. As Charity Mayors say: you build it, you run it. She did a nice talk Perils, Pitfalls and Pratfalls of Platform Engineering. She outlined the following pitfalls in her talk:
And that’s nicely wraps up on what we should avoid and what we also discussed so far. Platform as a backofficeI love the idea that Andreas Pinhammer explained in his talk DDD in large product portfolios. A key takeaway was how their “platform team” ended up being a business back-office focused on enabling others instead of just a “Kubernetes-and-other-technical-stuff” team. They focused on the business features that are not user facing, but can help other business teams to do stuff faster. The good example for that could be payment-module-as-a-platform. Multiple modules can require payments, there’s no point in repeating that features all around. We can gather the requirements, form a dedicated team that will be responsible for delivering business capabilities to other teams. There are good tools that can help in that Context Mapping and Team Topologies. You can start with those two talks:
In short that’s also the impact of the Conway Law, that we just cannot avoid. We discussed that in details in: Final adviceIf you’re already in the scenario where your platform/core is going rogue then my suggestion is to focus on delivering exact business instead of tech tools. You can start with the following steps:
By that we adapt by delivering smaller, targeted solutions, better tooling, and real synergy between operations, coding, and design. It’s all about enabling teams to be effective, accountable, and ready to adjust as our business grows and our tech evolves. Of course, it’s less sexy than playing with the new tech stack, but it will pay off in the longer term. What’s your experience, challenges and piece of advice? Please share it with me, I’m more than curious! Cheers! Oskar p.s. Ukraine is still under brutal Russian invasion. A lot of Ukrainian people are hurt, without shelter and need help. You can help in various ways, for instance, directly helping refugees, spreading awareness, and putting pressure on your local government or companies. You can also support Ukraine by donating, e.g. to the Ukraine humanitarian organisation, Ambulances for Ukraine or Red Cross. You're currently a free subscriber to Architecture Weekly. For the full experience, upgrade your subscription. |
Older messages
Locks, Queues and business workflows processing
Monday, December 30, 2024
Last week, we discussed Distributed Locking. Today, we'll continue with it but doing it differently: with a full backflip. We'll see how and why to implement locks with queuing. Then we'll
Distributed Locking: A Practical Guide
Monday, December 23, 2024
If you're wondering how and when distributed locking can be useful, here's the practical guide. I explained why distributed locking is needed in real-world scenarios. Explored how popular tools
On getting the meaningful discussions, and why that's important
Thursday, December 19, 2024
To put our design into practice, we need to be able to persuade our colleagues, stakeholders, and other peers. Without the ability to explain and persuade, even the best design will not be applied. And
The Write-Ahead Log: The underrated Reliability Foundation for Databases and Distributed systems
Tuesday, December 10, 2024
The write-ahead log (WAL) is everywhere. Yet, many people miss it and are not aware of it. The simple idea powers reliability in databases, messaging systems, and distributed systems. Let's discuss
Applying Observability: From Strategy to Practice with Hazel Weakly
Monday, December 2, 2024
Have you considered applying observability but struggled to match the strategy with the tooling? Or maybe you were lost on how to do it? I have something for you!I had a great discussion with Hazel
You Might Also Like
🎮 5 Cheap Apple AirPlay Receiver Alternatives — Your Game Controllers Need Firmware Updates Too
Tuesday, January 7, 2025
Also: The Best Free Offline Music Player Apps For Android How-To Geek Logo January 7, 2025 Did You Know It's a common practice in Japan to package toys with a single cheap piece of candy in order
Daily Coding Problem: Problem #1661 [Medium]
Tuesday, January 7, 2025
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Triplebyte. Implement a data structure which carries out the following operations
DRF, Temp Files, Dataclasses, and More
Tuesday, January 7, 2025
Building HTTP APIs With Django REST Framework #663 – JANUARY 7, 2025 VIEW IN BROWSER The PyCoder's Weekly Logo Building HTTP APIs With Django REST Framework This course will get you ready to build
Charted | The Pyramid of S&P 500 Returns (1874-2024) 💰
Tuesday, January 7, 2025
In 2024, the S&P 500 surged 23%, setting a series of record highs. We show these returns in a historical context spanning 150 years. View Online | Subscribe | Download Our App Presented by: Global
LW 164 - How to create new arrivals collection in Shopify using Shopify Flow
Tuesday, January 7, 2025
How to create new arrivals collection in Shopify using Shopify Flow Shopify Development news and
Tic-Tac-D’Oh 💻
Tuesday, January 7, 2025
The latest from the dull side of the internet. Here's a version for your browser. Hunting for the end of the long tail • January 07, 2025 Tic-Tac-D'Oh Dell decides to rebrand its machines along
Spyglass Dispatch: CaptAIn AmerIca...
Tuesday, January 7, 2025
Hulu, Fubo, Venu • NVIDIA's Cosmos • NVIDIA's DIGITS • Meta's Board Addition • Meta's Fact-Checking Subtraction • Dude, You're Getting a Dell Pro Max Premium The Spyglass Dispatch
DeveloPassion's Newsletter #183 - Knowledge Management for All
Tuesday, January 7, 2025
A newsletter discussing Knowledge Management, Knowledge Work, Zen Productivity, Personal Organization, and more! Sébastien Dubois DeveloPassion's Newsletter DeveloPassion's Newsletter #183 -
CES 2025 ICYMI: 8 top reveals so far
Tuesday, January 7, 2025
Bluesky's most-needed feature; A mulching robot mower; Linux man pages -- ZDNET ZDNET Tech Today - US January 7, 2025 ces55gettyimages-2191705850 CES 2025: ZDNET's 8 most impressive products we
Post from Syncfusion Blogs on 01/07/2025
Tuesday, January 7, 2025
New blogs from Syncfusion Introducing the New Blazor Chat UI Component By Silambarasan Ilango Enhance real-time communication with the Blazor Chat UI. Discover its features and use cases for creating