Real-World Engineering Challenges Roundup
👋 This is Gergely. A quick favor: if you could cast an upvote or share a comment for this newsletter on Product Hunt, I’d be grateful. This would help more people be aware of this publication. Thank you! This is a bonus issue of the Pragmatic Engineer Newsletter. In every issue, I cover challenges at big tech and high-growth startups through the lens of engineering managers and senior engineers. If you’re not a subscriber yet, here’s what you missed the past weeks:
Subscribe to get this newsletter every week 👇 Real-World Engineering Challenges is a bonus column I’m starting as additional content on top of the weekly articles. Here, I share a handful of the most interesting real-world engineering challenges I came across this month, based on engineering blog posts from tech companies. This is the first issue - and an experiment of this format. I pick pieces that cover interesting engineering approaches, and you can learn something new by reading them, and diving deeper into the concepts they mention. Read something that fits this? Share it with me! Shopify made their app 20% faster by… caching on the backendShopify built their app on top of React Native and noticed how the home page was starting to become slow on loading. This was because the Rails backend made multiple database queries, adding latency. The solution to this problem was - surprise! - adding a cache in front of those queries. The article goes into design considerations on choosing between Memchached and Redis, and touches on concepts like write-through caching, cache invalidation and the tradeoffs of delete-then-write strategies. What I really liked about this writeup is how they talk about one of the most challenging parts of a project impacting millions of customers: rolling out caching changes. They detail how they ran an experiment and how they fixed issues that the experiment surfaced before rolling out fully. Doordash introduced multi-tenancy for their user dataDoordash ran into an interesting use case where they wanted to create duplicate users so different user instances could be used for each of their Storefront merchants. They decided to use a multi-tenancy approach. In practice, this meant adding a tenant_id column to all user-related tables. They then made code changes, so all related systems started to use this tenant_id, and all systems routed this information, including passing through headers. Once this was all done (complex work in itself!), they added enforcing rules to the database. Multi-tenancy is a sensible approach as companies grow and scale and an approach worth becoming familiar with. Uber, for example, uses a similar multi-tenancy approach in their microservices architecture. Airbnb improved developer productivity for their 70+ iOS engineersThe Airbnb iOS app has 1.5M lines of code, 75 engineers working on it, and is localized in 62 languages. This puts them as one of the larger mobile app teams. With this growth, came frustrations like Xcode now taking minutes to load the project, long build times and slower overall iteration for engineers. These are all common to pain points I’ve experienced at Uber as the codebase and team grew and are a few of the many challenges of scaling up mobile engineering - the exact topic I wrote my book Building Mobile Apps at Scale about. Airbnb has an iOS infrastructure team in place who take ownership of improving mobile developer efficiency and experience. This is similar to how other, similarly sized mobile organizations approach engineering productivity. The mobile developer platform was one of the first platform teams Uber funded after they introduced platform teams. Here’s what Airbnb’s iOS infra team did to reduce frustrations and improve mobile productivity:
I especially like the concept of Dev Apps: almost every team struggling with slow build times created these “skeleton apps” manually for themselves, but it’s the first time I heard the infra team building tooling to generate these. Engineering being productive when building large mobile apps keeps being a hard problem, and there are few easy solutions. GitHub reduced their MySQL database load by partitioning itGitHub started out as a Ruby on Rails app with a MySQL database and MySQL remains at its core. In 2019, they decided to partition the database to reduce both load, and database-related incidents. The article is a detailed walk through of the staggering amount of work they had to do to enable partitioning
A term I came across for the first time - and could not find a definition to link to - is database cutover. This means rapidly transitioning from one database to the other, as part of the migration. The shorter the downtime, the better, and GitHub pulled off a zero-downtime cutover approach. The article is dense and well-written. I found it a good reminder on how if you find your framework or service not supporting a use case - like partitioning - you can build this by using a mix of existing, industry-standard tools, and building your own tooling. And how migrating databases with no downtime, though challenging, should be the standard to aim for: if GitHub could do it with such high load, so can anyone else. Nubank killed their end-to-end test suiteBrazil-based neobank Nubank had previously put E2E tests in place, but soon realized the many downsides. Slow test suites, flakey tests, and expensive maintenance and four more reasons. So how did they convince engineering leadership to make a major change? They pulled the ultimate card of demonstrating how the build will take close to infinite time to build, using queuing theory: Nubank is now experimenting with consumer driven contract testing, created a framework around this and discuss more details in the article. My take is that end-to-end tests are hard to do right, mostly due to the tooling just not being there. Nubank being open about how it just didn't work for them is a refreshingly honest admission. However, I do not fully buy the takeaway of how end-to-end tests are just not practical enough. I’m surprised they did not look deeper into why those tests were flakey. I also think that black box tests to exercise key parts of the system - which are typically a variation of E2E ones - have lots of added value. A good use case if having a black box testing suite to quickly validate if your key systems are up or down. For example, Uber utilizes these kinds of black box tests as part of their alerting system. 🤔 How would you rate this bonus newsletter?You’re on the free list for The Pragmatic Engineer. For the full experience, become a paying subscriber. Many readers expense this newsletter within their company’s training/learning/development budget. This post is public, so feel free to share and forward it. |
Older messages
How Big Tech Runs Tech Projects and the Curious Absence of Scrum
Tuesday, September 21, 2021
A survey of how tech projects run across the industry highlights Scrum being absent from Big Tech. Why is this, and are there takeaways others should take note of?
Advice for Tech Workers to Navigate the Most Heated Job Market of All Time
Wednesday, September 15, 2021
The job market is on fire across the globe. Here's advice on how to make the most out of it.
You Might Also Like
Quick question
Sunday, April 28, 2024
I want to learn how I can better serve you
Kotlin Weekly #404 (NOT FOUND)
Sunday, April 28, 2024
ISSUE #404 28st of April 2024 Announcements Kotlin Multiplatform State of the Art Survey 2024 Help to shape and understand the Kotlin Multiplatform Ecosystem! It takes 4 minutes to fill this survey.
📲 Why Is It Called Bluetooth? — Check Out This AI Text to Song Generator
Sunday, April 28, 2024
Also: What to Know About Emulating Games on iPhone, and More! How-To Geek Logo April 28, 2024 📩 Get expert reviews, the hottest deals, how-to's, breaking news, and more delivered directly to your
Daily Coding Problem: Problem #1425 [Easy]
Sunday, April 28, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Microsoft. Suppose an arithmetic expression is given as a binary tree. Each leaf is an
PD#571 Software Design Principles I Learned the Hard Way
Sunday, April 28, 2024
If there's two sources of truth, one is probably wrong. And yes, please repeat yourself.
When Procrastination is Productive & Ghost integrating with ActivityPub
Sunday, April 28, 2024
Automattic, Texts, and Beeper join forces to build world's best inbox, Reflect launches its iOS app, how to start small rituals, and a lot more in this week's issue of Creativerly. Creativerly
C#503 Building pipelines with System.Threading.Channels
Sunday, April 28, 2024
Concurrent programming challenges can be effectively addressed using channels
RD#453 Get your codebase ready for React 19
Sunday, April 28, 2024
Is your app ready for what's coming up in React 19's release
☁️ Azure Weekly #464 - 28th April 2024
Sunday, April 28, 2024
Azure Weekly Newsletter Issue #464 powered by endjin Welcome to issue 464 of the Azure Weekly Newsletter. In AI we have a good mix of high-level and deep-dive technical articles. Next-Gen Customer
Tesla profits tumble, Fisker flatlines, and California cities battle for control of AVs
Sunday, April 28, 2024
Plus, an up-close look at the all-electric Mercedes G-Wagen and more View this email online in your browser By Kirsten Korosec Sunday, April 28, 2024 Welcome back to TechCrunch Mobility — your central