Real-World Engineering Challenges Roundup
👋 This is Gergely. A quick favor: if you could cast an upvote or share a comment for this newsletter on Product Hunt, I’d be grateful. This would help more people be aware of this publication. Thank you! This is a bonus issue of the Pragmatic Engineer Newsletter. In every issue, I cover challenges at big tech and high-growth startups through the lens of engineering managers and senior engineers. If you’re not a subscriber yet, here’s what you missed the past weeks:
Subscribe to get this newsletter every week 👇 Real-World Engineering Challenges is a bonus column I’m starting as additional content on top of the weekly articles. Here, I share a handful of the most interesting real-world engineering challenges I came across this month, based on engineering blog posts from tech companies. This is the first issue - and an experiment of this format. I pick pieces that cover interesting engineering approaches, and you can learn something new by reading them, and diving deeper into the concepts they mention. Read something that fits this? Share it with me! Shopify made their app 20% faster by… caching on the backendShopify built their app on top of React Native and noticed how the home page was starting to become slow on loading. This was because the Rails backend made multiple database queries, adding latency. The solution to this problem was - surprise! - adding a cache in front of those queries. The article goes into design considerations on choosing between Memchached and Redis, and touches on concepts like write-through caching, cache invalidation and the tradeoffs of delete-then-write strategies. What I really liked about this writeup is how they talk about one of the most challenging parts of a project impacting millions of customers: rolling out caching changes. They detail how they ran an experiment and how they fixed issues that the experiment surfaced before rolling out fully. Doordash introduced multi-tenancy for their user dataDoordash ran into an interesting use case where they wanted to create duplicate users so different user instances could be used for each of their Storefront merchants. They decided to use a multi-tenancy approach. In practice, this meant adding a tenant_id column to all user-related tables. They then made code changes, so all related systems started to use this tenant_id, and all systems routed this information, including passing through headers. Once this was all done (complex work in itself!), they added enforcing rules to the database. Multi-tenancy is a sensible approach as companies grow and scale and an approach worth becoming familiar with. Uber, for example, uses a similar multi-tenancy approach in their microservices architecture. Airbnb improved developer productivity for their 70+ iOS engineersThe Airbnb iOS app has 1.5M lines of code, 75 engineers working on it, and is localized in 62 languages. This puts them as one of the larger mobile app teams. With this growth, came frustrations like Xcode now taking minutes to load the project, long build times and slower overall iteration for engineers. These are all common to pain points I’ve experienced at Uber as the codebase and team grew and are a few of the many challenges of scaling up mobile engineering - the exact topic I wrote my book Building Mobile Apps at Scale about. Airbnb has an iOS infrastructure team in place who take ownership of improving mobile developer efficiency and experience. This is similar to how other, similarly sized mobile organizations approach engineering productivity. The mobile developer platform was one of the first platform teams Uber funded after they introduced platform teams. Here’s what Airbnb’s iOS infra team did to reduce frustrations and improve mobile productivity:
I especially like the concept of Dev Apps: almost every team struggling with slow build times created these “skeleton apps” manually for themselves, but it’s the first time I heard the infra team building tooling to generate these. Engineering being productive when building large mobile apps keeps being a hard problem, and there are few easy solutions. GitHub reduced their MySQL database load by partitioning itGitHub started out as a Ruby on Rails app with a MySQL database and MySQL remains at its core. In 2019, they decided to partition the database to reduce both load, and database-related incidents. The article is a detailed walk through of the staggering amount of work they had to do to enable partitioning
A term I came across for the first time - and could not find a definition to link to - is database cutover. This means rapidly transitioning from one database to the other, as part of the migration. The shorter the downtime, the better, and GitHub pulled off a zero-downtime cutover approach. The article is dense and well-written. I found it a good reminder on how if you find your framework or service not supporting a use case - like partitioning - you can build this by using a mix of existing, industry-standard tools, and building your own tooling. And how migrating databases with no downtime, though challenging, should be the standard to aim for: if GitHub could do it with such high load, so can anyone else. Nubank killed their end-to-end test suiteBrazil-based neobank Nubank had previously put E2E tests in place, but soon realized the many downsides. Slow test suites, flakey tests, and expensive maintenance and four more reasons. So how did they convince engineering leadership to make a major change? They pulled the ultimate card of demonstrating how the build will take close to infinite time to build, using queuing theory: Nubank is now experimenting with consumer driven contract testing, created a framework around this and discuss more details in the article. My take is that end-to-end tests are hard to do right, mostly due to the tooling just not being there. Nubank being open about how it just didn't work for them is a refreshingly honest admission. However, I do not fully buy the takeaway of how end-to-end tests are just not practical enough. I’m surprised they did not look deeper into why those tests were flakey. I also think that black box tests to exercise key parts of the system - which are typically a variation of E2E ones - have lots of added value. A good use case if having a black box testing suite to quickly validate if your key systems are up or down. For example, Uber utilizes these kinds of black box tests as part of their alerting system. 🤔 How would you rate this bonus newsletter?You’re on the free list for The Pragmatic Engineer. For the full experience, become a paying subscriber. Many readers expense this newsletter within their company’s training/learning/development budget. This post is public, so feel free to share and forward it. |
Older messages
How Big Tech Runs Tech Projects and the Curious Absence of Scrum
Tuesday, September 21, 2021
A survey of how tech projects run across the industry highlights Scrum being absent from Big Tech. Why is this, and are there takeaways others should take note of?
Advice for Tech Workers to Navigate the Most Heated Job Market of All Time
Wednesday, September 15, 2021
The job market is on fire across the globe. Here's advice on how to make the most out of it.
You Might Also Like
This Week in Rust #579
Saturday, December 28, 2024
Email isn't displaying correctly? Read this e-mail on the Web This Week in Rust issue 579 — 25 DEC 2024 Hello and welcome to another issue of This Week in Rust! Rust is a programming language
The Calm Voice Of Chaos 🏆
Friday, December 27, 2024
The protest singer whose songs shaped 2024. Here's a version for your browser. Hunting for the end of the long tail • December 27, 2024 The Calm Voice Of Chaos This year's Tedium awards start
JSK Daily for Dec 27, 2024
Friday, December 27, 2024
JSK Daily for Dec 27, 2024 View this email in your browser A community curated daily e-mail of JavaScript news Performance Optimization in React Pivot Table with Data Compression The Syncfusion React
Daily Coding Problem: Problem #1650 [Hard]
Friday, December 27, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Microsoft. Recall that the minimum spanning tree is the subset of edges of a tree that
🧠 3 Ways Quantum Computing Will Change Our World — How to Transfer Data to Your New iPhone
Friday, December 27, 2024
Also: Great Spotify Features That Apple Music Has Too, and More! How-To Geek Logo December 27, 2024 Did You Know 2004 was the last year that hidden (or "pop-up") headlamps appeared on a mass-
Charted | How U.S. Household Incomes Have Changed (1967-2023) 💰
Friday, December 27, 2024
When looking at inflation adjusted data, US households have definitely gotten a whole lot richer since 1967. View Online | Subscribe | Download Our App FEATURED STORY How US Household Incomes Have
Can Pirates Save Democracy?
Friday, December 27, 2024
Top Tech Content sent at Noon! Boost Your Article on HackerNoon for $159.99! Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, December 27, 2024? The
The 2025 Predictions You Can't Afford to Miss 🔮
Friday, December 27, 2024
Get a head start on what's to come in the New Year. Join VC+ to gain access to our 2025 Global Forecast Series and other exclusive insights! View email in browser HOW LEADERS STAY AHEAD IN 2025 The
DeveloPassion's Newsletter #182 - 2024 Retrospective
Friday, December 27, 2024
A newsletter discussing Knowledge Management, Knowledge Work, Zen Productivity, Personal Organization, and more! Sébastien Dubois DeveloPassion's Newsletter DeveloPassion's Newsletter #182 -
End 2024 on a High Note: The Top Writing Tips and Templates You Need
Friday, December 27, 2024
What's good, @newsletterest1! As we welcome 2025, let's take a moment to celebrate the incredible stories that fueled our hacker minds in 2024! We've compiled a roundup of the most-used