Lenny's Newsletter - How to measure cohort retention
How to measure cohort retentionA deep dive into the formulas, visualizations, and SQL you need to accurately measure and report on cohort retention—a guest post by Olga Berezovsky👋 Hey, Lenny here! Welcome to this month’s ✨ free edition ✨ of my newsletter. Each week I humbly tackle reader questions about product, growth, working with humans, and anything else that’s stressing you out about work. If you’re not a subscriber, here’s what you missed this month:
Subscribe to get access to these posts, and every post. ✏️ Take the Activation Rate Benchmark SurveyWhat’s a good activation rate? How do you improve your activation rate? What do companies use for their activation milestones? Let’s pool our data and find out! 👉 Share your activation rate in this short survey 👈 If you contribute your data, you’ll be emailed the full (anonymized) results of the survey as soon as it’s ready. Seriously, take the survey and contribute your data! It takes under 2 minutes. Help me help you! Now, on to this week’s post…
I was also shocked to find that there wasn’t a great post out there on how to accurately and concretely measure cohort retention. Considering how important it is to all things product and growth—and how costly it is to get it wrong—this is a big gap. So let’s fix it. To help us out, I’ve tapped Olga Berezovsky, author of the wonderful Data Analysis Journal newsletter, to get deep into the weeds of retention. Below, she shares the formulas, SQL queries, tools, and templates you need to measure, visualize, and report on retention. I’ve never seen this level of detail and guidance before, and I’m excited to share it here. Thank you, Olga! And enjoy! You can follow Olga on LinkedIn and Twitter, and subscribe to her newsletter here. Many analysts (and product managers, founders, and most people) are intimidated by retention. The graphs can be difficult to read, the definitions vary for each product and business type, and the SQL is complex. At the same time, retention is both the most important and the least understood metric at most companies. In my experience, it takes up to six months to nail accurate retention reporting. On every team I was a part of over my 10+ year tenure as an analyst, I would continually work on our retention reporting. As one example, at Change.org, I was part of the Petition Starters squad, and by monitoring and slicing the Day 7 and Day 30 retention of users who started petitions, we grew “petition starters per day” (our KPI) by 450%. Each new petition generated new signatures, which helped us grow to over 450 million active users. Similarly, at MyFitnessPal (where I work now), Day 1 and Day 7 retention are closely monitored for every product initiative. In this post, I’ll share the methods, tools, and strategies I’ve developed over the years to help you calculate cohort retention and understand the nuances of reporting on retention. For a primer on retention, make sure to read two of Lenny’s previous posts on the topic, How to increase your retention and What is good retention, to get an introduction and learn why it’s important to monitor and grow retention. To recap, retention measures the ability of your product to keep users active over time. Retention for consumer businesses would have a different meaning than for enterprise, B2B, or B2C companies: Step 1. Define “active”As you can see above, a fundamental variable in calculating retention for most products is “active users.” But it’s not obvious what “active” means. Below are the most common events used to define “active” across companies. Each of these events has its own downsides and limitations:
At different companies, I had to report DAU and retention based on all these event variations, sometimes even mixing a few of them together (session start + app open + main transaction). Most companies use logins or app opens as main events for the “active users” definition. But given that I always aim for the cleanest and most precise data reporting, I’d recommend using the main user action as the activity event, for example: Pros to this approach:
Regardless of your business type, my recommendation is to use the main user action to monitor not only retention but other user activity metrics like DAU, WAU, MAU, and DAU/MAU ratio. Step 2. Differentiate users from customersOnce your team agrees on what “active” means for your product, the next step is to segment your active users into free vs. paid. Your retention will have a different logic for each of these groups.
Depending on your business model, your retention output will be built on a different metric input, for example: These metrics are your baseline for calculating retention. Now that you know who your active user or active customer is (e.g. a paid user who logged an exercise or a free user who read an article), you can start building retention to learn how often they come back. Often a mistake I see SaaS companies make is reporting one “blended” retention, with a mix of free and paid users. This can be misleading, because users who pay for the product are likely to use it way more than free users. So the true activity of free users (and your “pool of opportunity” to convert) will be hidden. Step 3. Pick your retention typeThe next step in calculating retention is setting your timeline. As a team, you have to agree on what retention reporting type you should adopt:
The method you choose will significantly affect your results. 1. X-day retentionX-day (also known as N-day or bounded) tells you the percentage of users who come back on a specific day. For example, for all users who join, what percentage return to your app on exactly day 14: This is the most conservative retention approach and will return you a lower retention percentage. 2. Unbounded retentionUnbounded retention tells you how many of your users got back on a specific day or later. For example, for all new users who joined on a specific day, what percentage of users are using the product after 14 days (and not necessarily on day 14). If your goal is to match retention with your user churn, this is the way to go. The unbounded retention value for month 6 gives you the percentage of users who returned after month 6. So which approach should you use? Both are correct, yet both will return completely different retention data: My general favorite method is unbounded retention because (1) I like my KPIs to be connected, proportional, and correlated to each other, and (2) unbounded retention is the inverse of churn, so it gives me another way to validate and test my data, increasing the confidence in my reporting (the higher the churn, the lower the retention, and vice versa). But you should pick the method that works best for your analytical objective and your users’ natural behavior:
Overall, a rule of thumb in analytics is that if your product is SaaS, you are likely to be tied to a specific time bound (paid subscription length, trial length, etc.). Because of that, it’s recommended to follow X-day retention. If you are not in SaaS but in a B2C or consumer transactional or social business, then you can be more flexible and adopt unbounded retention. Step 4. Reporting retention from BI applications vs. SQL1. Getting retention via product analytics toolsToday’s popular product analytics tools support retention reporting. Each application has its own specifics or limitations. Reporting retention from these tools is doable (and becomes common), assuming you have the right events you need, which often is not the case. To make things easy, Amplitude, Mixpanel, Kissmetrics, Google Analytics, Adobe Analytics, and others are integrated via SDK with the client side (browser, app, device) and pass you client data, which is usually user and app activity. While some of these events can be used for retention reporting, others won’t be sufficient. Read more to learn about when to use client-side or server-side events for analytics and why most analytics efforts fail. Often, to make changes to a subscription or a purchase, users are routed from the app to the app store (Apple Store, Google Play, Stripe, PayPal, etc.) to activate a trial, complete a payment, cancel or renew a subscription, etc. That’s why this payment data is likely not available in the app (client) analytics. So you won’t be able to access it in Amplitude or Mixpanel right away. Therefore, for SaaS products (those that report retention based on DAPU or DAC), you might not have the ability to replicate retention in product analytics tools, because they won’t have payment data. This is solvable, and your company might already address it. The common solution here is to load payment data into product analytics tools from a database via Segment or a data pipeline (assuming you already receive this data from a payment provider like Stripe/Apple/PayPal). But if you work for a small startup and are in the early process of setting up reporting and analytics, this is likely not the case. And creating cohorted or even simple retention most likely won’t be doable at this stage. Assuming you have needed events loaded in the product analytics tool, you can read the guidance on how to approach retention graphs in Amplitude, Mixpanel, or Google Analytics. ❗Heads-up: Amplitude, Mixpanel, or similar analytics by default report N-day retention. Read more about N-day or unbounded retention differences in Amplitude: 3 Ways to Measure User Retention. If your business is consumer social, you will have to change retention settings and customize it. ❗❗If you report retention from multiple sources—for example, from Amplitude and also from Tableau, which is sourcing data from a database—it will be very different, because of all the specifics described above: (1) rolling dates/N-day type, (2) different “activity” definition and tracking, and (3) possibly data availability and refresh gaps. 2. Getting retention via SQLThis is one of the common questions for analysts during SQL interviews and often is the first project to work on. It’s a multi-step process requiring a deep understanding of the business objective, a flexible approach, and data intuition. Looking back at all the places I’ve worked, I don’t remember a case where you could come in and quickly run some SQL to cohort your users into the right retention groups. Usually, some work needs to be done to the underlying tables to bring the data to the appropriate format and structure to work with retention. Same with DAU, churn, LTV, anything. First, you work on creating Sessions or Users or an Activity table that serves as a foundation for calculating your KPIs later. This table should have the right timestamp of user activity, event ID, user ID, properties, and measures you need. After the new table is tested, you can then work on creating downstream views for retention, DAU, or anything else. When replicating retention with SQL, it’s easy to go wild and overcomplicate your approach. Follow these steps to stay on track and get the correct retention logic:
Depending on your underlying data and table structure, these steps can be done via a simple SELECT statement (like SELECT * FROM ready_to_use_retention_table) or another quick way, such as
or via multiple subqueries, self-joins, window functions, or a combination of all of the above. That’s why this is such a good interview question—because, given the input values and the business type/case, your output SQL will be completely different. Regardless of SQL complexity, your output raw data may look similar to this for simple retention: And this is the underlying table structure for cohorted retention: Here is a sample SQL to replicate the cohorted table above:
Find more SQL solutions for retention types here. Step 5. Visualizing retentionOnce you have your SQL working, the next step is to visualize your data. There are many ways to visualize a retention report. I won’t be going into depth here, because the most common data visualization tools (Tableau, Power BI) require meaningful expertise. Same with Mode or Sisence. These tools let you leverage their cohorted graphs, and each format can be grouped by (1) time period (e.g. daily, monthly, annually) and then by (2) user segments (e.g. trialers, resubscribers, power users) to make your analytics really powerful. In Excel, you can leverage pivot tables and replicate the same charts there as well. In the pivot table editor, you can drag the initial activity dates into rows and the retained period into columns. For cohorted charts, it’s recommended to apply a color scale (click on conditional formatting, choose a color scale, then pick a color option). Without color-coding, cohorts are difficult to read. If you are looking for a quick template for a cohorted retention graph, I recommend using Christoph Janz’s cohort analysis template. If you have new customers and their activity dates, you can plug them into his file and get retention calculations. Regarding retention visualization format, quite often I see a line chart or a stacked line chart. Product analytics tools like Amplitude and Mixpanel by default create line charts similar to these: For a KPIs dashboard, it’s a good practice to have a simple summary table with retention stats: The best practice for reporting retention is cohorts: This chart shows how many users registered each week and how many of them were retained in each subsequent week. It is filtered to only “New Users.” Additionally, you can segment retention reporting for active users, churned users, inactive users, and reactivated users. That can help you locate your power users and understand what makes them stick. It’s expected that retention for these cohorts will be significantly different. So you should always cohort your users into different behavioral segments if you can. Like I said, your ability to create multiple user cohorts is linearly correlated with how well your analytics is set up. In companies with more advanced data stacks, the product team has the ability to locate, test, iterate, and release new features to only specific user segments. Having these groups cleanly and precisely cohorted also speaks to the maturity of the analytical domain. It takes time to get there, though. 📗 Takeaways🤝 Retention measures the ability of your product to keep (1) users (2) active (3) over time.
📈 Reporting retention from product analytics tools has its own specifics. It’s faster and easier to create a report and slice and dice, but given the analytics setup at your company, it may be limited or lead you further away from the questions you want to answer. 🗄️ Getting retention data via SQL is not straightforward either, and your SQL code will be different depending on your underlying data structure and format. Create the foundation metrics tables first, then work on your SQL from there. 📊 Visualizing retention via cohort graphs is the best practice that gives you the most insights into user behavior patterns. Other things to keep in mindGiven that retention includes so many elements of activity, it’s not the right metric to use for weekly reporting or as a baseline for an A/B test. Too often I have seen how product analytics teams evaluate experimentation against retention and wonder why the increase in user activity doesn’t lead to a change in retention. Because the activity is only one of the components of retention. Retention (like revenue) is the output metric. You should monitor it but not strictly utilize it as a goal for testing or campaigning. Additionally, every new executive who comes in pushes for their own definition and reporting format. Therefore, product analytics should be flexible with the dashboards and data sources to adopt whatever new retention meaning is requested (for example, changing app_open event to login, or 30-day retention calculation to rolling 28 days). I’ll stop here, but there is much more to retention to cover—the specifics of revenue retention reporting, retention differences between SaaS and B2C or B2B, handling user outliers, working with retention curve patterns, retention growth-rate benchmarks, and more. Thank you for reading! If you want to learn more about analyzing, measuring, or reporting user behavior, product metrics, or A/B tests, subscribe to my Data Analysis Journal, a weekly newsletter about product analytics and data science. 📚 Further study
Thanks, Olga! Have a fulfilling and productive week 🙏 📣 Hiring, or looking for a new gig? Join Lenny’s Talent Collective.Join Lenny’s Talent Collective to get bi-monthly drops of world-class hand-curated product and growth people who are open to new opportunities. If you’re looking for a new gig, join the collective to get personalized opportunities from hand-selected companies. You can join publicly or anonymously, and leave anytime. ❤️🔥 Featured job opportunities
If you’re finding this newsletter valuable, consider sharing it with friends, or subscribing if you haven’t already. Sincerely, Lenny 👋 You’re a free subscriber to Lenny's Newsletter. For the full experience, become a paid subscriber. |
Older messages
The nature of product | Marty Cagan, Silicon Valley Product Group
Monday, August 22, 2022
Listen now (60 min) | What are common diseases of product teams, and how do you avoid them? Why should you focus less on problem discovery and more on solution discovery? How do you maintain your
The art of building legendary brands | Arielle Jackson (Google, Square, Marketer in Residence at First Round Capit…
Thursday, August 18, 2022
Listen now (83 min) | What makes a great brand? After working at Google and Square, Arielle Jackson has spent the past eight years consulting startups on how to create powerful messaging that works. In
Kickstarting and scaling a consumer business—Step 6: SCALE: Build your growth engine
Tuesday, August 16, 2022
How products grow, how your product will grow, and the GTM motions of today's biggest consumer apps
The rituals of great teams | Shishir Mehrotra, Coda, YouTube, Microsoft
Sunday, August 14, 2022
Listen now (92 min) | Shishir Mehrotra is the co-founder and CEO of Coda, and formerly head of product and engineering at YouTube. In this episode, he shares his insights on growth strategy, how he
How to kickstart and scale a consumer business—Step 5: RETAIN: Iterate until enough people stick around
Tuesday, August 9, 2022
How to know if you've got product-market fit, how to iterate toward product-market fit, and how long it usually takes
You Might Also Like
🔴 24 hours left - you don’t have to be Rihanna to do this
Sunday, November 17, 2024
Turn your beauty brand dream into reality with a step-by-step blueprint Hey Friend , In just 24 hours, the doors close on your chance to access Alicia Scott's How to Build a Million Dollar Beauty
🗞 What's New: Fewer personalized Meta ads in Europe: A new ad crisis?
Sunday, November 17, 2024
Also: Has AI hit a plateau? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
What’s 🔥 in Enterprise IT/VC #420
Saturday, November 16, 2024
Why we wrote our largest initial Inception check of $12.5M in Tessl to build AI Native Software Development - what this means for Inception/seed fund sizing? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
🚨 48 hours left - your login info for Alicia’s course
Saturday, November 16, 2024
How to Build a Million Dollar Beauty Brand by Alicia Scott Hey Friend , Only 48 hours left. This is your last chance to get access to How to Build a Million Dollar Beauty Brand with Alicia Scott. After
🚀 SpaceX Seeks $250B Valuation
Friday, November 15, 2024
Plus $ASTS and Blue Origin team up, $RKLB soars, $SPIR and ABL's realignment, Q3 earnings and more! The latest space investing news and updates. View this email in your browser The Space Scoop Week
Small but Mighty AI
Friday, November 15, 2024
Tomasz Tunguz Venture Capitalist If you were forwarded this newsletter, and you'd like to receive it in the future, subscribe here. Small but Mighty AI 77% of enterprise AI usage are using
🗞 What's New: Meet the indie hackers killing it on TikTok
Friday, November 15, 2024
Also: ChatGPT can read code from your computer ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
what’s next for tech in 2025
Friday, November 15, 2024
emerging tech trends to have on your radar heading into the new year Hi there, AI breakthroughs are fueling a new era of tech innovation. Join our expert panel as they go live to unpack the trends that
How to go viral on TikTok (10 actionable tips)
Friday, November 15, 2024
Plus, clever automations to streamline your content creation. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
10words: Top picks from this week
Friday, November 15, 2024
Today's projects: Kaptr.me • Success.ai • MakeCharts.co • Quickads.ai • StoryBee • Subjects • JobSearch.Coach • Directorist • X-Ray.contact • Music-list • Taskheat • SalesDoubler 10words Discover