Data Science Weekly - Data Science Weekly - Issue 471

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #471

December 01 2022

Editor's Picks

  • Machine Learning Operations (MLOps): Overview, Definition, and Architecture
    We conducted mixed-method research, including a literature review, a tool review, and expert interviews. As a result of these investigations, we provide an aggregated overview of the necessary principles, components, and roles, as well as the associated architecture and workflows. Furthermore, we furnish a definition of MLOps and highlight open challenges in the field. Finally, this work provides guidance for ML researchers and practitioners who want to automate and operate their ML products with a designated set of technologies....
  • Challenges of Building Realtime ML Pipelines
    Realtime machine learning is on the rise, and as companies start introducing realtime into their ML pipelines, they are finding themselves having to weigh the trade-offs between performance, cost, and infrastructure complexity, and determine which to prioritize...In this post, we will look at some of the most typical trade-offs that occur at each stage of the transition from batch to realtime and why these advantages and disadvantages are important to keep in mind...
  • Good Machine Learning Practice for Medical Device Development: Guiding Principles
    The U.S. Food and Drug Administration (FDA), Health Canada, and the United Kingdom’s Medicines and Healthcare products Regulatory Agency (MHRA) have jointly identified 10 guiding principles that can inform the development of Good Machine Learning Practice (GMLP). These guiding principles will help promote safe, effective, and high-quality medical devices that use artificial intelligence and machine learning (AI/ML)...


A Message from this week's Sponsor:


12/20 Tech Talk: How to Modernize OLAP and BI with AtScale + Google BigQuery

Join this Tech Talk to learn how organizations query data quickly & with reduced complexity. You’ll learn about how leading analytics teams are rethinking legacy approaches to delivering low-latency business intelligence reporting across multiple verticals.


Data Science Articles & Videos

  • Essential Books for Data Scientists
    They used to say ‘an apple a day keeps the doctor away’, but we say ‘a chapter a day keeps the doctor away’. Okay, maybe that’s a slight exaggeration, but over the past couple of decades there has been mounting scientific evidence to support the health benefits of reading. Here, we’ve compiled a list of essential reading material for data scientists...
  • Goodbye, Data Science
    I had been a data scientist for the past few years, but in 2022, I got a new job as a data engineer, and it’s been pretty good to me so far...The main reason I soured on data science is that the work felt like it didn’t matter, in multiple senses of the words “didn’t matter”...
  • Idealism and pragmatism in visualization
    Some reactions to the Washington Post chart on the right, designed by my former student Luís Melgar, reminded me of a passage from The Art of Insight that I shared the other day. This morning, a few readers of this chart asked in social media: “Why isn't time on the X-axis?” implying that there's something wrong with that, as it breaks some convention or rule...
  • Deep (Learning) Focus Newsletter
    Deep (Learning) Focus is a newsletter that I release every two weeks. Each issue picks a single topic in deep learning research, provides (hopefully) all background information relevant to understanding the topic, overviews 3-4 impactful papers in this space, and provides various links/pointers to further expanding your knowledge of the topic (e.g., implementations, tutorials, more papers, etc.)...
  • A New Object Detection Benchmark
    In this paper we introduce the Roboflow 100 object detection benchmark consisting of 100 projects that span a wide array of imagery domains and task targets. We derived our benchmark selection from over 90000 public datasets, 60 million public images that are actively being worked on in the open on Roboflow....
  • The connectome of an insect brain
    Brains contain networks of interconnected neurons, so knowing the network architecture is essential for understanding brain function. We therefore mapped the synaptic-resolution connectome of an insect brain (Drosophila larva) with rich behavior, including learning, value-computation, and action-selection, comprising 3,013 neurons and 544,000 synapses. We characterized neuron-types, hubs, feedforward and feedback pathways, and cross-hemisphere and brain-nerve cord interactions...
  • ML Observability — Hype or Here to Stay?
    n explosion of tooling has led to a lack of consensus in the space — making it both a challenging and interesting one to explore. I hope this article will provide a useful framework for other investors who venture down the ML Observability rabbit hole...Below, we’ll delve into: What is ML Observability and why do we need it? How Observability tooling has the power to unlock a market. Why vendor solutions are in pole position to capture the value. The supporting tailwinds and hurdles we have yet to overcome. The challenges of an overload in tooling. Building a moat in a noisy market...
  • Gitlab SQL Style Guide
    This guide establishes our standards for SQL and are enforced by the SQLFluff linter and by code review. The target code changes that this stile guide apply to are those made using dbt...
  • Text-to-Image: Diffusion, Text Conditioning, Guidance, Latent Space
    Text-to-image has advanced at a breathless pace in 2021 - 2022, starting with DALL·E, then DALL·E 2, Imagen, and now Stable Diffusion. I dug into a couple of papers to learn more about the space and organized my understanding into a few key concepts: Diffusion: Gradually add noise to data and then learn to generate data from noise Text conditioning: Generating images given (i.e., conditioned on) a text prompt Classifier guidance: Using classifier gradients to text-increase image alignment Latent space: Applying diffusion on image embeddings instead of image pixels...
  • A/B Testing with Multiple Metrics
    A lot of literature references and guidance about A/B testing anchor on tests based on one single comparison or one single metric. Some of my friends working in tech also shared that they typically focus on one primary metric when performing experiment designs (e.g., study design and sample size calculation etc.)...I'm curious if this is a common practice in the tech industry (especially the non-biotech industries)...What I learned from industry practitioners is that multiple metrics (usually 3 to 5) are monitored even though the sample size might be based on one primary metric...




Do more with data, together.

Bring SQL, Python, no-code, and R together in one UI. From exploratory analyses to beautiful data apps to ML modeling and data science, Hex streamlines the entire analytics workflow so your team can focus on generating insights, driving decisions, and moving things forward. No more jumping between tools, struggling with versions, or sharing via screenshot. Try Hex free with a 14 day trial and join companies like Notion, Fivetran and AngelList who are doing more with data.

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!




Learn the Essentials of Data Science in the 21st century!

American University’s Institute for Data Science and Big Data is open to early and mid-career professionals looking to enhance their understanding of data science and apply it to their careers. Through seven days of lectures, guest speakers drawn from government, business and academia, and hands-on assignments on American University’s campus in Washington, D.C., you will learn tools, gain skills, and receive a certificate of completion to enhance your credentials.

To join us from Jan. 4 -12, 2023, apply now by December 23, 2022, by clicking here.

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!



  • Senior Data Analyst - Epic Games - New York

    Epic Games spans across 19 countries with 55 studios and 4,500+ employees globally. For over 25 years, we’ve been making award-winning games and engine technology that empowers others to make visually stunning games and 3D content that bring environments to life like never before.

    Use your expert experience in data & analytics to build powerful stories and visuals that inform the games we make, the technology we develop, and business decisions that drive Epic... Epic Games is looking for a Senior Data Analyst to help us create the models that fuel our creator economy. The successful candidate will have excellent SQL knowledge, and enjoy combining analytic skills with business acumen to provide the data and insights that will drive our continued success...


        Want to post a job here? Email us for details -->



Training & Resources

  • Not only is Stable Diffusion 2.0 not bad, but really better ——my prompt engineering experiments released the Stable Diffusion 2.0 model last weekend, which is the biggest update since Stable Diffusion 1.4 since August. However, the new release caused controversy in the AI art community. Users complained the distorted anatomical structures and weird fuzzy textures in generative outcomes look more like downgrade rather than an update. A mass especially when compared to the eye-pleasured easily-satisfied outcomes from Midjourney v4...
  • Causal Confounds in Sequential Decision Making
    In causal inference, we call a random variable that we don’t observe that influences a relationship we’d like to model a confounder. Using techniques from causal inference, we derive provably correct and scalable algorithms for sequential decision making in these sorts of confounded settings...We’re going to be focused mostly on imitation learning...
  • Sketch-Guided Text-to-Image Diffusion Models
    Our key idea is to train a Latent Guidance Predictor (LGP) - a small, per-pixel, Multi-Layer Perceptron (MLP) that maps latent features of noisy images to spatial maps, where the deep features are extracted from the core Denoising Diffusion Probabilistic Model (DDPM) network...We take a particular focus on the sketch-to-image translation task, revealing a robust and expressive way to generate images that follow the guidance of a sketch of arbitrary style or domain...

Last Week's Newsletter's 3 Most Clicked Links

* Based on unique clicks.
** Find last week's newsletter here.


Cutting Room Floor


P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2022, All rights reserved.
unsubscribe from this list    update subscription preferences 

Older messages

Data Science Weekly - Issue 470

Thursday, November 24, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #470 November 24 2022 Editor's Picks

[in case you missed it] Data Science Weekly - Issue 469

Sunday, November 20, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #469 November 17 2022 Editor's Picks

Data Science Weekly - Issue 469

Friday, November 18, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #469 November 17 2022 Editor's Picks

Data Science Weekly - Issue 468

Friday, November 11, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #468 November 03 2022 Editor's Picks

Data Science Weekly - Issue 467

Thursday, November 3, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #467 November 03 2022 Editor's Picks

Kotlin Weekly #340

Sunday, February 5, 2023

ISSUE #340 5th of February 2023 Announcements We Are Improving Library Authors' Experience! Are you a Kotlin librarian? The JetBrains team explains in this post some of the things they are doing to

Open Assistant – project meant to give everyone access to a great chat based LLM — Update on Samsung SSD Reliability — and Universal Summarizer

Saturday, February 4, 2023

Issue #1033 — Top 20 stories of February 05, 2023 Issue #1033 — February 05, 2023 You receive this email because you are subscribed to Hacker News Digest. You can open it in the browser if you prefer.

Daily Coding Problem: Problem #1013 [Medium]

Saturday, February 4, 2023

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Yahoo. Write a function that returns the bitwise AND of all integers between M and N ,

How to Use a USB Flash Drive With an Android Phone or Tablet

Saturday, February 4, 2023

Did You Know?: The theme song for the popular PBS show The Magic School Bus was performed by none other than rock-and-roll great Little Richard. Read in Browser Logo for How-To Geek February 4, 2023

Ranked | Top Online Music Services in the U.S. by Monthly Users 🎼

Saturday, February 4, 2023

This graphic shows the percentage of Americans that are monthly music listeners for each service. Which is most popular? View Online | Subscribe FEATURED STORY The Top Online Music Services in the US

Week in Review - Netflix crackdown, monetizing ChatGPT and bypassing FB’s 2FA

Saturday, February 4, 2023

TechCrunch Newsletter TechCrunch logo Week in Review logo By Henry Pickavet Saturday, February 04, 2023 Happy weekend, folks, and welcome back to the TechCrunch Week in Review. Henry here, standing in

Noonification: A Brief History of Open Source

Saturday, February 4, 2023

Top Tech Content sent at Noon! Get hands-on learning from ML experts on Coursera How are you, @hacker? 🪐 What's happening in tech this week: The Noonification by HackerNoon has got you covered with

Startups Weekly - 2023 is the year of the bottom line 

Saturday, February 4, 2023

TechCrunch Newsletter TechCrunch logo Startups Weekly logo By Natasha Mascarenhas Saturday, February 04, 2023 Welcome to Startups Weekly, a nuanced take on this week's startup news and trends by

One Song, Many Writers 🎸

Saturday, February 4, 2023

Why modern songs have super-sized writing credits. Here's a version for your browser. Hunting for the end of the long tail • February 04, 2023 Hey all, Ernie here with a piece from Chris Dalla Riva

🐍 New Python tutorials on Real Python

Saturday, February 4, 2023

Hey there, There's always something going on over at as far as Python tutorials go. Here's what you may have missed this past week: How to Iterate Over Rows in pandas, and Why