- Machine Learning Operations (MLOps): Overview, Definition, and Architecture
We conducted mixed-method research, including a literature review, a tool review, and expert interviews. As a result of these investigations, we provide an aggregated overview of the necessary principles, components, and roles, as well as the associated architecture and workflows. Furthermore, we furnish a definition of MLOps and highlight open challenges in the field. Finally, this work provides guidance for ML researchers and practitioners who want to automate and operate their ML products with a designated set of technologies....
- Challenges of Building Realtime ML Pipelines
Realtime machine learning is on the rise, and as companies start introducing realtime into their ML pipelines, they are finding themselves having to weigh the trade-offs between performance, cost, and infrastructure complexity, and determine which to prioritize...In this post, we will look at some of the most typical trade-offs that occur at each stage of the transition from batch to realtime and why these advantages and disadvantages are important to keep in mind...
- Good Machine Learning Practice for Medical Device Development: Guiding Principles
The U.S. Food and Drug Administration (FDA), Health Canada, and the United Kingdom’s Medicines and Healthcare products Regulatory Agency (MHRA) have jointly identified 10 guiding principles that can inform the development of Good Machine Learning Practice (GMLP). These guiding principles will help promote safe, effective, and high-quality medical devices that use artificial intelligence and machine learning (AI/ML)...
A Message from this week's Sponsor:
12/20 Tech Talk: How to Modernize OLAP and BI with AtScale + Google BigQuery
Join this Tech Talk to learn how organizations query data quickly & with reduced complexity. You’ll learn about how leading analytics teams are rethinking legacy approaches to delivering low-latency business intelligence reporting across multiple verticals.
Data Science Articles & Videos
- Essential Books for Data Scientists
They used to say ‘an apple a day keeps the doctor away’, but we say ‘a chapter a day keeps the doctor away’. Okay, maybe that’s a slight exaggeration, but over the past couple of decades there has been mounting scientific evidence to support the health benefits of reading. Here, we’ve compiled a list of essential reading material for data scientists...
- Goodbye, Data Science
I had been a data scientist for the past few years, but in 2022, I got a new job as a data engineer, and it’s been pretty good to me so far...The main reason I soured on data science is that the work felt like it didn’t matter, in multiple senses of the words “didn’t matter”...
- Idealism and pragmatism in visualization
Some reactions to the Washington Post chart on the right, designed by my former student Luís Melgar, reminded me of a passage from The Art of Insight that I shared the other day. This morning, a few readers of this chart asked in social media: “Why isn't time on the X-axis?” implying that there's something wrong with that, as it breaks some convention or rule...
- Deep (Learning) Focus Newsletter
Deep (Learning) Focus is a newsletter that I release every two weeks. Each issue picks a single topic in deep learning research, provides (hopefully) all background information relevant to understanding the topic, overviews 3-4 impactful papers in this space, and provides various links/pointers to further expanding your knowledge of the topic (e.g., implementations, tutorials, more papers, etc.)...
- A New Object Detection Benchmark
In this paper we introduce the Roboflow 100 object detection benchmark consisting of 100 projects that span a wide array of imagery domains and task targets. We derived our benchmark selection from over 90000 public datasets, 60 million public images that are actively being worked on in the open on Roboflow....
- The connectome of an insect brain
Brains contain networks of interconnected neurons, so knowing the network architecture is essential for understanding brain function. We therefore mapped the synaptic-resolution connectome of an insect brain (Drosophila larva) with rich behavior, including learning, value-computation, and action-selection, comprising 3,013 neurons and 544,000 synapses. We characterized neuron-types, hubs, feedforward and feedback pathways, and cross-hemisphere and brain-nerve cord interactions...
- ML Observability — Hype or Here to Stay?
n explosion of tooling has led to a lack of consensus in the space — making it both a challenging and interesting one to explore. I hope this article will provide a useful framework for other investors who venture down the ML Observability rabbit hole...Below, we’ll delve into: What is ML Observability and why do we need it? How Observability tooling has the power to unlock a market. Why vendor solutions are in pole position to capture the value. The supporting tailwinds and hurdles we have yet to overcome. The challenges of an overload in tooling. Building a moat in a noisy market...
- Gitlab SQL Style Guide
This guide establishes our standards for SQL and are enforced by the SQLFluff linter and by code review. The target code changes that this stile guide apply to are those made using dbt...
- Text-to-Image: Diffusion, Text Conditioning, Guidance, Latent Space
Text-to-image has advanced at a breathless pace in 2021 - 2022, starting with DALL·E, then DALL·E 2, Imagen, and now Stable Diffusion. I dug into a couple of papers to learn more about the space and organized my understanding into a few key concepts: Diffusion: Gradually add noise to data and then learn to generate data from noise Text conditioning: Generating images given (i.e., conditioned on) a text prompt Classifier guidance: Using classifier gradients to text-increase image alignment Latent space: Applying diffusion on image embeddings instead of image pixels...
- A/B Testing with Multiple Metrics
A lot of literature references and guidance about A/B testing anchor on tests based on one single comparison or one single metric. Some of my friends working in tech also shared that they typically focus on one primary metric when performing experiment designs (e.g., study design and sample size calculation etc.)...I'm curious if this is a common practice in the tech industry (especially the non-biotech industries)...What I learned from industry practitioners is that multiple metrics (usually 3 to 5) are monitored even though the sample size might be based on one primary metric...
Do more with data, together.
Bring SQL, Python, no-code, and R together in one UI. From exploratory analyses to beautiful data apps to ML modeling and data science, Hex streamlines the entire analytics workflow so your team can focus on generating insights, driving decisions, and moving things forward. No more jumping between tools, struggling with versions, or sharing via screenshot. Try Hex free with a 14 day trial and join companies like Notion, Fivetran and AngelList who are doing more with data.
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!
Learn the Essentials of Data Science in the 21st century!
American University’s Institute for Data Science and Big Data is open to early and mid-career professionals looking to enhance their understanding of data science and apply it to their careers. Through seven days of lectures, guest speakers drawn from government, business and academia, and hands-on assignments on American University’s campus in Washington, D.C., you will learn tools, gain skills, and receive a certificate of completion to enhance your credentials.
To join us from Jan. 4 -12, 2023, apply now by December 23, 2022, by clicking here.
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!
- Senior Data Analyst - Epic Games - New York
Epic Games spans across 19 countries with 55 studios and 4,500+ employees globally. For over 25 years, we’ve been making award-winning games and engine technology that empowers others to make visually stunning games and 3D content that bring environments to life like never before.
Use your expert experience in data & analytics to build powerful stories and visuals that inform the games we make, the technology we develop, and business decisions that drive Epic... Epic Games is looking for a Senior Data Analyst to help us create the models that fuel our creator economy. The successful candidate will have excellent SQL knowledge, and enjoy combining analytic skills with business acumen to provide the data and insights that will drive our continued success...
Want to post a job here? Email us for details --> email@example.com
Training & Resources
- Not only is Stable Diffusion 2.0 not bad, but really better ——my prompt engineering experiments
Stability.ai released the Stable Diffusion 2.0 model last weekend, which is the biggest update since Stable Diffusion 1.4 since August. However, the new release caused controversy in the AI art community. Users complained the distorted anatomical structures and weird fuzzy textures in generative outcomes look more like downgrade rather than an update. A mass especially when compared to the eye-pleasured easily-satisfied outcomes from Midjourney v4...
- Causal Confounds in Sequential Decision Making
In causal inference, we call a random variable that we don’t observe that influences a relationship we’d like to model a confounder. Using techniques from causal inference, we derive provably correct and scalable algorithms for sequential decision making in these sorts of confounded settings...We’re going to be focused mostly on imitation learning...
- Sketch-Guided Text-to-Image Diffusion Models
Our key idea is to train a Latent Guidance Predictor (LGP) - a small, per-pixel, Multi-Layer Perceptron (MLP) that maps latent features of noisy images to spatial maps, where the deep features are extracted from the core Denoising Diffusion Probabilistic Model (DDPM) network...We take a particular focus on the sketch-to-image translation task, revealing a robust and expressive way to generate images that follow the guidance of a sketch of arbitrary style or domain...
Last Week's Newsletter's 3 Most Clicked Links
* Based on unique clicks.
** Find last week's newsletter here.
Cutting Room Floor
P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian