͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Forwarded this email? Subscribe here for more

Data Science Weekly - Issue 576

Curated news, articles and jobs related to Data Science, AI, & Machine Learning

Data Science Weekly

Dec 5

READ IN APP

Issue #576
December 5, 2024

Hello!

Once a week, we write this email to share the links we thought were worth sharing in the Data Science, ML, AI, Data Visualization, and ML/Data Engineering worlds.

And now…let's dive into some interesting links from this week.

Editor's Picks

Amazon S3 Tables: Storage optimized for analytics workloads
Amazon S3 Tables give you storage that is optimized for tabular data such as daily purchase transactions, streaming sensor data, and ad impressions in Apache Iceberg format, for easy queries using popular query engines like Amazon Athena, Amazon EMR, and Apache Spark. When compared to self-managed table storage, you can expect up to 3x faster query performance and up to 10x more transactions per second, along with the operational efficiency that is part-and-parcel when you use a fully managed service…

Interviewing Finbarr Timbers on the "We are So Back" Era of Reinforcement Learning
Finbarr Timbers is an AI researcher who writes Artificial Fintelligence — one of the technical AI blog’s I’ve been recommending for a long time — and has a variety of experiences at top AI labs including DeepMind and Midjourney. The goal of this interview was to do a few things:
1. Revisit what reinforcement learning (RL) actually is, its origins, and its motivations.
2. Contextualize the major breakthroughs of deep RL in the last decade, from DQN for Atari to AlphaZero to ChatGPT. How could we have seen the resurgence coming? (see the timeline below for the major events we cover)
3. Modern uses for RL, o1, RLHF, and the future of finetuning all ML models.
4. Address some of the critiques like “RL doesn’t work yet.”
LLMOps Database
A curated knowledge base of real-world LLMOps implementations, with detailed summaries and technical notes…

Sponsor Message

Quadratic - analyze anything, host anywhere

With Quadratic, combine the spreadsheets your organization asks for with the code that matches your team’s code-driven workflows.

Powered by code, you can build anything in Quadratic spreadsheets with Python, JavaScript, or SQL, all approachable with the power of AI.

Use the data tool that actually aligns with how your team works with data, from ad-hoc to end-to-end analytics, all in a familiar spreadsheet.

Level up your team’s analytics with Quadratic today

.

* Want to sponsor the newsletter? Email us for details --> team@datascienceweekly.org

Data Science Articles & Videos

Model Validation Techniques, Explained: A Visual Guide with Code Examples
Here, I’ve organized these validation techniques — all 12 of them — in a tree structure, showing how they evolved from basic concepts into more specialized ones. And of course, we will use clear visuals and a consistent dataset to show what each method does differently and why method selection matters…
I read every major AI lab’s safety plan so you don’t have to
A handful of tech companies are competing to build advanced, general-purpose AI systems that radically outsmart all of humanity. Each acknowledges that this will be a highly – perhaps existentially – dangerous undertaking. How do they plan to mitigate these risks?…I tried to write this assuming no prior knowledge. It is aimed at a reader who has heard that AI companies are doing something dangerous, and would like to know how they plan to address that. In the first section, I give a high-level summary of what each framework actually says. In the second, I offer some of my own opinions…
Diffusion Meets Flow Matching: Two Sides of the Same Coin
Flow matching and diffusion models are two popular frameworks in generative modeling. Despite seeming similar, there is some confusion in the community about their exact connection. In this post, we aim to clear up this confusion and show that diffusion models and Gaussian flow matching are the same, although different model specifications can lead to different network outputs and sampling schedules. This is great news, it means you can use the two frameworks interchangeably…
Why hasn't forecasting evolved as far as LLMs have? [Reddit]
Forecasting is still very clumsy and very painful. Even the models built by major companies -- Meta's Prophet and Google's Causal Impact come to mind -- don't really succeed as one-step, plug-and-play forecasting tools. They miss a lot of seasonality, overreact to outliers, and need a lot of tweaking to get right…It's an area of data science where the models that I build on my own tend to work better than the models I can find. LLMs, on the other hand, have reached incredible versatility and usability…Why is that? After all the time we as data scientists have put into forecasting, why haven't we created something that outperforms what an individual data scientist can create? Or -- if I'm wrong, and that does exist -- what tool does that?…
Skimpy - A light weight tool for creating summary statistics from dataframes
skimpy is a light weight tool that provides summary statistics about variables in pandas or Polars data frames within the console or your interactive Python window. Think of it as a super-charged version of pandas’ df.describe()…
Probabilistic weather forecasting with machine learning
We introduce GenCast, a probabilistic weather model with greater skill and speed than the top operational medium-range weather forecast in the world, ENS, the ensemble forecast of the European Centre for Medium-Range Weather Forecasts. GenCast is an ML weather prediction method, trained on decades of reanalysis data…
Disposable environments for ad-hoc analyses
In this blog post, I explore the innovative 'juv' package, which simplifies Python environment management for Jupyter notebooks by embedding dependencies directly within the notebook file. This approach eliminates the need for separate environment files, making notebooks easily shareable and reducing setup complexity. I also discuss integrating 'juv' with 'pyds-cli' to streamline ad-hoc data analyses within organizations, enhancing reproducibility and reducing environment conflicts. Curious about how this could change your data science workflow?…
Universal Semantic Layer: Capabilities, Integrations, and Enterprise Benefits
This article examines how semantic layers fit into modern data architectures and their critical benefits, from API-driven access to enhanced governance, and why they've become essential in today's data stack…
Meta Learning: Addendum or a revised recipe for life
In 2021 I published Meta Learning: How To Learn Deep Learning And Thrive In The Digital World. The book is based on 8 years of my life where nearly every day I thought about how to learn machine learning and how to do machine learning efficiently and at a high level…The more I pursued ML professionally, the more I started to think that the recipe that led me up to this point no longer applied. On one hand, this hasn't impacted my work one bit. I have been an employee for nearly 20 years and that's plenty of time to figure things out. However, I didn't appreciate how big of a role being part of the fast.ai community and continuous learning played in my life. How important following the recipe was for me on a personal level. I share my experience in the hopes that should you find yourself in a similar situation, you might have an easier time balancing your personal growth trajectory and your work…
Successive Halving
There is an experimental technique for cross validation in scikit-learn that revolves around "Successive Halving". In this livestream we discuss how the technique works but also why the approach is still considered experimental at the time of making this recording…
Why You Should Care About AI Agents
So what exactly are AI agents, and how soon (if ever) can we expect them to become ubiquitous? Should their emergence excite us, worry us, or both? We’ve investigated the state-of-the-art in AI agents and where it might be going next…
Jobs where Bayesian statistics is used a lot? [Reddit]
How much bayesian inference are data scientists generally doing in their day to day work? Are there roles in specific areas of data science where that knowledge is needed? Marketing comes to mind but I’m not sure where else. By knowledge of Bayesian inference I mean building hierarchical Bayesian models or more complex models in languages like Stan…
How to be a wise optimist about science and technology?
I believe the central problem of the 21st century is how civilization co-evolves with science and technology. As our understanding of the world deepens, it enables technologies that confer ever more power to both improve and damage the world…This essay emerged from a personal crisis. From 2011 through 2015 much of my work focused on artificial intelligence. But in the second half of the 2010s I began doubting the wisdom of such work, despite the enormous creative opportunity…one needs a big-picture view of how humanity should meet the challenges posed by science and technology, and especially by artificial superintelligence (ASI). This essay attempts to develop such a big-picture view…

.

Last Week's Newsletter's 3 Most Clicked Links

.
* Based on unique clicks.
** Find last week's issue #575 here.

Cutting Room Floor

.

Whenever you're ready, 3 ways we can help:

Learning something for your job? Hit reply to get get our help.
Looking to get a job? Check out our “Get A Data Science Job” Course
It is a comprehensive course that teaches you everything related to getting a data science job based on answers to thousands of emails from readers like you. The course has 3 sections: Section 1 covers how to get started, Section 2 covers how to assemble a portfolio to showcase your experience (even if you don’t have any), and Section 3 covers how to write your resume.
Promote yourself/organization to ~64,300 subscribers by sponsoring this newsletter. 35-45% weekly open rate.

Thank you for joining us this week! :)

Stay Data Science-y!

All our best,
Hannah & Sebastian

Invite your friends and earn rewards

If you enjoy Data Science Weekly Newsletter, share it with your friends and earn rewards when they subscribe.

Data Science Weekly - Data Science Weekly - Issue 576

Data Science Weekly - Issue 576

Curated news, articles and jobs related to Data Science, AI, & Machine Learning

Issue #576
December 5, 2024

Editor's Picks

Sponsor Message

Quadratic - analyze anything, host anywhere

Data Science Articles & Videos

Last Week's Newsletter's 3 Most Clicked Links

Cutting Room Floor

Whenever you're ready, 3 ways we can help:

Invite your friends and earn rewards

Older messages

Data Science Weekly - Issue 575

Data Science Weekly - Issue 574

Data Science Weekly - Issue 573

Data Science Weekly - Issue 572

Data Science Weekly - Issue 571

You Might Also Like

Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator

Defining Your Paranoia Level: Navigating Change Without the Overkill

5 ways AI can help with taxes 🪄

Recurring Automations + Secret Updates

The First Provable AI-Proof Game: Introducing Butterfly Wings 4

GCP Newsletter #437

Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰

The Great Social Media Diaspora & Tapestry is here

Daily Coding Problem: Problem #1689 [Medium]

📧 Stop Conflating CQRS and MediatR

Data Science Weekly - Data Science Weekly - Issue 576

Curated news, articles and jobs related to Data Science, AI, & Machine Learning

Issue #576December 5, 2024

Editor's Picks

Sponsor Message

Data Science Articles & Videos

Last Week's Newsletter's 3 Most Clicked Links

Cutting Room Floor

Whenever you're ready, 3 ways we can help:

Invite your friends and earn rewards

Older messages

You Might Also Like

Issue #576
December 5, 2024