͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Forwarded this email? Subscribe here for more

Data Science Weekly - Issue 575

Curated news, articles and jobs related to Data Science, AI, & Machine Learning

Data Science Weekly

Nov 28

READ IN APP

Issue #575
November 28, 2024

Hello!

Once a week, we write this email to share the links we thought were worth sharing in the Data Science, ML, AI, Data Visualization, and ML/Data Engineering worlds.

And now…let's dive into some interesting links from this week.

Editor's Picks

Announcing the NeurIPS 2024 Test of Time Paper Awards
We are honored to announce the Test of Time Paper Awards for NeurIPS 2024. This award is intended to recognize papers published 10 years ago at NeurIPS 2014 that have significantly shaped the research field since then, standing the test of time…This year, we are making an exception to award two Test of Time papers given the undeniable influence of these two papers on the entire field. The awarded papers are:
- Generative Adversarial Nets
  Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio
- Sequence to Sequence Learning with Neural Networks
  Ilya Sutskever, Oriol Vinyals, Quoc V. Le…

Seven Sins of Quantitative Investing
In this paper, we discuss the seven common mistakes investors tend to make when they perform backtesting and build quant models. Some of these may be familiar to our readers, but nonetheless, you may be surprised to see the impact of these biases. The other sins are so commonplace in both academia and practitioner’s research that we usually take them for granted…
An Intuitive Explanation of Sparse Autoencoders for LLM Interpretability
Sparse Autoencoders (SAEs) have recently become popular for interpretability of machine learning models…Machine learning models and LLMs are becoming more powerful and useful, but they are still black boxes, and we don’t understand how they do the things that they are capable of. It seems like it would be useful if we could understand how they work…Using SAEs, we can begin to break down a model’s computation into understandable components. There are several existing explanations of SAEs, and I wanted to create a brief writeup from a different angle with an intuitive explanation of how they work…

Sponsor Message

Quadratic - analyze anything, host anywhere

With Quadratic, combine the spreadsheets your organization asks for with the code that matches your team’s code-driven workflows.

Powered by code, you can build anything in Quadratic spreadsheets with Python, JavaScript, or SQL, all approachable with the power of AI.

Use the data tool that actually aligns with how your team works with data, from ad-hoc to end-to-end analytics, all in a familiar spreadsheet.

Level up your team’s analytics with Quadratic today

.

* Want to sponsor the newsletter? Email us for details --> team@datascienceweekly.org

Data Science Articles & Videos

Evaluating Bayesian Mixed Models in R/Python
In this article, my goal guide is you through some useful model checking and evaluation VISUAL METHODS for Bayesian models (not your typical RMSE) in both R and Python. I will build upon an example and set of models covered in my previous post so I recommend you take a quick look before moving forward….
A Knight’s Tour
The “knight’s tour” is a classic problem in graph theory, first posed over 1,000 years ago and pondered by legendary mathematicians including Leonhard Euler before finally being solved in 1823. We will use the knight’s tour problem to illustrate a common graph algorithm called depth first search…
Ask, and it shall be given: Turing completeness of prompting
In this work, we show that prompting is in fact Turing-complete: there exists a finite-size Transformer such that for any computable function, there exists a corresponding prompt following which the Transformer computes the function. Furthermore, we show that even though we use only a single finite-size Transformer, it can still achieve nearly the same complexity bounds as that of the class of all unbounded-size Transformers…
The Problem with Reasoners
o1 reasoners are the most exciting models since the original GPT-4. They prove what I predicted earlier this year in my AI Search: The Bitter-er Lesson paper: we can get models to think for longer instead of building bigger models. It's too bad they suck on problems you should care about…
Model Context Protocol
The Model Context Protocol (MCP) is an open protocol that enables seamless integration between LLM applications and external data sources and tools. Whether you’re building an AI-powered IDE, enhancing a chat interface, or creating custom AI workflows, MCP provides a standardized way to connect LLMs with the context they need…
🎨 Diagram-as-Code: Creating Dynamic and Interactive Documentation for Visual Content
Diagrams is a 🐍Python library that implements the Diagram as Code approach, enabling you to create architectural infrastructure diagrams and other types of diagrams through code. With Diagrams, you can easily define cloud infrastructure components (such as AWS, Azure, and GCP), network elements, software services, and more, all with just a few lines of code…
Unlocking the power of time-series data with multimodal models
We compare the performance of multimodal models on the understanding of time-series data when presented visually as plots compared to numerical values. We find significant performance improvements when presented with plots on tasks like fall detection…
The optimisers curse
When looking for the best hyperparameters you can spend a lot of compute. So much so, that you can also spend *too much*. It is a sutble thing, but if you're not careful you can become a victim to something that's known as "the optimisers curse". This video explains the warning in full detail…
Hazard and Survival
Here’s a question from the Reddit statistics forum: “If I have a tumor that I’ve been told has a malignancy rate of 2% per year, does that compound? So after 5 years there’s a 10% chance it will turn malignant?”…This turns out to be an interesting question, because the answer depends on what that 2% means. If we know that it’s the same for everyone, and it doesn’t vary over time, computing the compounded probability after 5 years is a relatively simple…
You’ve Been Waiting for Native Mobile Apps with R? The Wait Is Over.
For the past couple of months, I’ve been sharing how webR will transform the way we build apps with R inside. If you’re unfamiliar, webR is a WebAssembly compilation of R. In simpler terms, it enables R to run within JavaScript environments. If you are familiar, you know it’s a bit more nuanced—but let’s keep it straightforward for now. I’m convinced that webR will redefine how we create apps that harness the power of R. And today, we’re taking the next big step: native mobile apps…
Dismantling ELT: The Case for Graphs, Not Silos
I generally advocate for breaking down silos between software development and data analytics teams by improving collaboration, aligning team incentives, and adopting engineering practices like data contracts and data products. In a nutshell, "Shift Left" applied to data analytics. I also wrote about how the rise of incremental processing (across all data platforms) only makes it more important to do this. With all that in mind, let’s now look at ELT (and its cousins ETL and Reverse ETL) where the letters stand for “Extract”, “Load”, “Transform”…
Unveiling DeepSeek: A Story of Even More Radical Chinese Technological Idealism [PDF]
Among the seven prominent large model startups in China, DeepSeek (深度求索) is the quietest yet often remembered for its unexpected moves. A year ago, its surprising presence stemmed from being backed by High-Flyer Quant (High-Flyer), a private quantitative hedge fund powerhouse. It was the only non-tech giant to stockpile tens of thousands of A100 chips…In May, a period saturated with AI developments, DeepSeek shot to fame by releasing an open-source model called DeepSeek V2, which offered unprecedented cost-effectiveness: inference costs dropped to just 1 RMB (0.14 USD) per million tokens—approximately one-seventh the cost of Llama3 70B and one-seventieth that of GPT-4 Turbo…
ChatGPT for Data Analysis: A Beginner’s Guide
ChatGPT is a comprehensive data analysis tool that can handle various data file formats, including Excel spreadsheets, CVS files, PDFs, and even JSON files…In this guide, I am going to show you how to use ChatGPT to perform several data analysis tasks in minutes without coding experience or expensive statistical software. After uploading your data, you can use simple conversational prompts to clean, transform, and visualize your data…

.

Last Week's Newsletter's 3 Most Clicked Links

.
* Based on unique clicks.
** Find last week's issue #574 here.

Cutting Room Floor

.

Whenever you're ready, 3 ways we can help:

Learning something for your job? Hit reply to get get our help.
Looking to get a job? Check out our “Get A Data Science Job” Course
It is a comprehensive course that teaches you everything related to getting a data science job based on answers to thousands of emails from readers like you. The course has 3 sections: Section 1 covers how to get started, Section 2 covers how to assemble a portfolio to showcase your experience (even if you don’t have any), and Section 3 covers how to write your resume.
Promote yourself/organization to ~64,300 subscribers by sponsoring this newsletter. 35-45% weekly open rate.

Thank you for joining us this week! :)

Stay Data Science-y!

All our best,
Hannah & Sebastian

You're currently a free subscriber to Data Science Weekly Newsletter. For the full experience, upgrade your subscription.

Data Science Weekly - Data Science Weekly - Issue 575

Data Science Weekly - Issue 575

Curated news, articles and jobs related to Data Science, AI, & Machine Learning

Issue #575
November 28, 2024

Editor's Picks

Sponsor Message

Quadratic - analyze anything, host anywhere

Data Science Articles & Videos

Last Week's Newsletter's 3 Most Clicked Links

Cutting Room Floor

Whenever you're ready, 3 ways we can help:

Older messages

Data Science Weekly - Issue 574

Data Science Weekly - Issue 573

Data Science Weekly - Issue 572

Data Science Weekly - Issue 571

Data Science Weekly - Issue 570

You Might Also Like

Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator

Defining Your Paranoia Level: Navigating Change Without the Overkill

5 ways AI can help with taxes 🪄

Recurring Automations + Secret Updates

The First Provable AI-Proof Game: Introducing Butterfly Wings 4

GCP Newsletter #437

Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰

The Great Social Media Diaspora & Tapestry is here

Daily Coding Problem: Problem #1689 [Medium]

📧 Stop Conflating CQRS and MediatR

Data Science Weekly - Data Science Weekly - Issue 575

Curated news, articles and jobs related to Data Science, AI, & Machine Learning

Issue #575November 28, 2024

Editor's Picks

Sponsor Message

Data Science Articles & Videos

Last Week's Newsletter's 3 Most Clicked Links

Cutting Room Floor

Whenever you're ready, 3 ways we can help:

Older messages

You Might Also Like

Issue #575
November 28, 2024