| | | Send the gift of Data Science Weekly Newsletter this holiday season | | | | Send a gift |
|
|
Hello! Once a week, we write this email to share the links we thought were worth sharing in the Data Science, ML, AI, Data Visualization, and ML/Data Engineering worlds.
And now…let's dive into some interesting links from this week.
What Are Shapley Interactions, and Why Should You Care? Shapley values are the go-to method for explainable AI because they are easy to interpret and theoretically well-founded. However, they struggle to capture the interplay between features…two years ago, we attended a talk that introduced the concept of Shapley interactions…
Streamlining AI Paper Discovery: Building an Automated Research Newsletter With the increasing volume of AI research being published, I found myself wanting a more automated way to discover papers aligned with my interests in practical AI implementation and regulation, among other LLM-related things. While there are excellent existing tools and curated newsletters, I wanted something tailored specifically to my research priorities – and I wanted to experiment with using Claude for content analysis…
Bloom Filters by Example A Bloom filter is a data structure designed to tell you, rapidly and memory-efficiently, whether an element is present in a set. The price paid for this efficiency is that a Bloom filter is a probabilistic data structure: it tells us that the element either definitely is not in the set or may be in the set. The base data structure of a Bloom filter is a Bit Vector. Here's a small one we'll use to demonstrate…
With Quadratic, combine the spreadsheets your organization asks for with the code that matches your team’s code-driven workflows. Powered by code, you can build anything in Quadratic spreadsheets with Python, JavaScript, or SQL, all approachable with the power of AI.
Use the data tool that actually aligns with how your team works with data, from ad-hoc to end-to-end analytics, all in a familiar spreadsheet. Level up your team’s analytics with Quadratic today . * Want to sponsor the newsletter? Email us for details --> team@datascienceweekly.org
The Gambler Who Cracked the Horse-Racing Code Bill Benter did the impossible: He wrote an algorithm that couldn’t lose at the track. Close to a billion dollars later, he tells his story for the first time… How much Github Actions should I know as a data engineer? [Reddit Discussion] Basically title. I really don't want to deep dive into it and get lost in the process and become a devops engineer. Do you have any recommendation materials?…
Computer Vision Basics In the previous blog, we set the stage for this series by exploring some foundational concepts. We discussed the difference between how humans perceive images and how computers process them, delved into the idea of noise in images, touched upon transformations and their applications, and got a glimpse into the basics of image processing…In this segment I’ll be covering: Edge Analysis: Understanding image gradients, Sobel operators, and Canny edge detection Feature Understanding: Harris corner detection, SIFT keypoint basics, and feature description Pattern Detection: Template matching, Haar cascades, and sliding window concept…
Building effective agents Over the past year, we've worked with dozens of teams building large language model (LLM) agents across industries. Consistently, the most successful implementations weren't using complex frameworks or specialized libraries…Instead, they were building with simple, composable patterns. In this post, we share what we’ve learned from working with our customers and building agents ourselves, and give practical advice for developers on building effective agents…
PostgreSQL Meets ScyllaDB’s Lightning Speed and Monstrous Scalability We are dealing with 2.7 billion database rows, consisting of pricing and availability data for up to 1 year ahead…the problem is, this huge data and massive traffic need to be processed and updated in real time. Because even a few minutes of delay can lead to bad user experiences, like price mismatches, overbooking, and inconsistency…we came across an article from Discord where they migrated from Cassandra to ScyllaDB due to performance issues…But after repeating the test and verifying it, it turns out it’s indeed mind-blowing. Especially on the write operation, in the load test, we achieved more than 200x higher throughput than we got on PostgreSQL...
The end of the “Age of Data”? Enter the age of superhuman data and AI In the ever-shifting landscape of Artificial Intelligence, pronouncements of the ‘end of an era’ are surprisingly common. The latest such declaration comes from Ilya Sutskever who recently suggested that the ‘age of data’—a period defined by the relentless pursuit of ever-larger datasets— is drawing to a close . But is he right? This post will argue that the age of data is far from over. Instead, it’s transforming into something even more powerful: an age of superhuman data…
Cognitive load is what matters There are so many buzzwords and best practices out there, but most of them have failed. We need something more fundamental, something that can't be wrong. Sometimes we feel confusion going through the code. Confusion costs time and money. Confusion is caused by high cognitive load. It's not some fancy abstract concept, but rather a fundamental human constraint. It's not imagined, it's there and we can feel it…
Boids, an artificial life program, which simulates flocking behavior of birds Boids is an artificial life program that produces startlingly realistic simulations of the flocking behavior of birds. Each "boid" (which is an abbreviation of "bird-oid object" follows a very simple set of rules. These rules will be discussed at length, but they can be summarized as follows: Separation: boids move away from other boids that are too close Alignment: boids attempt to match the velocities of their neighbors Cohesion: boids move toward the center of mass of their neighbors…
OpenAI's o3: The grand finale of AI in 2024 A step change as influential as the release of GPT-4. Reasoning language models are the current and next big thing…
Beyond Decoding: Meta-Generation Algorithms for Large Language Models We will present a tutorial on past and present classes of generation algorithms for generating text from autoregressive LLMs, ranging from greedy decoding to sophisticated meta-generation algorithms used to power compound AI systems…
What ML Concepts Do People Misunderstand the Most? [Reddit Discussion] I’ve noticed that certain ML concepts, like the bias-variance tradeoff or regularization, often get misunderstood. What’s one ML topic you think is frequently misinterpreted, and how do you explain it to others?…
Let’s think step by step: Chain of Thought prompting in LLMs LLMs are impressive feats of pattern recognition, able to sift through mountains of data and find connections invisible to the human eye. Yet, even with this prodigious ability, they sometimes stumble when faced with complex reasoning. They might offer a correct answer, but it's more akin to a lucky guess than genuine understanding…Chain-of-Thought prompting (CoT) offers a compelling solution to this intriguing paradox. By providing the LLM with a "chain" of reasoning—a series of logical stepping stones composed of intermediate steps, justifications, and supporting evidence—we guide it through the problem-solving process. It's like giving the LLM a cognitive map, enabling it to navigate the intricate pathways of logic and arrive at an answer not through chance, but through genuine comprehension…
A primer on machine learning in cryo-electron microscopy (cryo-EM) Cryo-electron microscopy (cryo-EM) has been gaining increasing popularity over the past few years. Used as a way to perform macromolecular structure determination for decades, cryo-EM really hit its stride around 2010, when it crossed the resolution thresholds needed to determine protein structures. The technique was so deeply powerful, so able to answer biological questions for which no alternative tool existed, that its creators were awarded the 2017 Nobel Prize in chemistry…
.
. * Based on unique clicks. ** Find last week's issue #578 here.
.
Learning something for your job? Reply and we’ll make a mini-course for you for free.
Looking to get a job? Check out our “Get A Data Science Job” Course It is a comprehensive course that teaches you everything related to getting a data science job based on answers to thousands of emails from readers like you. The course has 3 sections: Section 1 covers how to get started, Section 2 covers how to assemble a portfolio to showcase your experience (even if you don’t have any), and Section 3 covers how to write your resume.
Promote yourself/organization to ~65,230 subscribers by sponsoring this newsletter. 35-45% weekly open rate.
Thank you for joining us this week! :) Stay Data Science-y! All our best, Hannah & Sebastian
| |