Data Science Weekly - Data Science Weekly - Issue 586
Data Science Weekly - Issue 586Curated news, articles and jobs related to Data Science, AI, & Machine LearningIssue #586 |
|
Last Week’s Poll:
Data Science Articles & Videos
Comparing Apache, CNCF, and Commonhaus
I've used open source projects for over 30 years and contributed for about 20 of those. My first interaction with an open source foundation was with Apache when I began working with Apache Hadoop in 2008. Since then, I have contributed to many Apache projects, created Apache Samza, and was mentor and project champion for Apache Airflow…I'm frequently ask for guidance on which open source foundation to donate a project to. I've decided to share my thoughts in this post…BAML is like building blocks for AI engineers
In this post, I’ll explain more about how BAML, a domain-specific language for helping LLMs generate better structured outputs, provides AI engineers the necessary building blocks to create more composable, testable and robust LLM and agentic workflows. If you’ve never heard of BAML, check out my previous post that introduces its fundamentals…How do companies with hundreds of databases document them effectively? [Reddit Discussion]
For those who’ve worked in companies with tens or hundreds of databases, what documentation methods have you seen that actually work and provide value to engineers, developers, admins, and other stakeholders?..I’m curious about approaches that go beyond just listing databases, rather something that helps with understanding schemas, ownership, usage, and dependencies…Have you seen tools, templates, or processes that actually work? I’m currently working on a template containing relevant details about the database that would be attached to the documentation of the parent application/project, but my feeling is that without proper maintenance it could become outdated real fast. What’s your experience on this matter?…Open sourcing kubenetmon: how we monitor data transfer in ClickHouse Cloud
When it comes to data transfer, cloud providers typically charge you for:NAT Gateways;
Load Balancers;
Cross-Availability Zone traffic;
Egress, where the cost basis depends on which region you egress from and where you egress into;
Ingress, where the cost basis also depends on which region you ingress into and where the remote is.
We set out to untangle this complexity, and this blog post is going to tell you how…
Defining Unique Identifiers
Generally speaking, a Unique Identifier (UID) is an inscription that represents (no more than) one entity within a given system. UIDs are essential to the functioning of modern information systems, so it is important to understand and define what a UID is and how it should be used…In this post, I will define unique identifiers by deducing their essential properties…Polars Cloud: the distributed Cloud Architecture to run Polars anywhere
Our goal is to enable Scalable data processing with all the flexibility and expressiveness of Polars’ API. We are working on two things; Polars Cloud and a completely novel Streaming Engine design. We will explain more about the streaming engine in later posts; Today we want to share what are building with Polars Cloud…It will be very seamless to spin up hardware and run Polars queries remotely, either in batch mode for production ETL jobs, or interactively doing data exploration. The rest of the post, we want to explore this through a few code examples…What companies/industries are “slow-paced”/low stress? [Reddit Discussion]
I’ve only ever worked in data science for consulting companies, which are inherently fast-paced and quite stressful. The money is good but I don’t see myself in this field forever. “Fast-pace” in my experience can be a code word for “burn you out”…Out of curiosity, do any of you have lower stress jobs in data science? My guess would be large retailers/corporations that are no longer in growth stage and just want to fine tune/maintain their production models, while also dedicating some money to R&D with more reasonable timelines…An Unexpected Reinforcement Learning Renaissance
The era we are living through in language modeling research is one pervasive with complete faith that reasoning and new reinforcement learning (RL) training methods will work. This is well founded. A day cannot go by without | a new reasoning model, RL training result, or dataset distilled from DeepSeek R1…The goal of this talk is to try and make sense of the story that is unfolding today…How to disaggregate a log replication protocol
This post continues my series looking at log replication protocols, within the context of state-machine replication (SMR) or just when the log itself is the product (such as Kafka). So far I’ve been looking at Virtual Consensus, but now I’m going to widen the view to look at how log replication protocols can be disaggregated in general (there are many ways)…
Cutting through Complexity: How Data Science Can Help Policymakers Understand the World
This chapter looks at examples of where innovations from data science are cutting through the complexities faced by policymakers in measurement, allocating resources, monitoring the natural world, making predictions, and more. These examples show the promise and potential of data science to aid policymakers, and point to where actions may be taken that would support further progress in this space…Exploring the bioRxiv API with R, httr2, rvest, tidytext, and Datawrapper
Collect metadata and publication details for >200k preprints over a 10 year period, investigate trends, and scrape full text for sentiment analysis…Understanding Model Calibration: A Gentle Introduction & Visual Exploration
In this blog post we’ll take a look at the most commonly used definition for calibration and then dive into a frequently used evaluation measure for Model Calibration. We’ll then cover some of the drawbacks of this measure and how these surfaced the need for additional notions of calibration, which require their own new evaluation measures…Learn how to make QGIS Plugins with AI coding tools (video)
I recently published a post on my experience using Cursor to create a new QGIS plugin. It seems to have inspired a few people, and so I decided to record a couple videos to try to show everyone exactly the process to do it. I’ve felt that being able to build things like QGIS Plugins has been life-changing, and so I just wanted to help demystify the process…
.
Last Week's Newsletter's 3 Most Clicked Links
.
* Based on unique clicks.
** Find last week's issue #585 here.
Cutting Room Floor
Round and Round We Go! What makes Rotary Positional Encodings useful?
Vector Fields - An Introduction to Vector Fields and Velocity Fields
.
Whenever you're ready, 2 ways we can help:
Looking to get a job? Check out our “Get A Data Science Job” Course
It is a comprehensive course that teaches you everything related to getting a data science job based on answers to thousands of emails from readers like you. The course has 3 sections: Section 1 covers how to get started, Section 2 covers how to assemble a portfolio to showcase your experience (even if you don’t have any), and Section 3 covers how to write your resume.Promote yourself/organization to ~66,500 subscribers by sponsoring this newsletter. 35-45% weekly open rate.
Thank you for joining us this week! :)
Stay Data Science-y!
All our best,
Hannah & Sebastian
You're currently a free subscriber to Data Science Weekly Newsletter. For the full experience, upgrade your subscription.
Older messages
Data Science Weekly - Issue 581
Thursday, January 9, 2025
Curated news, articles and jobs related to Data Science, AI, & Machine Learning ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Data Science Weekly - Issue 580
Friday, January 3, 2025
Curated news, articles and jobs related to Data Science, AI, & Machine Learning ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Data Science Weekly - Issue 579
Thursday, December 26, 2024
Curated news, articles and jobs related to Data Science, AI, & Machine Learning ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Data Science Weekly - Issue 578
Thursday, December 19, 2024
Curated news, articles and jobs related to Data Science, AI, & Machine Learning ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Data Science Weekly - Issue 577
Thursday, December 19, 2024
Curated news, articles and jobs related to Data Science, AI, & Machine Learning ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
You Might Also Like
That Loving Feeling
Wednesday, March 26, 2025
OpenAI's product launches are stirring something which Apple hasn't in a while That Loving Feeling OpenAI's product launches are stirring something which Apple hasn't in a while By MG
JSK Daily for Mar 26, 2025
Wednesday, March 26, 2025
JSK Daily for Mar 26, 2025 View this email in your browser A community curated daily e-mail of JavaScript news Easily Render Flat JSON Data in JavaScript File Manager The Syncfusion JavaScript File
How and why PlanetScale built a VM in Go
Wednesday, March 26, 2025
Plus running Go code on the PlayStation 2. | #547 — March 26, 2025 Unsub | Web Version Together with Stytch logo Go Weekly Go on the PlayStation 2 — If you like tinkering with consoles and shoe-
Daily Coding Problem: Problem #1729 [Medium]
Wednesday, March 26, 2025
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Mailchimp. You are given an array representing the heights of neighboring buildings on a
Ranked | The Most Popular AI Tools in 2025 📊
Wednesday, March 26, 2025
ChatGPT remains the most widely used AI tool, with 4.7 billion monthly site visits—far surpassing all other platforms. View Online | Subscribe | Download Our App NEW REPORT: The Age of Data >>
Nobody Wants to Pay for Apps Anymore—Except When AI Is Involved
Wednesday, March 26, 2025
Top Tech Content sent at Noon! Get Inside AI: Code, Learn, and Get Paid! Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, March 26, 2025? The
Rsdoctor build analyzer; Corepack removed from Node.js; migrating to ESM; Intl.DurationFormat
Wednesday, March 26, 2025
We have 8 links for you - the latest on JavaScript and tools Rsdoctor 1.0: build analyzer for Rspack that's compatible with webpack rsdoctor.dev github.com/web-infra-dev “Rsdoctor is committed to
Software Testing Weekly - Issue 263
Wednesday, March 26, 2025
Is it common for devs to dislike QA? 🧐 View on the Web Archives ISSUE 263 March 26th 2025 COMMENT Welcome to the 263rd issue! This discussion blew up — Is it common for devs to dislike QA? While in
ChatGPT's shocking image upgrade
Wednesday, March 26, 2025
Linux kernel 6.14; Microsoft's new agents; Amazon Spring Sale -- Chabot loneliness ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
The Future of Cyber Warfare: Don’t Miss the SANS Security West 2025 Keynote
Wednesday, March 26, 2025
Understand the future role of cyber in war, critical for anyone involved in security and defense. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏