Data Science Weekly - Data Science Weekly - Issue 586
Data Science Weekly - Issue 586Curated news, articles and jobs related to Data Science, AI, & Machine LearningIssue #586 |
|
Last Week’s Poll:
Data Science Articles & Videos
Comparing Apache, CNCF, and Commonhaus
I've used open source projects for over 30 years and contributed for about 20 of those. My first interaction with an open source foundation was with Apache when I began working with Apache Hadoop in 2008. Since then, I have contributed to many Apache projects, created Apache Samza, and was mentor and project champion for Apache Airflow…I'm frequently ask for guidance on which open source foundation to donate a project to. I've decided to share my thoughts in this post…BAML is like building blocks for AI engineers
In this post, I’ll explain more about how BAML, a domain-specific language for helping LLMs generate better structured outputs, provides AI engineers the necessary building blocks to create more composable, testable and robust LLM and agentic workflows. If you’ve never heard of BAML, check out my previous post that introduces its fundamentals…How do companies with hundreds of databases document them effectively? [Reddit Discussion]
For those who’ve worked in companies with tens or hundreds of databases, what documentation methods have you seen that actually work and provide value to engineers, developers, admins, and other stakeholders?..I’m curious about approaches that go beyond just listing databases, rather something that helps with understanding schemas, ownership, usage, and dependencies…Have you seen tools, templates, or processes that actually work? I’m currently working on a template containing relevant details about the database that would be attached to the documentation of the parent application/project, but my feeling is that without proper maintenance it could become outdated real fast. What’s your experience on this matter?…Open sourcing kubenetmon: how we monitor data transfer in ClickHouse Cloud
When it comes to data transfer, cloud providers typically charge you for:NAT Gateways;
Load Balancers;
Cross-Availability Zone traffic;
Egress, where the cost basis depends on which region you egress from and where you egress into;
Ingress, where the cost basis also depends on which region you ingress into and where the remote is.
We set out to untangle this complexity, and this blog post is going to tell you how…
Defining Unique Identifiers
Generally speaking, a Unique Identifier (UID) is an inscription that represents (no more than) one entity within a given system. UIDs are essential to the functioning of modern information systems, so it is important to understand and define what a UID is and how it should be used…In this post, I will define unique identifiers by deducing their essential properties…Polars Cloud: the distributed Cloud Architecture to run Polars anywhere
Our goal is to enable Scalable data processing with all the flexibility and expressiveness of Polars’ API. We are working on two things; Polars Cloud and a completely novel Streaming Engine design. We will explain more about the streaming engine in later posts; Today we want to share what are building with Polars Cloud…It will be very seamless to spin up hardware and run Polars queries remotely, either in batch mode for production ETL jobs, or interactively doing data exploration. The rest of the post, we want to explore this through a few code examples…What companies/industries are “slow-paced”/low stress? [Reddit Discussion]
I’ve only ever worked in data science for consulting companies, which are inherently fast-paced and quite stressful. The money is good but I don’t see myself in this field forever. “Fast-pace” in my experience can be a code word for “burn you out”…Out of curiosity, do any of you have lower stress jobs in data science? My guess would be large retailers/corporations that are no longer in growth stage and just want to fine tune/maintain their production models, while also dedicating some money to R&D with more reasonable timelines…An Unexpected Reinforcement Learning Renaissance
The era we are living through in language modeling research is one pervasive with complete faith that reasoning and new reinforcement learning (RL) training methods will work. This is well founded. A day cannot go by without | a new reasoning model, RL training result, or dataset distilled from DeepSeek R1…The goal of this talk is to try and make sense of the story that is unfolding today…How to disaggregate a log replication protocol
This post continues my series looking at log replication protocols, within the context of state-machine replication (SMR) or just when the log itself is the product (such as Kafka). So far I’ve been looking at Virtual Consensus, but now I’m going to widen the view to look at how log replication protocols can be disaggregated in general (there are many ways)…
Cutting through Complexity: How Data Science Can Help Policymakers Understand the World
This chapter looks at examples of where innovations from data science are cutting through the complexities faced by policymakers in measurement, allocating resources, monitoring the natural world, making predictions, and more. These examples show the promise and potential of data science to aid policymakers, and point to where actions may be taken that would support further progress in this space…Exploring the bioRxiv API with R, httr2, rvest, tidytext, and Datawrapper
Collect metadata and publication details for >200k preprints over a 10 year period, investigate trends, and scrape full text for sentiment analysis…Understanding Model Calibration: A Gentle Introduction & Visual Exploration
In this blog post we’ll take a look at the most commonly used definition for calibration and then dive into a frequently used evaluation measure for Model Calibration. We’ll then cover some of the drawbacks of this measure and how these surfaced the need for additional notions of calibration, which require their own new evaluation measures…Learn how to make QGIS Plugins with AI coding tools (video)
I recently published a post on my experience using Cursor to create a new QGIS plugin. It seems to have inspired a few people, and so I decided to record a couple videos to try to show everyone exactly the process to do it. I’ve felt that being able to build things like QGIS Plugins has been life-changing, and so I just wanted to help demystify the process…
.
Last Week's Newsletter's 3 Most Clicked Links
.
* Based on unique clicks.
** Find last week's issue #585 here.
Cutting Room Floor
Round and Round We Go! What makes Rotary Positional Encodings useful?
Vector Fields - An Introduction to Vector Fields and Velocity Fields
.
Whenever you're ready, 2 ways we can help:
Looking to get a job? Check out our “Get A Data Science Job” Course
It is a comprehensive course that teaches you everything related to getting a data science job based on answers to thousands of emails from readers like you. The course has 3 sections: Section 1 covers how to get started, Section 2 covers how to assemble a portfolio to showcase your experience (even if you don’t have any), and Section 3 covers how to write your resume.Promote yourself/organization to ~66,500 subscribers by sponsoring this newsletter. 35-45% weekly open rate.
Thank you for joining us this week! :)
Stay Data Science-y!
All our best,
Hannah & Sebastian
You're currently a free subscriber to Data Science Weekly Newsletter. For the full experience, upgrade your subscription.
Older messages
Data Science Weekly - Issue 581
Thursday, January 9, 2025
Curated news, articles and jobs related to Data Science, AI, & Machine Learning ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Data Science Weekly - Issue 580
Friday, January 3, 2025
Curated news, articles and jobs related to Data Science, AI, & Machine Learning ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Data Science Weekly - Issue 579
Thursday, December 26, 2024
Curated news, articles and jobs related to Data Science, AI, & Machine Learning ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Data Science Weekly - Issue 578
Thursday, December 19, 2024
Curated news, articles and jobs related to Data Science, AI, & Machine Learning ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Data Science Weekly - Issue 577
Thursday, December 19, 2024
Curated news, articles and jobs related to Data Science, AI, & Machine Learning ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
You Might Also Like
Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator
Friday, February 14, 2025
What came before the golem? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Defining Your Paranoia Level: Navigating Change Without the Overkill
Friday, February 14, 2025
We've all been there: trying to learn something new, only to find our old habits holding us back. We discussed today how our gut feelings about solving problems can sometimes be our own worst enemy
5 ways AI can help with taxes 🪄
Friday, February 14, 2025
Remotely control an iPhone; 💸 50+ early Presidents' Day deals -- ZDNET ZDNET Tech Today - US February 10, 2025 5 ways AI can help you with your taxes (and what not to use it for) 5 ways AI can help
Recurring Automations + Secret Updates
Friday, February 14, 2025
Smarter automations, better templates, and hidden updates to explore 👀 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
The First Provable AI-Proof Game: Introducing Butterfly Wings 4
Friday, February 14, 2025
Top Tech Content sent at Noon! Boost Your Article on HackerNoon for $159.99! Read this email in your browser How are you, @newsletterest1? undefined The Market Today #01 Instagram (Meta) 714.52 -0.32%
GCP Newsletter #437
Friday, February 14, 2025
Welcome to issue #437 February 10th, 2025 News BigQuery Cloud Marketplace Official Blog Partners BigQuery datasets now available on Google Cloud Marketplace - Google Cloud Marketplace now offers
Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰
Friday, February 14, 2025
Discover how the share of US wealth held by the top 1% has evolved from 1989 to 2024 in this infographic. View Online | Subscribe | Download Our App Download our app to see thousands of new charts from
The Great Social Media Diaspora & Tapestry is here
Friday, February 14, 2025
Apple introduces new app called 'Apple Invites', The Iconfactory launches Tapestry, beyond the traditional portfolio, and more in this week's issue of Creativerly. Creativerly The Great
Daily Coding Problem: Problem #1689 [Medium]
Friday, February 14, 2025
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. Given a linked list, sort it in O(n log n) time and constant space. For example,
📧 Stop Conflating CQRS and MediatR
Friday, February 14, 2025
Stop Conflating CQRS and MediatR Read on: my website / Read time: 4 minutes The .NET Weekly is brought to you by: Step right up to the Generative AI Use Cases Repository! See how MongoDB powers your