Data Science Weekly - Data Science Weekly - Issue 587
Data Science Weekly - Issue 587Curated news, articles and jobs related to Data Science, AI, & Machine LearningIssue #587 |
|
Last Week’s Poll:
Data Science Articles & Videos
Towards composable data platforms
I’ve written extensively about the open table formats (OTFs - Apache Iceberg, Delta Lake, and Apache Hudi). In my original Tableflow post, I wrote that shared tables were one of the major trends, enabled by the OTFs…But we can also view OTFs as enabling a kind of virtualization. In this post, I will start by explaining my take on OTFs and virtualization…DuckDB Newsletter: Learn DuckDB by example
Get your dose of DuckDB tips, SQL tricks, and links to learn more about the swiss knife of analytical databases…What is your daily/weekly routine if you have a WFH position? [Reddit]
I'm asking this here since data science/analytics is a very remote industry. I'm honestly trying to figure out a good cadence of when to make breakfast and get coffee, when to meal prep, when to get a 15 minute walk in, when to work out, do my hobbies etc., without driving myself insane…What is your routine for keeping up with life while you're working remotely?…Incremental Archival from Postgres to Parquet for Analytics
Ideally, data would get automatically archived in cheap storage, in a format optimized for large analytical queries.We developed two open source Postgres extensions that help you do that:
pg_parquet can export (and import) query results to the Parquet file format in object storage using regular COPY commands
pg_incremental can run a command for a never-ending series of time intervals or files, built on top of pg_cron
With some simple commands, you can set up a reliable, fully automated pipeline to export time ranges to the columnar Parquet format in S3…
Step-by-Step Diffusion: An Elementary Tutorial
We present an accessible first course on diffusion models and flow matching for machine learning, aimed at a technical audience with no diffusion experience. We try to simplify the mathematical details as much as possible (sometimes heuristically), while retaining enough precision to derive correct algorithms…Semantic Layers: The Missing Link Between AI and Data with David Jayatillake from Cube
In this episode, we chat with David Jayatillake, VP of AI at Cube, about semantic layers and their crucial role in making AI work reliably with data. We explore how semantic layers act as a bridge between raw data and business meaning, and why they're more practical than pure knowledge graphs. David shares insights from his experience at Delphi Labs, where they achieved 100% accuracy in natural language data queries by combining semantic layers with AI, compared to just 16% accuracy with direct text-to-SQL approaches…Bottom up Architecture
Although architects get a bad rap, I do think architecture is an important aspect of software engineering and the better question is to figure out how to get to an architecture that enables different teams to operate more independently…In this post, I’m discussing what software architecture even is and offer some thoughts on a bottom up approach…Data Science is losing its soul [Reddit]
DS teams are starting to lose the essence that made them truly groundbreaking. their mixed scientific and business core. What we’re seeing now is a shift from deep statistical analysis and business oriented modeling to quick and dirty engineering solutions. Sure, this approach might give us a few immediate wins but it leads to low ROI projects and pulls the field further away from its true potential. One size-fits-all programming just doesn’t work. it’s not the whole game…We often start coding to avoid "by hand" work. Here's why labeling your own data is worth the time…
How to use a histogram as a legend in {ggplot2}
Land isn’t unemployed—people are. Here’s how to use R, {ggplot2}, {sf}, and {patchwork} to create a histogram legend in a choropleth map to better see the distribution of values…When calibration beats metrics
Having a classifier with great metrics is good, but it is not enough for it to be useful in production. One reason why it might still fail is because it could be that you are dealing with a badly calibrated model. The predictions might be fine, but the probability estimates can be way off…In this video we talk about how to think about calibration and what it means…If spreadsheets are eternal, are BI tools transitory?
Consider the following scenario in the Microsoft stack: ingest data with ADF, transform with Fabric (& maybe dbt?), build a semantic model in Power BI, and delicately craft an artisanal dashboard (with your mouse). Then your stakeholder takes a look at your dashboard, navigates to the top left corner and clicks “Analyze In Excel”. How did we get here?…Building a SNAP LLM eval: part 1
This is the first write-up in a series about our process of building an "eval" — evaluation — to assess how well AI models perform on prompts related to the USA’s SNAP (food stamp) program…By sharing how we are approaching this for SNAP in some detail — including publishing a SNAP eval — we hope it will make it easier for others to do the same in similar problem spaces that matter for lower income Americans: healthcare (e.g. Medicaid), disability benefits, housing, legal help, etc…While evaluation can be a fairly technical topic, we hope these posts reduce barriers to more domain-specific evaluations being created by experts in these high-impact — but complex — areas…
.
Last Week's Newsletter's 3 Most Clicked Links
.
* Based on unique clicks.
** Find last week's issue #586 here.
Cutting Room Floor
.
Whenever you're ready, 2 ways we can help:
Looking to get a job? Check out our “Get A Data Science Job” Course
It is a comprehensive course that teaches you everything related to getting a data science job based on answers to thousands of emails from readers like you. The course has 3 sections: Section 1 covers how to get started, Section 2 covers how to assemble a portfolio to showcase your experience (even if you don’t have any), and Section 3 covers how to write your resume.Promote yourself/organization to ~66,600 subscribers by sponsoring this newsletter. 35-45% weekly open rate.
Thank you for joining us this week! :)
Stay Data Science-y!
All our best,
Hannah & Sebastian
Invite your friends and earn rewards
Older messages
Data Science Weekly - Issue 586
Friday, February 14, 2025
Curated news, articles and jobs related to Data Science, AI, & Machine Learning ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Data Science Weekly - Issue 581
Thursday, January 9, 2025
Curated news, articles and jobs related to Data Science, AI, & Machine Learning ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Data Science Weekly - Issue 580
Friday, January 3, 2025
Curated news, articles and jobs related to Data Science, AI, & Machine Learning ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Data Science Weekly - Issue 579
Thursday, December 26, 2024
Curated news, articles and jobs related to Data Science, AI, & Machine Learning ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Data Science Weekly - Issue 578
Thursday, December 19, 2024
Curated news, articles and jobs related to Data Science, AI, & Machine Learning ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
You Might Also Like
BetterDev #277 - When You Deleted /lib on Linux While Still Connected via SSH
Tuesday, March 25, 2025
Better Dev #277 Mar 25, 2025 Hi all, Last week, NextJS has a new security vulnerability, CVE-2025-29927 that allow by pass middleware auth checking by setting a header to trick it into thinking this is
JSK Daily for Mar 25, 2025
Tuesday, March 25, 2025
JSK Daily for Mar 25, 2025 View this email in your browser A community curated daily e-mail of JavaScript news Easily Render Flat JSON Data in JavaScript File Manager The Syncfusion JavaScript File
Want to create an AI Agent?
Tuesday, March 25, 2025
Tell me what to build next ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
LangGraph, Marimo, Django Template Components, and More
Tuesday, March 25, 2025
LangGraph: Build Stateful AI Agents in Python #674 – MARCH 25, 2025 VIEW IN BROWSER The PyCoder's Weekly Logo LangGraph: Build Stateful AI Agents in Python LangGraph is a versatile Python library
Charted | Where People Trust the Media (and Where They Don't) 🧠
Tuesday, March 25, 2025
Examine the global landscape of public trust in media institutions. Confidence remains low in all but a few key countries. View Online | Subscribe | Download Our App Presented by: BHP >> Read
Daily Coding Problem: Problem #1728 [Medium]
Tuesday, March 25, 2025
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Square. Assume you have access to a function toss_biased() which returns 0 or 1 with a
LW 175 - Shopify uses AI to Prepare Stores for Script Editor Deprecation
Tuesday, March 25, 2025
Shopify uses AI to Prepare Stores for Script Editor Deprecation Shopify Development news and
Reminder: Microservices rules #7: Design loosely design-time coupled services - part 1
Tuesday, March 25, 2025
You are receiving this email because you subscribed to microservices.io. Considering migrating a monolith to microservices? Struggling with the microservice architecture? I can help: architecture
Delete your 23andMe data ASAP 🧬
Tuesday, March 25, 2025
95+ Amazon tech deals; 10 devs on vibe coding pros and cons -- ZDNET ZDNET Tech Today - US March 25, 2025 dnacodegettyimages-155360625 How to delete your 23andMe data and why you should do it now With
Post from Syncfusion Blogs on 03/25/2025
Tuesday, March 25, 2025
New blogs from Syncfusion ® Create AI-Powered Smart .NET MAUI Data Forms for Effortless Data Collection By Jeyasri Murugan This blog explains how to create an AI-powered smart data form using our .NET