Forwarded this email? Subscribe here for more

Data Science Weekly - Issue 519

Curated news, articles and jobs related to Data Science, AI, & Machine Learning

Data Science Weekly

Nov 3

READ IN APP

Issue #519
November 02 2023

Hello!

Once a week we write this email to share the links we thought were worth sharing in the Data Science, ML, AI, Data Visualization, and ML/Data Engineering worlds.

If this newsletter is helpful to your job please become a paid subscriber here: https://datascienceweekly.substack.com/subscribe :)

And now…let's dive into some interesting links from this week

Editor's Picks

Making the right characters for [insert anything]
We’re building a library of tools to help us bring any story to life, from text to art. As the art director, it’s my mission to make sure we strike an adequate balance between meeting players’ expectations and delivering surprises—while building on certain narrative gaming traditions to prepare our audience for a novel but familiar play experience…

I, A Data Scientist, Accidentally Saved Half A Million Dollars
I saved my company half a million dollars in about five minutes. This is more money than I've made for my employers over the course of my entire career because this industry is a sham. I clicked about five buttons. Let's talk about why it happened and why it's a disgrace that it was even possible…
Open LLM company playbook
Companies releasing the weights of cutting-edge language models in 2023 is one of the most popular things they can do, but there are few serious analyses into what this is doing for their long-term business prospects…This post is about building and how releasing LLM weights in the open will make it easier for a company to iterate, gain customers, and deliver better products. It's about how creating open LLMs facilitates a healthier, more collaborative, ML economy with broad stakeholders building a wonderful future. I outline 3 prerequisites, 3 actions, and 3 benefits for open LLM companies, but you’ll quickly see they all interweave throughout…

A Message from this week's Sponsor:

AI Forward 2023: LLMs in the Enterprise - From Theory to Practice

Learn how to overcome LLMOps challenges in pre- and post-production, build enterprise LLM infrastructure, and deliver measurable business value, including hands-on workshops and expert panel discussions with data science leaders. Register now for AI Forward 2023 — a FREE one-day virtual summit!

* Want to sponsor the newsletter? Email us for details --> team@datascienceweekly.org

Data Science Articles & Videos

ToxicChat: A Benchmark for Content Moderation in Real-world User-AI Interactions
In this blog post, we introduce ToxicChat, a benchmark consisting of 10K high-quality data for content moderation in real-world user-AI interactions. Evaluation results show that fine-tuning on this benchmark notably improves a baseline model’s ability to detect toxic queries in user-AI interactions…

The (Political) News is Too Negative
News coverage of American politics is not known for an abundance of positivity. Lots of people have noted that our political coverage tends to be overwhelmingly negative. And it’s getting worse. Dylan Mathews at Vox recently reported on a 2022 study that documented a growing number of news organizations featuring negative emotions, namely "anger, fear, disgust and sadness,” in their headlines. Mathews says journalists are a “morose bunch.” But is it all journalists? Or are political reporters uniquely prone to cynicism?…

Didn't realize how insane the market is [Reddit]
I work at FAANG as a DS manager. Opened up a Data Science position. Less than 24 hours later there were 1000+ applicants. I advertised the position on LinkedIn It's absolutely crazy. People have managed to get a hold of my personal and professional email addresses (I don't have these as public but they're a logical combination of first/last name). I hired in the past, and I have never seen anything like this…
ICLR 2024 Blog Post Track Submission Request
The deadline for submitting blog posts to the #ICLR2024 blog post track is in one month! Don't miss this opportunity to have your blog post peer reviewed and (optionally) presented at the conference!…
LeetCode for Data Engineers? [Reddit]
I've been thinking about it for quite a while now. What is the alternate for Data Engineers when it comes to upskilling and showcasing their skills. Like, Developers usually have coding questions like Leetcode, Codeforces etc. What do the DEs have to practice or work on? I've seen few companies ask LC questions as well in interviews for DE, Analyst etc and these companies are legit Fortune 500 ones…

Image Layer Animations with Clip-Path
Today, I’d like to introduce some straightforward page transitions that involve animating a clip-path when switching to new content. The possibilities here are quite diverse, depending on the type of animation feel we want to achieve, including how the content exits and enters. For creating the shapes, we can employ a tool like Clippy, allowing us to create distinct clip-paths for both the initial and final states…
Data Markets I: Introduction
Hello! I am going to write my master’s thesis this spring, on data markets. For that reason, I am starting to investigate the concept. This blog post is mostly for myself as a way to digest what I have learned but here it is for anyone interested in a semi-structured format…

AI safety regulation threatens our digital freedoms
What I want to focus on here is what it would mean to regulate AI development in the name of AI safety. In other words, what kind of regulations would be needed to mitigate existential or civilizational threats from AI, if such threats existed? And what effects would such regulations have on us and our society?..

When Gradient Descent Is a Kernel Method
In this post, we will focus exclusively on a toy problem. Our discussion (which admittedly takes a bit of a scenic route) is divided into three sections.
- First, we will discover how the covariance kernel of the process F is related to the dynamics of gradient descent. (This is a toy case from Jacot's paper.)
- Next, we recall the theory of reproducing kernel Hilbert spaces and show how the kernel-based behavior of gradient descent is related to regularization.
- Finally, we recall some special properties of Gaussian processes and explain why regularization is related to the Bayesian interpretation of our trained model. (For more details, see Chapter 6 of Gaussian Processes for Machine Learning.)…

What is the point of ML? [Reddit]
To what end are all these terms you guys use: models, LLM? What is the end game? The uses of ML are a black box to me. Yeah I can read it off Google but it's not clicking mostly because even Google does not really state where and how ML is used. There is this lady I follow on LinkedIn who is an ML engineer at a gaming company. How does ML even fold into gaming? Ok so with AI I guess the models are training the AI to eventually recognize some patterns and eventually analyze a situation by itself I guess. But I'm not sure…

Demystifying Advanced RAG Pipelines
Retrieval-Augmented Generation (RAG) pipelines powered by large language models (LLMs) are gaining popularity for building end-to-end question-answering systems. Frameworks such as LlamaIndex and Haystack have made significant progress in making RAG pipelines easy to use. While these frameworks provide excellent abstractions for building advanced RAG pipelines, they do so at the cost of transparency. From a user perspective, it's not readily apparent what's going on under the hood, particularly when errors or inconsistencies arise. In this EvaDB application, we'll shed light on the inner workings of advanced RAG pipelines by examining the mechanics, limitations, and costs that often remain opaque…

clip-image-search: Fine-tuning OpenAI CLIP Model for Image Search on medical images
Very useful repo with detailed instructions on how someone fine-tuned CLIP on medical imagery…

Jobs

Data Engineer II

At Chewy, our mission is to be the most trusted and convenient destination for pet parents and partners, everywhere. Behind the scenes, our talented teams are made up of innovators, delighters, big-thinkers and, of course, passionate pet people—creating a place where you'll be empowered to build, grow and unleash your fullest potential.

We are looking for a Data Engineer II at our facility in Plantation, Florida, to collaborate with teams across Chewy to drive innovative solutions for data usage.

Location is Plantation, Florida. Apply here

Want to post a job here? Email us for details --> team@datascienceweekly.org

Training & Resources

Machine Learning and Dynamical Systems Seminar
The Machine Learning and Dynamical Systems Seminar is an online platform for online research seminars, symposia, and reading groups on the interface of Machine Learning and Dynamical Systems. The series started as an activity of the Special Interest Group (SIG) on "Machine Learning and Dynamical Systems" (MLDSIG) hosted by the Alan Turing Institute and I am co-leading, with Prof. Robert Mackay, click here for more details and here for a short video. Click here for the YouTube channel, which contains the previous seminars, and here to get updates about the Research Interest Group (RIG). The schedule of the seminars is below…
Duke University’s Python and Pandas for Data Engineering
In this first course of the Python, Bash and SQL Essentials for Data Engineering Specialization, you will learn how to set up a version-controlled Python working environment that can utilize third-party libraries. You will learn to use Python and the powerful Pandas library for data analysis and manipulation. Additionally, you will also be introduced to Vim and Visual Studio Code, two popular tools for writing software. This course is valuable for beginning and intermediate students in order to begin transforming and manipulating data as a data engineer…
DataFrame Operations Using pandas in Python (5 Examples)
In this post, you’ll learn how to change Pandas DataFrames in the Python programming language. The post will consist of five examples for the adjustment of a pandas DataFrame. To be more precise, the article will consist of the following topics: 1) Exemplifying Data & Add-On Libraries 2) Example 1: Replace Values in Pandas DataFrame 3) Example 2: Append Row to pandas DataFrame 4) Example 3: Drop Rows from pandas DataFrame 5) Example 4: Add Column to pandas DataFrame 6) Example 5: Delete Column from pandas DataFrame 7) Video & Further Resources…

Last Week's Newsletter's 3 Most Clicked Links

* Based on unique clicks.
** Find last week's issue #518 here.

Cutting Room Floor

Whenever you're ready, 3 ways we can help you:

Looking to hire? Hit reply to this email and let us know.
Looking to get a job? Check out our “Get A Data Science Job” Course
A comprehensive course that teaches you everything related to getting a data science job based on answers to thousands of emails from readers like you. The course has 3 sections: Section 1 covers how to get started, Section 2 covers how to put together a portfolio to showcase your experience (even if you don’t have any), and Section 3 covers how to write your resume.
Promote yourself to ~60,000 subscribers by sponsoring this newsletter.

Thank you for joining us this week :)

All our best,
Hannah & Sebastian

P.S. Was today’s newsletter helpful to your job?

Consider becoming a paid subscriber here: https://datascienceweekly.substack.com/subscribe :)

You're currently a free subscriber to Data Science Weekly Newsletter. For the full experience, upgrade your subscription.

Data Science Weekly - Data Science Weekly - Issue 519

Data Science Weekly - Issue 519

Curated news, articles and jobs related to Data Science, AI, & Machine Learning

Issue #519
November 02 2023

Editor's Picks

A Message from this week's Sponsor:

AI Forward 2023: LLMs in the Enterprise - From Theory to Practice

Data Science Articles & Videos

Jobs

Data Engineer II

Training & Resources

Last Week's Newsletter's 3 Most Clicked Links

Cutting Room Floor

Whenever you're ready, 3 ways we can help you:

Older messages

Data Science Weekly - Issue 518

Data Science Weekly - Issue 517

Data Science Weekly - Issue 515

Data Science Weekly - Issue 514

Data Science Weekly - Issue 513

You Might Also Like

Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator

Defining Your Paranoia Level: Navigating Change Without the Overkill

5 ways AI can help with taxes 🪄

Recurring Automations + Secret Updates

The First Provable AI-Proof Game: Introducing Butterfly Wings 4

GCP Newsletter #437

Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰

The Great Social Media Diaspora & Tapestry is here

Daily Coding Problem: Problem #1689 [Medium]

📧 Stop Conflating CQRS and MediatR

Data Science Weekly - Data Science Weekly - Issue 519

Curated news, articles and jobs related to Data Science, AI, & Machine Learning

Issue #519November 02 2023

Editor's Picks

A Message from this week's Sponsor:

Data Science Articles & Videos

Jobs

Training & Resources

Last Week's Newsletter's 3 Most Clicked Links

Cutting Room Floor

Whenever you're ready, 3 ways we can help you:

Older messages

You Might Also Like

Issue #519
November 02 2023