Data Science Weekly - Data Science Weekly - Issue 460

Curated news, articles and jobs related to Data Science.
Keep up with all the latest developments

Email not displaying correctly?
View it in your browser.

Issue #460

September 15 2022

Editor's Picks

The underestimated importance of soft skills in data science
Soft Skills for Data Scientists, and Why They Need Them...When it comes to data scientists...you will find the following mentioned as crucial attributes: excellent communication, critical thinking, storytelling, the ability to work in a team, adaptability, knowledge of your brand, and an enduring sense of curiosity...

How to get images that don't suck: a Beginner/Intermediate Guide to Getting Cool Images from Stable Diffusion [Reddit Discussion]
So you've taken the dive and installed Stable Diffusion. But this isn't quite like Dalle2. There's sliders everywhere, different diffusers, seeds... Enough to make anyone's head spin. But don't fret. These settings will give you a better experience once you get comfortable with them. In this guide, I'm going to talk about how to generate text2image artwork using Stable Diffusion. I'm going to go over basic prompting theory, what different settings do, and in what situations you might want to tweak the settings...

Data Science Twitch Streamers Round Up
Did you know? There’s an entire world of absolutely free live-streamed data science content available to you almost 24//7 at twitch.tv? Me neither...I’m not the only one streaming data science content on Twitch! Although Twitch has yet to create a programming or data science or machine learning category (I’m not bitter cough pools, hot tubs, and beaches cough), you can find most of us under the Science & Technology tag...I’ll keep updating this list, but as of right now, here are some data science streamers you should check out, follow, and engage with!...

A Message from this week's Sponsor:

AI, BI, and Data Leaders: Dive Deep Into the Semantic Layer in a One-Day Virtual Summit

Our Semantic Layer is what makes data discoverable and usable - if it’s designed correctly. Join Snowplow, Databricks, AtScale, and 30+ top industry technologists to learn best practices and discuss the latest developments in semantic layers for enterprise data.

Free registration closes soon. Save your spot at the Semantic Layer Summit 2022 (virtual)

Data Science Articles & Videos

Which fonts to use for your charts and tables and how to customize them
How should the text appear in your data visualizations? The possibilities are endless...This article explains many options — and shows how ignoring this advice can set your visualization apart from others....

AI Content Generation, Part 1: Machine Learning Basics
AI superpowers are already here for creators who are willing to invest a little time in understanding how these machine learning-based content tools work. In this new series of posts, I’ll give you an overview of the content generation space, covering everything from the ideas behind it to how to use specific tools...

The Hardest Things to Do in SQL
The 5 hardest things Josh Berry, a 15 year analytics professional, experienced while switching from Python to SQL. Offering examples, SQL code, and a resource to customize the SQL to your own project...

Thoughts on ML Engineering After a Year of my PhD
Automating the end-to-end machine learning (ML) lifecycle, even for a specific prediction task, is neither easy nor obvious. People keep talking about how ML engineering (MLE) is a subset of software engineering or should be treated as such. But over the last 15 months of graduate school, I’ve been thinking about MLE through the lens of data engineering...

The spelled-out intro to language modeling: building makemore. Part 2: MLP by Andrej Karpathy
We implement a multilayer perceptron (MLP) character-level language model. In this video we also introduce many basics of machine learning (e.g. model training, learning rate tuning, hyperparameters, evaluation, train/dev/test splits, under/overfitting, etc.)...

Tracking Any Pixel in a Video
We propose Persistent Independent Particles (PIPs), a new particle video method. Our method takes a video as input, along with the (x,y) coordinate of a target to track, and produces the target’s trajectory as output. The model can be queried for any number of particles, at any positions...

Some Math behind Neural Tangent Kernel
Neural tangent kernel (NTK) (Jacot et al. 2018) is a kernel to explain the evolution of neural networks during training via gradient descent. It leads to great insights into why neural networks with enough width can consistently converge to a global minimum when trained to minimize an empirical loss. In the post, we will do a deep dive into the motivation and definition of NTK, as well as the proof of a deterministic convergence at different initializations of neural networks with infinite width by characterizing NTK in such a setting...

Slack Recommend API
Slack, as a product, presents many opportunities for recommendation, where we can make suggestions to simplify the user experience and make it more delightful. Each one seems like a terrific use case for machine learning, but it isn’t realistic for us to create a bespoke solution for each...Instead, we developed a unified framework we call the Recommend API, which allows us to quickly bootstrap new recommendation use cases behind an API which is easily accessible to engineers at Slack. Behind the scenes, these recommenders reuse a common set of infrastructure for every part of the recommendation engine, such as data processing, model training, candidate generation, and monitoring...

Learning with Differentiable Algorithms
While classic computer science algorithms are suitable for precise execution of exactly defined tasks such as finding the shortest path in a large graph, neural networks allow learning from data to predict the most likely answer in more complex tasks such as image classification, which cannot be reduced to an exact algorithm. To get the best of both worlds, this thesis explores combining both concepts leading to more robust, better performing, more interpretable, more computationally efficient, and more data efficient architectures. The thesis formalizes the idea of algorithmic supervision, which allows a neural network to learn from or in conjunction with an algorithm...

DeepPicarMicro Crams NVIDIA's PilotNet Autonomous Vehicle Neural Network Into a Raspberry Pi Pico
A trio of scientists from the University of Kansas have published a paper on DeepPicarMicro, an autonomous vehicle testbed, which crams a fully-functional convolutional neural network (CNN) onto a Raspberry Pi Pico microcontroller board...

Clifford Neural Layers for PDE Modeling
This paper presents the first usage of such multivector representations together with Clifford convolutions and Clifford Fourier transforms in the context of deep learning. The resulting Clifford neural layers are universally applicable and will find direct use in the areas of fluid dynamics, weather forecasting, and the modeling of physical systems in general. We empirically evaluate the benefit of Clifford neural layers by replacing convolution and Fourier operations in common neural PDE surrogates by their Clifford counterparts on two-dimensional Navier-Stokes and weather modeling tasks, as well as three-dimensional Maxwell equations. Clifford neural layers consistently improve generalization capabilities of the tested neural PDE surrogates...

New Series: Creating Media with Machine Learning
Welcome to the first post in our multi-part series on how Netflix is developing and using machine learning (ML) to help creators make better media — from TV shows to trailers to movies to promotional art and so much more...This blog series will take you behind the scenes, showing you how we use the power of machine learning to create stunning media at a global scale...

Tool*

DataQA is a no-code tool for model error and quality analysis

Assessing the quality of a model is more than just looking at a few metrics; problems can often be hidden in biases or underperforming segments that are important to the business.

DataQA enables data science teams to accelerate their model QA with an intuitive no-code platform. With it, teams can quickly inspect model performance visually across different segments of the data. DataQA keeps non-technical domain experts involved in the process, replacing the need to send emails and spreadsheets.

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

Jobs

Data Scientist - Success Academy Charter Schools, Inc - NYC

This new Data Scientist role will be a key contributor to our mission of driving innovation across the organization. Reporting to the Leader of Enterprise Analytics, this role will be responsible for working with stakeholders in various functions to understand areas of opportunity, developing analytical solutions ranging from dashboards to sophisticated mathematical models, and helping functional teams adopt those solutions. This role will be part of a highly collaborative team of professionals with a wide range of skills including data science, data engineering, business analysis, and project management....

Want to post a job here? Email us for details --> team@datascienceweekly.org

Training & Resources

CSEP 590B Explainable AI
This course is about explainable artificial intelligence (XAI), a subfield of machine learning that provides transparency for complex models. Modern machine learning relies heavily on black-box models like tree ensembles and deep neural networks; these models provide state-of-the-art accuracy, but they make it difficult to understand the features, concepts, and data examples that drive their predictions. As a consequence, it's difficult for users, experts, and organizations to trust such models, and it's challenging to learn about the underlying processes we're modeling...

Python Numpy Tutorial (with Jupyter and Colab)
This section will serve as a quick crash course on both the Python programming language and its use for scientific computing. We’ll also introduce notebooks, which are a very convenient way of tinkering with Python code...

Continual Learning
In this video, we cover what it takes to build a continual learning system around a machine learning model...

What you’re up to – notes from DSW readers

Vicki is working on NormConf - the normcore data takes conference for everyone. Free and online December 15.
Register here for free-> https://normconf.com/...

Keming is working on https://github.com/mosecorg/mosec: This library provides a Python interface for the fast development of machine learning model services and Rust core for maximum serving efficiency. All the core features like dynamic batching, preprocess and post-process pipeline, spawning multiprocessing are already supported. Can run this easily in a local machine or a pod inside the Kubernetes cluster.......

* To share your projects and updates, share the details here.

** Want to chat with one of the above people? Hit reply and let us know :)

Last Week's Newsletter's 3 Most Clicked Links

Organizations need to deliberately create data

The spelled-out intro to language modeling: building makemore

Data Activation In The Modern Data Stack

* Based on unique clicks.

** Find last week's newsletter here.

Cutting Room Floor

Exploratory data analysis using personal data from Strava and Apple Watch

What is Data Engineering?

Git Re-Basin: Merging Models modulo Permutation Symmetries

Confidential Computing for Machine Learning

How to Make a Donut Chart in ggplot

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

Follow on Twitter

unsubscribe from this list update subscription preferences

Data Science Weekly - Data Science Weekly - Issue 460

Issue #460

September 15 2022

Editor's Picks

A Message from this week's Sponsor:

Data Science Articles & Videos

Tool*

Jobs

Training & Resources

What you’re up to – notes from DSW readers

Last Week's Newsletter's 3 Most Clicked Links

Cutting Room Floor

Older messages

Data Science Weekly - Issue 459

Data Science Weekly - Issue 458

Data Science Weekly - Issue 457

Data Science Weekly - Issue 456

Data Science Weekly - Issue 455

You Might Also Like

Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator

Defining Your Paranoia Level: Navigating Change Without the Overkill

5 ways AI can help with taxes 🪄

Recurring Automations + Secret Updates

The First Provable AI-Proof Game: Introducing Butterfly Wings 4

GCP Newsletter #437

Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰

The Great Social Media Diaspora & Tapestry is here

Daily Coding Problem: Problem #1689 [Medium]

📧 Stop Conflating CQRS and MediatR

Data Science Weekly - Data Science Weekly - Issue 460

Issue #460 September 15 2022

Editor's Picks

A Message from this week's Sponsor:

Data Science Articles & Videos

Tool*

Jobs

Training & Resources

What you’re up to – notes from DSW readers

Last Week's Newsletter's 3 Most Clicked Links

Cutting Room Floor

Older messages

You Might Also Like

Issue #460

September 15 2022