Editor's Picks
- The underestimated importance of soft skills in data science
Soft Skills for Data Scientists, and Why They Need Them...When it comes to data scientists...you will find the following mentioned as crucial attributes: excellent communication, critical thinking, storytelling, the ability to work in a team, adaptability, knowledge of your brand, and an enduring sense of curiosity...
- How to get images that don't suck: a Beginner/Intermediate Guide to Getting Cool Images from Stable Diffusion [Reddit Discussion]
So you've taken the dive and installed Stable Diffusion. But this isn't quite like Dalle2. There's sliders everywhere, different diffusers, seeds... Enough to make anyone's head spin. But don't fret. These settings will give you a better experience once you get comfortable with them. In this guide, I'm going to talk about how to generate text2image artwork using Stable Diffusion. I'm going to go over basic prompting theory, what different settings do, and in what situations you might want to tweak the settings...
- Data Science Twitch Streamers Round Up
Did you know? There’s an entire world of absolutely free live-streamed data science content available to you almost 24//7 at twitch.tv? Me neither...I’m not the only one streaming data science content on Twitch! Although Twitch has yet to create a programming or data science or machine learning category (I’m not bitter cough pools, hot tubs, and beaches cough), you can find most of us under the Science & Technology tag...I’ll keep updating this list, but as of right now, here are some data science streamers you should check out, follow, and engage with!...
A Message from this week's Sponsor:
AI, BI, and Data Leaders: Dive Deep Into the Semantic Layer in a One-Day Virtual Summit
Our Semantic Layer is what makes data discoverable and usable - if it’s designed correctly. Join Snowplow, Databricks, AtScale, and 30+ top industry technologists to learn best practices and discuss the latest developments in semantic layers for enterprise data.
Free registration closes soon. Save your spot at the Semantic Layer Summit 2022 (virtual)
Data Science Articles & Videos
- AI Content Generation, Part 1: Machine Learning Basics
AI superpowers are already here for creators who are willing to invest a little time in understanding how these machine learning-based content tools work. In this new series of posts, I’ll give you an overview of the content generation space, covering everything from the ideas behind it to how to use specific tools...
- The Hardest Things to Do in SQL
The 5 hardest things Josh Berry, a 15 year analytics professional, experienced while switching from Python to SQL. Offering examples, SQL code, and a resource to customize the SQL to your own project...
- Thoughts on ML Engineering After a Year of my PhD
Automating the end-to-end machine learning (ML) lifecycle, even for a specific prediction task, is neither easy nor obvious. People keep talking about how ML engineering (MLE) is a subset of software engineering or should be treated as such. But over the last 15 months of graduate school, I’ve been thinking about MLE through the lens of data engineering...
- Tracking Any Pixel in a Video
We propose Persistent Independent Particles (PIPs), a new particle video method. Our method takes a video as input, along with the (x,y) coordinate of a target to track, and produces the target’s trajectory as output. The model can be queried for any number of particles, at any positions...
- Some Math behind Neural Tangent Kernel
Neural tangent kernel (NTK) (Jacot et al. 2018) is a kernel to explain the evolution of neural networks during training via gradient descent. It leads to great insights into why neural networks with enough width can consistently converge to a global minimum when trained to minimize an empirical loss. In the post, we will do a deep dive into the motivation and definition of NTK, as well as the proof of a deterministic convergence at different initializations of neural networks with infinite width by characterizing NTK in such a setting...
- Slack Recommend API
Slack, as a product, presents many opportunities for recommendation, where we can make suggestions to simplify the user experience and make it more delightful. Each one seems like a terrific use case for machine learning, but it isn’t realistic for us to create a bespoke solution for each...Instead, we developed a unified framework we call the Recommend API, which allows us to quickly bootstrap new recommendation use cases behind an API which is easily accessible to engineers at Slack. Behind the scenes, these recommenders reuse a common set of infrastructure for every part of the recommendation engine, such as data processing, model training, candidate generation, and monitoring...
- Learning with Differentiable Algorithms
While classic computer science algorithms are suitable for precise execution of exactly defined tasks such as finding the shortest path in a large graph, neural networks allow learning from data to predict the most likely answer in more complex tasks such as image classification, which cannot be reduced to an exact algorithm. To get the best of both worlds, this thesis explores combining both concepts leading to more robust, better performing, more interpretable, more computationally efficient, and more data efficient architectures. The thesis formalizes the idea of algorithmic supervision, which allows a neural network to learn from or in conjunction with an algorithm...
- Clifford Neural Layers for PDE Modeling
This paper presents the first usage of such multivector representations together with Clifford convolutions and Clifford Fourier transforms in the context of deep learning. The resulting Clifford neural layers are universally applicable and will find direct use in the areas of fluid dynamics, weather forecasting, and the modeling of physical systems in general. We empirically evaluate the benefit of Clifford neural layers by replacing convolution and Fourier operations in common neural PDE surrogates by their Clifford counterparts on two-dimensional Navier-Stokes and weather modeling tasks, as well as three-dimensional Maxwell equations. Clifford neural layers consistently improve generalization capabilities of the tested neural PDE surrogates...
- New Series: Creating Media with Machine Learning
Welcome to the first post in our multi-part series on how Netflix is developing and using machine learning (ML) to help creators make better media — from TV shows to trailers to movies to promotional art and so much more...This blog series will take you behind the scenes, showing you how we use the power of machine learning to create stunning media at a global scale...
Tool*
DataQA is a no-code tool for model error and quality analysis
Assessing the quality of a model is more than just looking at a few metrics; problems can often be hidden in biases or underperforming segments that are important to the business.
DataQA enables data science teams to accelerate their model QA with an intuitive no-code platform. With it, teams can quickly inspect model performance visually across different segments of the data. DataQA keeps non-technical domain experts involved in the process, replacing the need to send emails and spreadsheets.
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!
Jobs
- Data Scientist - Success Academy Charter Schools, Inc - NYC
This new Data Scientist role will be a key contributor to our mission of driving innovation across the organization. Reporting to the Leader of Enterprise Analytics, this role will be responsible for working with stakeholders in various functions to understand areas of opportunity, developing analytical solutions ranging from dashboards to sophisticated mathematical models, and helping functional teams adopt those solutions. This role will be part of a highly collaborative team of professionals with a wide range of skills including data science, data engineering, business analysis, and project management....
Want to post a job here? Email us for details --> team@datascienceweekly.org
Training & Resources
- CSEP 590B Explainable AI
This course is about explainable artificial intelligence (XAI), a subfield of machine learning that provides transparency for complex models. Modern machine learning relies heavily on black-box models like tree ensembles and deep neural networks; these models provide state-of-the-art accuracy, but they make it difficult to understand the features, concepts, and data examples that drive their predictions. As a consequence, it's difficult for users, experts, and organizations to trust such models, and it's challenging to learn about the underlying processes we're modeling...
- Python Numpy Tutorial (with Jupyter and Colab)
This section will serve as a quick crash course on both the Python programming language and its use for scientific computing. We’ll also introduce notebooks, which are a very convenient way of tinkering with Python code...
- Continual Learning
In this video, we cover what it takes to build a continual learning system around a machine learning model...
What you’re up to – notes from DSW readers
- Vicki is working on NormConf - the normcore data takes conference for everyone. Free and online December 15.
Register here for free-> https://normconf.com/...
- Keming is working on https://github.com/mosecorg/mosec: This library provides a Python interface for the fast development of machine learning model services and Rust core for maximum serving efficiency. All the core features like dynamic batching, preprocess and post-process pipeline, spawning multiprocessing are already supported. Can run this easily in a local machine or a pod inside the Kubernetes cluster.......
* To share your projects and updates, share the details here.
** Want to chat with one of the above people? Hit reply and let us know :)
Last Week's Newsletter's 3 Most Clicked Links
* Based on unique clicks.
** Find last week's newsletter here.
Cutting Room Floor
P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian |