Data Science Weekly - Data Science Weekly - Issue 449

Curated news, articles and jobs related to Data Science.
Keep up with all the latest developments

Email not displaying correctly?
View it in your browser.

Issue #449

June 30 2022

Editor Picks

Pen and Paper Exercises in Machine Learning
This is a collection of (mostly) pen-and-paper exercises in machine learning. The exercises are on the following topics: linear algebra, optimisation, directed graphical models, undirected graphical models, expressive power of graphical models, factor graphs and message passing, inference for hidden Markov models, model-based learning (including ICA and unnormalised models), sampling and Monte-Carlo integration, and variational inference...

Seeing Like a Toolkit: How Toolkits Envision the Work of AI Ethics
Numerous toolkits have been developed to support ethical AI development. However, ethical AI toolkits, like all tools, encode assumptions in their design about what the work of “doing ethics” looks like—what work should be done, how, and by whom. We conduct a qualitative analysis of AI ethics toolkits to examine what their creators imagine to be the work of doing ethics, and the gaps that exist between the types of work that the toolkits imagine and support, and the way that the work of ethical AI actually occurs within technology companies and organizations...

The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
The grokking phenomenon as reported by Power et al., refers to a regime where a long period of overfitting is followed by a seemingly sudden transition to perfect generalization. In this paper, we attempt to reveal the underpinnings of Grokking via a series of empirical studies. Specifically, we uncover an optimization anomaly plaguing adaptive optimizers at extremely late stages of training, referred to as the Slingshot Mechanism...

A Message from this week's Sponsor:

Retool is the fast way to build an interface for any database

With Retool, you don't need to be a developer to quickly build an app or dashboard on top of any data set. Data teams at companies like NBC use Retool to build any interface on top of their data—whether it's a simple read-write visualization or a full-fledged ML workflow.

Drag and drop UI components—like tables and charts—to create apps. At every step, you can jump into the code to define the SQL queries and JavaScript that power how your app acts and connects to data. The result—less time on repetitive work and more time to discover insights.

Data Science Articles & Videos

What are the most common mistakes you see (junior) data scientists making? [Reddit Discussion]
E.g. mixing up correlation and causation, using accuracy to evaluate an ML model trained on imbalanced data, focussing on model performance and not on business impact etc.? Something else...

Interpretable Machine Learning in Natural and Social Sciences
This workshop will convened an interdisciplinary group of scholars to inspire clear foundational formulations of interpretability in a variety of domains where questions of interpretability arise in the application of machine learning, statistics, and data science more broadly...

Text Embeddings Visually Explained
We take a visual approach to gain an intuition behind text embeddings, what use cases they are good for, and how they can be customized using finetuning...

Ethical concerns with replacing human relations with humanoid robots
This paper considers ethical concerns with regard to replacing human relations with humanoid robots. Many have written about the impact that certain types of relations with robots may have on us, and why we should be concerned about robots replacing human relations...This paper first discusses what humanoid robots are, why and how humans tend to anthropomorphise them, and what the literature says about robots crowding out human relations...

Minerva: Solving Quantitative Reasoning Problems with Language Models
Language models have demonstrated remarkable performance on a variety of natural language tasks...Quantitative reasoning is one area in which language models still fall far short of human-level performance...In “Solving Quantitative Reasoning Problems With Language Models”, we present Minerva, a language model capable of solving mathematical and scientific questions using step-by-step reasoning...

DALL·E 2 Pre-Training Mitigations
In order to share the magic of DALL·E 2 with a broad audience, we needed to reduce the risks associated with powerful image generation models. To this end, we put various guardrails in place to prevent generated images from violating our content policy. This post focuses on pre-training mitigations, a subset of these guardrails which directly modify the data that DALL·E 2 learns from. In particular, DALL·E 2 is trained on hundreds of millions of captioned images from the internet, and we remove and reweight some of these images to change what the model learns...

Apple Privacy-Preserving Machine Learning Workshop 2022
Earlier this year, Apple hosted the Workshop on Privacy-Preserving Machine Learning (PPML). This virtual event brought Apple and members of the academic research communities together to discuss the state of the art in the field of privacy-preserving machine learning through a series of talks and discussions over two days...In this post we will introduce a new dataset for community benchmarking in PPML, and share highlights from workshop discussions and recordings of select workshop talks...

The Six Conundrums of Building and Deploying Language Technologies for Social Good
Many researchers, especially those working in core NLP/Speech domains, rely on a combination of individual expertise, experiences or ad hoc surveys for prioritizing between language technologies that provide social good to the end-users. This has been criticized by several scholars who argue that it is critical to include the target community during the LT’s design and development process. However, prioritization of communities, languages, technologies and design approaches presents a very large set of complex challenges to the technologists, for which there are no simple or off-the-shelf solutions. In this position paper, we distill our experiential insights into six fundamental conundrums that technologists face and must resolve while deciding which LT technology to build for which community, and by using what approach. ...

Neural-Implicit Representations for 3D Shapes and Scenes
Tracing the progress of deep learning-based solutions to computer graphics tasks...

Reducing gender-based harms in AI with Sunipa Dev
Grammar checkers use NLP to come up with grammar suggestions that help people write grammatically correct phrases. But it’s sometimes necessary to have human intervention to identify risks of unfair bias...Sunipa Dev is a research scientist at Google who focuses on Responsible AI. Some of her work focuses specifically on ways to evaluate unfair bias in NLP outcomes, reducing harms for people with queer and non-binary identities. ...

Artificial General Intelligence Is Not as Imminent as You Might Think
A close look reveals that the newest systems, including DeepMind’s much-hyped Gato, are still stymied by the same old problems...

Masked World Models for Visual Control
Masked autoencoders (MAE) has emerged as a scalable and effective self-supervised learning technique. Can MAE be also effective for visual model-based RL? Yes! with the recipe of convolutional feature masking and reward prediction to capture fine-grained and task-relevant information...

Course*

Business-Driven Data Analysis

Want to drive more value with your findings? Pragmatic Institute’s Business-Driven Data Analysis course empowers data practitioners to deliver timely analysis with actionable insights.

"This is an amazing course. Its live format provided an efficient environment with instant feedback from both sides. With the instructor's outstanding presenting skills and real-life insights, the course equipped us with a solid framework for tackling every stage of a data analysis project: Define, Prepare, Refine, Analyze, Present," said attendee Viorel Cazacu (Head of Controlling at Inditex).

The next 8-week, part-time session kicks off on July 18.

Register Now

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

Jobs

Senior Data Scientist, Startup Creation at Redesign Health - US

As our Senior Data Scientist for our Startup Creation team, you will set up and configure the data infrastructure for our startups, and work with the startup founding team to define data driven KPIs, and implement automated statistical analyses of customer behavior. Your goal is to make all of the companies that we launch data-driven from day one.

In this role, you will function as an in-house implementation team for the companies that Redesign Health launches (internally referred to as OpCos). We provide data strategy, data pipeline, data analytics and forecasting services to newly formed companies in a repeatable and scalable manner...

Want to post a job here? Email us for details --> team@datascienceweekly.org

Training & Resources

How to create a dashboard in Python with Jupyter Notebook
Would you like to build a data dashboard in 9 lines of Python code? I will show you how to create a dashboard in Python with Jupyter Notebook. The dashboard will present information about stock for selected ticker (data table and chart). The notebook will be published as a web application. I will use an open-source Mercury framework to convert Python notebook to interactive web application...

How to Read a Technical Paper
Multi-pass reading // Write as you read // When and where to read // Set aside time // Which parts to focus on // What to read...

3D Machine Learning 201 Guide: Point Cloud Semantic Segmentation
Complete python tutorial to create supervised learning AI systems for semantic segmentation of unstructured 3D LiDAR point cloud data...

What you’re up to – notes from DSW readers

Working on something cool? Let us know here :) ...

* To share your projects and updates, share the details here.

** Want to chat with one of the above people? Hit reply and let us know :)

Last Week's Newsletter's 3 Most Clicked Links

How do you guys ace your SQL skills? [Reddit Discussion]

The State of Data Engineering 2022

In your experience, what's the thing that can boost an ML model's performance the most? Is it the hyperparameter tuning, feature engineering or ensembling? Or is it something else? [Reddit Discussion]

* Based on unique clicks.

** Find last week's newsletter here.

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

Follow on Twitter

unsubscribe from this list update subscription preferences

Data Science Weekly - Data Science Weekly - Issue 449

Issue #449

June 30 2022

Editor Picks

A Message from this week's Sponsor:

Data Science Articles & Videos

Course*

Jobs

Training & Resources

What you’re up to – notes from DSW readers

Last Week's Newsletter's 3 Most Clicked Links

Older messages

Data Science Weekly - Issue 448

Data Science Weekly - Issue 447

Data Science Weekly - Issue 446

Data Science Weekly - Issue 445

Data Science Weekly - Issue 444

You Might Also Like

Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator

Defining Your Paranoia Level: Navigating Change Without the Overkill

5 ways AI can help with taxes 🪄

Recurring Automations + Secret Updates

The First Provable AI-Proof Game: Introducing Butterfly Wings 4

GCP Newsletter #437

Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰

The Great Social Media Diaspora & Tapestry is here

Daily Coding Problem: Problem #1689 [Medium]

📧 Stop Conflating CQRS and MediatR

Data Science Weekly - Data Science Weekly - Issue 449

Issue #449 June 30 2022

Editor Picks

A Message from this week's Sponsor:

Data Science Articles & Videos

Course*

Jobs

Training & Resources

What you’re up to – notes from DSW readers

Last Week's Newsletter's 3 Most Clicked Links

Older messages

You Might Also Like

Issue #449

June 30 2022