- Pen and Paper Exercises in Machine Learning
This is a collection of (mostly) pen-and-paper exercises in machine learning. The exercises are on the following topics: linear algebra, optimisation, directed graphical models, undirected graphical models, expressive power of graphical models, factor graphs and message passing, inference for hidden Markov models, model-based learning (including ICA and unnormalised models), sampling and Monte-Carlo integration, and variational inference...
- Seeing Like a Toolkit: How Toolkits Envision the Work of AI Ethics
Numerous toolkits have been developed to support ethical AI development. However, ethical AI toolkits, like all tools, encode assumptions in their design about what the work of “doing ethics” looks like—what work should be done, how, and by whom. We conduct a qualitative analysis of AI ethics toolkits to examine what their creators imagine to be the work of doing ethics, and the gaps that exist between the types of work that the toolkits imagine and support, and the way that the work of ethical AI actually occurs within technology companies and organizations...
- The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
The grokking phenomenon as reported by Power et al., refers to a regime where a long period of overfitting is followed by a seemingly sudden transition to perfect generalization. In this paper, we attempt to reveal the underpinnings of Grokking via a series of empirical studies. Specifically, we uncover an optimization anomaly plaguing adaptive optimizers at extremely late stages of training, referred to as the Slingshot Mechanism...
A Message from this week's Sponsor:
Retool is the fast way to build an interface for any database
With Retool, you don't need to be a developer to quickly build an app or dashboard on top of any data set. Data teams at companies like NBC use Retool to build any interface on top of their data—whether it's a simple read-write visualization or a full-fledged ML workflow.
Data Science Articles & Videos
- Interpretable Machine Learning in Natural and Social Sciences
This workshop will convened an interdisciplinary group of scholars to inspire clear foundational formulations of interpretability in a variety of domains where questions of interpretability arise in the application of machine learning, statistics, and data science more broadly...
- Text Embeddings Visually Explained
We take a visual approach to gain an intuition behind text embeddings, what use cases they are good for, and how they can be customized using finetuning...
- Ethical concerns with replacing human relations with humanoid robots
This paper considers ethical concerns with regard to replacing human relations with humanoid robots. Many have written about the impact that certain types of relations with robots may have on us, and why we should be concerned about robots replacing human relations...This paper first discusses what humanoid robots are, why and how humans tend to anthropomorphise them, and what the literature says about robots crowding out human relations...
- Minerva: Solving Quantitative Reasoning Problems with Language Models
Language models have demonstrated remarkable performance on a variety of natural language tasks...Quantitative reasoning is one area in which language models still fall far short of human-level performance...In “Solving Quantitative Reasoning Problems With Language Models”, we present Minerva, a language model capable of solving mathematical and scientific questions using step-by-step reasoning...
- DALL·E 2 Pre-Training Mitigations
In order to share the magic of DALL·E 2 with a broad audience, we needed to reduce the risks associated with powerful image generation models. To this end, we put various guardrails in place to prevent generated images from violating our content policy. This post focuses on pre-training mitigations, a subset of these guardrails which directly modify the data that DALL·E 2 learns from. In particular, DALL·E 2 is trained on hundreds of millions of captioned images from the internet, and we remove and reweight some of these images to change what the model learns...
- Apple Privacy-Preserving Machine Learning Workshop 2022
Earlier this year, Apple hosted the Workshop on Privacy-Preserving Machine Learning (PPML). This virtual event brought Apple and members of the academic research communities together to discuss the state of the art in the field of privacy-preserving machine learning through a series of talks and discussions over two days...In this post we will introduce a new dataset for community benchmarking in PPML, and share highlights from workshop discussions and recordings of select workshop talks...
- The Six Conundrums of Building and Deploying Language Technologies for Social Good
Many researchers, especially those working in core NLP/Speech domains, rely on a combination of individual expertise, experiences or ad hoc surveys for prioritizing between language technologies that provide social good to the end-users. This has been criticized by several scholars who argue that it is critical to include the target community during the LT’s design and development process. However, prioritization of communities, languages, technologies and design approaches presents a very large set of complex challenges to the technologists, for which there are no simple or off-the-shelf solutions. In this position paper, we distill our experiential insights into six fundamental conundrums that technologists face and must resolve while deciding which LT technology to build for which community, and by using what approach. ...
- Reducing gender-based harms in AI with Sunipa Dev
Grammar checkers use NLP to come up with grammar suggestions that help people write grammatically correct phrases. But it’s sometimes necessary to have human intervention to identify risks of unfair bias...Sunipa Dev is a research scientist at Google who focuses on Responsible AI. Some of her work focuses specifically on ways to evaluate unfair bias in NLP outcomes, reducing harms for people with queer and non-binary identities. ...
- Masked World Models for Visual Control
Masked autoencoders (MAE) has emerged as a scalable and effective self-supervised learning technique. Can MAE be also effective for visual model-based RL? Yes! with the recipe of convolutional feature masking and reward prediction to capture fine-grained and task-relevant information...
Business-Driven Data Analysis
Want to drive more value with your findings? Pragmatic Institute’s Business-Driven Data Analysis course empowers data practitioners to deliver timely analysis with actionable insights.
"This is an amazing course. Its live format provided an efficient environment with instant feedback from both sides. With the instructor's outstanding presenting skills and real-life insights, the course equipped us with a solid framework for tackling every stage of a data analysis project: Define, Prepare, Refine, Analyze, Present," said attendee Viorel Cazacu (Head of Controlling at Inditex).
The next 8-week, part-time session kicks off on July 18.
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!
- Senior Data Scientist, Startup Creation at Redesign Health - US
As our Senior Data Scientist for our Startup Creation team, you will set up and configure the data infrastructure for our startups, and work with the startup founding team to define data driven KPIs, and implement automated statistical analyses of customer behavior. Your goal is to make all of the companies that we launch data-driven from day one.
In this role, you will function as an in-house implementation team for the companies that Redesign Health launches (internally referred to as OpCos). We provide data strategy, data pipeline, data analytics and forecasting services to newly formed companies in a repeatable and scalable manner...
Want to post a job here? Email us for details --> email@example.com
Training & Resources
- How to create a dashboard in Python with Jupyter Notebook
Would you like to build a data dashboard in 9 lines of Python code? I will show you how to create a dashboard in Python with Jupyter Notebook. The dashboard will present information about stock for selected ticker (data table and chart). The notebook will be published as a web application. I will use an open-source Mercury framework to convert Python notebook to interactive web application...
- How to Read a Technical Paper
Multi-pass reading // Write as you read // When and where to read // Set aside time // Which parts to focus on // What to read...
What you’re up to – notes from DSW readers
- Working on something cool? Let us know here :) ...
* To share your projects and updates, share the details here.
** Want to chat with one of the above people? Hit reply and let us know :)
Last Week's Newsletter's 3 Most Clicked Links
* Based on unique clicks.
** Find last week's newsletter here.
P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian