Data Science Weekly - Data Science Weekly - Issue 412

Curated news, articles and jobs related to Data Science.
Keep up with all the latest developments

Email not displaying correctly?
View it in your browser.

Issue #412

October 14 2021

Editor Picks

Machine learning is not nonparametric statistics
Many times in my career, I’ve been told by respected statisticians that machine learning is nothing more than nonparametric statistics. The longer I work in this field, the more I think this view is both misleading and unhelpful. Not only can I never get a consistent definition of what “nonparametric” means, but the jump from statistics to machine learning is considerably larger than most expect. Statistics is an important tool for understanding machine learning and randomness is valuable for machine learning algorithm design, but there is considerably more to machine learning than what we learn in elementary statistics...

State of AI Report 2021
Now in its fourth year, the State of AI Report 2021 is reviewed by AI practioners in industry and research, and features invited contributions from a range of well-known and up-and-coming companies and research groups. The Report considers the following key dimensions: a) Research: Technology breakthroughs and capabilities, b) Talent: Supply, demand and concentration of AI talent, c) Industry: Areas of commercial application for AI and its business impact, d) Politics: Regulation of AI, its economic implications and the emerging geopolitics of AI, and e) Predictions: What we believe will happen and a performance review to keep us honest...

Deploying Machine Learning Models Safely and Systematically
The Data Exchange Podcast interviews Hamel Husain on CI/CD for ML, MLOps tools and processes, and how much software engineering should data scientists know...Hamel Husain, Staff Machine Learning Engineer at GitHub and a core developer for fastai, previously worked on machine learning applications and systems at Airbnb and DataRobot...

A Message from this week's Sponsor:

Live Webinar | How to leverage AI for BI at scale

Thursday, Oct 21, 2021 at 2PM ET (11AM PT)

Learn how to harness the power of AI for BI, democratize data, and improve analytics at scale with featured speakers from Snowflake and Cardinal Health.

Data Science Articles & Videos

FlingBot: The Unreasonable Effectiveness of Dynamic Manipulations for Cloth Unfolding
In this work, we demonstrate the effectiveness of dynamic flinging actions for cloth unfolding with our proposed self-supervised learning framework, FlingBot. Our approach learns how to unfold a piece of fabric from arbitrary initial configurations using a pick, stretch, and fling primitive for a dual-arm setup from visual observations...

OpenPrompt: An Open-Source Framework for Prompt-learning.
Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks, which modifies the input text with a textual template and directly uses PLMs to conduct pre-trained tasks. This library provides a standard, flexible and extensible framework to deploy the prompt-learning pipeline. OpenPrompt supports loading PLMs directly from huggingface transformers. In the future, we will also support PLMs implemented by other libraries...

The Dawn of Quantum Natural Language Processing
In this paper, we discuss the initial attempts at boosting understanding human language based on deep-learning models with quantum computing. We successfully train a quantum-enhanced Long Short-Term Memory network to perform the parts-of-speech tagging task via numerical simulations. Moreover, a quantum-enhanced Transformer is proposed to perform the sentiment analysis based on the existing dataset....

Self-Supervised Learning Advances Medical Image Classification
In recent years, there has been increasing interest in applying deep learning to medical imaging tasks, with exciting progress in various applications like radiology, pathology and dermatology...In “Big Self-Supervised Models Advance Medical Image Classification”, to appear at ICCV 2021, we study the effectiveness of self-supervised contrastive learning as a pre-training strategy within the domain of medical image classification...

Balancing Average and Worst-case Accuracy in Multitask Learning
When training and evaluating machine learning models on a large number of tasks, it is important to not only look at average task accuracy -- which may be biased by easy or redundant tasks -- but also worst-case accuracy (i.e. the performance on the task with the lowest accuracy). In this work, we show how to use techniques from the distributionally robust optimization (DRO) literature to improve worst-case performance in multitask learning...

Neural Tangent Kernel Eigenvalues Accurately Predict Generalization
Finding a quantitative theory of neural network generalization has long been a central goal of deep learning research. We extend recent results to demonstrate that, by examining the eigensystem of a neural network's "neural tangent kernel", one can predict its generalization performance when learning arbitrary functions. Our theory accurately predicts not only test mean-squared-error but all first- and second-order statistics of the network's learned function. Furthermore, using a measure quantifying the "learnability" of a given target function, we prove a new "no-free-lunch" theorem characterizing a fundamental tradeoff in the inductive bias of wide neural networks: improving a network's generalization for a given target function must worsen its generalization for orthogonal functions...

A Few More Examples May Be Worth Billions of Parameters
We investigate the dynamics of increasing the number of model parameters versus the number of labeled examples across a wide variety of tasks. Our exploration reveals that while scaling parameters consistently yields performance improvements, the contribution of additional examples highly depends on the task's format. Specifically, in open question answering tasks, enlarging the training set does not improve performance. In contrast, classification, extractive question answering, and multiple choice tasks benefit so much from additional examples that collecting a few hundred examples is often "worth" billions of parameters...

Facebook Loves Self-Supervised Learning. Period.
What was once a research strategy for Facebook AI teams – over the years – has turned into an area of scientific breakthrough – where they have been delivering strong internal results, with some self-supervised language understanding models, libraries, frameworks, and experiments consistently beating traditional systems or fully supervised models...

Duke Computer Scientist, Cynthia Rudin, Wins $1 Million AI Prize
Duke University computer scientist Cynthia Rudin wants AI to show its work. Especially when it’s making decisions that deeply affect people’s lives...She chose to pursue opportunities to apply machine learning techniques to important societal problems, and in the process, realized that AI’s potential is best unlocked when humans can peer inside and understand what it is doing...Now, after 15 years of advocating for and developing “interpretable” machine learning algorithms that allow humans to see inside AI, Rudin’s contributions to the field have earned her the $1 million Squirrel AI Award for Artificial Intelligence for the Benefit of Humanity...

Primary visual cortex straightens natural video trajectories
Many sensory-driven behaviors rely on predictions about future states of the environment. Visual input typically evolves along complex temporal trajectories that are difficult to extrapolate. We test the hypothesis that spatial processing mechanisms in the early visual system facilitate prediction by constructing neural representations that follow straighter temporal trajectories...our findings reveal that the early visual system uses a set of specialized computations to build representations that can support prediction in the natural environment...

Tools*

Create AI-powered search and recommendation apps with Pinecone

Pinecone is a fully managed vector database that makes it easy to add vector search to production applications. It combines state-of-the-art vector search libraries, advanced features such as filtering, and distributed infrastructure to provide high performance and reliability at any scale. Get started now — it's free!

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

Jobs

Entry Level Data Scientist: 2022 - IBM - Multiple Locations

As a Data Scientist at IBM, you will help transform our clients’ data into tangible business value by analyzing information, communicating outcomes and collaborating on product development. Work with Best in Class open source and visual tools, along with the most flexible and scalable deployment options. Whether it’s investigating patient trends or weather patterns, you will work to solve real world problems for the industries transforming how we live.

Want to post a job here? Email us for details >> team@datascienceweekly.org

Training & Resources

Machine Learning Formulas Explained: Binary Cross Entropy Loss
This is the formula for the Binary Cross Entropy Loss. This loss function is commonly used for binary classification problems...It may look super confusing, but I promise you that it is actually quite simple!...Let's go step by step...

Bayesian Optimization Book [semi-final-draft]
The book aims to provide a self-contained and comprehensive introduction to Bayesian optimization, starting “from scratch” and carefully developing all the key ideas along the way. The intended audience is graduate students and researchers in machine learning, statistics, and related fields. However, I also hope that practitioners and researchers from more distant fields will find some utility here...

NYU Deep Learning SP21 Class [YouTube Playlist]
Course Videos...

Books

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits

Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

Follow on Twitter

unsubscribe from this list update subscription preferences

Data Science Weekly - Data Science Weekly - Issue 412

Issue #412

October 14 2021

A Message from this week's Sponsor:

Data Science Articles & Videos

Tools*

Jobs

Training & Resources

Books

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits

Older messages

[in case you missed it] Data Science Weekly - Issue 410

Data Science Weekly - Issue 410

Data Science Weekly - Issue 409

Data Science Weekly - Issue 408

You Might Also Like

Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator

Defining Your Paranoia Level: Navigating Change Without the Overkill

5 ways AI can help with taxes 🪄

Recurring Automations + Secret Updates

The First Provable AI-Proof Game: Introducing Butterfly Wings 4

GCP Newsletter #437

Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰

The Great Social Media Diaspora & Tapestry is here

Daily Coding Problem: Problem #1689 [Medium]

📧 Stop Conflating CQRS and MediatR

Data Science Weekly - Data Science Weekly - Issue 412

Issue #412 October 14 2021

A Message from this week's Sponsor:

Data Science Articles & Videos

Tools*

Jobs

Training & Resources

Books

Older messages

You Might Also Like

Issue #412

October 14 2021