Data Science Weekly - Data Science Weekly - Issue 412

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #412

October 14 2021

Editor Picks
  • Machine learning is not nonparametric statistics
    Many times in my career, I’ve been told by respected statisticians that machine learning is nothing more than nonparametric statistics. The longer I work in this field, the more I think this view is both misleading and unhelpful. Not only can I never get a consistent definition of what “nonparametric” means, but the jump from statistics to machine learning is considerably larger than most expect. Statistics is an important tool for understanding machine learning and randomness is valuable for machine learning algorithm design, but there is considerably more to machine learning than what we learn in elementary statistics...
  • State of AI Report 2021
    Now in its fourth year, the State of AI Report 2021 is reviewed by AI practioners in industry and research, and features invited contributions from a range of well-known and up-and-coming companies and research groups. The Report considers the following key dimensions: a) Research: Technology breakthroughs and capabilities, b) Talent: Supply, demand and concentration of AI talent, c) Industry: Areas of commercial application for AI and its business impact, d) Politics: Regulation of AI, its economic implications and the emerging geopolitics of AI, and e) Predictions: What we believe will happen and a performance review to keep us honest...
  • Deploying Machine Learning Models Safely and Systematically
    The Data Exchange Podcast interviews Hamel Husain on CI/CD for ML, MLOps tools and processes, and how much software engineering should data scientists know...Hamel Husain, Staff Machine Learning Engineer at GitHub and a core developer for fastai, previously worked on machine learning applications and systems at Airbnb and DataRobot...

A Message from this week's Sponsor:


Live Webinar | How to leverage AI for BI at scale

Thursday, Oct 21, 2021 at 2PM ET (11AM PT)

Learn how to harness the power of AI for BI, democratize data, and improve analytics at scale with featured speakers from Snowflake and Cardinal Health.



Data Science Articles & Videos

  • OpenPrompt: An Open-Source Framework for Prompt-learning.
    Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks, which modifies the input text with a textual template and directly uses PLMs to conduct pre-trained tasks. This library provides a standard, flexible and extensible framework to deploy the prompt-learning pipeline. OpenPrompt supports loading PLMs directly from huggingface transformers. In the future, we will also support PLMs implemented by other libraries...
  • The Dawn of Quantum Natural Language Processing
    In this paper, we discuss the initial attempts at boosting understanding human language based on deep-learning models with quantum computing. We successfully train a quantum-enhanced Long Short-Term Memory network to perform the parts-of-speech tagging task via numerical simulations. Moreover, a quantum-enhanced Transformer is proposed to perform the sentiment analysis based on the existing dataset....
  • Self-Supervised Learning Advances Medical Image Classification
    In recent years, there has been increasing interest in applying deep learning to medical imaging tasks, with exciting progress in various applications like radiology, pathology and dermatology...In “Big Self-Supervised Models Advance Medical Image Classification”, to appear at ICCV 2021, we study the effectiveness of self-supervised contrastive learning as a pre-training strategy within the domain of medical image classification...
  • Balancing Average and Worst-case Accuracy in Multitask Learning
    When training and evaluating machine learning models on a large number of tasks, it is important to not only look at average task accuracy -- which may be biased by easy or redundant tasks -- but also worst-case accuracy (i.e. the performance on the task with the lowest accuracy). In this work, we show how to use techniques from the distributionally robust optimization (DRO) literature to improve worst-case performance in multitask learning...
  • Neural Tangent Kernel Eigenvalues Accurately Predict Generalization
    Finding a quantitative theory of neural network generalization has long been a central goal of deep learning research. We extend recent results to demonstrate that, by examining the eigensystem of a neural network's "neural tangent kernel", one can predict its generalization performance when learning arbitrary functions. Our theory accurately predicts not only test mean-squared-error but all first- and second-order statistics of the network's learned function. Furthermore, using a measure quantifying the "learnability" of a given target function, we prove a new "no-free-lunch" theorem characterizing a fundamental tradeoff in the inductive bias of wide neural networks: improving a network's generalization for a given target function must worsen its generalization for orthogonal functions...
  • A Few More Examples May Be Worth Billions of Parameters
    We investigate the dynamics of increasing the number of model parameters versus the number of labeled examples across a wide variety of tasks. Our exploration reveals that while scaling parameters consistently yields performance improvements, the contribution of additional examples highly depends on the task's format. Specifically, in open question answering tasks, enlarging the training set does not improve performance. In contrast, classification, extractive question answering, and multiple choice tasks benefit so much from additional examples that collecting a few hundred examples is often "worth" billions of parameters...
  • Facebook Loves Self-Supervised Learning. Period.
    What was once a research strategy for Facebook AI teams – over the years – has turned into an area of scientific breakthrough – where they have been delivering strong internal results, with some self-supervised language understanding models, libraries, frameworks, and experiments consistently beating traditional systems or fully supervised models...
  • Duke Computer Scientist, Cynthia Rudin, Wins $1 Million AI Prize
    Duke University computer scientist Cynthia Rudin wants AI to show its work. Especially when it’s making decisions that deeply affect people’s lives...She chose to pursue opportunities to apply machine learning techniques to important societal problems, and in the process, realized that AI’s potential is best unlocked when humans can peer inside and understand what it is doing...Now, after 15 years of advocating for and developing “interpretable” machine learning algorithms that allow humans to see inside AI, Rudin’s contributions to the field have earned her the $1 million Squirrel AI Award for Artificial Intelligence for the Benefit of Humanity...
  • Primary visual cortex straightens natural video trajectories
    Many sensory-driven behaviors rely on predictions about future states of the environment. Visual input typically evolves along complex temporal trajectories that are difficult to extrapolate. We test the hypothesis that spatial processing mechanisms in the early visual system facilitate prediction by constructing neural representations that follow straighter temporal trajectories...our findings reveal that the early visual system uses a set of specialized computations to build representations that can support prediction in the natural environment...



Create AI-powered search and recommendation apps with Pinecone

Pinecone is a fully managed vector database that makes it easy to add vector search to production applications. It combines state-of-the-art vector search libraries, advanced features such as filtering, and distributed infrastructure to provide high performance and reliability at any scale. Get started now — it's free!

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!



  • Entry Level Data Scientist: 2022 - IBM - Multiple Locations

    As a Data Scientist at IBM, you will help transform our clients’ data into tangible business value by analyzing information, communicating outcomes and collaborating on product development. Work with Best in Class open source and visual tools, along with the most flexible and scalable deployment options. Whether it’s investigating patient trends or weather patterns, you will work to solve real world problems for the industries transforming how we live.

        Want to post a job here? Email us for details >>


Training & Resources

  • Bayesian Optimization Book [semi-final-draft]
    The book aims to provide a self-contained and comprehensive introduction to Bayesian optimization, starting “from scratch” and carefully developing all the key ideas along the way. The intended audience is graduate students and researchers in machine learning, statistics, and related fields. However, I also hope that practitioners and researchers from more distant fields will find some utility here...



  • Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits

    Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.

    P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2021, All rights reserved.
unsubscribe from this list    update subscription preferences 

Key phrases

Older messages

Data Science Weekly - Issue 411

Friday, October 8, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #411 October 07 2021 Editor Picks

[in case you missed it] Data Science Weekly - Issue 410

Sunday, October 3, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #410 September 30 2021 Editor Picks Top

Data Science Weekly - Issue 410

Friday, October 1, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #410 September 30 2021 Editor Picks Top

Data Science Weekly - Issue 409

Friday, September 24, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #409 September 23 2021 Editor Picks Tree

Data Science Weekly - Issue 408

Friday, September 17, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #408 September 16 2021 Editor Picks The

Like Tiger King, but for L&D

Thursday, September 30, 2021

Clear your TV schedule. For many of us, the pandemic has offered a dose of fresh perspective. It was the jolt to the system we needed to start thinking about new challenges, opportunities, and

Today's startups in 10 words

Thursday, September 30, 2021

Do you want to reach 1500+ developers, designers, investors, and entrepreneurs & support 10words? Check out our new sponsorship options. Discover new apps and startups in 10 words or

Get your Instagram Shop ready for the holidays 🎁

Friday, September 17, 2021

Learn the latest Instagram best practices to make your shop a success. ‌ ‌ ‌ Hello there! 👋 Instagram's new guide to making your Instagram account ready for the holiday season is a must-read. You

John Carmack pushes out unlocked OS for defunct Oculus Go headset — and Fed to ban policymakers from owning individual stocks

Friday, October 22, 2021

Issue #564 — Top 20 stories of October 23, 2021 Issue #564 — October 23, 2021 You receive this email because you are subscribed to Hacker News Digest. You can open it in the browser if you prefer. 1

[New post] Collection Performance: Looping Over Other Collection Types

Friday, October 22, 2021

dotNetDave posted: " Subscriber Content I would guess that most code that is written defines a collection like this: var collection = new List<Person>(); Using List<> is perfectly fine

Trump SPAC’s market cap approaches $4.7 billion

Friday, October 22, 2021

TechCrunch Newsletter TechCrunch logo The Daily Crunch logo Friday, October 22, 2021 • By Alex Wilhelm Friday! Dear friends, we made it to the end of the week. It was a big, busy few days, so give

Embrace Your Inner Stooge 🤪

Friday, October 22, 2021

How The Three Stooges became icons. Here's a version for your browser. Hunting for the end of the long tail • October 22, 2021 Hey all, Ernie here with a refreshed piece about a troupe that you

Dear Tech God, Please Execute Your Civic Duty

Friday, October 22, 2021

Contribute to the internet's 'most democra-tech awards system': Invent hilarious award titles! Hacker Noon: How Hackers Start Their Afternoons Hacker Noon: How Hackers Start Their

JSK Daily for Oct 22, 2021

Friday, October 22, 2021

JSK Daily for Oct 22, 2021 View this email in your browser A community curated daily e-mail of JavaScript news ULID vs UUID: Sortable Random ID Generators for JavaScript UUID is one of the most used

Alpine.js Weekly #77

Friday, October 22, 2021

An announcement from the Alpine Components Team We've decided to rename Alpine Component Patterns to just Alpine Components and integrate it with the Alpine documentation for a more seamless

Chart | Should Investors Buy the Dip, Buy the Rise, or Follow a Plan? 📉📈

Friday, October 22, 2021

Which strategy has paid off the most in the last decade? Should Investors Buy the Dip, Buy the Rise, or Follow a Plan? Which of these strategies works the best in an investment portfolio? We look at

Writing clean code: Naming

Friday, October 22, 2021

Keep up-to-date with the latest programming news Codementor Your Weekly Digest TOP POSTS FROM THIS WEEK Tanaka Mutakwa Writing clean code: Naming When you start learning how to code your main focus is

[Python Dependency Pitfalls] A total mess?

Friday, October 22, 2021

Hey there, Recently I watched a Pythonista ask for advice on setting up a Python project on his work machine. This new developer had some prior experience with NodeJS and had just started to get his