Data Science Weekly - Data Science Weekly - Issue 451

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #451

July 14 2022

Editor's Picks


  • The Data Science Trap [Reddit Discussion]
    It is no longer open to question that data scientists in the industry are merely glorified data analysts. Businesses are pouring money into STEM graduates to create colorful charts and BS reporting. Aside from hypothesis testing and linear or logistic regressions, nothing they do comes close to statistics or modeling. There have been several threads about how research scientists are the new data scientists - and these threads are full of scorn for the state of the data scientist job market...
  • The Data Science Trap: A Rebuttal [Reddit Discussion]
    More often than not, I see comments on this subreddit suggesting the dilution of the Data Science discipline into a glorified Data Analyst position. Maybe my 10 years in the Data Science field leads me to possessing a level of naivety, but I’ve concluded that Data Science in its academic interpretation is far from its practicality in application...
  • Prof. Noam Chomsky Machine Learning Street Talk Interview [Video]
    Prof. Noam Chomsky is the father of modern linguistics and the most important intellectual of the 20th century...We explore some of the profound misunderstandings of linguistics in general and Chomsky’s own work specifically which have persisted, at the highest levels of academia for over sixty years...We have produced a significant introduction section where we discuss in detail Yann LeCun’s recent position paper on AGI, a recent paper on emergence in LLMs, empiricism related to cognitive science, cognitive templates, “the ghost in the machine” and language. ...

A Message from this week's Sponsor:


Pinecone vector database

The Pinecone vector database makes it easy to build high-performance vector search applications. Developer-friendly, fully managed, and easily scalable without infrastructure hassles.

Use Pinecone to build semantic search, object recognition, recommendations, anomaly detection, and other vector-based functionality into your applications..



Data Science Articles & Videos

  • Job Hunt as a PhD in AI / ML / RL: How it Actually Happens
    Combine a supposedly good tech hiring market with a Ph.D. from Berkeley AI Research and let's see what we get. In the vein of transparency, I wanted to share my experience in the job search and some notes specific to reinforcement learning as an area. Is there something you want to know about and I didn't share? Please let me know...
  • Critical Dataset Studies Reading List
    How should we study datasets in machine learning? As machine learning increasingly becomes a site of sociotechnical inquiry, invoking numerous social, political, legal, and ethical issues, datasets are a crucial component as they are core material used to train models. Inspired by Tarleton Gillespie and Nick Seaver’s Critical Algorithm Studies reading list, this collection is meant to serve as an entry point to the growing literature on ML datasets across the fields of computer science, human-computer interaction, science and technology studies, media studies, and histories of technology...
  • On the Principles of Parsimony and Self-Consistency for the Emergence of Intelligence
    Ten years into the revival of deep networks and artificial intelligence, we propose a theoretical framework that sheds light on understanding deep networks within a bigger picture of Intelligence in general. We introduce two fundamental principles, Parsimony and Self-consistency, that we believe to be cornerstones for the emergence of Intelligence, artificial or natural...
  • Why every statistician should know about cross-validation
    Surprisingly, many statisticians see cross-validation as something data miners do, but not a core statistical technique. I thought it might be helpful to summarize the role of cross-validation in statistics, especially as it is proposed that the Q&A site at should be renamed
  • Deep Learning with a Small Training Batch (or Lack Thereof)
    This article covers two self-supervised approaches to tackling the image classification issue. Google and Facebook offer these methods. These methods are excellent when you only have a small training batch, and they will significantly simplify manual markup or even allow you to ditch it...
  • How Spotify Uses Semantic Search for Podcasts
    Spotify’s natural language search for podcasts is fascinating. In the past, users had to rely on keyword/term matching to find the podcast episodes they wanted. Now, they can search in natural language, in much the same way we might ask a real person where to find something...This technology relies on what we like to call semantic search. It enables a more intuitive search experience because we tend to have an idea of what we’re looking for, but rarely do we know precisely which terms appear in what we want...
  • BERTopic: Leverage BERT and c-TF-IDF to create easily interpretable topics
    BERTopic is a topic modeling technique that leverages 🤗 transformers and c-TF-IDF to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions...BERTopic supports guided, (semi-) supervised, and dynamic topic modeling. It even supports visualizations similar to LDAvis!...
  • An Introduction to Lifelong Supervised Learning
    This primer is an attempt to provide a detailed summary of the different facets of lifelong learning...Chapter 2 provides a high-level overview of lifelong learning systems...Chapter 3 focuses on regularization-based approaches that do not assume access to any data from previous tasks. Chapter 4 discusses memory-based approaches that typically use a replay buffer or an episodic memory to save subset of data across different tasks. Chapter 5 focuses on different architecture families (and their instantiations) that have been proposed for training lifelong learning systems. Following these different classes of learning algorithms, we discuss the commonly used evaluation benchmarks and metrics for lifelong learning (Chapter 6) and wrap up with a discussion of future challenges and important research directions in Chapter 7...
  • 4 Pandas Anti-Patterns to Avoid and How to Fix Them
    Pandas is a powerful data analysis library with a rich API that offers multiple ways to perform any given data manipulation task. Some of these approaches are better than others, and pandas users often learn suboptimal coding practices that become their default workflows. This post highlights four common pandas anti-patterns and outlines a complementary set of techniques that you should use instead...
  • awesome-data-leadership
    A curated list of awesome and useful posts, videos, and articles on leading a data team. This includes leadership at the middle-management, Director/VP, or C-suite level, for organizations both big and small. A few relevant engineering management articles are sprinkled in...
  • Intuitive physics learning in a deep-learning model inspired by developmental psychology
    ‘Intuitive physics’ enables our pragmatic engagement with the physical world and forms a key component of ‘common sense’ aspects of thought. Current artificial intelligence systems pale in their understanding of intuitive physics, in comparison to even very young children. Here we address this gap between humans and machines by drawing on the field of developmental psychology...

Webinar Panel*


Wednesday, July 27 @ 2PM ET / 11AM PT

Get practical advice from Data & Analytics Leaders from PayPal, Penguin Random House, & PartnerRe to learn about fostering an analytics-driven culture to drive better insights. Register Now!

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!




  • Senior Data Scientist, Startup Creation at Redesign Health - US

    As our Senior Data Scientist for our Startup Creation team, you will set up and configure the data infrastructure for our startups, and work with the startup founding team to define data driven KPIs, and implement automated statistical analyses of customer behavior. Your goal is to make all of the companies that we launch data-driven from day one.

    In this role, you will function as an in-house implementation team for the companies that Redesign Health launches (internally referred to as OpCos). We provide data strategy, data pipeline, data analytics and forecasting services to newly formed companies in a repeatable and scalable manner...


        Want to post a job here? Email us for details -->



Training & Resources

  • Transformers United (Stanford CS 25)
    Since their introduction in 2017, transformers have revolutionized Natural Language Processing (NLP). Now, transformers are finding applications all over Deep Learning, be it computer vision (CV), reinforcement learning (RL), Generative Adversarial Networks (GANs), Speech or even Biology...In this seminar, we examine the details of how transformers work, and dive deep into the different kinds of transformers and how they're applied in different fields. We do this through a combination of instructor lectures, guest lectures, and classroom discussions...
  • Introduction to K-Means Clustering
    While this article will focus most closely on K-means, there are other powerful types of clustering that can be used as well. Let’s take a look at the main ones like hierarchical, density-based, and partitional clustering...

What you’re up to – notes from DSW readers

  • Working on something cool? Let us know here :) ...

* To share your projects and updates, share the details here.

** Want to chat with one of the above people? Hit reply and let us know :)


Last Week's Newsletter's 3 Most Clicked Links


* Based on unique clicks.

** Find last week's newsletter here.


P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2022, All rights reserved.
unsubscribe from this list    update subscription preferences 

Key phrases

Older messages

Data Science Weekly - Issue 450

Friday, July 8, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #450 July 07 2022 Editor's Picks AI

Data Science Weekly - Issue 449

Friday, July 1, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #449 June 30 2022 Editor Picks Pen and

Data Science Weekly - Issue 448

Friday, June 24, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #448 June 23 2022 Editor Picks Machine

Data Science Weekly - Issue 447

Friday, June 17, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #447 June 16 2022 Editor Picks The

Data Science Weekly - Issue 446

Friday, June 10, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #446 June 09 2022 Editor Picks Literary

🌅 Edge#229: VQGAN + CLIP

Tuesday, September 27, 2022

+the original VQGAN+CLIP paper; +VQGAN+CLIP implementations ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

NASA hits asteroid ☄️, Netflix game studio 🎮, Big Tech allure fades 👨‍💻

Tuesday, September 27, 2022

NASA crashed a spacecraft into an asteroid at 7:14 PM ET last night. Sign Up | View Online | Sponsor Daily Update 2022-09-27 How Google, Microsoft, Lyft, and GitLab find and fix vulnerabilities (

Webinar #4 - From CQRS to CRUD in Practice

Tuesday, September 27, 2022

Watch now (98 min) | Technical patterns are filled with myths and mishaps. Most of the material shows a skewed perspective about CQRS. They suggest that you need multiple databases and messaging

[Python Dependency Pitfalls] How to set the world on fire

Tuesday, September 27, 2022

Hey there, #1 on my list of dependency management pitfalls is there for a good reason: It lead to a single developer causing mayhem and breaking thousands of open-source projects around the world in

DeveloPassion's Newsletter - Procrastination

Tuesday, September 27, 2022

Hello everyone! I'm Sébastien Dubois, your host. You're receiving this email because you signed up fo DeveloPassion's Newsletter - Procrastination By Sébastien Dubois • Issue #86 • View

Someday aliens will land and all will be fine until we explain our calendar — and is the best e-commerce site I've ever used

Monday, September 26, 2022

Issue #902 — Top 20 stories of September 27, 2022 Issue #902 — September 27, 2022 You receive this email because you are subscribed to Hacker News Digest. You can open it in the browser if you prefer.

Cloudflare rolls out new mobile services to secure employees' smartphones

Monday, September 26, 2022

TechCrunch Newsletter TechCrunch logo The Daily Crunch logo By Christine Hall and Haje Jan Kamps Monday, September 26, 2022 Good morning, you crunchistas. We hope you've had a chill weekend. Or if

Asteroids, spaceships and dinosaurs: Everything you need to know about NASA's DART mission

Monday, September 26, 2022

What is Starlink? Everything you need to know about Elon Musk's satellite internet service... ZDNET ZDNET Insights September 26, 2022 editor's note placeholder Asteroids, spaceships and

JSK Daily for Sep 26, 2022

Monday, September 26, 2022

JSK Daily for Sep 26, 2022 View this email in your browser A community curated daily e-mail of JavaScript news Improve React Custom Hook Debugging with useDebugValue As developers, we often need to

Max Q - Icy

Monday, September 26, 2022

TechCrunch Newsletter TechCrunch logo Max Q logo By Aria Alamalhodaei Monday, September 26, 2022 Hello and welcome back to Max Q. This past week, thousands of people traveled to Paris for the