Data Science Weekly - Data Science Weekly - Issue 470

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #470

November 24 2022

Editor's Picks

  • What Does it Mean to Give Someone What They Want? The Nature of Preferences in Recommender Systems
    In practice, most recommenders optimize for engagement. This has been justified by the assumption that people always choose what they want, an idea from 20th-century economics called revealed preference. However, this approach to preferences can lead to a variety of unwanted outcomes including clickbait, addiction, or algorithmic manipulation...Doing better requires both a change in thinking and a change in approach. We’ll propose a more realistic definition of preferences, taking into account a century of interdisciplinary study, and two concrete ways to build better recommender systems...
  • The GitLab Data Team Handbook
    GitLab has two primary distinct groups within the Data Program who use data to drive insights and business decisions...The two teams are the (central) Data Team and, separately, Function Analytics Teams located in Sales, Marketing, Product, Engineering or Finance...The Data Team Handbook contains a large amount of information! To help you navigate the handbook we've organized it into the following major sections: a) Dashboards & Data you can use, b) How data works at GitLab, c) How the data team works, d) How the data platform works, and e) What the data team is working on...
  • Playtesting Candy Crush: Human-Like Playtesting with Deep Learning
    Today I learned that there's actualy research in playtesting video games using deep learning...What's interesting is that the paper is actually written by actual employees from actual video game companies. But also that it decided to explore these techniques for Candy Crush Saga...


A Message from this week's Sponsor:


Pinecone vector database

The Pinecone vector database makes it easy to build high-performance vector search applications. Developer-friendly, fully managed, and easily scalable without infrastructure hassles.

Use Pinecone to build semantic search, object recognition, recommendations, anomaly detection, and other vector-based functionality into your applications.


Data Science Articles & Videos

  • World's Simplest Data Pipeline?
    How much you invest in your data engineering capability is dependent on your own ambition and needs and the risks of over- or under-investing are well documented. I believe there are some rules and guidelines that are universally applicable, regardless of your team size or tech stack, and that following these rules can save huge headaches in both teams of one and teams of one hundred...In order to demonstrate this, about a year ago I built the simplest data pipeline I could build while still adhering to my rules...
  • Explaining the Effects of Clouds on Remote Sensing Scene Classification
    Most of Earth is covered by haze or clouds, impeding the constant monitoring of our planet...little effort has been spent on understanding how exactly atmospheric disturbances impede the application of modern machine learning methods to Earth observation data...We provide a thorough investigation of how classifiers trained on cloud-free data fail once they encounter noisy imagery – a common scenario encountered when deploying pretrained models for remote sensing to real use cases...
  • A Short Guide for Feature Engineering and Feature Selection
    Feature engineering and selection is the art/science of converting data to the best way possible, which involve an elegant blend of domain expertise, intuition and mathematics. This guide is a concise reference for beginners with most simple yet widely used techniques for feature engineering and selection...
  • Demystifying ML PhD Admissions to US Universities [Video]
    This video is a recording of the panel discussion and Q & A on ML PhD admissions to US universities...Several faculty from various universities in the US took part in it including Tatsu Hashimoto (Stanford), Rada Mihalcea (UMichigan), Devi Parikh (Georgia Tech), Sameer Singh (UC Irvine), and James Zou (Stanford)...
  • Writing a scientific article: A step-by-step guide for beginners
    Many young researchers find it extremely difficult to write scientific articles, and few receive specific training in the art of presenting their research work in written format. Yet, publication is often vital for career advancement, to obtain funding, to obtain academic qualifications, or for all these reasons. We describe here the basic steps to follow in writing a scientific article. We outline the main sections that an average article should contain; the elements that should appear in these sections, and some pointers for making the overall result attractive and acceptable for publication...
  • Tools to Improve Training Data - Talking Language AI Episode #2 [Video]
    Vincent Warmerdam builds a lot of NLP tools. Many of these tools target the scikit-learn ecosystem and there's a theme of labeling across many of them. A recent focus of his stack of tools is to improve training data. In this video, Vincent and Jay discuss a few of these tools and show how they work together...These tools are discussed in the video: a) Human-learn: a toolkit to build human-based scikit-learn components, b) Doubtlab: a toolkit to help find doubtful labels in data, c) Embetter: A library that makes it very easy to use embeddings in scikit-learn, and d) Bulk: a library that uses embeddings to leverage bulk labeling...The talk includes live demos for each and to show how some simple tricks can go a long way...
  • Planes are still decades away from displacing most bird jobs
    Here’s the thing: all human-built artificial flight (AF) machines are incredibly specialized and are far away from being able to perform most of the tasks birds – the only general flight (GF) machines we are aware of – can perform...
  • Robot Learning on the Job: Human-in-the-Loop Autonomy and Learning During Deployment
    To harness the capabilities of state-of-the-art robot learning models while embracing their imperfections, we present Sirius, a principled framework for humans and robots to collaborate through a division of work. In this framework, partially autonomous robots are tasked with handling a major portion of decision-making where they work reliably; meanwhile, human operators monitor the process and intervene in challenging situations...
  • General purpose visual recognition across modalities with limited supervision [Video]
    Ishan Misra, FAIR (Meta AI), presents on how modern computer vision models are good at specialized tasks...However, specialist models also have severe limitations — they can only do what they are trained for and require copious amounts of pristine supervision for it. In this talk, he focuses on two limitations: specialist models cannot work on tasks beyond what they saw training labels for, or on new types of visual data. He’ll present our recent efforts that design better architectures, training paradigms and loss functions to address these issues...




Retool is the fast way to build an interface for any database

With Retool, you don't need to be a developer to quickly build an app or dashboard on top of any data set. Data teams at companies like NBC use Retool to build any interface on top of their data—whether it's a simple read-write visualization or a full-fledged ML workflow.

Drag and drop UI components—like tables and charts—to create apps. At every step, you can jump into the code to define the SQL queries and JavaScript that power how your app acts and connects to data. The result—less time on repetitive work and more time to discover insights.

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!




SuperAnnotate Webinar

In December last year SuperAnnotate hosted a webinar “2021 CV’s year retrospective and opportunities for 2022” to wrap up the passing year in AI and share their predictions of 2022.

We are excited to share that this year SuperAnnotate is hosting an end-of-the-year webinar again reviewing the developments in the AI space in 2022 and sharing what we can expect from the year ahead. This webinar will be covering everything from generative models like Stable Diffusion, NLP with Large Language Models, DataOps and Data-Centricity, Transformers expanding into CV; new models like YOLOv7, large partnerships in A(G)I space and more! Following that, SuperAnnotate's CTO and co-founder Vahan Petrosyan will share his predictions for 2023.

Join us to see which of their predictions from the previous webinar came true, sum up developments in AI this year, and see what to expect from 2023. Register Now.

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!




  • Senior Data Analyst - Epic Games - New York

    Epic Games spans across 19 countries with 55 studios and 4,500+ employees globally. For over 25 years, we’ve been making award-winning games and engine technology that empowers others to make visually stunning games and 3D content that bring environments to life like never before.

    Use your expert experience in data & analytics to build powerful stories and visuals that inform the games we make, the technology we develop, and business decisions that drive Epic... Epic Games is looking for a Senior Data Analyst to help us create the models that fuel our creator economy. The successful candidate will have excellent SQL knowledge, and enjoy combining analytic skills with business acumen to provide the data and insights that will drive our continued success...


        Want to post a job here? Email us for details -->



Training & Resources

  • Introduction to Robotics @ Princeton
    Lectures from "Introduction to Robotics" at Princeton University by instructor Anirudha Majumdar...This course will provide an introduction to the fundamental theoretical and algorithmic principles behind robotic systems. The course will also allow students to get hands-on experience through project-based assignments with the Crazyflie quadrotor....

Last Week's Newsletter's 3 Most Clicked Links


* Based on unique clicks.

** Find last week's newsletter here.


Cutting Room Floor


P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2022, All rights reserved.
unsubscribe from this list    update subscription preferences 

Key phrases

Older messages

[in case you missed it] Data Science Weekly - Issue 469

Sunday, November 20, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #469 November 17 2022 Editor's Picks

Data Science Weekly - Issue 469

Friday, November 18, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #469 November 17 2022 Editor's Picks

Data Science Weekly - Issue 468

Friday, November 11, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #468 November 03 2022 Editor's Picks

Data Science Weekly - Issue 467

Thursday, November 3, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #467 November 03 2022 Editor's Picks

Data Science Weekly - Issue 466

Thursday, October 27, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #466 October 27 2022 Editor's Picks — Show HN: Using stylometry to find HN users with alternate accounts — and The Need to Read

Saturday, November 26, 2022

Issue #963 — Top 20 stories of November 27, 2022 Issue #963 — November 27, 2022 You receive this email because you are subscribed to Hacker News Digest. You can open it in the browser if you prefer. 1

⚡️30% OFF – only three days left⚡️

Saturday, November 26, 2022

Don't miss out! ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

Digest #79: Run GitHub Actions Locally 🚀

Saturday, November 26, 2022

Digest #79: Run GitHub Actions Locally 🚀 #79: Run GitHub Actions Locally This week's newsletter highlights why containers are just chrooted processes, how to migrate from Postgres to DynamoDB,

Daily Coding Problem: Problem #947 [Hard]

Saturday, November 26, 2022

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Oracle. We say a number is sparse if there are no adjacent ones in its binary

12 AirPods Features You Should be Using

Saturday, November 26, 2022

Did You Know?: Most cranberries are harvested by flooding their fields (they float thanks to little air pockets in the berries) and tend to be frozen or processed quickly once harvested. Those that are

Infographic | Every Song With Over 1 Billion Spotify Streams 🎧

Saturday, November 26, 2022

Spotify's 'Billions Club' playlist tracks every song with over 1 billion streams. We took the data and broke it down by decade and artist. View Online | Subscribe Presented by: Our latest

U.S. Bans Chinese Telecom Equipment and Surveillance Cameras Over National Security Risk

Saturday, November 26, 2022

The Hacker News Daily Updates Newsletter cover A Must-Have Checklist for Workplace Security and Protection What do a bustling workplace, happy employees, and secure equipment and data all have in

Noonification: White Man

Saturday, November 26, 2022

Top Tech Content sent at Noon! Find Your Next Software Engineering Job on Hired How are you, @hacker? 🪐 What's happening in tech this week: The Noonification by HackerNoon has got you covered with

8 Ways Google Assistant Can Increase Your Work Productivity

Saturday, November 26, 2022

Read in Browser Logo for Review Geek November 26, 2022 The year is winding down, and it's getting more difficult to be productive than ever. Call it the weather, call it the holidays, call it all

New Python tutorials on Real Python

Saturday, November 26, 2022

Hey there, There's always something going on over at as far as Python tutorials go. Here's what you may have missed this past week: Python REST APIs With Flask, Connexion, and