Data Science Weekly - Data Science Weekly - Issue 408

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #408

September 16 2021

Editor Picks
 
  • ICLR 2022 Call for Blog Posts
    This year, the ICLR 2022 main conference will host a blog post track. We invite both academic and industrial researchers to submit their posts on a previously published paper at ICLR. We particularly welcome submissions on papers that appeared last year at ICLR...
  • Our Journey towards Data-Centric AI: A Retrospective
    Starting in about 2016, researchers from our lab — the Hazy Research lab — circled through academia and industry giving talks about an intentionally provocative idea: machine learning (ML) models—long the darlings of researchers and practitioners—were no longer the center of AI. In fact, models were becoming commodities. Instead, we claimed that it was the training data that would drive progress towards more performant ML models and systems...
 
 

A Message from this week's Sponsor:

 

 
TransformX Conference: Driving AI from Experimentation to Reality

Join Scale AI for our two-day, virtual conference featuring 100+ speakers and 60+ sessions. We’re bringing together a community of leaders, visionaries, practitioners, and researchers across industries as we explore the shift from research to reality within AI and Machine Learning. Register now to secure your free ticket...
 

 

Data Science Articles & Videos

 
  • The mathematics of adversarial attacks in AI
    It is well established that the current DL methodology produces universally unstable neural networks (NNs). The instability problem has caused an enormous research effort -- with a vast literature on so-called adversarial attacks -- yet there has been no solution to the problem. Our paper addresses why there has been no solution to the problem, as we prove the following mathematical paradox: any training procedure based on training neural networks for classification problems with a fixed architecture will yield neural networks that are either inaccurate or unstable (if accurate) -- despite the provable existence of both accurate and stable neural networks for the same classification problems...
  • Parallelizing Python Code
    Python is great for tasks like training machine learning models...When performing these tasks, you also want to use your underlying hardware as much as possible for quick results. Parallelizing Python code enables this. However, using the standard CPython implementation means you cannot fully use the underlying hardware because of the global interpreter lock (GIL) that prevents running the bytecode from multiple threads simultaneously...This article reviews some common options for parallelizing Python code...
  • Using learning-to-rank to precisely locate where to deliver packages
    For delivery drivers, finding the doorstep where a package should be dropped off can be surprisingly hard. House numbers can be obscured by foliage, or they might be missing entirely; some neighborhoods use haphazard numbering systems that make house numbers hard to guess; and complexes of multiple buildings sometimes share a single street address...I adapt an idea from information retrieval — learning-to-rank — to the problem of predicting the coordinates of a delivery location from past GPS data...
  • Building a smart Robot AI using Hugging Face 🤗 and Unity
    Today we’re going to build this adorable smart robot that will perform actions based on player text input...It uses a deep language model to understand any text input and find the most appropriate action of its list...What’s interesting with that system, contrary to classical game development, is that you don’t need to hard-code every interaction. Instead, you use a language model that selects what’s robot possible action is the most appropriate given user input...
  • Bayesian Media Mix Modeling for Marketing Optimization
    A problem faced by many companies is how to allocate marketing budgets across different media channels. For example, how should funds be allocated across TV, radio, social media, direct mail, or daily deals?...So-called Media Mix Modelling (MMM) can estimate how effective each advertising channel is in gaining new customers. Once we have estimated each channel’s effectiveness we can optimize our budget allocation to maximize customer acquisition and sales...In this blog post, we outline what you can do with MMM’s, introduce how they work, summarise some of the benefits they can provide, as well as covering some of the modeling challenges...
  • Bad Labels: GridSearch is Not Enough
    I write a lot of blog posts on why you need more than grid-search to properly judge a machine learning model. In this blog post I want to demonstrate yet another reason; labels often seem to be wrong...The issue here isn’t just that we might have bad labels in our training set, the issue is that it appears in the validation set. If a machine learning model can become state of the art by squeezing another 0.5% out of a validation set one has to wonder. Are we really making a better model? Or are we creating a model that is better able to overfit on the bad labels?...
  • bad labels: introduction
    Even famous datasets have bad labels in them...Because it's such a big problem we wanted to spend a few videos on this topic. It'd be a shame if our machine learning models are merely optimal because they overfit on the bad labels. That's why we're going to explore heuristics to find bad labels in our training data so that we may try to improve the quality of our training data...
  • Embedding Values in Artificial Intelligence (AI) Systems
    Though there are numerous high-level normative frameworks, it is still quite unclear how or whether values can be implemented in AI systems. Van de Poel and Kroes’s (2014) have recently provided an account of how to embed values in technology. The current article proposes to expand that view to complex AI systems and explain how values can be embedded in technological systems that are “autonomous, interactive, and adaptive”...
  • How To Lead In Data Science
    The Data Exchange Podcast: Jike Chong and Yue Cathy Chang on helping data scientists increase their impact in business and in society...
 
 

Training*

 

 
Quick Question For You: Do you want a Data Science job?

After helping hundred of readers like you get Data Science jobs, we've distilled all the real-world-tested advice into a self-directed course.

The course is broken down into three guides:
  1. Data Science Getting Started Guide. This guide shows you how to figure out the knowledge gaps that MUST be closed in order for you to become a data scientist quickly and effectively (as well as the ones you can ignore)

  2. Data Science Project Portfolio Guide. This guide teaches you how to start, structure, and develop your data science portfolio with the right goals and direction so that you are a hiring manager's dream candidate

  3. Data Science Resume Guide. This guide shows how to make your resume promote your best parts, what to leave out, how to tailor it to each job you want, as well as how to make your cover letter so good it can't be ignored!
Click here to learn more...

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!
 

 

Jobs

 
  • Senior Data Scientist - TikTok - LA

    TikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy by offering a home for creative expression and an experience that is genuine, joyful, and positive.
    • Generate useful features from large amount of data
    • Apply supervised and unsupervised machine learning techniques, such as linear and logistic regression, decision trees, and k-means clustering
    • Develop segmentation models, classification models, propensity models, LTV models, experimental design, optimization models
    • Perform statistical analysis such as KPI deep dives, performance marketing efficiency, behavioral clustering, and user journey analytics
    • Curate audiences and inform engagement tactics to enable differentiated, relevant marketing touches across channels (social, email, in app, push)
    • Synthesize analytics and statistical approaches into easy-to-consume storylines, both visually and verbally, and provide indicated actions for executive audiences
    • Capture business requirements for data and analytic solutions and collaborate XFN to ensure business requirements align with business needs
    • Analyze creatives and surface insights that will help drive engagement and retention
    • Support day-to-day collaboration with performance marketing to communicate insights and recommend data informed strategies

        Want to post a job here? Email us for details >> team@datascienceweekly.org
 

 

Training & Resources

 
  • How percentile approximation works (and why it's more useful than averages)
    As I was researching this piece, I found a number of good blog posts (see examples from the folks at Dynatrace, Elastic, AppSignal, and Optimizely) about how averages aren’t great for understanding application performance, or other similar things, and why it’s better to use percentiles...I won’t spend too long on this, but I think it’s important to provide a bit of background on why and how percentiles can help us better understand our data...First off, let’s consider how percentiles and averages are defined. To understand this, let’s start by looking at a normal distribution...
  • State of PyTorch core: September 2021 edition
    There are a lot of projects currently going on in PyTorch core and it can be difficult to keep track of all of them or how they relate with each other. Here is my personal understanding of all the things that are going on, organized around the people who are working on these projects, and how I think about how they relate to each other...
 
 

Books

 

  • Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits


    Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
     


    P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2021 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Key phrases

You Might Also Like

SBF gets 25 years 

Thursday, March 28, 2024

Sam Bankman-Fried is sentenced View this email online in your browser By Christine Hall Thursday, March 28, 2024 Welcome back to TechCrunch PM! The editorial team spent a chunk of the day discussing

💎 Issue 410 - Being laid off in 2023-2024 as an early-career developer

Thursday, March 28, 2024

This week's Awesome Ruby Newsletter Read this email on the Web The Awesome Ruby Newsletter Issue » 410 Release Date Mar 28, 2024 Your weekly report of the most popular Ruby news, articles and

💻 Issue 403 - Microsoft defends .NET 9 features competing with open source ecosystem

Thursday, March 28, 2024

This week's Awesome .NET Weekly Read this email on the Web The Awesome .NET Weekly Issue » 403 Release Date Mar 28, 2024 Your weekly report of the most popular .NET news, articles and projects

💻 Issue 410 - Node.js TSC Confirms: No Intention to Remove npm from Distribution

Thursday, March 28, 2024

This week's Awesome Node.js Weekly Read this email on the Web The Awesome Node.js Weekly Issue » 410 Release Date Mar 28, 2024 Your weekly report of the most popular Node.js news, articles and

💻 Issue 410 - JSDoc as an alternative TypeScript syntax

Thursday, March 28, 2024

This week's Awesome JavaScript Weekly Read this email on the Web The Awesome JavaScript Weekly Issue » 410 Release Date Mar 28, 2024 Your weekly report of the most popular JavaScript news, articles

📱 Issue 404 - Dependency Injection for Modern Swift Applications Part II

Thursday, March 28, 2024

This week's Awesome iOS Weekly Read this email on the Web The Awesome iOS Weekly Issue » 404 Release Date Mar 28, 2024 Your weekly report of the most popular iOS news, articles and projects Popular

💻 Issue 328 - My new open-source repository to schedule all your content!

Thursday, March 28, 2024

This week's Awesome React Weekly Read this email on the Web The Awesome React Weekly Issue » 328 Release Date Mar 28, 2024 Your weekly report of the most popular React news, articles and projects

📱 Issue 407 - Apple just announced WWDC24. The keynote for WWDC24 will be held on Monday, June 10th.

Thursday, March 28, 2024

This week's Awesome Swift Weekly Read this email on the Web The Awesome Swift Weekly Issue » 407 Release Date Mar 28, 2024 Your weekly report of the most popular Swift news, articles and projects

💻 Issue 405 - 2024 Edition Update

Thursday, March 28, 2024

This week's Awesome Rust Weekly Read this email on the Web The Awesome Rust Weekly Issue » 405 Release Date Mar 28, 2024 Your weekly report of the most popular Rust news, articles and projects

🤖 What to Expect From Google I/O 2024 — How to Stop Apps From Leaking Your Data

Thursday, March 28, 2024

Also: The Best Camera Straps of 2024, and More! How-To Geek Logo March 28, 2024 📩 Get expert reviews, the hottest deals, how-to's, breaking news, and more delivered directly to your inbox by