Data Science Weekly - Data Science Weekly - Issue 440

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #440

April 28 2022

Editor Picks

 
  • DALL·E 2 and The Origin of Vibe Shifts
    The point of this essay isn’t about predicting next year’s design trends. To me the more interesting thing is to understand the ecological process that generates those trends, seeing the true signaling function of visual design, and learning why some corporate status signaling is so effective and why some isn’t. Projecting further out, it’s about trying to picture a world where it’s cheap and easy for anyone to generate just about any kind of image they want...To answer these questions, we’re going to tap the most well-developed pool of knowledge on the use of costly signals and their evolution over time: biology....
  • Learning with not Enough Data Part 3: Data Generation
    Part 3 of “what if you don’t have enough training data” series - touch base on creating more synthetic data by data augmentation or model generation, as well as some ideas on how to work with noisy labels (given synthetic data might not be fully correct)...
 
 

A Message from this week's Sponsor:

 



ML and Data Developers Week (May 16~20)

Learn ML/data engineering from top minds practitioners at the ML and Data Developers Week, which is geared for engineering teams to discuss the practical solutions, challenges faced when building ML for the real world. With thousands of global ML devs/data scientists, deep dive tech talks, hands-on workshops, You can look forward to engaging conversations, insightful discussions, hands-on code labs, and peer networking. Free to join virtually and/or in-person with food, swags and prizes.

 

 

Data Science Articles & Videos

 
  • Will It Scale? Applying Data, Science, and Economics to the Art of Ideas
    In this interview, John List [chief economist at Walmart & professor of economics at the University of Chicago] discusses some of his new book’s themes [The Voltage Effect: How to Make Good Ideas Great and Great Ideas Scale], such as the importance of knowing when to quit or pivot, and how practicing the science of scaling can help ensure an idea’s success. He also shares his thoughts on the relationship between economics and technology, the state of behavioral economics and data science, and the prospect of using AI to reanimate promising, but previously unsuccessful ideas...
  • Data Science at Stitch Fix
    Podcast Interview with Olivia Liao, Senior Director of Data Science at Stitch Fix, a company that uses data science and expert stylists to deliver personalization at scale. We discuss how they blend data science and domain expertise, how they tune recommendations in light of logistics and supply chain constraints, and how they incorporate new developments in large language models, multimodal models and Responsible AI....
  • Creating Confidence Intervals for Machine Learning Classifiers
    This article outlines different methods for creating confidence intervals for machine learning models. Note that these methods also apply to deep learning...it’s worth highlighting that the big picture is to measure and report uncertainty. Confidence intervals are one way to do that. However, It is also helpful to include the average performance over different dataset splits or random seeds with the variance or standard deviation – I sometimes adopt this simpler approach as it is more straightforward to explain. But since this article is about confidence intervals, let’s define what they are and how we can construct them....
  • It’s Our Moral Obligation to Make Data More Accessible
    Most of the world’s data is sitting on a shelf, being used in a very narrow domain. This data, if properly activated, could solve some of the world’s biggest problems and lead to more health, happiness, and love for society. We could use this data to uncover some of society’s biggest secrets...Like Marc Andreessen’s piece, It’s Time to Build, this piece is a full-throated argument to massively increase the accessibility of data. And we need to do it now...
  • The StatQuest Introduction to PyTorch [Video]
    PyTorch is one of the most popular tools for making Neural Networks. This StatQuest walks you through a simple example of how to use PyTorch one step at a time. By the end of this StatQuest, you'll know how to create a new neural network from scratch, make predictions and graph the output, and optimize a parameter using backpropagation. BAM!!!...
  • An arxiv-sanity-like view of ICLR 2022 papers
    Hi, I am a fan of www.arxiv-sanity.com and like to have similar summaries for conference papers. I have ordered all ICLR2022 papers by rating and created 8-page thumbnails. With ICLR2022 now in full swing, the project can be useful in getting a quick overview of the accepted publications...
  • Specification gaming: the flip side of AI ingenuity
    Specification gaming is a behaviour that satisfies the literal specification of an objective without achieving the intended outcome...This problem arises in the design of artificial agents. For example, a reinforcement learning agent can find a shortcut to getting lots of reward without completing the task as intended by the human designer. These behaviours are common, and we have collected around 60 examples so far (aggregating existing lists and ongoing contributions from the AI community)...In this post, we review possible causes for specification gaming, share examples of where this happens in practice, and argue for further work on principled approaches to overcoming specification problems...
  • More Than Meets the Eye: A Closer Look at Encodings in Visualization
    Encodings play a central role in visualization, but I believe our thinking about them is too simplistic. In a new paper, I argue that we need to distinguish between the encodings that specify how a visualization is drawn and the ones that are readable or actually read by an observer. While they largely or entirely overlap in some charts (like bar charts or scatterplots) they don’t in others (pie charts, line charts, etc.). And what exactly do you even specify in more complex visualizations like treemaps?...
  • What is the value of the p-value? [Slides from the talk]
    The debate over the value and interpretation of p-value has endured since the time of its inception nearly 100 years ago. The use and interpretation of p-values vary by a host of factors, especially by discipline. These differences have proven to be a barrier when developing and implementing boundary-crossing clinical and translational science. The purpose of this panel discussion is to discuss misconceptions, debates, and alternatives to the p-value...
  • Compact word vectors with Bloom embeddings
    A high-coverage word embedding table will usually be quite large. One million 32-bit floats occupies 4MB of memory, so one million 300-dimension vectors will be 1.2GB in size. Such a large model size is at least annoying for many applications, while for others it’s completely prohibitive...Probabilistic data structures are a natural fit for machine learning models, so they’re quite widely used. However, they’re definitely unintuitive, which is why we refer to this solution [using a probabilistic data structure ] as a “cheat”. We’ll start by introducing the full algorithm, without dwelling too long on why it works. We’ll then go back and fill in more of the intuition, and then describe how we use it in practice in Thinc, spaCy and floret...
 
 

Conference*

 



Join us at apply(), the ML data engineering conference - it’s free.

Speakers include practitioners from the Wikimedia Foundation, Facebook, Gojek, Snapchat, Instacart, Walmart, Stripe, Uber, Volvo, Snowflake, Databricks, and more. We’d love for you to join us.

Agenda highlights:
  • Smitha Shyam, Director of Engineering at Uber: Uber's Michelangelo: Then and Now
  • Chris Albon, Director of Machine Learning at Wikimedia Foundation: More Ethical Machine Learning Using Model Card at Wikimedia
  • Matei Zaharia, Co-Founder and Chief Technologist at Databricks: The Future of Data for Machine Learning
  • Chip Huyen, Co-Founder at Claypot AI: Machine Learning Platform for Online Prediction and Continual Learning
  • Clem Delangue, CEO at Hugging Face: Is Open-Source Machine Learning Becoming the Most Impactful Technology of the Decade?

See the full agenda and register for free.


*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

 
 

Jobs

 
  • Data Scientist - Hungryroot - Remote

    Hungryroot is looking for a Data Scientist to join our growing Data Team. As a Data Scientist, you will work closely with other Data Scientists and Data Engineers to develop various Machine Learning models that power Hungryroot and it’s AI functions. These models include traditional forecasting models, as well as more industry-specific optimization challenges.

    As a Data Scientist at Hungryroot, you will work on answering questions like: how do you tell what food someone would like to eat this week, how do you determine whether they enjoyed it or not, maybe they liked their means last week, but are now looking for different options, maybe they like the same food on Tuesdays, but variety on Fridays, what about spicy food, is Green Chilly as spicy as Green Curry?

     

        Want to post a job here? Email us for details --> team@datascienceweekly.org

 
 

Training & Resources

 
  • Writing production grade code for ML in python [Reddit Discussion]
    I have been interviewing for a machine learning lead position. I have successfully passed 3 interview rounds (coding , HR, system design). I have my final interview with the VP of Engineering. When asked how best to prepare myself, they said they would like to test my ability to write "production quality" code in python. While I do have some experience, the downside is I worked in small R&D teams for a long time. Though I am knowledgeable in python, perhaps, I might have not followed all the industry best practices...If you are a hiring manager or interviewer, how would you test this ability? How do I prepare myself to prove my ability to write production grade code?...
  • Parametric vs. Non-parametric tests, and when to use them
    Too often the statistical underpinnings of the data science community are overlooked. I’ve been lucky enough to have had both undergraduate and graduate courses dedicated solely to statistics, in addition to growing up with a statistician for a mother. So this article is what will likely be the first of several to share some basic statistical tests and when/where to use them!...A parametric test makes assumptions about a population’s parameters...
  • Mathematical Foundations of Monte Carlo Methods
    We will try to give a sense of what these Monte Carlo methods are, how they work, why, and what they are used for. This quick introduction, is for readers who do not have the time or the desire to get any further. But you may need to read all the remaining chapters if you are serious about learning what these methods are...This lesson is more an introduction to the mathematical tools upon which the Monte Carlo methods are built. The methods themselves are explained in the next lesson (Monte Carlo Methods in Practice)...
 
 

Books

 

 
  • Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits


    Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
     


    P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2022 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Older messages

Data Science Weekly - Issue 439

Thursday, April 21, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #439 April 21 2022 Editor Picks Real

Data Science Weekly - Issue 437

Thursday, April 7, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #437 April 07 2022 Editor Picks

Data Science Weekly - Issue 436

Thursday, March 31, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #436 March 31 2022 Editor Picks Stop

Data Science Weekly - Issue 435

Friday, March 25, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #435 March 24 2022 Editor Picks

Data Science Weekly - Issue 434

Thursday, March 17, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #434 March 17 2022 Editor Picks A Deep

You Might Also Like

📧 Building Async APIs in ASP.NET Core - The Right Way

Saturday, November 23, 2024

​ Building Async APIs in ASP .NET Core - The Right Way Read on: m​y website / Read time: 5 minutes The .NET Weekly is brought to you by: Even the smartest AI in the world won't save you from a

WebAIM November 2024 Newsletter

Friday, November 22, 2024

WebAIM November 2024 Newsletter Read this newsletter online at https://webaim.org/newsletter/2024/november Features Using Severity Ratings to Prioritize Web Accessibility Remediation When it comes to

➡️ Why Your Phone Doesn't Want You to Sideload Apps — Setting the Default Gateway in Linux

Friday, November 22, 2024

Also: Hey Apple, It's Time to Upgrade the Macs Storage, and More! How-To Geek Logo November 22, 2024 Did You Know Fantasy author JRR Tolkien is credited with inventing the main concept of orcs and

JSK Daily for Nov 22, 2024

Friday, November 22, 2024

JSK Daily for Nov 22, 2024 View this email in your browser A community curated daily e-mail of JavaScript news React E-Commerce App for Digital Products: Part 4 (Creating the Home Page) This component

Spyglass Dispatch: The Fate of Chrome • Amazon Tops Up Anthropic • Pros Quit Xitter • Brave Powers AI Search • Apple's Lazy AI River • RIP Enrique Allen

Friday, November 22, 2024

The Fate of Chrome • Amazon Tops Up Anthropic • Pros Quit Xitter • Brave Powers AI Search • Apple's Lazy AI River • RIP Enrique Allen The Spyglass Dispatch is a free newsletter sent out daily on

Charted | How the Global Distribution of Wealth Has Changed (2000-2023) 💰

Friday, November 22, 2024

This graphic illustrates the shifts in global wealth distribution between 2000 and 2023. View Online | Subscribe | Download Our App Presented by: MSCI >> Get the Free Investor Guide Now FEATURED

Daily Coding Problem: Problem #1616 [Easy]

Friday, November 22, 2024

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Alibaba. Given an even number (greater than 2), return two prime numbers whose sum will

The problem to solve

Friday, November 22, 2024

​ Use problem framing to define the problem to solve This week, Tom Parson and Krishna Raha share tools and frameworks to identify and address challenges effectively, while Voltage Control highlights

Issue #568: Random mazes, train clock, and ReKill

Friday, November 22, 2024

View this email in your browser Issue #568 - November 22nd 2024 Weekly newsletter about Web Game Development. If you have anything you want to share with our community please let me know by replying to

Whats Next for AI: Interpreting Anthropic CEOs Vision

Friday, November 22, 2024

Top Tech Content sent at Noon! How the world collects web data Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, November 22, 2024? The HackerNoon