Data Science Weekly - Data Science Weekly - Issue 439

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #439

April 21 2022

Editor Picks

 
  • Real World Recommendation System - Part 1
    Training a collaborative filtering based recommendation system on a toy dataset is a sophomore year project in colleges these days. But where the rubber meets the road is building such a system at scale, deploying in production, and serving live requests within a few hundred milliseconds while the user is waiting for the page to load. To build a system like this, engineers have to make decisions spanning multiple moving layers like...
  • Advances in Understanding, Improving, and Applying Contrastive Learning
    Contrastive learning has emerged as a powerful method for training ML models. In this series of three blog posts, we’ll discuss recent advances in understanding the mechanisms behind contrastive learning. We’ll see how we can use those insights to get better learned representations out of supervised contrastive learning, and see how we can apply contrastive learning to improve long-tailed entity retrieval...
 
 

A Message from this week's Sponsor:

 



Retool is the fast way to build an interface for any database

With Retool, you don't need to be a developer to quickly build an app or dashboard on top of any data set. Data teams at companies like NBC use Retool to build any interface on top of their data—whether it's a simple read-write visualization or a full-fledged ML workflow.

Drag and drop UI components—like tables and charts—to create apps. At every step, you can jump into the code to define the SQL queries and JavaScript that power how your app acts and connects to data. The result—less time on repetitive work and more time to discover insights.

 

 

Data Science Articles & Videos

 
  • Faking It: How to Simulate Complex Data Generation Processes in R, Tidyverse Edition
    Data simulation is easily near the top of the long list of useful skills that are seldom taught in social science graduate programs. This is unfortunate given the central role of simulation in model checking, sensitivity analysis, and developing a basic understanding of modeling assumptions and often complex relationships between the phenomena social scientists aspire to understand. My aim in this blog post is thus to provide a basic introduction to data simulation and parameter recovery in R for cross-sectional time series and non-nested data structures commonly encountered in political science and international relations...
  • Ever wondered how the probability of the null hypothesis being true changes given a significant result?
    In a recently accepted paper...we discuss how, using Bayes' rule, one can explore the change in the probability of a null hypothesis being true (call it theta) when you get a significant effect. The paper...shows that theta does not necessarily change much even if you get a significant result. The probability theta can change dramatically under certain conditions, but those conditions are either so stringent or so trivial that it renders many of the significance-based conclusions in psychology and psycholinguistics questionable at the very least...You can do your own simulations...using this shiny app below...
  • All the talks and the Q&As from the #Outlier2022 Data Viz Conference
    All the curated talks, lighting talks and the Q&As from the 2022 edition of the Outlier conference...For all the #dataviz enthusiasts out there. Bookmark this playlist by the @DataVizSociety and @OutlierConf. It contains all the curated talks, lighting talks and the Q&As from the #Outlier2022! #datajournalism...
  • Bad ML Abstractions I (Generative vs Discriminative Models)
    This post is part of a series on bad abstractions in machine learning...Bad Abstraction: There are two types of machine learning models. Discriminative models are trained to separate inputs into classes, while generative models learn a distribution from which they can draw new samples...These two categories are not actually distinct...
  • A Robot Web for Distributed Many-Device Localisation
    We show that a distributed network of robots or other devices which make measurements of each other can collaborate to globally localise via efficient ad-hoc peer to peer communication. Our Robot Web solution is based on Gaussian Belief Propagation on the fundamental non-linear factor graph describing the probabilistic structure of all of the observations robots make internally or of each other, and is flexible for any type of robot, motion or sensor...
  • Probability Distributions To Be Aware Of For Data Science (With Code)
    Knowing the distribution of data helps us better model the world around us. It helps us to determine the likeliness of various outcomes, or make an estimate of the variability of an occurrence. All of this makes knowing different probability distributions extremely valuable in data science & machine learning...In this article, we are going to cover a few distributions and share some Python code to display them visually...
  • A Tour of Visualization Techniques for Computer Vision Datasets
    We survey a number of data visualization techniques for analyzing Computer Vision (CV) datasets. These techniques help us understand properties and latent patterns in such data, by applying dataset-level analysis. We present various examples of how such analysis helps predict the potential impact of the dataset properties on CV models and informs appropriate mitigation of their shortcomings. Finally, we explore avenues for further visualization techniques of different modalities of CV datasets as well as ones that are tailored to support specific CV tasks and analysis needs...
  • On NYT Magazine on AI: Resist the Urge to be Impressed
    On April 15, 2022, Steven Johnson published a piece in the New York Times Magazine entitled “A.I. Is Mastering Language. Should We Trust What It Says?”...Emily M. Bender, Professor, Linguistics, University of Washington, unpacks a recent NYT Magazine article on the future of AI and language models...
  • The Distributed Information Bottleneck reveals the explanatory structure of complex systems
    The fruits of science are relationships made comprehensible, often by way of approximation. While deep learning is an extremely powerful way to find relationships in data, its use in science has been hindered by the difficulty of understanding the learned relationships. The Information Bottleneck (IB) is an information theoretic framework for understanding a relationship between an input and an output in terms of a trade-off between the fidelity and complexity of approximations to the relationship. Here we show that a crucial modification -- distributing bottlenecks across multiple components of the input -- opens fundamentally new avenues for interpretable deep learning in science...
  • Comprehensive Guide to GitHub for Data Scientists
    The purpose behind this article is to give data scientists / analysts (or any non engineering focused individual) the run down on how to use GitHub and what best practices to adhere too. The tutorial will consist of a combination guidelines using the UI and command line (terminal). The naming convention for Git commands are consistent across the platforms provided by GitHub so the skills should be exchangeable if you prefer to use Github desktop or GitLab instead of the web UI or command line. The following is the outline for the article...
 
 

Summit*

 



You're invited to the first-ever Metrics Store Summit

Transform is hosting the first-ever industry summit on the metrics layer. The first-ever Metrics Store Summit on April 26, 2022 will bring discussions around the semantic layer into one event—providing context with use cases for metrics stores, highlighting applications for metrics, and sharing ideas from leaders across the modern data stack.You can expect to hear from Airbnb, Slack, Spotify, Atlan, Hex, Mode, Hightouch, AtScale and many more in this action-packed 1-day event. We would love to see you there! Register today for free.


*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

 
 

Jobs

 
  • Data Scientist - Hungryroot - Remote

    Hungryroot is looking for a Data Scientist to join our growing Data Team. As a Data Scientist, you will work closely with other Data Scientists and Data Engineers to develop various Machine Learning models that power Hungryroot and it’s AI functions. These models include traditional forecasting models, as well as more industry-specific optimization challenges.

    As a Data Scientist at Hungryroot, you will work on answering questions like: how do you tell what food someone would like to eat this week, how do you determine whether they enjoyed it or not, maybe they liked their means last week, but are now looking for different options, maybe they like the same food on Tuesdays, but variety on Fridays, what about spicy food, is Green Chilly as spicy as Green Curry?

     

        Want to post a job here? Email us for details --> team@datascienceweekly.org

 
 

Training & Resources

 
  • R Graphics Cookbook, 2nd edition
    Welcome to the R Graphics Cookbook, a practical guide that provides more than 150 recipes to help you generate high-quality graphs quickly, without having to comb through all the details of R’s graphing systems. Each recipe tackles a specific problem with a solution you can apply to your own project, and includes a discussion of how and why the recipe works...Read online here for free, or buy a physical copy...
  • What are Diffusion Models? [Video]
    This short tutorial covers the basics of diffusion models, a simple yet expressive approach to generative modeling. They've been behind a recent string of impressive results, including OpenAI's DALL-E 2...
 
 

Books

 

 
  • Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits


    Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
     


    P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2022 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Older messages

Data Science Weekly - Issue 437

Thursday, April 7, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #437 April 07 2022 Editor Picks

Data Science Weekly - Issue 436

Thursday, March 31, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #436 March 31 2022 Editor Picks Stop

Data Science Weekly - Issue 435

Friday, March 25, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #435 March 24 2022 Editor Picks

Data Science Weekly - Issue 434

Thursday, March 17, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #434 March 17 2022 Editor Picks A Deep

Data Science Weekly - Issue 433

Friday, March 11, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #433 March 10 2022 Editor Picks Deep

You Might Also Like

📧 Building Async APIs in ASP.NET Core - The Right Way

Saturday, November 23, 2024

​ Building Async APIs in ASP .NET Core - The Right Way Read on: m​y website / Read time: 5 minutes The .NET Weekly is brought to you by: Even the smartest AI in the world won't save you from a

WebAIM November 2024 Newsletter

Friday, November 22, 2024

WebAIM November 2024 Newsletter Read this newsletter online at https://webaim.org/newsletter/2024/november Features Using Severity Ratings to Prioritize Web Accessibility Remediation When it comes to

➡️ Why Your Phone Doesn't Want You to Sideload Apps — Setting the Default Gateway in Linux

Friday, November 22, 2024

Also: Hey Apple, It's Time to Upgrade the Macs Storage, and More! How-To Geek Logo November 22, 2024 Did You Know Fantasy author JRR Tolkien is credited with inventing the main concept of orcs and

JSK Daily for Nov 22, 2024

Friday, November 22, 2024

JSK Daily for Nov 22, 2024 View this email in your browser A community curated daily e-mail of JavaScript news React E-Commerce App for Digital Products: Part 4 (Creating the Home Page) This component

Spyglass Dispatch: The Fate of Chrome • Amazon Tops Up Anthropic • Pros Quit Xitter • Brave Powers AI Search • Apple's Lazy AI River • RIP Enrique Allen

Friday, November 22, 2024

The Fate of Chrome • Amazon Tops Up Anthropic • Pros Quit Xitter • Brave Powers AI Search • Apple's Lazy AI River • RIP Enrique Allen The Spyglass Dispatch is a free newsletter sent out daily on

Charted | How the Global Distribution of Wealth Has Changed (2000-2023) 💰

Friday, November 22, 2024

This graphic illustrates the shifts in global wealth distribution between 2000 and 2023. View Online | Subscribe | Download Our App Presented by: MSCI >> Get the Free Investor Guide Now FEATURED

Daily Coding Problem: Problem #1616 [Easy]

Friday, November 22, 2024

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Alibaba. Given an even number (greater than 2), return two prime numbers whose sum will

The problem to solve

Friday, November 22, 2024

​ Use problem framing to define the problem to solve This week, Tom Parson and Krishna Raha share tools and frameworks to identify and address challenges effectively, while Voltage Control highlights

Issue #568: Random mazes, train clock, and ReKill

Friday, November 22, 2024

View this email in your browser Issue #568 - November 22nd 2024 Weekly newsletter about Web Game Development. If you have anything you want to share with our community please let me know by replying to

Whats Next for AI: Interpreting Anthropic CEOs Vision

Friday, November 22, 2024

Top Tech Content sent at Noon! How the world collects web data Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, November 22, 2024? The HackerNoon