Data Science Weekly - Data Science Weekly - Issue 439

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #439

April 21 2022

Editor Picks

 
  • Real World Recommendation System - Part 1
    Training a collaborative filtering based recommendation system on a toy dataset is a sophomore year project in colleges these days. But where the rubber meets the road is building such a system at scale, deploying in production, and serving live requests within a few hundred milliseconds while the user is waiting for the page to load. To build a system like this, engineers have to make decisions spanning multiple moving layers like...
  • Advances in Understanding, Improving, and Applying Contrastive Learning
    Contrastive learning has emerged as a powerful method for training ML models. In this series of three blog posts, we’ll discuss recent advances in understanding the mechanisms behind contrastive learning. We’ll see how we can use those insights to get better learned representations out of supervised contrastive learning, and see how we can apply contrastive learning to improve long-tailed entity retrieval...
 
 

A Message from this week's Sponsor:

 



Retool is the fast way to build an interface for any database

With Retool, you don't need to be a developer to quickly build an app or dashboard on top of any data set. Data teams at companies like NBC use Retool to build any interface on top of their data—whether it's a simple read-write visualization or a full-fledged ML workflow.

Drag and drop UI components—like tables and charts—to create apps. At every step, you can jump into the code to define the SQL queries and JavaScript that power how your app acts and connects to data. The result—less time on repetitive work and more time to discover insights.

 

 

Data Science Articles & Videos

 
  • Faking It: How to Simulate Complex Data Generation Processes in R, Tidyverse Edition
    Data simulation is easily near the top of the long list of useful skills that are seldom taught in social science graduate programs. This is unfortunate given the central role of simulation in model checking, sensitivity analysis, and developing a basic understanding of modeling assumptions and often complex relationships between the phenomena social scientists aspire to understand. My aim in this blog post is thus to provide a basic introduction to data simulation and parameter recovery in R for cross-sectional time series and non-nested data structures commonly encountered in political science and international relations...
  • Ever wondered how the probability of the null hypothesis being true changes given a significant result?
    In a recently accepted paper...we discuss how, using Bayes' rule, one can explore the change in the probability of a null hypothesis being true (call it theta) when you get a significant effect. The paper...shows that theta does not necessarily change much even if you get a significant result. The probability theta can change dramatically under certain conditions, but those conditions are either so stringent or so trivial that it renders many of the significance-based conclusions in psychology and psycholinguistics questionable at the very least...You can do your own simulations...using this shiny app below...
  • All the talks and the Q&As from the #Outlier2022 Data Viz Conference
    All the curated talks, lighting talks and the Q&As from the 2022 edition of the Outlier conference...For all the #dataviz enthusiasts out there. Bookmark this playlist by the @DataVizSociety and @OutlierConf. It contains all the curated talks, lighting talks and the Q&As from the #Outlier2022! #datajournalism...
  • Bad ML Abstractions I (Generative vs Discriminative Models)
    This post is part of a series on bad abstractions in machine learning...Bad Abstraction: There are two types of machine learning models. Discriminative models are trained to separate inputs into classes, while generative models learn a distribution from which they can draw new samples...These two categories are not actually distinct...
  • A Robot Web for Distributed Many-Device Localisation
    We show that a distributed network of robots or other devices which make measurements of each other can collaborate to globally localise via efficient ad-hoc peer to peer communication. Our Robot Web solution is based on Gaussian Belief Propagation on the fundamental non-linear factor graph describing the probabilistic structure of all of the observations robots make internally or of each other, and is flexible for any type of robot, motion or sensor...
  • Probability Distributions To Be Aware Of For Data Science (With Code)
    Knowing the distribution of data helps us better model the world around us. It helps us to determine the likeliness of various outcomes, or make an estimate of the variability of an occurrence. All of this makes knowing different probability distributions extremely valuable in data science & machine learning...In this article, we are going to cover a few distributions and share some Python code to display them visually...
  • A Tour of Visualization Techniques for Computer Vision Datasets
    We survey a number of data visualization techniques for analyzing Computer Vision (CV) datasets. These techniques help us understand properties and latent patterns in such data, by applying dataset-level analysis. We present various examples of how such analysis helps predict the potential impact of the dataset properties on CV models and informs appropriate mitigation of their shortcomings. Finally, we explore avenues for further visualization techniques of different modalities of CV datasets as well as ones that are tailored to support specific CV tasks and analysis needs...
  • On NYT Magazine on AI: Resist the Urge to be Impressed
    On April 15, 2022, Steven Johnson published a piece in the New York Times Magazine entitled “A.I. Is Mastering Language. Should We Trust What It Says?”...Emily M. Bender, Professor, Linguistics, University of Washington, unpacks a recent NYT Magazine article on the future of AI and language models...
  • The Distributed Information Bottleneck reveals the explanatory structure of complex systems
    The fruits of science are relationships made comprehensible, often by way of approximation. While deep learning is an extremely powerful way to find relationships in data, its use in science has been hindered by the difficulty of understanding the learned relationships. The Information Bottleneck (IB) is an information theoretic framework for understanding a relationship between an input and an output in terms of a trade-off between the fidelity and complexity of approximations to the relationship. Here we show that a crucial modification -- distributing bottlenecks across multiple components of the input -- opens fundamentally new avenues for interpretable deep learning in science...
  • Comprehensive Guide to GitHub for Data Scientists
    The purpose behind this article is to give data scientists / analysts (or any non engineering focused individual) the run down on how to use GitHub and what best practices to adhere too. The tutorial will consist of a combination guidelines using the UI and command line (terminal). The naming convention for Git commands are consistent across the platforms provided by GitHub so the skills should be exchangeable if you prefer to use Github desktop or GitLab instead of the web UI or command line. The following is the outline for the article...
 
 

Summit*

 



You're invited to the first-ever Metrics Store Summit

Transform is hosting the first-ever industry summit on the metrics layer. The first-ever Metrics Store Summit on April 26, 2022 will bring discussions around the semantic layer into one event—providing context with use cases for metrics stores, highlighting applications for metrics, and sharing ideas from leaders across the modern data stack.You can expect to hear from Airbnb, Slack, Spotify, Atlan, Hex, Mode, Hightouch, AtScale and many more in this action-packed 1-day event. We would love to see you there! Register today for free.


*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

 
 

Jobs

 
  • Data Scientist - Hungryroot - Remote

    Hungryroot is looking for a Data Scientist to join our growing Data Team. As a Data Scientist, you will work closely with other Data Scientists and Data Engineers to develop various Machine Learning models that power Hungryroot and it’s AI functions. These models include traditional forecasting models, as well as more industry-specific optimization challenges.

    As a Data Scientist at Hungryroot, you will work on answering questions like: how do you tell what food someone would like to eat this week, how do you determine whether they enjoyed it or not, maybe they liked their means last week, but are now looking for different options, maybe they like the same food on Tuesdays, but variety on Fridays, what about spicy food, is Green Chilly as spicy as Green Curry?

     

        Want to post a job here? Email us for details --> team@datascienceweekly.org

 
 

Training & Resources

 
  • R Graphics Cookbook, 2nd edition
    Welcome to the R Graphics Cookbook, a practical guide that provides more than 150 recipes to help you generate high-quality graphs quickly, without having to comb through all the details of R’s graphing systems. Each recipe tackles a specific problem with a solution you can apply to your own project, and includes a discussion of how and why the recipe works...Read online here for free, or buy a physical copy...
  • What are Diffusion Models? [Video]
    This short tutorial covers the basics of diffusion models, a simple yet expressive approach to generative modeling. They've been behind a recent string of impressive results, including OpenAI's DALL-E 2...
 
 

Books

 

 
  • Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits


    Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
     


    P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2022 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Key phrases

Older messages

Data Science Weekly - Issue 437

Thursday, April 7, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #437 April 07 2022 Editor Picks

Data Science Weekly - Issue 436

Thursday, March 31, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #436 March 31 2022 Editor Picks Stop

Data Science Weekly - Issue 435

Friday, March 25, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #435 March 24 2022 Editor Picks

Data Science Weekly - Issue 434

Thursday, March 17, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #434 March 17 2022 Editor Picks A Deep

Data Science Weekly - Issue 433

Friday, March 11, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #433 March 10 2022 Editor Picks Deep

You Might Also Like

Bringing PGO to the build pipeline

Tuesday, April 23, 2024

Plus how Go grew at Google, cmp.Or, and ways to visualize makefiles, Go binaries, and live Go processes. | #​504 — April 23, 2024 Unsub | Web Version Together with Three Dots Labs Go Weekly How Dolt

Noonification: Leetcode: Two-sum an Intuitive Approach

Tuesday, April 23, 2024

Top Tech Content sent at Noon! Get Algolia: AI Search that understands How are you, @newsletterest1? 🪐 What's happening in tech this week: The Noonification by HackerNoon has got you covered with

The best AI chatbot for coding

Tuesday, April 23, 2024

9 video gadget must-haves; 6 things Linux should borrow from MacOS -- ZDNET ZDNET Tech Today - US April 23, 2024 placeholder Can Meta AI code? I tested it against Llama, Gemini and ChatGPT - it wasn

Do I get to put your AI idea in front of 100K people?

Tuesday, April 23, 2024

If you build something great, I want to tell the world about it ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

UnitedHealth breach may affect huge portion of US citizens

Tuesday, April 23, 2024

The Change Healthcare ransomware attack has led to a massive leak of US healthcare data View this email online in your browser By Alex Wilhelm Tuesday, April 23, 2024 Good morning, and welcome to

LW 130 - Building a Product Configurator

Tuesday, April 23, 2024

Building a Product Configurator Shopify Development news and articles Issue 130 - 04/23/2024 Read Online Liquid Weekly All Things Shopify Development How to Sell Personalized Products on Shopify 2024 -

New public workshop in June: architecting for fast flow

Tuesday, April 23, 2024

Get the early bird discount You are receiving this email because you subscribed to the microservices.io mailing list. Helping organizations accelerate software delivery I provide consulting and

Pnpm v9.0.0; Biome v1.7; ESLint v9.1.0; Node.js collaboration summit; Intl.Segmenter; tree shaking;

Tuesday, April 23, 2024

We have 9 links for you - Stay up-to-date on JavaScript and tools WorkOS, the modern API for auth and user identity. workos.com Sponsor WorkOS enables B2B SaaS companies to accelerate enterprise

New on VC+: Our Visual Briefing on the IMF's World Economic Outlook Report 🔮

Tuesday, April 23, 2024

We've compiled a visual analysis of the most important takeaways from IMF's latest report. View email in browser EXCLUSIVE PREVIEW Upcoming on VC+: Our Key Takeaways from IMF's World

Meta teases a limited-edition, Xbox-inspired Quest headset

Tuesday, April 23, 2024

The Morning After It's Tuesday, April 23, 2024. Meta announced it's opening up the Quest's operating system to third-party companies, allowing them to build headsets of their own. The Quest