Data Science Weekly - Data Science Weekly - Issue 424

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #424

January 06 2022

Editor Picks
 
  • Managing the First Year - Thoughts on being a new data science manager
    It was during my second week that I met with my manager and understood that no, I’d been hired to replace her as the team’s manager...For the next 18 months I stayed in that role, directly managing a team of 4-8 data scientists. That time was a firehose of learning – some lessons I had sought out, and others than landed on my head without invitation...I’m not a management expert, but I did try really hard during my first year managing, and I’ve since spent time digesting the experience. My hope is that others will find a few of the things I learned useful when they’re at the start of their own management journey...
  • Chatbots: Still Dumb After All These Years
    Intelligence is more than statistically appropriate responses...I posed this commonsense question: "Is it safe to walk downstairs backwards if I close my eyes?"...Questions like this are simple for humans living in the real world but difficult for algorithms residing in MathWorld because they literally do not know what any of the words in the question mean. GPT-3’s answer was authoritative, confusing, and contradictory...
  • Real-time machine learning: challenges and solutions
    A year ago, I wrote a post on how machine learning is going real-time. The post must have captured many data scientists’ pain points because, after the post, many companies reached out to me sharing their pain points and discussing how to move their pipelines real time...In the last year, I’ve talked to ~30 companies in different industries about their challenges with real-time machine learning. I’ve also worked with quite a few to find the solutions. This post outlines the solutions for (1) online prediction and (2) continual learning, with step-by-step use cases, considerations, and technologies required for each level...
 
 

A Message from this week's Sponsor:

 



Free Course: Natural Language Processing (NLP) for Semantic Search

Learn how to build semantic search applications by making machines understand language as people do. This free course covers everything you need to build state-of-the-art language models, from machine translation to question-answering, and more. Brought to you by Pinecone. Start reading now.

 

 

Data Science Articles & Videos

 
  • Real-World Machine Learning Research To Production [Video]
    In this talk, Austin Huang (Vice President, AI & Machine Learning, Fidelity) explains how machine learning use cases have changed - evolving from batch prediction pipelines to real-time consumers of unstructured data. These use cases have also given rise to new opportunities for innovation in model development. Whereas in the past machine learning projects were often impeded by the availability of labeled data, we share examples of programmatic data generation such as simulation and distillation. Finally, we discuss human interfaces to machine learning models - highlighting considerations such as inference latency and aligning model architectures with user experience integration...
  • Defining AI in Policy versus Practice
    With an eye towards practical working definitions and a broader understanding of positions on these issues, we survey experts and review published policy documents to examine researcher and policy-maker conceptions of AI. We find that while AI researchers favor definitions of AI that emphasize technical functionality, policy-makers instead use definitions that compare systems to human thinking and behavior...
  • Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering
    Through systematic ablation experiments and qualitative visualizations, we verify that collective outliers are a general phenomenon responsible for degrading pool-based active learning. Notably, we show that active learning sample efficiency increases significantly as the number of collective outliers in the active learning pool decreases. We conclude with a discussion and prescriptive recommendations for mitigating the effects of these outliers in future work...
  • Neural Network From Scratch
    In this edition of Napkin Math, we'll invoke the spirit of the Napkin Math series to establish a mental model for how a neural network works by building one from scratch. In a future issue we will do napkin math on performance, as establishing the first-principle understanding is plenty of ground to cover for today!...
  • Cogram.ai: A Coding Assistant for Data Science and Machine Learning
    Since the publication and dissemination of GPT-3, coding assistants like Github copilot, powered by OpenAi’s codex API have been on the radar of the machine learning community for quite a while. Recently, I came across this tool called Cogram, which seems to be a type of evolution of autocompletion, specialized for data science and machine learning that runs directly on Jupyter Notebooks. In this article, I will show you how this tool works and share a little bit of my experience with it so far, generating machine learning code on Jupyter Notebooks...
  • “My data drifted. What’s next?” How to handle ML model drift in production.
    I have a model in production, and the data is drifting. How to react?”...This data drift might be the only signal. You are predicting something, but don’t know the facts yet. Statistical change in model inputs and outputs is the proxy. The data has shifted, and you suspect a decay in the model performance...In other cases, you can know it for sure. You can calculate the model quality or business metrics. Accuracy, mean error, fraud rates, you name it. The performance got worse, and the data is different, too...What can you do next?...Here is an introductory overview of the possible steps...
  • Bayesian Statistics Overview and your first Bayesian Linear Regression Model
    A brief recap of Bayesian Learning followed by implementation of a Bayesian Linear Regression Model on NYC Airbnb open dataset...When I first started researching about this, I had many questions like, when is it beneficial to use Bayesian, how does the output differ from its non-Bayesian counterpart (Frequentist), how to define prior distribution, are there existing libraries in python for estimating posterior distribution, etc. I attempt to answer all these questions in this post, while keeping it brief...
  • Building models in JAX - Part 1
    I am starting a whole new series of tutorials where we will learn about the existing methods of building models in JAX. In this tutorial, we are going to build an image classifier purely in JAX. Here is the list of things that we will cover in this notebook: 1) Use the Cifar-10 dataset for training the classifier, 2) Build a classifier purely in JAX using no library other than JAX, 3) Data augmentation purely in JAX, 4) Create a custom training/testing loop in the most simplified manner, and 5) Discuss the pros and cons of this approach...
  • Effective Testing for Machine Learning (Part II)
    A progressive, step-by-step framework for developing robust ML projects...In this series’s first part, we started with a simple smoke testing strategy to ensure our code runs on every git push. Then, we built on top of it to ensure that our feature generation pipeline produced data with a minimum level of quality (integration tests) and verified the correctness of our data transformations (unit tests)...Now, we’ll add more robust tests: distribution changes, ensure that our training and serving logic is consistent, and check that our pipeline produces high-quality models...
  • The Magic of Integrating Factor
    One of the many techniques for solving ordinary differential equations involves using an integrating factor. An integrating factor is a function that we multiply a differential equation with to simplify it and make it integrable. It almost appears to work like magic!...
 
 

Tools*

 



High quality data labeling, consistently

Edge cases are the most common challenges that ML teams face when training their AI models, making it difficult to reach 95+% accuracy. This can be more complex once you need to scale and start working with 3rd party data labeling solutions. The evaluation metrics that we use to measure the quality of labeled data - Intersection over Union (IOU) and F1 score - has allowed us to make swift adjustments on the go and continuously improve the quality of our labeling standards. To find out more and start exploring our end-to-end data labeling service, speak to the team at Supahands today.

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

 
 

Jobs

 
  • (Senior) Analytics Engineer - Fabulous - Remote

    Fabulous is a mobile app helping thousands of people every day to change their lifestyles by integrating healthy habits into their lives. Fabulous is using a behavioral economics lens to help everyone achieve their fullest potential. We work closely with researchers based at Duke University and our advisor is Dan Ariely, author of NYT bestseller Predictably Irrational. We are looking for an experienced Analytics Engineer to consolidate the Data Science team and lead the development and enrichment of our Data Pipelines. We have a modern Data-Stack based on Fivetran, dbt, BigQuery, Amplitude, Metabase...

        Want to post a job here? Email us for details >> team@datascienceweekly.org

 
 

Training & Resources

 
  • ISLR tidymodels Labs
    This book aims to be a complement to the 2nd version An Introduction to Statistical Learning book with translations of the labs into using the tidymodels set of packages...The labs will be mirrored quite closely to stay true to the original material...
  • Deep Learning Interviews: Hundreds of fully solved job interview questions from a wide range of key topics in AI
    The second edition of Deep Learning Interviews is home to hundreds of fully-solved problems, from a wide range of key topics in AI. It is designed to both rehearse interview or exam specific topics and provide machine learning MSc / PhD. students, and those awaiting an interview a well-organized overview of the field. The problems it poses are tough enough to cut your teeth on and to dramatically improve your skills-but they're framed within thought-provoking questions and engaging stories...
 
 

Books

 

 
  • Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits


    Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
     


    P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2021 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Older messages

Data Science Weekly - Issue 423

Friday, December 31, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #423 December 30 2021 Editor Picks 2021:

Data Science Weekly - Issue 422

Friday, December 24, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #422 December 23 2021 Editor Picks

[in case you missed it] Data Science Weekly - Issue 421

Sunday, December 19, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #421 December 16 2021 Editor Picks Lee

Data Science Weekly - Issue 421

Friday, December 17, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #421 December 16 2021 Editor Picks Lee

Data Science Weekly - Issue 420

Friday, December 10, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #420 December 09 2021 Editor Picks D3

You Might Also Like

🔋 Why You Need More Than One Power Bank — Things We Want to See in Windows 12

Monday, May 6, 2024

Also: 7 Samsung Messages Features You Should Be Using, and More! How-To Geek Logo May 6, 2024 Did You Know You can find all manner of canned vegetables, but not broccoli: the temperatures required for

Launch pad decongestion

Monday, May 6, 2024

We've got some very cool news from Hubble Networks, which became the first company to connect a Bluetooth chip to a satellite. View this email online in your browser By Aria Alamalhodaei Monday,

Daily Coding Problem: Problem #1433 [Medium]

Monday, May 6, 2024

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Nest. Create a basic sentence checker that takes in a stream of characters and

Want to become an AI consultant?

Monday, May 6, 2024

My take on this new industry ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

Visualized | Interest Rate Forecasts for Advanced Economies 📈📉

Monday, May 6, 2024

In this graphic, we show the IMF's interest rate forecast for the US, Europe, the UK, and Japan for the next five years ahead. View Online | Subscribe Presented by Voronoi: The App Where Data Tells

⚙️ Apple AI updates

Monday, May 6, 2024

Plus: X AI stories & YouTube "skip to the good part" ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

Unlock Time Series Data, FTC Chair Joins StrictlyVC & More

Monday, May 6, 2024

TechCrunch Events Roundup | May 6 TechCrunch Events TechCrunch events roundup Unlock the power of time series data with industry experts from AWS and InfluxDB on May 16. Join us next week for this free

Deepdive – product strategy, AI, leadership, emotional intelligence

Monday, May 6, 2024

Earlier this month, we presented our Virtual edition of INDUSTRY: The Product Conference, featuring some of our favorite product leaders worldwide. There were seven great keynote presentations, live

Noonification: The Human Roots of Rising Fascism

Monday, May 6, 2024

Top Tech Content sent at Noon! Get Algolia: AI Search that understands How are you, @newsletterest1? 🪐 What's happening in tech today, May 6, 2024? The HackerNoon Newsletter brings the HackerNoon

Code Story - Apr 24

Monday, May 6, 2024

Welcome to the April addition of the podcast newsletter. Please enjoy, and check out any of the episodes you might have missed below. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏