Data Science Weekly - Data Science Weekly - Issue 424

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #424

January 06 2022

Editor Picks
 
  • Managing the First Year - Thoughts on being a new data science manager
    It was during my second week that I met with my manager and understood that no, I’d been hired to replace her as the team’s manager...For the next 18 months I stayed in that role, directly managing a team of 4-8 data scientists. That time was a firehose of learning – some lessons I had sought out, and others than landed on my head without invitation...I’m not a management expert, but I did try really hard during my first year managing, and I’ve since spent time digesting the experience. My hope is that others will find a few of the things I learned useful when they’re at the start of their own management journey...
  • Chatbots: Still Dumb After All These Years
    Intelligence is more than statistically appropriate responses...I posed this commonsense question: "Is it safe to walk downstairs backwards if I close my eyes?"...Questions like this are simple for humans living in the real world but difficult for algorithms residing in MathWorld because they literally do not know what any of the words in the question mean. GPT-3’s answer was authoritative, confusing, and contradictory...
  • Real-time machine learning: challenges and solutions
    A year ago, I wrote a post on how machine learning is going real-time. The post must have captured many data scientists’ pain points because, after the post, many companies reached out to me sharing their pain points and discussing how to move their pipelines real time...In the last year, I’ve talked to ~30 companies in different industries about their challenges with real-time machine learning. I’ve also worked with quite a few to find the solutions. This post outlines the solutions for (1) online prediction and (2) continual learning, with step-by-step use cases, considerations, and technologies required for each level...
 
 

A Message from this week's Sponsor:

 



Free Course: Natural Language Processing (NLP) for Semantic Search

Learn how to build semantic search applications by making machines understand language as people do. This free course covers everything you need to build state-of-the-art language models, from machine translation to question-answering, and more. Brought to you by Pinecone. Start reading now.

 

 

Data Science Articles & Videos

 
  • Real-World Machine Learning Research To Production [Video]
    In this talk, Austin Huang (Vice President, AI & Machine Learning, Fidelity) explains how machine learning use cases have changed - evolving from batch prediction pipelines to real-time consumers of unstructured data. These use cases have also given rise to new opportunities for innovation in model development. Whereas in the past machine learning projects were often impeded by the availability of labeled data, we share examples of programmatic data generation such as simulation and distillation. Finally, we discuss human interfaces to machine learning models - highlighting considerations such as inference latency and aligning model architectures with user experience integration...
  • Defining AI in Policy versus Practice
    With an eye towards practical working definitions and a broader understanding of positions on these issues, we survey experts and review published policy documents to examine researcher and policy-maker conceptions of AI. We find that while AI researchers favor definitions of AI that emphasize technical functionality, policy-makers instead use definitions that compare systems to human thinking and behavior...
  • Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering
    Through systematic ablation experiments and qualitative visualizations, we verify that collective outliers are a general phenomenon responsible for degrading pool-based active learning. Notably, we show that active learning sample efficiency increases significantly as the number of collective outliers in the active learning pool decreases. We conclude with a discussion and prescriptive recommendations for mitigating the effects of these outliers in future work...
  • Neural Network From Scratch
    In this edition of Napkin Math, we'll invoke the spirit of the Napkin Math series to establish a mental model for how a neural network works by building one from scratch. In a future issue we will do napkin math on performance, as establishing the first-principle understanding is plenty of ground to cover for today!...
  • Cogram.ai: A Coding Assistant for Data Science and Machine Learning
    Since the publication and dissemination of GPT-3, coding assistants like Github copilot, powered by OpenAi’s codex API have been on the radar of the machine learning community for quite a while. Recently, I came across this tool called Cogram, which seems to be a type of evolution of autocompletion, specialized for data science and machine learning that runs directly on Jupyter Notebooks. In this article, I will show you how this tool works and share a little bit of my experience with it so far, generating machine learning code on Jupyter Notebooks...
  • “My data drifted. What’s next?” How to handle ML model drift in production.
    I have a model in production, and the data is drifting. How to react?”...This data drift might be the only signal. You are predicting something, but don’t know the facts yet. Statistical change in model inputs and outputs is the proxy. The data has shifted, and you suspect a decay in the model performance...In other cases, you can know it for sure. You can calculate the model quality or business metrics. Accuracy, mean error, fraud rates, you name it. The performance got worse, and the data is different, too...What can you do next?...Here is an introductory overview of the possible steps...
  • Bayesian Statistics Overview and your first Bayesian Linear Regression Model
    A brief recap of Bayesian Learning followed by implementation of a Bayesian Linear Regression Model on NYC Airbnb open dataset...When I first started researching about this, I had many questions like, when is it beneficial to use Bayesian, how does the output differ from its non-Bayesian counterpart (Frequentist), how to define prior distribution, are there existing libraries in python for estimating posterior distribution, etc. I attempt to answer all these questions in this post, while keeping it brief...
  • Building models in JAX - Part 1
    I am starting a whole new series of tutorials where we will learn about the existing methods of building models in JAX. In this tutorial, we are going to build an image classifier purely in JAX. Here is the list of things that we will cover in this notebook: 1) Use the Cifar-10 dataset for training the classifier, 2) Build a classifier purely in JAX using no library other than JAX, 3) Data augmentation purely in JAX, 4) Create a custom training/testing loop in the most simplified manner, and 5) Discuss the pros and cons of this approach...
  • Effective Testing for Machine Learning (Part II)
    A progressive, step-by-step framework for developing robust ML projects...In this series’s first part, we started with a simple smoke testing strategy to ensure our code runs on every git push. Then, we built on top of it to ensure that our feature generation pipeline produced data with a minimum level of quality (integration tests) and verified the correctness of our data transformations (unit tests)...Now, we’ll add more robust tests: distribution changes, ensure that our training and serving logic is consistent, and check that our pipeline produces high-quality models...
  • The Magic of Integrating Factor
    One of the many techniques for solving ordinary differential equations involves using an integrating factor. An integrating factor is a function that we multiply a differential equation with to simplify it and make it integrable. It almost appears to work like magic!...
 
 

Tools*

 



High quality data labeling, consistently

Edge cases are the most common challenges that ML teams face when training their AI models, making it difficult to reach 95+% accuracy. This can be more complex once you need to scale and start working with 3rd party data labeling solutions. The evaluation metrics that we use to measure the quality of labeled data - Intersection over Union (IOU) and F1 score - has allowed us to make swift adjustments on the go and continuously improve the quality of our labeling standards. To find out more and start exploring our end-to-end data labeling service, speak to the team at Supahands today.

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

 
 

Jobs

 
  • (Senior) Analytics Engineer - Fabulous - Remote

    Fabulous is a mobile app helping thousands of people every day to change their lifestyles by integrating healthy habits into their lives. Fabulous is using a behavioral economics lens to help everyone achieve their fullest potential. We work closely with researchers based at Duke University and our advisor is Dan Ariely, author of NYT bestseller Predictably Irrational. We are looking for an experienced Analytics Engineer to consolidate the Data Science team and lead the development and enrichment of our Data Pipelines. We have a modern Data-Stack based on Fivetran, dbt, BigQuery, Amplitude, Metabase...

        Want to post a job here? Email us for details >> team@datascienceweekly.org

 
 

Training & Resources

 
  • ISLR tidymodels Labs
    This book aims to be a complement to the 2nd version An Introduction to Statistical Learning book with translations of the labs into using the tidymodels set of packages...The labs will be mirrored quite closely to stay true to the original material...
  • Deep Learning Interviews: Hundreds of fully solved job interview questions from a wide range of key topics in AI
    The second edition of Deep Learning Interviews is home to hundreds of fully-solved problems, from a wide range of key topics in AI. It is designed to both rehearse interview or exam specific topics and provide machine learning MSc / PhD. students, and those awaiting an interview a well-organized overview of the field. The problems it poses are tough enough to cut your teeth on and to dramatically improve your skills-but they're framed within thought-provoking questions and engaging stories...
 
 

Books

 

 
  • Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits


    Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
     


    P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2021 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Older messages

Data Science Weekly - Issue 423

Friday, December 31, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #423 December 30 2021 Editor Picks 2021:

Data Science Weekly - Issue 422

Friday, December 24, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #422 December 23 2021 Editor Picks

[in case you missed it] Data Science Weekly - Issue 421

Sunday, December 19, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #421 December 16 2021 Editor Picks Lee

Data Science Weekly - Issue 421

Friday, December 17, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #421 December 16 2021 Editor Picks Lee

Data Science Weekly - Issue 420

Friday, December 10, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #420 December 09 2021 Editor Picks D3

You Might Also Like

Re: Hackers may have stolen everyone's SSN!

Saturday, November 23, 2024

I wanted to make sure you saw Incogni's Black Friday deal, which is exclusively available for iPhone Life readers. Use coupon code IPHONELIFE to save 58%. Here's why we recommend Incogni for

North Korean Hackers Steal $10M with AI-Driven Scams and Malware on LinkedIn

Saturday, November 23, 2024

THN Daily Updates Newsletter cover Generative AI For Dummies ($18.00 Value) FREE for a Limited Time Generate a personal assistant with generative AI Download Now Sponsored LATEST NEWS Nov 23, 2024

📧 Building Async APIs in ASP.NET Core - The Right Way

Saturday, November 23, 2024

​ Building Async APIs in ASP .NET Core - The Right Way Read on: m​y website / Read time: 5 minutes The .NET Weekly is brought to you by: Even the smartest AI in the world won't save you from a

WebAIM November 2024 Newsletter

Friday, November 22, 2024

WebAIM November 2024 Newsletter Read this newsletter online at https://webaim.org/newsletter/2024/november Features Using Severity Ratings to Prioritize Web Accessibility Remediation When it comes to

➡️ Why Your Phone Doesn't Want You to Sideload Apps — Setting the Default Gateway in Linux

Friday, November 22, 2024

Also: Hey Apple, It's Time to Upgrade the Macs Storage, and More! How-To Geek Logo November 22, 2024 Did You Know Fantasy author JRR Tolkien is credited with inventing the main concept of orcs and

JSK Daily for Nov 22, 2024

Friday, November 22, 2024

JSK Daily for Nov 22, 2024 View this email in your browser A community curated daily e-mail of JavaScript news React E-Commerce App for Digital Products: Part 4 (Creating the Home Page) This component

Spyglass Dispatch: The Fate of Chrome • Amazon Tops Up Anthropic • Pros Quit Xitter • Brave Powers AI Search • Apple's Lazy AI River • RIP Enrique Allen

Friday, November 22, 2024

The Fate of Chrome • Amazon Tops Up Anthropic • Pros Quit Xitter • Brave Powers AI Search • Apple's Lazy AI River • RIP Enrique Allen The Spyglass Dispatch is a free newsletter sent out daily on

Charted | How the Global Distribution of Wealth Has Changed (2000-2023) 💰

Friday, November 22, 2024

This graphic illustrates the shifts in global wealth distribution between 2000 and 2023. View Online | Subscribe | Download Our App Presented by: MSCI >> Get the Free Investor Guide Now FEATURED

Daily Coding Problem: Problem #1616 [Easy]

Friday, November 22, 2024

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Alibaba. Given an even number (greater than 2), return two prime numbers whose sum will

The problem to solve

Friday, November 22, 2024

​ Use problem framing to define the problem to solve This week, Tom Parson and Krishna Raha share tools and frameworks to identify and address challenges effectively, while Voltage Control highlights