Data Science Weekly - Data Science Weekly - Issue 416

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #416

November 11 2021

Editor Picks
 
  • Lessons on ML Platforms — from Netflix, DoorDash, Spotify, and more
    Your data scientists produce wonderful models, but they can only deliver value once the models are integrated into your production systems...Through scouring conference talks and blog posts from the past several years, I’ve documented ML platforms’ common components and capabilities at eleven large tech companies...This post contains: a) A high-level overview of common ML platform components, b) A table of tools used by each company, c) Observations about the components, d) The platform user experience, and e) A summary of capabilities unique to certain companies...
  • From Data Engineer to SysAdmin: Put down the K8s cluster, your pipelines can run without it
    I’ve been operating Kubernetes (using EKS) in a data engineering team for almost three years now, and I’d be wary of using it if I had the choice in the future. This isn’t an anti-Kubernetes post, as I think Kubernetes (K8s) is a game-changing technology and would bet that our team’s investment in K8s will pay off over the longer term as our engineering headcount grows past one thousand. This post is a ‘be careful, you are not Google’ type post, with some specifics on how K8s ownership has proved frustrating and unsatisfying to an engineer whose actual goal is to help the business understand itself and build better products using data & ML...
  • GPT-3 is No Longer the Only Game in Town
    GPT-3 was by far the largest AI model of its kind last year. Now? Not so much...the ability of people to build upon GPT-3 was hampered by one major factor: it was not publicly released...So, since last year multiple organizations have worked towards creating their own version of GPT-3, and as I’ll go over in this article at this point roughly half a dozen such gigantic GPT-3 esque models have been developed (though as with GPT-3, not yet publicly released)...
 
 

A Message from this week's Sponsor:

 



Pull data at any scale from your data warehouse

PostHog is an open-source product analytics platform that can ingest data at any scale, even from data warehouses based on BigQuery, Snowflake, S3 or Redshift.

Once your data is in PostHog you can analyse it using funnels, trends, pathing visualizations and more. You can even integrate with other platforms, creating a data pipeline for on-going analysis.

Best of all, you can deploy PostHog on your own infrastructure in minutes.

Deploy PostHog today for free.

 

 

Data Science Articles & Videos

 
  • The Turing Test Is Bad For Business
    Fears of artificial intelligence fill the news...The one group everyone assumes will benefit is business, but the data seems to disagree. Amid all the hype, US businesses have been slow in adopting the most advanced AI technologies, and there is little evidence that such technologies are contributing significantly to productivity growth or job creation...Turing himself, and other technology pioneers such as Douglas Engelbart and Norbert Wiener, understood that computers would be most useful to business and society when they augmented and complemented human capabilities, not when they competed directly with us...
  • The difference between outlier detection and data drift detection
    When monitoring ML models in production, we can apply different techniques...Data drift and outlier detection are among those. Both are useful when we do not have ground truth labels yet. The data is then the only thing to look at...There are various statistical approaches to detect either (an interesting discussion by itself!), but also a principle difference...
  • Improving a Machine Learning System (Part 1 - Broken Abstractions)
    Suppose you have been hired to apply state of the art machine learning technology to improve the Foo vs Bar classifier at FooBar International. Foo vs Bar classification is a critical business need for FooBar International, and the company has been using a simple system based on a decade-old machine learning technology to solve this problem for the last several years...To your surprise, your new model substantially underperforms compared to the existing system...This is a familiar story that anybody who has built machine learning models at a large company will recognize. Making measurable improvements to a mature machine learning system is extremely difficult. In this post, we will explore why...
  • Machine Learning from a Bayesian Perspective [PDF]
    I summarize a Bayesian perspective of machine learning. We view Bayes as an optimization problem whose solutions use the information-geometry of the posterior. Using this perspective, we can show that many machine-learning methods have a (more general) Bayesian side to them. I believe this perspective to be essential for bridging the gap between ‘artificial’ and ‘natural’ learning systems...
  • DALL·E mini: Zero-Shot Text-to-Image Generation [Video]
    The ability to control image generation with natural language is very fascinating and opens a lot of new opportunities in the field of multimodal machine learning. OpenAI's recent blog about their DALL·E project shows the potential of models, but unfortunately, the model has not been released...Our goal here with DALL·E mini is to show that one can still achieve reasonable performance on this multimodal task with far more accessible means of compute. Even though DALL·E mini is about 30 times smaller than the original and trained on a much smaller dataset, it demonstrates interesting zero-shot capabilities...In this talk, we will get to know DALL·E mini in detail, and explain how it is capable of achieving such results thanks to the use of pre-trained models such as the VQ-GAN and BART. We will dig deeper into the theoretical aspects of these models to understand what happens under the hood in the DALL·E mini pipeline...
  • Updates and Lessons from AI Forecasting
    Earlier this year, my research group commissioned 6 questions for professional forecasters to predict about AI. Broadly speaking, 2 were on geopolitical aspects of AI and 4 were on future capabilities...My overall take from this task and the previous one is that forecasters are pretty confident that we won't have the singularity before 2025, but at the same time there will be demonstrated progress in ML that I would expect to convince a significant fraction of skeptics (in the sense that it will look untenable to hold positions that "Deep learning can't do X")...
  • An Introduction to Language Models in NLP (Part 1: Intuition)
    This post provids an overview of a couple key concepts surrounding language models: a) We define a language model as an algorithm that scores how "human" a sentence is, b) We describe a way to train language models: by observing language and turning these observations into probabilities, and c) We discuss a couple approaches to evaluating the quality of language models: human evaluation (did the robot responses sound natural to a human?), downstream tasks (did the robot responses lead to actual food?), and intrinsic evaluations (how perplexed were the robots by the human utterances?)...
  • Gradients are Not All You Need
    Differentiable programming techniques are widely used in the community and are responsible for the machine learning renaissance of the past several decades. While these methods are powerful, they have limits. In this short report, we discuss a common chaos based failure mode which appears in a variety of differentiable circumstances, ranging from recurrent neural networks and numerical physics simulation to training learned optimizers. We trace this failure to the spectrum of the Jacobian of the system under study, and provide criteria for when a practitioner might expect this failure to spoil their differentiation based optimization algorithms...
  • Clarity and Aesthetics in Data Visualization: Guidelines
    We built an initial set of guidelines that are based on two elements. First, they come from observing actual problems we found over and over again in the solutions submitted to the mini-projects. In this sense the guidelines just emerged from practice. Second, they come from trying to justify our intuitions on notions of visual perception. In this sense the guidelines also rest on considerations stemming from visual perception. In this post I am going to focus on the guidelines...
 
 

Tools*

 


Create AI-powered search and recommendation apps with Pinecone

Pinecone is a fully managed vector database that makes it easy to add vector search to production applications. It combines state-of-the-art vector search libraries, advanced features such as filtering, and distributed infrastructure to provide high performance and reliability at any scale. Get started now — it's free!


*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

 
 

Jobs

 
  • Entry Level Data Scientist: 2022 - IBM - Multiple Locations

    As a Data Scientist at IBM, you will help transform our clients’ data into tangible business value by analyzing information, communicating outcomes and collaborating on product development. Work with Best in Class open source and visual tools, along with the most flexible and scalable deployment options. Whether it’s investigating patient trends or weather patterns, you will work to solve real world problems for the industries transforming how we live.

        Want to post a job here? Email us for details >> team@datascienceweekly.org

 
 

Training & Resources

 
  • The Ancient Secrets of Computer Vision - An Introduction to Computer Vision
    This class is a general introduction to computer vision. It covers standard techniques in image processing like filtering, edge detection, stereo, flow, etc. (old-school vision), as well as newer, machine-learning based computer vision. It was originally offered in the spring of 2018 at the University of Washington...
  • Deep Learning With PyTorch - 5 Hour Full YouTube Course
    In this course you learn all the fundamentals to get started with PyTorch and Deep Learning: a) Intro, b) Installation, c) Tensor Basics, d) Autograd, e) Backpropagation, f) Gradient Descent, g) Training Pipeline, h) Linear Regression, i) Logistic Regression, j) Dataset and Dataloader, k) Dataset Transforms, l) Softmax and Crossentropy, m) Activation Functions, n) Feed Forward Net, o) CNN, p) Transfer Learning, q) Tensorboard and , r) Save & Load Models...
  • How to create a Hex Tile Grid Map in Excel
    In a previous blog post I showed you how to build a Grid Map with circles using Excel charting capability. In this blog post I’m going to start off from where we left it and use the same data and graph to transform it into the hex tile grid map—as per the below graph showing the US Death Penalty Status in 2020...
 
 

Books

 

 
  • Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits


    Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
     


    P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2021 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Older messages

Data Science Weekly - Issue 415

Friday, November 5, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #415 November 04 2021 Editor Picks

Data Science Weekly - Issue 414

Friday, October 29, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #414 October 28 2021 Editor Picks

[in case you missed it] Data Science Weekly - Issue 413

Sunday, October 24, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #413 October 21 2021 Editor Picks The

Data Science Weekly - Issue 412

Friday, October 15, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #412 October 14 2021 Editor Picks

[in case you missed it] Data Science Weekly - Issue 410

Sunday, October 3, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #410 September 30 2021 Editor Picks Top

You Might Also Like

Re: Hackers may have stolen everyone's SSN!

Saturday, November 23, 2024

I wanted to make sure you saw Incogni's Black Friday deal, which is exclusively available for iPhone Life readers. Use coupon code IPHONELIFE to save 58%. Here's why we recommend Incogni for

North Korean Hackers Steal $10M with AI-Driven Scams and Malware on LinkedIn

Saturday, November 23, 2024

THN Daily Updates Newsletter cover Generative AI For Dummies ($18.00 Value) FREE for a Limited Time Generate a personal assistant with generative AI Download Now Sponsored LATEST NEWS Nov 23, 2024

📧 Building Async APIs in ASP.NET Core - The Right Way

Saturday, November 23, 2024

​ Building Async APIs in ASP .NET Core - The Right Way Read on: m​y website / Read time: 5 minutes The .NET Weekly is brought to you by: Even the smartest AI in the world won't save you from a

WebAIM November 2024 Newsletter

Friday, November 22, 2024

WebAIM November 2024 Newsletter Read this newsletter online at https://webaim.org/newsletter/2024/november Features Using Severity Ratings to Prioritize Web Accessibility Remediation When it comes to

➡️ Why Your Phone Doesn't Want You to Sideload Apps — Setting the Default Gateway in Linux

Friday, November 22, 2024

Also: Hey Apple, It's Time to Upgrade the Macs Storage, and More! How-To Geek Logo November 22, 2024 Did You Know Fantasy author JRR Tolkien is credited with inventing the main concept of orcs and

JSK Daily for Nov 22, 2024

Friday, November 22, 2024

JSK Daily for Nov 22, 2024 View this email in your browser A community curated daily e-mail of JavaScript news React E-Commerce App for Digital Products: Part 4 (Creating the Home Page) This component

Spyglass Dispatch: The Fate of Chrome • Amazon Tops Up Anthropic • Pros Quit Xitter • Brave Powers AI Search • Apple's Lazy AI River • RIP Enrique Allen

Friday, November 22, 2024

The Fate of Chrome • Amazon Tops Up Anthropic • Pros Quit Xitter • Brave Powers AI Search • Apple's Lazy AI River • RIP Enrique Allen The Spyglass Dispatch is a free newsletter sent out daily on

Charted | How the Global Distribution of Wealth Has Changed (2000-2023) 💰

Friday, November 22, 2024

This graphic illustrates the shifts in global wealth distribution between 2000 and 2023. View Online | Subscribe | Download Our App Presented by: MSCI >> Get the Free Investor Guide Now FEATURED

Daily Coding Problem: Problem #1616 [Easy]

Friday, November 22, 2024

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Alibaba. Given an even number (greater than 2), return two prime numbers whose sum will

The problem to solve

Friday, November 22, 2024

​ Use problem framing to define the problem to solve This week, Tom Parson and Krishna Raha share tools and frameworks to identify and address challenges effectively, while Voltage Control highlights