Data Science Weekly - Data Science Weekly - Issue 440

Curated news, articles and jobs related to Data Science.
Keep up with all the latest developments

Email not displaying correctly?
View it in your browser.

Issue #441

May 5 2022

Editor Picks

How Gaussian Is It?
This article is an excerpt from the current draft of my book Probably Overthinking It, to be published by the University of Chicago Press in early 2023...How tall are you? How long are your arms? How far it is from the radiale landmark on your right elbow to the stylion landmark on your right wrist?...

Democratizing access to large-scale language models with OPT-175B
In line with Meta AI’s commitment to open science, we are sharing Open Pretrained Transformer (OPT-175B), a language model with 175 billion parameters trained on publicly available data sets, to allow for more community engagement in understanding this foundational new technology...

OPT-175 Logbook [PDF]
[Editor's note: click on the download button]...Goal: Get a 175B dense model up and running by any means necessary...Purpose of this document: To provide a source of truth of what we did, when, and why, and any context that was important to those decisions. To provide each other with a clear place to find information about what is happening without having to ping....

A Message from this week's Sponsor:

Free Course: Natural Language Processing (NLP) for Semantic Search

Learn how to build semantic search applications by making machines understand language as people do. This free course covers everything you need to build state-of-the-art language models, from machine translation to question-answering, and more. Brought to you by Pinecone. Start reading now.

Data Science Articles & Videos

JAX vs Julia (vs PyTorch)
A while ago there was an interesting thread on the Julia Discourse about the “state of machine learning in Julia”. I posted a response discussing the differences between Julia and Python (both JAX and PyTorch), and it seemed to be really well received!...Since then this topic seems to keep coming up, so I thought I’d tidy up that post and put it somewhere I could link to easily...To my mind JAX and Julia are unquestionably the current state-of-the-art frameworks for autodifferentiation, scientific computing, and ML computing. So let’s dig into the differences....

Working on build systems full-time at Meta
Summary: I joined Meta 2.5 years ago to work on build systems. I’m enjoying it...I'll cover What I’ve learnt about build systems as well as What's different moving from finance to tech...

Advances in Neural Compression with Auke Wiggers
Today we’re joined by Auke Wiggers, an AI research scientist at Qualcomm...we discuss his team’s recent research on data compression using generative models. We discuss the relationship between historical compression research and the current trend of neural compression, and the benefit of neural codecs, which learn to compress data from examples. We also explore the performance evaluation process and the recent developments that show that these models can operate in real-time on a mobile device. Finally, we discuss another ICLR paper, “Transformer-based transform coding”, that proposes a vision transformer-based architecture for image and video coding...

Training Language Models with Natural Language Feedback
Pretrained language models often do not perform tasks in ways that are in line with our preferences, e.g., generating offensive text or factually incorrect summaries. Recent work approaches the above issue by learning from a simple form of human evaluation: comparisons between pairs of model-generated task outputs. Comparison feedback conveys limited information about human preferences per human evaluation. Here, we propose to learn from natural language feedback, which conveys more information per human evaluation. We learn from language feedback on model outputs using a three-step learning algorithm...

What Data Visualization Reveals: Elizabeth Palmer Peabody and the Work of Knowledge Production
This essay offers the chronological charts of Elizabeth Palmer Peabody (1804–1894), the 19th-century educator and intellectual, as early examples of how data visualization can reveal a range of forms of knowledge. It challenges the universality of the goals of clarity and efficiency when designing data visualizations, and argues for the value of visualizations that encourage sustained reflection and imaginative response...

Hiring Data Scientists With Intention
I met Tara Robertson in 2019 when I joined Mozilla, where she was the Global Diversity and Inclusion Lead at the time. When I needed to grow my team, Tara and I worked together to develop an inclusive hiring process. Since then, Tara and I have kept the conversation going and wanted to share some of our thoughts here!...

Handling and Presenting Harmful Text
Textual data can pose a risk of serious harm. These harms can be categorised along three axes: (1) the harm type, (2) whether it is elicited as a feature of the research design from directly studying harmful content, and (3) who it affects...It is an unsolved problem in NLP as to how textual harms should be handled, presented, and discussed; but, stopping work on content which poses a risk of harm is untenable. Accordingly, we provide practical advice and introduce HARMCHECK, a resource for reflecting on research into textual harms...

Datacast Episode 90: Operational Analytics, Reverse Etl, And Finding Product-Market Fit With Kashish Gupta
Our wide-ranging conversation touches on his education at the University of Pennsylvania studying Computer Science; his learning about venture capital at Bessemer Venture Partners; his first startup Carry that went through Y Combinator; his current journey with Hightouch building a data activation platform; lessons learned creating the Operational Analytics category, pivoting through various startup ideas, identifying design partners, hiring talent, fundraising; and much more...

MLOps principles I think every ML platform should have [Twitter Thread]
I probably should have written this years ago, but here are some MLOps principles I think every ML platform (codebase, data management platform) should have...

New from Anaconda: Python in the Browser
Say Hello to PyScript PyScript is a framework that allows users to create rich Python applications in the browser using a mix of Python with standard HTML. PyScript aims to give users a first-class programming language that has consistent styling rules, is more expressive, and is easier to learn...What is PyScript? Well, here are some of the core components...

Vanishing Gradients Podcast Episode 7: The Evolution Of Python For Data Science
Hugo speaks with Peter Wang, CEO of Anaconda, about how Python became so big in data science, machine learning, and AI. They jump into many of the technical and sociological beginnings of Python being used for data science, a history of PyData, the conda distribution, and NUMFOCUS...

Conference*

Join us at apply(), the ML data engineering conference - it’s free.

Speakers include practitioners from the Wikimedia Foundation, Facebook, Gojek, Snapchat, Instacart, Walmart, Stripe, Uber, Volvo, Snowflake, Databricks, and more. We’d love for you to join us.

Agenda highlights:

Smitha Shyam, Director of Engineering at Uber: Uber's Michelangelo: Then and Now
Chris Albon, Director of Machine Learning at Wikimedia Foundation: More Ethical Machine Learning Using Model Card at Wikimedia
Matei Zaharia, Co-Founder and Chief Technologist at Databricks: The Future of Data for Machine Learning
Chip Huyen, Co-Founder at Claypot AI: Machine Learning Platform for Online Prediction and Continual Learning
Clem Delangue, CEO at Hugging Face: Is Open-Source Machine Learning Becoming the Most Impactful Technology of the Decade?

See the full agenda and register for free.

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

Jobs

Data Scientist - Hungryroot - Remote

Hungryroot is looking for a Data Scientist to join our growing Data Team. As a Data Scientist, you will work closely with other Data Scientists and Data Engineers to develop various Machine Learning models that power Hungryroot and it’s AI functions. These models include traditional forecasting models, as well as more industry-specific optimization challenges.

As a Data Scientist at Hungryroot, you will work on answering questions like: how do you tell what food someone would like to eat this week, how do you determine whether they enjoyed it or not, maybe they liked their means last week, but are now looking for different options, maybe they like the same food on Tuesdays, but variety on Fridays, what about spicy food, is Green Chilly as spicy as Green Curry?

Want to post a job here? Email us for details --> team@datascienceweekly.org

Training & Resources

Hierarchical Time Series With Prophet and PyMC3 by Matthijs Brouns [Video]
When doing time-series modelling, you often end up in a situation where you want to make long-term predictions for multiple, related, time-series. In this talk, we’ll build an hierarchical version of Facebook’s Prophet package to do exactly that...

How to reshape your data in R for analysis
Using tidyverse functions to switch between wide and long format data...

Scientific Visualization: Python + Matplotlib
This book is organized into four parts. The first part considers the fundamental principles of the Matplotlib library...The second part is dedicated to the actual design of a figure...The third part is dedicated to more advanced concepts, namely 3D figures, optimization & animation. The fourth and final part is a collection of showcases...

Books

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits

Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

Follow on Twitter

unsubscribe from this list update subscription preferences

Data Science Weekly - Issue 440

Thursday, April 28, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #440 April 28 2022 Editor Picks Beyond

Data Science Weekly - Issue 439

Thursday, April 21, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #439 April 21 2022 Editor Picks Real

Data Science Weekly - Issue 437

Thursday, April 7, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #437 April 07 2022 Editor Picks

Data Science Weekly - Issue 436

Thursday, March 31, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #436 March 31 2022 Editor Picks Stop

Data Science Weekly - Issue 435

Friday, March 25, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #435 March 24 2022 Editor Picks

Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator

Friday, February 14, 2025

What came before the golem? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Defining Your Paranoia Level: Navigating Change Without the Overkill

Friday, February 14, 2025

We've all been there: trying to learn something new, only to find our old habits holding us back. We discussed today how our gut feelings about solving problems can sometimes be our own worst enemy

5 ways AI can help with taxes 🪄

Friday, February 14, 2025

Remotely control an iPhone; 💸 50+ early Presidents' Day deals -- ZDNET ZDNET Tech Today - US February 10, 2025 5 ways AI can help you with your taxes (and what not to use it for) 5 ways AI can help

Recurring Automations + Secret Updates

Friday, February 14, 2025

Smarter automations, better templates, and hidden updates to explore 👀 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

The First Provable AI-Proof Game: Introducing Butterfly Wings 4

Friday, February 14, 2025

Top Tech Content sent at Noon! Boost Your Article on HackerNoon for $159.99! Read this email in your browser How are you, @newsletterest1? undefined The Market Today #01 Instagram (Meta) 714.52 -0.32%

GCP Newsletter #437

Friday, February 14, 2025

Welcome to issue #437 February 10th, 2025 News BigQuery Cloud Marketplace Official Blog Partners BigQuery datasets now available on Google Cloud Marketplace - Google Cloud Marketplace now offers

Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰

Friday, February 14, 2025

Discover how the share of US wealth held by the top 1% has evolved from 1989 to 2024 in this infographic. View Online | Subscribe | Download Our App Download our app to see thousands of new charts from

The Great Social Media Diaspora & Tapestry is here

Friday, February 14, 2025

Apple introduces new app called 'Apple Invites', The Iconfactory launches Tapestry, beyond the traditional portfolio, and more in this week's issue of Creativerly. Creativerly The Great

Daily Coding Problem: Problem #1689 [Medium]

Friday, February 14, 2025

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. Given a linked list, sort it in O(n log n) time and constant space. For example,

📧 Stop Conflating CQRS and MediatR

Friday, February 14, 2025

Stop Conflating CQRS and MediatR Read on: my website / Read time: 4 minutes The .NET Weekly is brought to you by: Step right up to the Generative AI Use Cases Repository! See how MongoDB powers your

Data Science Weekly - Data Science Weekly - Issue 440

Issue #441

May 5 2022

Editor Picks

A Message from this week's Sponsor:

Data Science Articles & Videos

Conference*

Jobs

Training & Resources

Books

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits

Older messages

Data Science Weekly - Issue 440

Data Science Weekly - Issue 439

Data Science Weekly - Issue 437

Data Science Weekly - Issue 436

Data Science Weekly - Issue 435

You Might Also Like

Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator

Defining Your Paranoia Level: Navigating Change Without the Overkill

5 ways AI can help with taxes 🪄

Recurring Automations + Secret Updates

The First Provable AI-Proof Game: Introducing Butterfly Wings 4

GCP Newsletter #437

Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰

The Great Social Media Diaspora & Tapestry is here

Daily Coding Problem: Problem #1689 [Medium]

📧 Stop Conflating CQRS and MediatR

Data Science Weekly - Data Science Weekly - Issue 440

Issue #441 May 5 2022

Editor Picks

A Message from this week's Sponsor:

Data Science Articles & Videos

Conference*

Jobs

Training & Resources

Books

Older messages

You Might Also Like

Issue #441

May 5 2022