Data Science Weekly - Data Science Weekly - Issue 440

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #441

May 5 2022

Editor Picks

 
  • How Gaussian Is It?
    This article is an excerpt from the current draft of my book Probably Overthinking It, to be published by the University of Chicago Press in early 2023...How tall are you? How long are your arms? How far it is from the radiale landmark on your right elbow to the stylion landmark on your right wrist?...
  • Democratizing access to large-scale language models with OPT-175B
    In line with Meta AI’s commitment to open science, we are sharing Open Pretrained Transformer (OPT-175B), a language model with 175 billion parameters trained on publicly available data sets, to allow for more community engagement in understanding this foundational new technology...
  • OPT-175 Logbook [PDF]
    [Editor's note: click on the download button]...Goal: Get a 175B dense model up and running by any means necessary...Purpose of this document: To provide a source of truth of what we did, when, and why, and any context that was important to those decisions. To provide each other with a clear place to find information about what is happening without having to ping....
 
 

A Message from this week's Sponsor:

 



Free Course: Natural Language Processing (NLP) for Semantic Search

Learn how to build semantic search applications by making machines understand language as people do. This free course covers everything you need to build state-of-the-art language models, from machine translation to question-answering, and more. Brought to you by Pinecone. Start reading now.

 

 

Data Science Articles & Videos

 
  • JAX vs Julia (vs PyTorch)
    A while ago there was an interesting thread on the Julia Discourse about the “state of machine learning in Julia”. I posted a response discussing the differences between Julia and Python (both JAX and PyTorch), and it seemed to be really well received!...Since then this topic seems to keep coming up, so I thought I’d tidy up that post and put it somewhere I could link to easily...To my mind JAX and Julia are unquestionably the current state-of-the-art frameworks for autodifferentiation, scientific computing, and ML computing. So let’s dig into the differences....
  • Working on build systems full-time at Meta
    Summary: I joined Meta 2.5 years ago to work on build systems. I’m enjoying it...I'll cover What I’ve learnt about build systems as well as What's different moving from finance to tech...
  • Advances in Neural Compression with Auke Wiggers
    Today we’re joined by Auke Wiggers, an AI research scientist at Qualcomm...we discuss his team’s recent research on data compression using generative models. We discuss the relationship between historical compression research and the current trend of neural compression, and the benefit of neural codecs, which learn to compress data from examples. We also explore the performance evaluation process and the recent developments that show that these models can operate in real-time on a mobile device. Finally, we discuss another ICLR paper, “Transformer-based transform coding”, that proposes a vision transformer-based architecture for image and video coding...
  • Training Language Models with Natural Language Feedback
    Pretrained language models often do not perform tasks in ways that are in line with our preferences, e.g., generating offensive text or factually incorrect summaries. Recent work approaches the above issue by learning from a simple form of human evaluation: comparisons between pairs of model-generated task outputs. Comparison feedback conveys limited information about human preferences per human evaluation. Here, we propose to learn from natural language feedback, which conveys more information per human evaluation. We learn from language feedback on model outputs using a three-step learning algorithm...
  • What Data Visualization Reveals: Elizabeth Palmer Peabody and the Work of Knowledge Production
    This essay offers the chronological charts of Elizabeth Palmer Peabody (1804–1894), the 19th-century educator and intellectual, as early examples of how data visualization can reveal a range of forms of knowledge. It challenges the universality of the goals of clarity and efficiency when designing data visualizations, and argues for the value of visualizations that encourage sustained reflection and imaginative response...
  • Hiring Data Scientists With Intention
    I met Tara Robertson in 2019 when I joined Mozilla, where she was the Global Diversity and Inclusion Lead at the time. When I needed to grow my team, Tara and I worked together to develop an inclusive hiring process. Since then, Tara and I have kept the conversation going and wanted to share some of our thoughts here!...
  • Handling and Presenting Harmful Text
    Textual data can pose a risk of serious harm. These harms can be categorised along three axes: (1) the harm type, (2) whether it is elicited as a feature of the research design from directly studying harmful content, and (3) who it affects...It is an unsolved problem in NLP as to how textual harms should be handled, presented, and discussed; but, stopping work on content which poses a risk of harm is untenable. Accordingly, we provide practical advice and introduce HARMCHECK, a resource for reflecting on research into textual harms...
  • Datacast Episode 90: Operational Analytics, Reverse Etl, And Finding Product-Market Fit With Kashish Gupta
    Our wide-ranging conversation touches on his education at the University of Pennsylvania studying Computer Science; his learning about venture capital at Bessemer Venture Partners; his first startup Carry that went through Y Combinator; his current journey with Hightouch building a data activation platform; lessons learned creating the Operational Analytics category, pivoting through various startup ideas, identifying design partners, hiring talent, fundraising; and much more...
  • New from Anaconda: Python in the Browser
    Say Hello to PyScript PyScript is a framework that allows users to create rich Python applications in the browser using a mix of Python with standard HTML. PyScript aims to give users a first-class programming language that has consistent styling rules, is more expressive, and is easier to learn...What is PyScript? Well, here are some of the core components...
 
 

Conference*

 



Join us at apply(), the ML data engineering conference - it’s free.

Speakers include practitioners from the Wikimedia Foundation, Facebook, Gojek, Snapchat, Instacart, Walmart, Stripe, Uber, Volvo, Snowflake, Databricks, and more. We’d love for you to join us.

Agenda highlights:
  • Smitha Shyam, Director of Engineering at Uber: Uber's Michelangelo: Then and Now
  • Chris Albon, Director of Machine Learning at Wikimedia Foundation: More Ethical Machine Learning Using Model Card at Wikimedia
  • Matei Zaharia, Co-Founder and Chief Technologist at Databricks: The Future of Data for Machine Learning
  • Chip Huyen, Co-Founder at Claypot AI: Machine Learning Platform for Online Prediction and Continual Learning
  • Clem Delangue, CEO at Hugging Face: Is Open-Source Machine Learning Becoming the Most Impactful Technology of the Decade?

See the full agenda and register for free.


*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

 
 

Jobs

 
  • Data Scientist - Hungryroot - Remote

    Hungryroot is looking for a Data Scientist to join our growing Data Team. As a Data Scientist, you will work closely with other Data Scientists and Data Engineers to develop various Machine Learning models that power Hungryroot and it’s AI functions. These models include traditional forecasting models, as well as more industry-specific optimization challenges.

    As a Data Scientist at Hungryroot, you will work on answering questions like: how do you tell what food someone would like to eat this week, how do you determine whether they enjoyed it or not, maybe they liked their means last week, but are now looking for different options, maybe they like the same food on Tuesdays, but variety on Fridays, what about spicy food, is Green Chilly as spicy as Green Curry?

     

        Want to post a job here? Email us for details --> team@datascienceweekly.org

 
 

Training & Resources

 
  • Scientific Visualization: Python + Matplotlib
    This book is organized into four parts. The first part considers the fundamental principles of the Matplotlib library...The second part is dedicated to the actual design of a figure...The third part is dedicated to more advanced concepts, namely 3D figures, optimization & animation. The fourth and final part is a collection of showcases...
 
 

Books

 

 
  • Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits


    Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
     


    P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2022 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Older messages

Data Science Weekly - Issue 440

Thursday, April 28, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #440 April 28 2022 Editor Picks Beyond

Data Science Weekly - Issue 439

Thursday, April 21, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #439 April 21 2022 Editor Picks Real

Data Science Weekly - Issue 437

Thursday, April 7, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #437 April 07 2022 Editor Picks

Data Science Weekly - Issue 436

Thursday, March 31, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #436 March 31 2022 Editor Picks Stop

Data Science Weekly - Issue 435

Friday, March 25, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #435 March 24 2022 Editor Picks

Programmer Weekly - Issue 105

Thursday, May 19, 2022

View this email in your browser Programmer Weekly Welcome to issue 105 of Programmer Weekly. Let's get straight to the links this week. Quote of the Week "If there is a feature of a language

Stripe's Markdown framework, Tencent's code analysis tool, and K8s visualization

Thursday, May 19, 2022

StackShare Weekly Email not displaying correctly? View it in your browser. StackShare Weekly Digest May 19th, 2022 Sponsored by LaunchDarkly. Delivering the right features to the right customers can be

Web Tools #461 - Mergify, Frameworks, React Tools, Uncats

Thursday, May 19, 2022

Web Tools Weekly WEB VERSION Issue #461 • May 19, 2022 The following intro is a paid product review for Mergify, a GitHub-based service for automating your pull requests and code merges. If you're

Python Weekly - Issue 550

Thursday, May 19, 2022

View this email in your browser Python Weekly Welcome to issue 550 of Python Weekly. Let's get straight to the links this week. From Our Sponsor Deepnote is a new kind of data notebook that's

Q&A: Deep Dive on Blazor, 3rd-Party Blazor/.NET MAUI Tools, MS Touts IntelliJ IDE for Azure, More

Thursday, May 19, 2022

Home | News | How To | Webcasts | Whitepapers | Advertise .net insight May 19, 2022 THIS ISSUE SPONSORED BY: Free version of 'Learning Blazor' eBook by David Pine Developer Tools to Prevent Bad

Researchers Find Potential Way to Run Malware on iPhone Even When it's OFF

Thursday, May 19, 2022

The Hacker News Daily Updates Newsletter cover Linux Cheat Sheet Bundle A free collection of downloadable cheat sheets to help you get the most from Linux. Download Now Sponsored LATEST NEWS May 19,

3D Printing in Higher Education & Research

Thursday, May 19, 2022

Adopt 3D printing in a way that delivers value for your institution View this email in your browser engineering.com Guide - 3D Printing in Higher Education and Research 3D Printing in Higher Education

💭 An Introduction to Content Federation | XS’ Issue #20

Thursday, May 19, 2022

💭 An Introduction to Content Federation | XS' Issue #20 By Esat from Experience Stack • Issue #20 • View online An Introduction to Content Federation When most people think of content management,

wpMail.me issue#563

Thursday, May 19, 2022

wpMail.me wpMail.me issue#563 - The weekly WordPress newsletter. No spam, no nonsense. - May 19, 2022 Is this email not displaying correctly? View it in your browser. News & Articles Does Market

You're invited to our next event - Mobile Development special edition 📱

Thursday, May 19, 2022

In case you've missed it Only 5 days left to join our TED-like online event! This time we will talk about Mobile Development 📱️ Accept Invitation Grab your free ticket Speakers Lineup 🎤 Our events