Data Science Weekly - Data Science Weekly - Issue 452

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #452

July 21 2022

Editor's Picks


  • Is Data Scientist Still the Sexiest Job of the 21st Century?
    Ten years ago, the authors posited that being a data scientist was the “sexiest job of the 21st century.” A decade later, does the claim stand up? The job has grown in popularity and is generally well-paid, and the field is projected to experience more growth than almost any other by 2029. But the job has changed, in both large and small ways...
  • Overview & Applications of Large Language Models (LLMs)
    Given the rapid pace of innovation in machine learning (ML), especially in LLMs, I believe that we are beginning a wave where many significant companies supplying or utilizing this technology will be built...I’m fascinated by all of the ways LLMs will continue to become part of our daily lives, regardless of whether the LLM application is consuming the model output from the APIs of others or training their own models for specific use cases. When thinking through if an LLM application is feasible (or a good investment), I consider the following questions...

A Message from this week's Sponsor:


Mona is a flexible and intelligent monitoring platform for AI / ML

Data science teams leverage Mona’s powerful analytical engine to gain granular insights about the behavior of their data and models, in order to reduce business risk and pinpoint areas that need improvements.

Mona enables teams to continuously collect, transform and analyze data from all parts of the AI system to track custom metrics in a robust dashboard, be proactively alerted on anomalous behavior, diagnose model issues, conduct A/B tests, and more. Enterprises in a variety of industries leverage Mona for NLP/NLU, speech, computer vision, and machine learning use cases.



Data Science Articles & Videos

  • Modeling Short Time Series with Prior Knowledge
    It is generally difficult to model time series when there is insuffient data to model a (suspected) long seasonality. Here, we show how this difficulty can be overcome by learning a seasonality on a different, long related time series and transferring the posterior as a prior distribution to the model of the short time series. The result is a forecast that is believable and can be used for decisions in a business context...
  • Interview with Sir David Cox [Video]
    Last fall Trevor Hastie and Robert Tibshirani interviewed Sir David Cox, for their online course on Statistical learning. Sir David talked about a wide range of topics including history of the Cox model and his view of modern data science...
  • SQL: The Universal Solvent for REST APIs
    Data scientists working in Python or R typically acquire data by way of REST APIs. Both environments provide libraries that help you make HTTP calls to REST endpoints, then transform JSON responses into dataframes. But that’s never as simple as we’d like...What if there were a way of reading from APIs that abstracted all the low-level grunt work and worked the same way everywhere? Good news! That is exactly what Steampipe does. It’s a tool that translates REST API calls directly into SQL tables. Here are three examples of questions that you can ask and answer using Steampipe...
  • The Rise of Domain Experts in Deep Learning
    In this interview, Jeremy Howard, an AI researcher and the co-founder of, iscusses what it means for different industries and even global regions now that people without PhDs from specialized research labs can build and work with deep learning models. Among other topics under this broad umbrella, he shares his thoughts on how to best keep up with state-of-the-art techniques, prompt engineering as a new skill set, and the pros and cons of code-generation systems like Codex...
  • Why do tree-based models still outperform deep learning on tabular data?
    While deep learning has enabled tremendous progress on text and image datasets, its superiority on tabular data is not clear. We contribute extensive benchmarks of standard and novel deep learning methods as well as tree-based models such as XGBoost and Random Forests, across a large number of datasets and hyperparameter combinations...
  • Transformers in computer vision: ViT architectures, tips, tricks and improvements
    You are probably already aware of the Vision Transformer (ViT). What came after its initial submission is the story of this blog-post. We will explore multiple orthogonal research directions on ViTs. Why? Because chances are that you are interested in a particular task like video summarization. We will address questions like how can you adapt/use ViT on your computer vision problem, what are the best ViT-based architectures, training tricks and recipes, scaling laws, supervised vs self-supervised pre-training, etc...
  • Theseus, a library for encoding domain knowledge in end to end AI models
    Theseus, a library for an optimization technique called differentiable nonlinear least squares (NLS) that is particularly useful for applications like robotics and computer vision. Built on PyTorch, Theseus enables researchers to easily incorporate expert domain knowledge into modern AI architectures...
  • Is Integer Arithmetic Enough for Deep Learning Training?
    Our empirical and mathematical results reveal that integer arithmetic is enough to train deep learning models. Unlike recent proposals, instead of quantization, we directly switch the number representation of computations. Our novel training method forms a fully integer training pipeline that does not change the trajectory of the loss and accuracy compared to floating-point, nor does it need any special hyper-parameter tuning, distribution adjustment, or gradient clipping. Our experimental results show that our proposed method is effective in a wide variety of tasks such as classification (including vision transformers), object detection, and semantic segmentation...
  • Transformer models: an introduction and catalog — 2022 Edition
    Just 3 months after my previous update I felt like there was already a “long overdued” further update. That is how quickly the field is evolving. In this recent update I have added the new class of text to image transformers, all the way from diffusion models to DALL-E2, CLIP, and Imagen. Another class of Transformer models I added are those that allow to use transformers to model an arbitrary agent with RL applications such as playing Atari or controling a robotic arm. Those include Trajectory Transformers, Decision Transformers, and Deepmind’s GATO. I have also included the latest open source BLOOM model. The full list of newcomers to the catalog: BLOOM, CLIP, DALL-E2, Decision and Trajectory Transformers, Flamingo, Gato, DQ-BART, GLaM, GLIDE, GC-ViT, Imagen, LAMDA, Minerva, OPT, PaLM, and Switch....



Become a Data Professional Without Paying a Dime

One Week Left to Apply for TDI’s Fall Data Bootcamps

Did you know that you can attend TDI’s bootcamp programs without paying a dime until you land a job and are earning over a certain threshold?

Apply now to work with expert, live instructors and our career services team to help you land your next data job—maybe with one of our exciting hiring partners.

Attend full-time for 8 weeks, or part-time for 20 weeks and without paying anything until you’re working.

Applications close next week on July 29th!
Apply Now.

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!




  • Senior Data Scientist, Startup Creation at Redesign Health - US

    As our Senior Data Scientist for our Startup Creation team, you will set up and configure the data infrastructure for our startups, and work with the startup founding team to define data driven KPIs, and implement automated statistical analyses of customer behavior. Your goal is to make all of the companies that we launch data-driven from day one.

    In this role, you will function as an in-house implementation team for the companies that Redesign Health launches (internally referred to as OpCos). We provide data strategy, data pipeline, data analytics and forecasting services to newly formed companies in a repeatable and scalable manner...


        Want to post a job here? Email us for details -->



Training & Resources

  • Is learning tensorflow & keras still worth it? [Reddit Discussion]
    I recently acquired Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow by Aurélien Geron. I've mainly worked with pytorch but I wanted to revise some ML/DL concepts. I probably should have thought about this before, but given the current trend of migrating from tensorflow to pytorch, is reading this book right now a step back? Thanks!...
  • Modeling Short Time Series with Prior Knowledge in PyMC
    In this notebook I want to reproduce in PyMC the methodology described in the amazing blog post Modeling Short Time Series with Prior Knowledge by Tim Radtke to forecast short time series using bayesian transfer learning 🚀...

What you’re up to – notes from DSW readers

  • Working on something cool? Let us know here :) ...

* To share your projects and updates, share the details here.

** Want to chat with one of the above people? Hit reply and let us know :)


Last Week's Newsletter's 3 Most Clicked Links


* Based on unique clicks.

** Find last week's newsletter here.


P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2022, All rights reserved.
unsubscribe from this list    update subscription preferences 

Key phrases

Older messages

Data Science Weekly - Issue 451

Friday, July 15, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #451 July 14 2022 Editor's Picks The

Data Science Weekly - Issue 450

Friday, July 8, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #450 July 07 2022 Editor's Picks AI

Data Science Weekly - Issue 449

Friday, July 1, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #449 June 30 2022 Editor Picks Pen and

Data Science Weekly - Issue 448

Friday, June 24, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #448 June 23 2022 Editor Picks Machine

Data Science Weekly - Issue 447

Friday, June 17, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #447 June 16 2022 Editor Picks The

Tuesday Triage #108

Tuesday, August 9, 2022

Your weekly crème de la crème of the Internet is here! The 108th edition featuring gesti famosi, Ejection Tie club, and a French focaccia. ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

I replaced all our blog thumbnails using DALL·E 2 — and An incident impacting 5M accounts and private information on Twitter

Tuesday, August 9, 2022

Issue #854 — Top 20 stories of August 10, 2022 Issue #854 — August 10, 2022 You receive this email because you are subscribed to Hacker News Digest. You can open it in the browser if you prefer. 1 I

WhatsApp extends its unsend time limit to 'a little over two days'

Tuesday, August 9, 2022

TechCrunch Newsletter TechCrunch logo The Daily Crunch logo By Christine Hall and Haje Jan Kamps Tuesday, August 09, 2022 Whazzaaaaaaa, we're back with another round of newsy goodness on this fine

Finding & Fixing Python Bugs, Uncommon Usage, NBA Highlights, and More

Tuesday, August 9, 2022

Finding and Fixing Python Code Bugs #537 – AUGUST 9, 2022 VIEW IN BROWSER The PyCoder's Weekly Logo Finding and Fixing Python Code Bugs Learn how to identify and fix logic errors, or bugs, in your

Data Elixir - Issue 399

Tuesday, August 9, 2022

The 8 slide resume. Intro to streaming for data scientists. Random Forest explainer. ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

[New post] Rockin’ The Code World with dotNetDave – Special Guest: Magnus Mårtensson

Tuesday, August 9, 2022

dotNetDave posted: "Join me live on Saturday, August 20th, 2022 at 10:00 PST on C# Corner for show #64 where I will, for the second time, interview my good friend from Sweden, Magnus Mårtensson,

Infographic | Visualizing 10 Years of Global EV Sales by Country 🔋

Tuesday, August 9, 2022

Global EV sales have grown exponentially, more than doubling in 2021 to 6.8 million units. Here's a look at EV sales by country since 2011. View Online | Subscribe Presented by: NEO: NETZ | OTCQB:

[Sublime + Python Setup] Grumpy old greybeard with a whitespace problem

Tuesday, August 9, 2022

One fateful day, the Agile Gods that be decided to “add some firepower” to my little team… And so, developer Paul joined (name changed to protect the guilty). Before I dive into this story, let me ask

New Webinar! IdEM Broadband Macromodeling Tool for Electronic Device Characterization

Tuesday, August 9, 2022

Replace expensive physical tests with high accuracy simulation View this email in your browser Electronic Device Characterization using IdEM Broadband Macromodeling Tool Live Webinar -

3 ways to optimize SaaS sales in a downturn

Tuesday, August 9, 2022

TechCrunch+ Newsletter TechCrunch+ logo TechCrunch+ Roundup logo By Walter Thompson Tuesday, August 09, 2022 Welcome to TechCrunch+ Tuesday Image Credits: Eva Almqvist / Getty Images I have limited