Data Science Weekly - Data Science Weekly - Issue 468

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #468

November 03 2022

Editor's Picks

  • Planning to leave Twitter?
    With all of the uncertainty around Twitter's future, many are considering leaving the platform. But before blindly jumping into the unknown, users should seriously consider downloading and saving their Twitter data to analyze it for important trends, insights and information that they can take with them...We created the free dataviz tool below to illustrate how data visualization can help better inform users before they decide to delete their Twitter accounts and abandon years of useful data. Without dataviz, these insights are nearly impossible for anyone to decipher from a data file alone...
  • Command-line data analytics made easy
    The command-line is incredibly powerful when it comes to data processing. Still, many of us working with data do not take advantage of it...These motivated me to write a command-line tool that focus on readability, easiness to learn and modern data formats, while leveraging the command-line ecosystem. On top of that, it also leverages the Python ecosystem! Meet SPyQL - SQL with Python in the middle...
  • How Federated Learning Protects Privacy
    With federated learning, it’s possible to collaboratively train a model with data from multiple users without any raw data leaving their devices. If we can learn from data across many sources without needing to own or collect it, imagine what opportunities that opens!...Let’s explore how this technology works with a simple example we can all relate to: blocking spam messages...


A Message from this week's Sponsor:


Out now: new semantic layer whitepapers

Check out this bundle of Semantic Layer whitepapers by best selling authors - download here.

You'll learn the key value propositions to implement a semantic layer and best practices for analytics success with one.


Data Science Articles & Videos

  • Management Seat Time: Reflections on Management and Returning to Engineering
    In case you haven’t heard, there’s a (r)evolution going on in the modern data stack. There’s a new way of working in data, where software engineering best practices are the way the data team gets work done. I recently left my role as the manager of a large team to join Aula Education as a Sr Analytics Engineer — primarily to get in on the technical fun...This post is a reflection on my years in management and the lessons I learned from them...
  • Tutorial #17: Transformers III Training
    In part I of this tutorial we introduced the self-attention mechanism and the transformer architecture. In part II, we discussed position encoding and how to extend the transformer to longer sequence lengths. We also discussed connections between the transformer and other machine learning models...In this final part, we discuss challenges with transformer training dynamics and introduce some of the tricks that practitioners use to get transformers to converge...
  • Online internal speech decoding from single neurons in a human participant
    Speech brain-machine interfaces (BMI’s) translate brain signals into words or audio outputs, enabling communication for people having lost their speech abilities due to diseases or injury...In this work, a tetraplegic participant with implanted microelectrode arrays located in the supramarginal gyrus (SMG) and primary somatosensory cortex (S1) performed internal and vocalized speech of six words and two pseudowords. We found robust internal speech decoding from SMG single neuron activity, achieving up to 91% classification accuracy during an online task (chance level 12.5%)...
  • Dashboard Design Patterns
    There are many high-level guidelines on dashboard design, including advice about visual perception, reducing information load, the use of interaction, and visualization literacy. Despite this, we know little about effective and applicable dashboard design, and about how to support rapid dashboard design...Our design patterns for dashboard design on this website aims to support creativity and to streamline the dashboard design...
  • Experiences from Using Code Explanations Generated by Large Language Models in a Web Software Development E-Book
    In this paper, we report on our experiences generating multiple code explanation types using LLMs and integrating them into an interactive e-book on web software development. We modified the e-book to make LLM-generated code explanations accessible through buttons next to code snippets in the materials, which allowed us to track the use of the explanations as well as to ask for feedback on their utility...Our preliminary results show that all varieties of explanations were viewed by students and that the majority of students perceived the code explanations as helpful to them. However, student engagement appeared to vary by code snippet complexity, explanation type, and code snippet length...
  • The Use Case for Relative Position Embeddings
    We’re in 2022 but many of our most popular causal language models (LMs), including GPT-3, still use absolute positional embeddings. I believe we should stop using those and move to relative positional embeddings such as ALiBi. Deepmind’s Gopher and BigScience’s BLOOM already use relative positioning methods, and I’ve heard that multiple upcoming models also will, and so hopefully this post will help in encouraging the remanining holdouts to follow suit...
  • Datacast Episode 101: Scaling Data Engineering, Building Data Teams, and Managed Data Stack With Tarush Aggarwal
    This is my conversation with Tarush Aggarwal — the Founder and CEO of 5x, the modern data stack as a managed data service. He is one of the leading experts in leveraging data for exponential growth, with over ten years of experience in the field...Our wide-ranging conversation touches on his college experience at Carnegie Mellon University, his time at Salesforce as the first data engineer, lessons learned from building and managing a data team as a Data Manager at Wyng, his leadership role at WeWork scaling the data team and establishing the operations in the Chinese market, his current journey with 5x building the app store for the modern data stack, and much more....
  • Learning to Imitate
    Systems often require over 100 million interactions with an environment to train — equivalent of more than 100 years of human experience — to reach human-level performance. In contrast, a human can acquire new skills in relatively short amounts of time by observing an expert. How can we enable our artificial agents to similarly acquire such fast learning ability?...In this post, I’ll discuss several techniques being developed in a field called “Imitation Learning” (IL) to solve these sorts of problems and present a recent method from our lab, called Inverse Q-Learning — which was used to create the best AI agent for playing Minecraft using few expert demos...
  • Chelsea Finn, Stanford: On the biggest bottlenecks in robotics and reinforcement learning
    Chelsea Finn is an Assistant Professor at Stanford and part of the Google Brain team. She's interested in the capability of robots and other agents to develop broadly intelligent behavior through learning and interaction at scale. In this episode, we chat about some of the biggest bottlenecks in RL and robotics—including distribution shifts, Sim2Real transferability, and the inherent tradeoff of sample efficiency—as well as what makes a great researcher, why she aspires to build a robot that can make cereal, and much more...
  • Find All the Pangolins
    Videos from trail cameras are a useful tool for noninvasive observation of wildlife, but if you are studying a rare species, you might have to look at a lot of videos before you find it...In a previous article, we used probablistic classification to remove blank videos, that is, videos that don't contain animals...An alternative is a targeted search. For each video in a dataset, we use probablistic classification to compute the probability that it contains each category of animal (species or group of species). If we are looking for a particular species, we can sort the videos in descending order by the probability they contain the category that contains the target species...





Unlock the power of language models – No ML experience required. Cohere’s ready-to-use NLP toolkits can help you build and deploy your language AI projects at scale. Our pre-trained models enable developers to build AI-driven apps faster and easier from creating Marketing copy, product descriptions, to summarizing articles, categorizing text and much more! Whether you’re a beginner or an expert, Cohere is making NLP accessible to everyone.

Get started for free

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!





Bridge the Analytics to Action Gap with Hightouch + Looker

Want to get the most from your data in Looker, but don’t want to rewrite your Looks or :shudder: export a CSV? We have great news! With Hightouch, you can automatically sync your Looks directly into any business tool in minutes, so you and your stakeholders can easily take action with all the right context.

Ready to bridge the analytics to action gap? Check out to learn more or get started for free.

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!




  • Senior Data Analyst - Epic Games - New York

    Epic Games spans across 19 countries with 55 studios and 4,500+ employees globally. For over 25 years, we’ve been making award-winning games and engine technology that empowers others to make visually stunning games and 3D content that bring environments to life like never before.

    Use your expert experience in data & analytics to build powerful stories and visuals that inform the games we make, the technology we develop, and business decisions that drive Epic... Epic Games is looking for a Senior Data Analyst to help us create the models that fuel our creator economy. The successful candidate will have excellent SQL knowledge, and enjoy combining analytic skills with business acumen to provide the data and insights that will drive our continued success...


        Want to post a job here? Email us for details -->



Training & Resources

  • Orchestrating Single-Cell Analysis with Bioconductor
    This is the landing page for the “Orchestrating Single-Cell Analysis with Bioconductor” book, which teaches users some common workflows for the analysis of single-cell RNA-seq data (scRNA-seq). This book will show you how to make use of cutting-edge Bioconductor tools to process, analyze, visualize, and explore scRNA-seq data. Additionally, it serves as an online companion for the paper of the same name...
  • Lovely Tensors - for PyTorch
    How often do you find yourself debugging PyTorch code? You dump a tensor to the cell output, and see this...Was it really useful for you, as a human, to see all these numbers?...What is the shape?...The size?...What are the statistics?...Are any of the values nan or inf?...Is it an image of a man holding a tench?...

Last Week's Newsletter's 3 Most Clicked Links


* Based on unique clicks.

** Find last week's newsletter here.


Cutting Room Floor


P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2022, All rights reserved.
unsubscribe from this list    update subscription preferences 

Older messages

Data Science Weekly - Issue 467

Thursday, November 3, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #467 November 03 2022 Editor's Picks

Data Science Weekly - Issue 466

Thursday, October 27, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #466 October 27 2022 Editor's Picks

Data Science Weekly - Issue 465

Thursday, October 20, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #465 October 20 2022 Editor's Picks

Data Science Weekly - Issue 464

Thursday, October 13, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #464 October 13 2022 Editor's Picks

Data Science Weekly - Issue 463

Thursday, October 6, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #463 October 06 2022 Editor's Picks — Show HN: Using stylometry to find HN users with alternate accounts — and The Need to Read

Saturday, November 26, 2022

Issue #963 — Top 20 stories of November 27, 2022 Issue #963 — November 27, 2022 You receive this email because you are subscribed to Hacker News Digest. You can open it in the browser if you prefer. 1

⚡️30% OFF – only three days left⚡️

Saturday, November 26, 2022

Don't miss out! ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

Digest #79: Run GitHub Actions Locally 🚀

Saturday, November 26, 2022

Digest #79: Run GitHub Actions Locally 🚀 #79: Run GitHub Actions Locally This week's newsletter highlights why containers are just chrooted processes, how to migrate from Postgres to DynamoDB,

Daily Coding Problem: Problem #947 [Hard]

Saturday, November 26, 2022

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Oracle. We say a number is sparse if there are no adjacent ones in its binary

12 AirPods Features You Should be Using

Saturday, November 26, 2022

Did You Know?: Most cranberries are harvested by flooding their fields (they float thanks to little air pockets in the berries) and tend to be frozen or processed quickly once harvested. Those that are

Infographic | Every Song With Over 1 Billion Spotify Streams 🎧

Saturday, November 26, 2022

Spotify's 'Billions Club' playlist tracks every song with over 1 billion streams. We took the data and broke it down by decade and artist. View Online | Subscribe Presented by: Our latest

U.S. Bans Chinese Telecom Equipment and Surveillance Cameras Over National Security Risk

Saturday, November 26, 2022

The Hacker News Daily Updates Newsletter cover A Must-Have Checklist for Workplace Security and Protection What do a bustling workplace, happy employees, and secure equipment and data all have in

Noonification: White Man

Saturday, November 26, 2022

Top Tech Content sent at Noon! Find Your Next Software Engineering Job on Hired How are you, @hacker? 🪐 What's happening in tech this week: The Noonification by HackerNoon has got you covered with

8 Ways Google Assistant Can Increase Your Work Productivity

Saturday, November 26, 2022

Read in Browser Logo for Review Geek November 26, 2022 The year is winding down, and it's getting more difficult to be productive than ever. Call it the weather, call it the holidays, call it all

New Python tutorials on Real Python

Saturday, November 26, 2022

Hey there, There's always something going on over at as far as Python tutorials go. Here's what you may have missed this past week: Python REST APIs With Flask, Connexion, and