Editor's Picks
- Planning to leave Twitter?
With all of the uncertainty around Twitter's future, many are considering leaving the platform. But before blindly jumping into the unknown, users should seriously consider downloading and saving their Twitter data to analyze it for important trends, insights and information that they can take with them...We created the free dataviz tool below to illustrate how data visualization can help better inform users before they decide to delete their Twitter accounts and abandon years of useful data. Without dataviz, these insights are nearly impossible for anyone to decipher from a data file alone...
- Command-line data analytics made easy
The command-line is incredibly powerful when it comes to data processing. Still, many of us working with data do not take advantage of it...These motivated me to write a command-line tool that focus on readability, easiness to learn and modern data formats, while leveraging the command-line ecosystem. On top of that, it also leverages the Python ecosystem! Meet SPyQL - SQL with Python in the middle...
- How Federated Learning Protects Privacy
With federated learning, it’s possible to collaboratively train a model with data from multiple users without any raw data leaving their devices. If we can learn from data across many sources without needing to own or collect it, imagine what opportunities that opens!...Let’s explore how this technology works with a simple example we can all relate to: blocking spam messages...
A Message from this week's Sponsor:
Out now: new semantic layer whitepapers
Check out this bundle of Semantic Layer whitepapers by best selling authors - download here.
You'll learn the key value propositions to implement a semantic layer and best practices for analytics success with one.
Data Science Articles & Videos
- Management Seat Time: Reflections on Management and Returning to Engineering
In case you haven’t heard, there’s a (r)evolution going on in the modern data stack. There’s a new way of working in data, where software engineering best practices are the way the data team gets work done. I recently left my role as the manager of a large team to join Aula Education as a Sr Analytics Engineer — primarily to get in on the technical fun...This post is a reflection on my years in management and the lessons I learned from them...
- Tutorial #17: Transformers III Training
In part I of this tutorial we introduced the self-attention mechanism and the transformer architecture. In part II, we discussed position encoding and how to extend the transformer to longer sequence lengths. We also discussed connections between the transformer and other machine learning models...In this final part, we discuss challenges with transformer training dynamics and introduce some of the tricks that practitioners use to get transformers to converge...
- Online internal speech decoding from single neurons in a human participant
Speech brain-machine interfaces (BMI’s) translate brain signals into words or audio outputs, enabling communication for people having lost their speech abilities due to diseases or injury...In this work, a tetraplegic participant with implanted microelectrode arrays located in the supramarginal gyrus (SMG) and primary somatosensory cortex (S1) performed internal and vocalized speech of six words and two pseudowords. We found robust internal speech decoding from SMG single neuron activity, achieving up to 91% classification accuracy during an online task (chance level 12.5%)...
- Dashboard Design Patterns
There are many high-level guidelines on dashboard design, including advice about visual perception, reducing information load, the use of interaction, and visualization literacy. Despite this, we know little about effective and applicable dashboard design, and about how to support rapid dashboard design...Our design patterns for dashboard design on this website aims to support creativity and to streamline the dashboard design...
- Experiences from Using Code Explanations Generated by Large Language Models in a Web Software Development E-Book
In this paper, we report on our experiences generating multiple code explanation types using LLMs and integrating them into an interactive e-book on web software development. We modified the e-book to make LLM-generated code explanations accessible through buttons next to code snippets in the materials, which allowed us to track the use of the explanations as well as to ask for feedback on their utility...Our preliminary results show that all varieties of explanations were viewed by students and that the majority of students perceived the code explanations as helpful to them. However, student engagement appeared to vary by code snippet complexity, explanation type, and code snippet length...
- The Use Case for Relative Position Embeddings
We’re in 2022 but many of our most popular causal language models (LMs), including GPT-3, still use absolute positional embeddings. I believe we should stop using those and move to relative positional embeddings such as ALiBi. Deepmind’s Gopher and BigScience’s BLOOM already use relative positioning methods, and I’ve heard that multiple upcoming models also will, and so hopefully this post will help in encouraging the remanining holdouts to follow suit...
- Datacast Episode 101: Scaling Data Engineering, Building Data Teams, and Managed Data Stack With Tarush Aggarwal
This is my conversation with Tarush Aggarwal — the Founder and CEO of 5x, the modern data stack as a managed data service. He is one of the leading experts in leveraging data for exponential growth, with over ten years of experience in the field...Our wide-ranging conversation touches on his college experience at Carnegie Mellon University, his time at Salesforce as the first data engineer, lessons learned from building and managing a data team as a Data Manager at Wyng, his leadership role at WeWork scaling the data team and establishing the operations in the Chinese market, his current journey with 5x building the app store for the modern data stack, and much more....
- Learning to Imitate
Systems often require over 100 million interactions with an environment to train — equivalent of more than 100 years of human experience — to reach human-level performance. In contrast, a human can acquire new skills in relatively short amounts of time by observing an expert. How can we enable our artificial agents to similarly acquire such fast learning ability?...In this post, I’ll discuss several techniques being developed in a field called “Imitation Learning” (IL) to solve these sorts of problems and present a recent method from our lab, called Inverse Q-Learning — which was used to create the best AI agent for playing Minecraft using few expert demos...
- Chelsea Finn, Stanford: On the biggest bottlenecks in robotics and reinforcement learning
Chelsea Finn is an Assistant Professor at Stanford and part of the Google Brain team. She's interested in the capability of robots and other agents to develop broadly intelligent behavior through learning and interaction at scale. In this episode, we chat about some of the biggest bottlenecks in RL and robotics—including distribution shifts, Sim2Real transferability, and the inherent tradeoff of sample efficiency—as well as what makes a great researcher, why she aspires to build a robot that can make cereal, and much more...
- Find All the Pangolins
Videos from trail cameras are a useful tool for noninvasive observation of wildlife, but if you are studying a rare species, you might have to look at a lot of videos before you find it...In a previous article, we used probablistic classification to remove blank videos, that is, videos that don't contain animals...An alternative is a targeted search. For each video in a dataset, we use probablistic classification to compute the probability that it contains each category of animal (species or group of species). If we are looking for a particular species, we can sort the videos in descending order by the probability they contain the category that contains the target species...
Tool*
Co:here
Unlock the power of language models – No ML experience required. Cohere’s ready-to-use NLP toolkits can help you build and deploy your language AI projects at scale. Our pre-trained models enable developers to build AI-driven apps faster and easier from creating Marketing copy, product descriptions, to summarizing articles, categorizing text and much more! Whether you’re a beginner or an expert, Cohere is making NLP accessible to everyone.
Get started for free
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!
Tool*
Bridge the Analytics to Action Gap with Hightouch + Looker
Want to get the most from your data in Looker, but don’t want to rewrite your Looks or :shudder: export a CSV? We have great news! With Hightouch, you can automatically sync your Looks directly into any business tool in minutes, so you and your stakeholders can easily take action with all the right context.
Ready to bridge the analytics to action gap? Check out Hightouch.com to learn more or get started for free.
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!
Jobs
- Senior Data Analyst - Epic Games - New York
Epic Games spans across 19 countries with 55 studios and 4,500+ employees globally. For over 25 years, we’ve been making award-winning games and engine technology that empowers others to make visually stunning games and 3D content that bring environments to life like never before.
Use your expert experience in data & analytics to build powerful stories and visuals that inform the games we make, the technology we develop, and business decisions that drive Epic... Epic Games is looking for a Senior Data Analyst to help us create the models that fuel our creator economy. The successful candidate will have excellent SQL knowledge, and enjoy combining analytic skills with business acumen to provide the data and insights that will drive our continued success...
Want to post a job here? Email us for details --> team@datascienceweekly.org
Training & Resources
- Orchestrating Single-Cell Analysis with Bioconductor
This is the landing page for the “Orchestrating Single-Cell Analysis with Bioconductor” book, which teaches users some common workflows for the analysis of single-cell RNA-seq data (scRNA-seq). This book will show you how to make use of cutting-edge Bioconductor tools to process, analyze, visualize, and explore scRNA-seq data. Additionally, it serves as an online companion for the paper of the same name...
- Lovely Tensors - for PyTorch
How often do you find yourself debugging PyTorch code? You dump a tensor to the cell output, and see this...Was it really useful for you, as a human, to see all these numbers?...What is the shape?...The size?...What are the statistics?...Are any of the values nan or inf?...Is it an image of a man holding a tench?...
Last Week's Newsletter's 3 Most Clicked Links
* Based on unique clicks.
** Find last week's newsletter here.
Cutting Room Floor
P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian |