Data Science Weekly - Data Science Weekly - Issue 468

Curated news, articles and jobs related to Data Science.
Keep up with all the latest developments

Email not displaying correctly?
View it in your browser.

Issue #468

November 03 2022

Editor's Picks

Planning to leave Twitter?
With all of the uncertainty around Twitter's future, many are considering leaving the platform. But before blindly jumping into the unknown, users should seriously consider downloading and saving their Twitter data to analyze it for important trends, insights and information that they can take with them...We created the free dataviz tool below to illustrate how data visualization can help better inform users before they decide to delete their Twitter accounts and abandon years of useful data. Without dataviz, these insights are nearly impossible for anyone to decipher from a data file alone...

Command-line data analytics made easy
The command-line is incredibly powerful when it comes to data processing. Still, many of us working with data do not take advantage of it...These motivated me to write a command-line tool that focus on readability, easiness to learn and modern data formats, while leveraging the command-line ecosystem. On top of that, it also leverages the Python ecosystem! Meet SPyQL - SQL with Python in the middle...

How Federated Learning Protects Privacy
With federated learning, it’s possible to collaboratively train a model with data from multiple users without any raw data leaving their devices. If we can learn from data across many sources without needing to own or collect it, imagine what opportunities that opens!...Let’s explore how this technology works with a simple example we can all relate to: blocking spam messages...

A Message from this week's Sponsor:

Out now: new semantic layer whitepapers

Check out this bundle of Semantic Layer whitepapers by best selling authors - download here.

You'll learn the key value propositions to implement a semantic layer and best practices for analytics success with one.

Data Science Articles & Videos

Management Seat Time: Reflections on Management and Returning to Engineering
In case you haven’t heard, there’s a (r)evolution going on in the modern data stack. There’s a new way of working in data, where software engineering best practices are the way the data team gets work done. I recently left my role as the manager of a large team to join Aula Education as a Sr Analytics Engineer — primarily to get in on the technical fun...This post is a reflection on my years in management and the lessons I learned from them...

Tutorial #17: Transformers III Training
In part I of this tutorial we introduced the self-attention mechanism and the transformer architecture. In part II, we discussed position encoding and how to extend the transformer to longer sequence lengths. We also discussed connections between the transformer and other machine learning models...In this final part, we discuss challenges with transformer training dynamics and introduce some of the tricks that practitioners use to get transformers to converge...

Online internal speech decoding from single neurons in a human participant
Speech brain-machine interfaces (BMI’s) translate brain signals into words or audio outputs, enabling communication for people having lost their speech abilities due to diseases or injury...In this work, a tetraplegic participant with implanted microelectrode arrays located in the supramarginal gyrus (SMG) and primary somatosensory cortex (S1) performed internal and vocalized speech of six words and two pseudowords. We found robust internal speech decoding from SMG single neuron activity, achieving up to 91% classification accuracy during an online task (chance level 12.5%)...

Dashboard Design Patterns
There are many high-level guidelines on dashboard design, including advice about visual perception, reducing information load, the use of interaction, and visualization literacy. Despite this, we know little about effective and applicable dashboard design, and about how to support rapid dashboard design...Our design patterns for dashboard design on this website aims to support creativity and to streamline the dashboard design...

Experiences from Using Code Explanations Generated by Large Language Models in a Web Software Development E-Book
In this paper, we report on our experiences generating multiple code explanation types using LLMs and integrating them into an interactive e-book on web software development. We modified the e-book to make LLM-generated code explanations accessible through buttons next to code snippets in the materials, which allowed us to track the use of the explanations as well as to ask for feedback on their utility...Our preliminary results show that all varieties of explanations were viewed by students and that the majority of students perceived the code explanations as helpful to them. However, student engagement appeared to vary by code snippet complexity, explanation type, and code snippet length...

Rise of Generative AI will be comparable to the rise of CGI in the early 90s
Runway, the content creation and video editing startup, has gotten much attention on Twitter over the past few weeks...I talked to Runway’s CEO, Cris, for a deep dive into their journey and to get his perspective on the recent developments and the future of generative AI...

Everything I know about Mastodon: A hastily written guide for data science folks trying to navigate the fediverse.
Hello there fellow data science person. Have you heard rumours that a lot of folks from our community are moving to use mastodon for social networking? Are you curious, but maybe not quite sure about how to get started? Have you been thinking “twitter is a hellsite and I need to escape” a lot lately?...If yes, this post is for you!...

The Use Case for Relative Position Embeddings
We’re in 2022 but many of our most popular causal language models (LMs), including GPT-3, still use absolute positional embeddings. I believe we should stop using those and move to relative positional embeddings such as ALiBi. Deepmind’s Gopher and BigScience’s BLOOM already use relative positioning methods, and I’ve heard that multiple upcoming models also will, and so hopefully this post will help in encouraging the remanining holdouts to follow suit...

Datacast Episode 101: Scaling Data Engineering, Building Data Teams, and Managed Data Stack With Tarush Aggarwal
This is my conversation with Tarush Aggarwal — the Founder and CEO of 5x, the modern data stack as a managed data service. He is one of the leading experts in leveraging data for exponential growth, with over ten years of experience in the field...Our wide-ranging conversation touches on his college experience at Carnegie Mellon University, his time at Salesforce as the first data engineer, lessons learned from building and managing a data team as a Data Manager at Wyng, his leadership role at WeWork scaling the data team and establishing the operations in the Chinese market, his current journey with 5x building the app store for the modern data stack, and much more....

Learning to Imitate
Systems often require over 100 million interactions with an environment to train — equivalent of more than 100 years of human experience — to reach human-level performance. In contrast, a human can acquire new skills in relatively short amounts of time by observing an expert. How can we enable our artificial agents to similarly acquire such fast learning ability?...In this post, I’ll discuss several techniques being developed in a field called “Imitation Learning” (IL) to solve these sorts of problems and present a recent method from our lab, called Inverse Q-Learning — which was used to create the best AI agent for playing Minecraft using few expert demos...

Chelsea Finn, Stanford: On the biggest bottlenecks in robotics and reinforcement learning
Chelsea Finn is an Assistant Professor at Stanford and part of the Google Brain team. She's interested in the capability of robots and other agents to develop broadly intelligent behavior through learning and interaction at scale. In this episode, we chat about some of the biggest bottlenecks in RL and robotics—including distribution shifts, Sim2Real transferability, and the inherent tradeoff of sample efficiency—as well as what makes a great researcher, why she aspires to build a robot that can make cereal, and much more...

Find All the Pangolins
Videos from trail cameras are a useful tool for noninvasive observation of wildlife, but if you are studying a rare species, you might have to look at a lot of videos before you find it...In a previous article, we used probablistic classification to remove blank videos, that is, videos that don't contain animals...An alternative is a targeted search. For each video in a dataset, we use probablistic classification to compute the probability that it contains each category of animal (species or group of species). If we are looking for a particular species, we can sort the videos in descending order by the probability they contain the category that contains the target species...

Tool*

Co:here

Unlock the power of language models – No ML experience required. Cohere’s ready-to-use NLP toolkits can help you build and deploy your language AI projects at scale. Our pre-trained models enable developers to build AI-driven apps faster and easier from creating Marketing copy, product descriptions, to summarizing articles, categorizing text and much more! Whether you’re a beginner or an expert, Cohere is making NLP accessible to everyone.

Get started for free

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

Tool*

Bridge the Analytics to Action Gap with Hightouch + Looker

Want to get the most from your data in Looker, but don’t want to rewrite your Looks or :shudder: export a CSV? We have great news! With Hightouch, you can automatically sync your Looks directly into any business tool in minutes, so you and your stakeholders can easily take action with all the right context.

Ready to bridge the analytics to action gap? Check out Hightouch.com to learn more or get started for free.

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

Jobs

Senior Data Analyst - Epic Games - New York

Epic Games spans across 19 countries with 55 studios and 4,500+ employees globally. For over 25 years, we’ve been making award-winning games and engine technology that empowers others to make visually stunning games and 3D content that bring environments to life like never before.

Use your expert experience in data & analytics to build powerful stories and visuals that inform the games we make, the technology we develop, and business decisions that drive Epic... Epic Games is looking for a Senior Data Analyst to help us create the models that fuel our creator economy. The successful candidate will have excellent SQL knowledge, and enjoy combining analytic skills with business acumen to provide the data and insights that will drive our continued success...

Want to post a job here? Email us for details --> team@datascienceweekly.org

Training & Resources

Orchestrating Single-Cell Analysis with Bioconductor
This is the landing page for the “Orchestrating Single-Cell Analysis with Bioconductor” book, which teaches users some common workflows for the analysis of single-cell RNA-seq data (scRNA-seq). This book will show you how to make use of cutting-edge Bioconductor tools to process, analyze, visualize, and explore scRNA-seq data. Additionally, it serves as an online companion for the paper of the same name...

Lovely Tensors - for PyTorch
How often do you find yourself debugging PyTorch code? You dump a tensor to the cell output, and see this...Was it really useful for you, as a human, to see all these numbers?...What is the shape?...The size?...What are the statistics?...Are any of the values nan or inf?...Is it an image of a man holding a tench?...

Build a Website to Talk to GPT-3 using Bubble and OpenAI
In this post, I’m going to show you how to use Bubble to connect to the OpenAI API and build a simple input and output page that you can ask any question to!...

Last Week's Newsletter's 3 Most Clicked Links

The past, present, and future of notebooks

What Good Data Self-Serve Looks Like

The biggest bottleneck for large language model startups is UX

* Based on unique clicks.

** Find last week's newsletter here.

Cutting Room Floor

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

Follow on Twitter

unsubscribe from this list update subscription preferences

Data Science Weekly - Data Science Weekly - Issue 468

Issue #468

November 03 2022

Editor's Picks

A Message from this week's Sponsor:

Data Science Articles & Videos

Tool*

Tool*

Jobs

Training & Resources

Last Week's Newsletter's 3 Most Clicked Links

Cutting Room Floor

Older messages

Data Science Weekly - Issue 467

Data Science Weekly - Issue 466

Data Science Weekly - Issue 465

Data Science Weekly - Issue 464

Data Science Weekly - Issue 463

You Might Also Like

Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator

Defining Your Paranoia Level: Navigating Change Without the Overkill

5 ways AI can help with taxes 🪄

Recurring Automations + Secret Updates

The First Provable AI-Proof Game: Introducing Butterfly Wings 4

GCP Newsletter #437

Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰

The Great Social Media Diaspora & Tapestry is here

Daily Coding Problem: Problem #1689 [Medium]

📧 Stop Conflating CQRS and MediatR

Data Science Weekly - Data Science Weekly - Issue 468

Issue #468 November 03 2022

Editor's Picks

A Message from this week's Sponsor:

Data Science Articles & Videos

Tool*

Tool*

Jobs

Training & Resources

Last Week's Newsletter's 3 Most Clicked Links

Cutting Room Floor

Older messages

You Might Also Like

Issue #468

November 03 2022