Data Science Weekly - Data Science Weekly - Issue 465

Curated news, articles and jobs related to Data Science.
Keep up with all the latest developments

Email not displaying correctly?
View it in your browser.

Issue #465

October 20 2022

Editor's Picks

An AI Might Have Written This
Every has been building Lex, a word processor with AI baked in. I started working on this piece before we launched Lex, but testing out this tool (among others) has shaped my perspective on the role of AI writing assistants for creatives...Indie fiction writers are using AI assistants to write their novels faster, and a New York Times best-selling author, April Henry, is using AI to help generate story ideas...

How Transformers Seem to Mimic Parts of the Brain
For years, neuroscientists have harnessed many types of neural networks to model the firing of neurons in the brain. In recent work, researchers have shown that the hippocampus, a structure of the brain critical to memory, is basically a special kind of neural net, known as a transformer, in disguise. Their new model tracks spatial information in a way that parallels the inner workings of the brain. They’ve seen remarkable success...

State of AI Report 2022
Now in its fifth year, the State of AI Report 2022 is reviewed by leading AI practioners in industry and research. It considers the following key dimensions, including a new Safety section: a) Research: Technology breakthroughs and their capabilities, b) Industry: Areas of commercial application for AI and its business impact, c) Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI, d) Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us, e) Predictions: What we believe will happen and a performance review to keep us honest...Key themes in the 2022 Report include... ...

A Message from this week's Sponsor:

Out now: new semantic layer whitepapers

Check out this bundle of Semantic Layer whitepapers by best selling authors - download here.

You'll learn the key value propositions to implement a semantic layer and best practices for analytics success with one.

Data Science Articles & Videos

Building Transformers from Neurons and Astrocytes
Glial cells account for roughly 90% of all human brain cells, and serve a variety of important developmental, structural, and metabolic functions. Recent experimental efforts suggest that astrocytes, a type of glial cell, are also directly involved in core cognitive processes such as learning and memory. While it is well-established that astrocytes and neurons are connected to one another in feedback loops across many time scales and spatial scales, there is a gap in understanding the computational role of neuron-astrocyte interactions. To help bridge this gap, we draw on recent advances in artificial intelligence (AI) and astrocyte imaging technology. In particular, we show that neuron-astrocyte networks can naturally perform the core computation of a Transformer...

How We Enabled Dev and Data Science Independence With Clear API Boundaries Using Airflow and Databricks
Your dev team needs to use a data science algorithm to solve a real business problem, but how can you use this algorithm? Usually, data scientists write in R, Python, or Scala (Spark), and these do not expose a microservice you can consume using a clear API like any other service. So, you will often need someone (Dev/ML Platform) to wrap a data science artifact and expose it for consumption...In this post, I will show you how we enabled our data science team to expose their artifacts with a clear API, allowing them to take full ownership of the process from deployment to production...

Are you Data Scientists or Software Developers?!
In my recent talk ‘Really Useful Engines’ I rabbited on about how effective data science teams must necessarily engineer a domain specific capability layer of software functions or packages that become a force multiplier. The simple becomes trivial, and the hard becomes tractable. This makes headroom for the development of more capabilities still. A virtuous cycle. It’s either that or get snared in a quagmire of copy-pasta code tech debt...This post is about what that looks like, and how it can be made better with good data science tooling (or not)...

Minimax Estimation and Identity Testing of Markov Chains
We briefly review the two classical problems of distribution estimation and identity testing (in the context of property testing), then propose to extend them to a Markovian setting. We will see that the sample complexity depends not only on the number of states, but also on the stationary and mixing properties of the chains...

Exploring the Frontiers in Earth System Modeling with Machine Learning
Over the last decade, the volume of data from satellite sensors and Earth system models has increased by at least an order of magnitude...This workshop brings a small but varied group of geoscientists and climate modelers together with machine learners, statisticians, and representatives of other fields where ML has already had a big impact. Discussions will center on how innovative and efficient ML methods will provide new, innovative and transformative ways of modeling and projecting the Earth system and extracting information from massive data volumes...[Videos and PDFs from presentations available]...

John Schulman on TalkRL: The Reinforcement Learning Podcast
John Schulman, OpenAI cofounder and researcher, inventor of PPO/TRPO talks RL from human feedback, tuning GPT-3 to follow instructions (InstructGPT) and answer long-form questions using the internet (WebGPT), AI alignment, AGI timelines, and more!...

Memorizing facts about systems I work with [Twitter Thread]
I've found it unexpectedly useful to memorize facts about systems I work with...Knowing these numbers allows one to 1. sanity check performance, 2. sketch out feasibility of technical solutions, and 3. reason about performance characteristics...Some examples below...

Bayesian Structural Timeseries - Forecasting
We want to show how we can model bayesian structural time series with autoregressive processes can be modeled in pymc and used to predict future unobserved data. How these kinds of models can flexibly incorporate structural assumptions and project future outcomes is only sparsely covered in the PYMC documentation. Hopefully recording the full modeling and prediction loop here is useful for you...

How undesired goals can arise with correct rewards
Exploring examples of goal misgeneralisation – where an AI system's capabilities generalise but its goal doesn't...we explore a more subtle mechanism by which AI systems may unintentionally learn to pursue undesired goals: goal misgeneralisation (GMG)...

Obtaining genetics insights from deep learning via explainable artificial intelligence
AI models based on deep learning now represent the state of the art for making functional predictions in genomics research. However, the underlying basis on which predictive models make such predictions is often unknown. For genomics researchers, this missing explanatory information would frequently be of greater value than the predictions themselves, as it can enable new insights into genetic processes. We review progress in the emerging area of explainable AI (xAI), a field with the potential to empower life science researchers to gain mechanistic insights into complex deep learning models...

General-Purpose Pre-Trained Models in Robotics
The impressive generalization capabilities of large neural network models hinge on the ability to integrate enormous quantities of training data. This presents a major challenge for most downstream tasks where data is scarce...A central benefit of robotic learning should be in enabling rapid and autonomous acquisition of new tasks on command, but if each task requires either a large human-provided demonstration dataset or a long reinforcement learning training run, this benefit will be hard to realize. So how can we develop models and datasets that make it possible to pre-train for a broad range of downstream robotic skills?...

Introducing the Sequoia Generative AI Market Map [Twitter Thread]
Introducing the @sequoia Gen AI Market Map! 🌎 We’ve decided to map out this emerging frontier, thanks to all the contributions and feedback we’ve received...This space is moving quickly – this map is a living document...

Tool*

Retool is the fast way to build an interface for any database

With Retool, you don't need to be a developer to quickly build an app or dashboard on top of any data set. Data teams at companies like NBC use Retool to build any interface on top of their data—whether it's a simple read-write visualization or a full-fledged ML workflow.

Drag and drop UI components—like tables and charts—to create apps. At every step, you can jump into the code to define the SQL queries and JavaScript that power how your app acts and connects to data. The result—less time on repetitive work and more time to discover insights.

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

Conference*

Global AI Developer Days – 26 October 2022

Join the Global AI Community for a day of inspiring keynotes from industry leaders with a high focus on AI developers.

Highlights during this 3-hour conference include responsible AI by Ruth Yakubu Principal Cloud Advocate at Microsoft. She will talk about how to improve fairness and reliability of AI solutions. Eric Boyd, Corporate Vice President at Microsoft, will show all the latest inventions in Azure AI. Manuvir Das, Head of Enterprise Computing at NVIDIA, takes you on a journey through the new era of AI for developers and many more leaders from the AI community will share their vision.

Don’t miss out on this free day of learning from top leaders in the AI space!

https://devdays.globalai.community

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

Jobs

Data Scientist - Mount Sinai Data Commons - NYC

A position is available for an individual with skills in data science, bioinformatics and software engineering to play the key role in running and managing the Mount Sinai Data Commons – known as the Data Ark. The Data Ark team brings together all the most important data sets used by Sinai researchers (e.g. 1000G, GTEx, UK Biobank) in a single location on our HPC server (minvera.org), performs QA/QC processing of the data, conducts initial demographics analyses to showcase the different data sets, and will be tasked with expanding the data commons to host a large range of different data sets of different types (genotype, WES, WGS, RNA-seq, EHR-linked, imaging etc.), which will come with their own computational and platform challenges...

Want to post a job here? Email us for details --> team@datascienceweekly.org

Training & Resources

Napkin Math - Techniques and numbers for estimating system's performance
The goal of this project is to collect software, numbers, and techniques to quickly estimate the expected performance of systems from first-principles. For example, how quickly can you read 1 GB of memory? By composing these resources you should be able to answer interesting questions like: how much storage cost should you expect to pay for logging for an application with 100,000 RPS?...

Tutorial on Uncertainty Estimation for Natural Language Processing
This tutorial is intended for both academic researchers and industry practitioners alike, and provides a comprehensive introduction to uncertainty estimation for NLP problems---from fundamentals in probability calibration, Bayesian inference, and confidence set (or interval) construction, to applied topics in modern out-of-distribution detection and selective inference...

Setting up R in Visual Studio Code
This post will show you how to set up Visual Studio Code as an integrated development environment for the statistical language R. This will include some useful features such as: a) plots that appear within a VS Code panel, b) a language server with autocomplete, c) syntax highlighting of R code in console and scripts, d) interactive window development...Of course, RStudio has all of these features for R too. However, Visual Studio Code does a lot more than just R, and has tons of cutting edge integrated development environment features that we’d like to make use of...

What you’re up to – notes from DSW readers

Fill out the form below to appear here :) ...

* To share your projects and updates, share the details here.

** Want to chat with one of the above people? Hit reply and let us know :)

Last Week's Newsletter's 3 Most Clicked Links

CS197 Harvard: AI Research Experiences

Bad Data, or Interesting Fact?

Seven Sins of Numerical Linear Algebra

* Based on unique clicks.

** Find last week's newsletter here.

Cutting Room Floor

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

Follow on Twitter

unsubscribe from this list update subscription preferences

Data Science Weekly - Data Science Weekly - Issue 465

Issue #465

October 20 2022

Editor's Picks

A Message from this week's Sponsor:

Data Science Articles & Videos

Tool*

Conference*

Jobs

Training & Resources

What you’re up to – notes from DSW readers

Last Week's Newsletter's 3 Most Clicked Links

Cutting Room Floor

Older messages

Data Science Weekly - Issue 464

Data Science Weekly - Issue 463

Data Science Weekly - Issue 462

Data Science Weekly - Issue 461

Data Science Weekly - Issue 460

You Might Also Like

Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator

Defining Your Paranoia Level: Navigating Change Without the Overkill

5 ways AI can help with taxes 🪄

Recurring Automations + Secret Updates

The First Provable AI-Proof Game: Introducing Butterfly Wings 4

GCP Newsletter #437

Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰

The Great Social Media Diaspora & Tapestry is here

Daily Coding Problem: Problem #1689 [Medium]

📧 Stop Conflating CQRS and MediatR

Data Science Weekly - Data Science Weekly - Issue 465

Issue #465 October 20 2022

Editor's Picks

A Message from this week's Sponsor:

Data Science Articles & Videos

Tool*

Conference*

Jobs

Training & Resources

What you’re up to – notes from DSW readers

Last Week's Newsletter's 3 Most Clicked Links

Cutting Room Floor

Older messages

You Might Also Like

Issue #465

October 20 2022