Data Science Weekly - Data Science Weekly - Issue 456

Curated news, articles and jobs related to Data Science.
Keep up with all the latest developments

Email not displaying correctly?
View it in your browser.

Issue #456

August 18 2022

Editor's Picks

Inferring Concept Drift Without Labeled Data
After iterations of development and testing, deploying a well-fit machine learning model often feels like the final hurdle for an eager data science team. In practice, however, a trained model is never final. This milestone marks just the beginning of the perpetual maintenance race that is production machine learning. This is because most machine learning models are static, but the world we live in is dynamic...

Testing Firefox more efficiently with machine learning
A browser is an incredibly complex piece of software. With such enormous complexity, the only way to maintain a rapid pace of development is through an extensive CI system that can give developers confidence that their changes won’t introduce bugs. Given the scale of our CI, we’re always looking for ways to reduce load while maintaining a high standard of product quality. We wondered if we could use machine learning to reach a higher degree of efficiency...

The spelled-out intro to neural networks and backpropagation: building micrograd [YouTube Video]
Andrej Karpathy Video Tutorial - This is the most step-by-step spelled-out explanation of backpropagation and training of neural networks. It only assumes basic knowledge of Python and a vague recollection of calculus from high school...

A Message from this week's Sponsor:

Free Access to the Semantic Layer Summit with Bill Inmon, Kirk Borne, and 30+ Enterprise Data Leaders

You're invited to a free one-day virtual event. Explore the importance and impact of using a semantic layer for analytics with an all-star lineup of data leaders from Cigna, Starbucks, Bank of America, and more. Lots to look forward to!

Data Science Articles & Videos

Comparing quantiles at scale in online A/B-testing
Using the properties of the Poisson bootstrap algorithm and quantile estimators, we have been able to reduce the computational complexity of Poisson bootstrap difference-in-quantiles confidence intervals enough to unlock bootstrap inference for almost arbitrary large samples. At Spotify, we can now easily calculate bootstrap confidence intervals for difference-in-quantiles in A/B tests with hundreds of millions of observations...

In 2022, what is the proper way to get into machine/deep learning? [HN Discussion]
By getting into machine or deep learning I mean building upto a stage to do ML/DL research. Applied research or core theory of ML/DL research. Ofcourse, the path to both will quite different. Standing in 2022, what are the best resources for a CS student/decent programmer to get into the field of ML and DL on their own. Resources can be both books or public courses...The target ability: 1. To understand the theory behind the algorithms, 2. To implement an algorithm on a dataset of choice. (Data cleaning and management should also be learned), 3. Read research publications and try to implement them....

How to Build a GPT-3 for Science
Want to create an image of velociraptors working on a skyscraper, in the style of “Lunch Atop A Skyscraper” of 1932? Use DALL-E...Want to deeply understand COVID-19 research and answer your questions based on evidence? Learn how to do a Boolean search, read scientific papers, and maybe get a PhD, because there are no generative AI models trained on the vast body of scientific research publications...

LLM.int8() and Emergent Features
When I attended NAACL, I wanted to do a little test. I had two pitches for my LLM.int8() paper. One pitch is about how I use advanced quantization methods to achieve no performance degradation transformer inference at scale that makes large models more accessible. The other pitch talks about emergent outliers in transformers and how they radically change what transformers learn and how they function...This blog post will spill some mandatory details about quantization, but I want to mostly make it about these emergent features that I found in transformers at scale...

Unleashing the power of large language models
Maarten Grootendorst on applying large language models to topic models and fuzzy string matching...Maarten Grootendorst, is a data scientist at IKNL, an institute that strives to reduce the impact of cancer by collecting and unlocking essential and reliable data. More importantly, he’s the author of a few open source libraries that I’ve come to enjoy: BERTopic (topic modeling with transformers and c-TF-IDF), PolyFuzz (fuzzy string matching), and KeyBERT (keyword extraction)...

Sipeed's TinyMaix Puts MNIST Digit Recognition on a Modest Microchip ATmega328 Microcontroller
Open source project, written during a hackathon weekend, adds INT8 and FP32 machine learning model support to low-end microcontrollers...

inControl Podcast - Sean Meyn: Markov chains, networks, reinforcement learning, beekeeping and jazz
inControl Podcast - a podcast on control theory and related topics, including feedback, decision making, artificial intelligence, robotics and much more...In this episode, our guest is Sean Meyn, Professor and Robert C. Pittman Eminent Scholar Chair in the Department of Electrical and Computer Engineering at the University of Florida. The episode features Sean’s adventures in the areas of Markov chains, networks and Reinforcement Learning (RL) as well as anecdotes and trivia about beekeeping and jazz...

NeuMan: Neural Human Radiance Field from a Single Video
Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences. We [Apple] propose a novel framework to reconstruct the human and the scene that can be rendered with novel human poses and views from just a single in-the-wild video. Given a video captured by a moving camera, we train two NeRF models: a human NeRF model and a scene NeRF model...

Doctor Penguin Newsletter - The Latest Healthcare + AI Research
With the goal of helping researchers keep up with the cutting edge of AI + Healthcare research, Doctor Penguin was born...

A Library for Representing Python Programs as Graphs for Machine Learning
Graph representations of programs are commonly a central element of machine learning for code research. We introduce an open source Python library python_graphs that applies static analysis to construct graph representations of Python programs suitable for training machine learning models...

How a Biologist became a Data Scientist [YouTube Video]
In this video, Chanin Nantasenamat, Ph.D. AKA the Data Professor share experiences on how he transitioned from a biologist to becoming a data scientist (bioinformatician) working in bioinformatics...

What are some "important" problems in machine learning/AI? [Reddit Discussion]
I am not talking about "hot stuff" like self driving cars or anything, but topics important to the field( like maybe interpretability of machine learning? ) which is fundamental to the advancement of the field...

Course*

Data Science Specialities: What Are My Options in Data Science?

Data science is a rewarding career field full of opportunities for advancement. Specialized roles are fundamental to helping organizations maximize their ability to harness data for strategic planning. Want to know more about your options as a data scientist? Read our blog!

TDI’s Data Programs are intensive bootcamps that turn STEM academics into leading data professionals, providing expert training, live code, and real-world data sets. Each industry-leading principle is tailored to prepare you as you venture towards new career paths, advanced education, and overall skill refinement. Applications open next week!

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

Jobs

Data Scientist - Success Academy Charter Schools, Inc - NYC

This new Data Scientist role will be a key contributor to our mission of driving innovation across the organization. Reporting to the Leader of Enterprise Analytics, this role will be responsible for working with stakeholders in various functions to understand areas of opportunity, developing analytical solutions ranging from dashboards to sophisticated mathematical models, and helping functional teams adopt those solutions. This role will be part of a highly collaborative team of professionals with a wide range of skills including data science, data engineering, business analysis, and project management....

Want to post a job here? Email us for details --> team@datascienceweekly.org

Training & Resources

AI Research Intensive
These lectures are part of the "AI Research Intensive", designed to teach fundamental skills involved in conducting cutting-edge AI research and writing a research paper...The AI Research Intensive was hosted by Rajpurkar Lab at Harvard Medical School on August 4 & 5, 2022...

Resources To Secure Your Next MLE / DS / SWE Job!
This repo contains cheat sheets + data structures & algorithms templates useful for MLE, DS, and SWE interviews. All cheat sheets were created by me and helped me secure multiple offers at big tech companies...

Cornell's Operations Research and Information Engineering 4741: Learning with Big Messy Data
Modern data sets...are often big, messy, and extremely useful. This course addresses scalable robust methods for learning from big messy data. We will cover techniques for learning with data that is messy — consisting of measurements that are continuous, discrete, boolean, categorical, or ordinal, or of more complex data such as graphs, texts, or sets, with missing entries and with outliers — and that is big — which means we can only use algorithms whose complexity scales linearly in the size of the data. We will cover techniques for cleaning data, supervised and unsupervised learning, finding similar items, model validation, and feature engineering...

What you’re up to – notes from DSW readers

Robert Ritz is working on Datafantic, a data blog, to tell data driven stories and share data science tutorials. First entry is on Matplotlib stylesheets. Site is Datafantic.com...

* To share your projects and updates, share the details here.

** Want to chat with one of the above people? Hit reply and let us know :)

Last Week's Newsletter's 3 Most Clicked Links

Data Engineers Spend Two Days Per Week Firefighting Bad Data, Data Quality Survey Says

The current and future state of AI/ML is shockingly demoralizing with little hope of redemption [Reddit Discussion]

What Did My AI Learn? How Data Scientists Make Sense of Model Behavior

* Based on unique clicks.

** Find last week's newsletter here.

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

Follow on Twitter

unsubscribe from this list update subscription preferences

Data Science Weekly - Data Science Weekly - Issue 456

Issue #456

August 18 2022

Editor's Picks

A Message from this week's Sponsor:

Data Science Articles & Videos

Course*

Jobs

Training & Resources

What you’re up to – notes from DSW readers

Last Week's Newsletter's 3 Most Clicked Links

Older messages

Data Science Weekly - Issue 455

Data Science Weekly - Issue 454

Data Science Weekly - Issue 453

Data Science Weekly - Issue 452

Data Science Weekly - Issue 451

You Might Also Like

Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator

Defining Your Paranoia Level: Navigating Change Without the Overkill

5 ways AI can help with taxes 🪄

Recurring Automations + Secret Updates

The First Provable AI-Proof Game: Introducing Butterfly Wings 4

GCP Newsletter #437

Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰

The Great Social Media Diaspora & Tapestry is here

Daily Coding Problem: Problem #1689 [Medium]

📧 Stop Conflating CQRS and MediatR

Data Science Weekly - Data Science Weekly - Issue 456

Issue #456 August 18 2022

Editor's Picks

A Message from this week's Sponsor:

Data Science Articles & Videos

Course*

Jobs

Training & Resources

What you’re up to – notes from DSW readers

Last Week's Newsletter's 3 Most Clicked Links

Older messages

You Might Also Like

Issue #456

August 18 2022