Editor's Picks
- Inferring Concept Drift Without Labeled Data
After iterations of development and testing, deploying a well-fit machine learning model often feels like the final hurdle for an eager data science team. In practice, however, a trained model is never final. This milestone marks just the beginning of the perpetual maintenance race that is production machine learning. This is because most machine learning models are static, but the world we live in is dynamic...
- Testing Firefox more efficiently with machine learning
A browser is an incredibly complex piece of software. With such enormous complexity, the only way to maintain a rapid pace of development is through an extensive CI system that can give developers confidence that their changes won’t introduce bugs. Given the scale of our CI, we’re always looking for ways to reduce load while maintaining a high standard of product quality. We wondered if we could use machine learning to reach a higher degree of efficiency...
A Message from this week's Sponsor:
Free Access to the Semantic Layer Summit with Bill Inmon, Kirk Borne, and 30+ Enterprise Data Leaders
You're invited to a free one-day virtual event. Explore the importance and impact of using a semantic layer for analytics with an all-star lineup of data leaders from Cigna, Starbucks, Bank of America, and more. Lots to look forward to!
Data Science Articles & Videos
- Comparing quantiles at scale in online A/B-testing
Using the properties of the Poisson bootstrap algorithm and quantile estimators, we have been able to reduce the computational complexity of Poisson bootstrap difference-in-quantiles confidence intervals enough to unlock bootstrap inference for almost arbitrary large samples. At Spotify, we can now easily calculate bootstrap confidence intervals for difference-in-quantiles in A/B tests with hundreds of millions of observations...
- In 2022, what is the proper way to get into machine/deep learning? [HN Discussion]
By getting into machine or deep learning I mean building upto a stage to do ML/DL research. Applied research or core theory of ML/DL research. Ofcourse, the path to both will quite different. Standing in 2022, what are the best resources for a CS student/decent programmer to get into the field of ML and DL on their own. Resources can be both books or public courses...The target ability: 1. To understand the theory behind the algorithms, 2. To implement an algorithm on a dataset of choice. (Data cleaning and management should also be learned), 3. Read research publications and try to implement them....
- How to Build a GPT-3 for Science
Want to create an image of velociraptors working on a skyscraper, in the style of “Lunch Atop A Skyscraper” of 1932? Use DALL-E...Want to deeply understand COVID-19 research and answer your questions based on evidence? Learn how to do a Boolean search, read scientific papers, and maybe get a PhD, because there are no generative AI models trained on the vast body of scientific research publications...
- LLM.int8() and Emergent Features
When I attended NAACL, I wanted to do a little test. I had two pitches for my LLM.int8() paper. One pitch is about how I use advanced quantization methods to achieve no performance degradation transformer inference at scale that makes large models more accessible. The other pitch talks about emergent outliers in transformers and how they radically change what transformers learn and how they function...This blog post will spill some mandatory details about quantization, but I want to mostly make it about these emergent features that I found in transformers at scale...
- Unleashing the power of large language models
Maarten Grootendorst on applying large language models to topic models and fuzzy string matching...Maarten Grootendorst, is a data scientist at IKNL, an institute that strives to reduce the impact of cancer by collecting and unlocking essential and reliable data. More importantly, he’s the author of a few open source libraries that I’ve come to enjoy: BERTopic (topic modeling with transformers and c-TF-IDF), PolyFuzz (fuzzy string matching), and KeyBERT (keyword extraction)...
- inControl Podcast - Sean Meyn: Markov chains, networks, reinforcement learning, beekeeping and jazz
inControl Podcast - a podcast on control theory and related topics, including feedback, decision making, artificial intelligence, robotics and much more...In this episode, our guest is Sean Meyn, Professor and Robert C. Pittman Eminent Scholar Chair in the Department of Electrical and Computer Engineering at the University of Florida. The episode features Sean’s adventures in the areas of Markov chains, networks and Reinforcement Learning (RL) as well as anecdotes and trivia about beekeeping and jazz...
- NeuMan: Neural Human Radiance Field from a Single Video
Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences. We [Apple] propose a novel framework to reconstruct the human and the scene that can be rendered with novel human poses and views from just a single in-the-wild video. Given a video captured by a moving camera, we train two NeRF models: a human NeRF model and a scene NeRF model...
- A Library for Representing Python Programs as Graphs for Machine Learning
Graph representations of programs are commonly a central element of machine learning for code research. We introduce an open source Python library python_graphs that applies static analysis to construct graph representations of Python programs suitable for training machine learning models...
Course*
Data Science Specialities: What Are My Options in Data Science?
Data science is a rewarding career field full of opportunities for advancement. Specialized roles are fundamental to helping organizations maximize their ability to harness data for strategic planning. Want to know more about your options as a data scientist? Read our blog!
TDI’s Data Programs are intensive bootcamps that turn STEM academics into leading data professionals, providing expert training, live code, and real-world data sets. Each industry-leading principle is tailored to prepare you as you venture towards new career paths, advanced education, and overall skill refinement. Applications open next week!
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!
Jobs
- Data Scientist - Success Academy Charter Schools, Inc - NYC
This new Data Scientist role will be a key contributor to our mission of driving innovation across the organization. Reporting to the Leader of Enterprise Analytics, this role will be responsible for working with stakeholders in various functions to understand areas of opportunity, developing analytical solutions ranging from dashboards to sophisticated mathematical models, and helping functional teams adopt those solutions. This role will be part of a highly collaborative team of professionals with a wide range of skills including data science, data engineering, business analysis, and project management....
Want to post a job here? Email us for details --> team@datascienceweekly.org
Training & Resources
- AI Research Intensive
These lectures are part of the "AI Research Intensive", designed to teach fundamental skills involved in conducting cutting-edge AI research and writing a research paper...The AI Research Intensive was hosted by Rajpurkar Lab at Harvard Medical School on August 4 & 5, 2022...
- Resources To Secure Your Next MLE / DS / SWE Job!
This repo contains cheat sheets + data structures & algorithms templates useful for MLE, DS, and SWE interviews. All cheat sheets were created by me and helped me secure multiple offers at big tech companies...
- Cornell's Operations Research and Information Engineering 4741: Learning with Big Messy Data
Modern data sets...are often big, messy, and extremely useful. This course addresses scalable robust methods for learning from big messy data. We will cover techniques for learning with data that is messy — consisting of measurements that are continuous, discrete, boolean, categorical, or ordinal, or of more complex data such as graphs, texts, or sets, with missing entries and with outliers — and that is big — which means we can only use algorithms whose complexity scales linearly in the size of the data. We will cover techniques for cleaning data, supervised and unsupervised learning, finding similar items, model validation, and feature engineering...
What you’re up to – notes from DSW readers
- Robert Ritz is working on Datafantic, a data blog, to tell data driven stories and share data science tutorials. First entry is on Matplotlib stylesheets. Site is Datafantic.com...
* To share your projects and updates, share the details here.
** Want to chat with one of the above people? Hit reply and let us know :)
Last Week's Newsletter's 3 Most Clicked Links
* Based on unique clicks.
** Find last week's newsletter here.
P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian |