Editor's Picks
- AI Forecasting: One Year In
Last August, my research group created a forecasting contest to predict AI progress on four benchmarks. Forecasts were asked to predict state-of-the-art performance (SOTA) on each benchmark for June 30th 2022, 2023, 2024, and 2025. It’s now past June 30th, so we can evaluate the performance of the forecasters so far...Forecasters were asked to provide probability distributions, so we can evaluate both their point estimates and their coverage (whether the true result was within their credible intervals). I’ll dive into the data in detail below, but my high-level takeaways were that...
- Angela Fan explains NLLB-200: High-quality Machine Translation for Low-Resource Languages [Video]
Interview of Angela Fan, one of the key research scientists behind the latest breakthrough in Machine Translation from Meta AI: No Language Left Behind (NLLB-200), a multilingual translation model across 200 low-resource languages, which has also been open-sourced. The interview covers the story behind the model, a breakdown of interesting technical details and the broader implications of high quality Machine Translation models across low-resource languages...
A Message from this week's Sponsor:

Don’t miss Ray Summit — the year’s top destination for scalable AI!
Why attend Ray Summit?
- Hear the latest developments in Ray and Ray libraries
- Learn about Ray uses cases from engineers and researchers at Lyft, IBM, Riot Games, Verizon, and more
- Find out how teams at Spotify, Meta, Amazon, and other top companies are building next-gen ML platforms on Ray
- Participate in exclusive hands-on Ray training sessions
Register now — attend in-person in San Francisco or virtually.
Data Science Articles & Videos
- What I learned from Tecton’s apply() 2022 conference
Back in May, I attended apply(), Tecton’s second annual virtual event for data and ML teams to discuss the practical data engineering challenges faced when building ML for the real world. There were talks on best practice development patterns, tools of choice, and emerging architectures to successfully build and manage production ML applications...This long-form article dissects content from 14 sessions and lightning talks that I found most useful from attending apply(). These talks cover 3 major areas: industry trends, production use cases, and open-source libraries. Let’s dive in!...
- min-dalle: Fast, minimal port of DALL·E Mega to PyTorch
This is a fast, minimal port of Boris Dayma's DALL·E Mega. It has been stripped down for inference and converted to PyTorch. The only third party dependencies are numpy, requests, pillow and torch...To generate a 4x4 grid of DALL·E Mega images it takes: - 89 sec with a T4 in Colab // - 48 sec with a P100 in Colab // - 14 sec with an A100 on Replicate...
- CVPR 2022 notes with focus on Medical Imaging
These are notes from some of the talks and papers/posters...The main platform for works on medical imaging at CVPR is Medical Computer Vision Workshop, held this year for the ninth time. Bonus: All the lectures from the workshop are publicly available online with no need to buy a conference ticket...
- 5 Advanced SQL Concepts You Should Know in 2022
With the rising volume of data, the need for skilled data professionals is also increasing. Only knowledge of advanced SQL concepts is not enough, but you should be able to implement them at your work efficiently And that is what looked for in job interviews for data science positions!...Therefore, I listed here 5 advanced SQL concepts with explanations and query examples which you should know in 2022...
- NeRF at CVPR 2022
There are more than 50 papers related to Neural Radiance Fields (NeRFs) at the CVPR 2022 conference. With my former student and now colleague at Google Research, Andrew Marmon, we rounded up all papers we could find and organized them here for our edification, and your reading pleasure...
- Releasing Color.js: A library that takes color seriously
Chris and I started working on Color.js in 2020, over 2 years ago! It was shortly after I had finished the Color lecture for the class I was teaching at MIT and I was appalled by the lack of color libraries that did the things I needed for the demos in my slides. I asked Chris, “Hey, what if we make a Color library? You will bring your Color Science knowledge and I will bring my JS and API design knowledge. Wouldn’t this be the coolest color library ever?”...
- Narrative A.I. — with Hilary Mason [Video]
Hilary Mason, Co-Founder and CEO of Hidden Door, joins Jon Krohn for a live discussion that explores narrative A.I., emerging ML techniques, and how her OSEMN data science process developed...
- Women speakers for tech conferences and meetups
We created this platform to provide event organizers with a large pool of highly qualified women professionals to choose from, so we never have to attend another non-inclusive event again...Whether live in-person or remotely online, we have pooled these professionals for your next event...
- The Unmet Data Visualization Needs of Decision Makers within Organizations [PDF]
As most past research in visual analytics has focused on understanding the needs and challenges of data analysts, less is known about the tasks and challenges of organizational decision makers, and how visualization support tools might help. Here we characterize the decision maker as a domain expert, review relevant literature in management theories, and report the results of an empirical survey and interviews with people who make organizational decisions. We identify challenges and opportunities for novel visualization tools, including trade-off overviews, scenario-based analysis, interrogation tools, flexible data input and collaboration support...
WhitePapers*

Out Now: New Semantic Layer Whitepapers
Check out this bundle of Semantic Layer whitepapers by best selling authors - download here.
You'll learn the key value propositions to implement a semantic layer and best practices for analytics success with one.
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!
Jobs
- Senior Data Scientist, Startup Creation at Redesign Health - US
As our Senior Data Scientist for our Startup Creation team, you will set up and configure the data infrastructure for our startups, and work with the startup founding team to define data driven KPIs, and implement automated statistical analyses of customer behavior. Your goal is to make all of the companies that we launch data-driven from day one.
In this role, you will function as an in-house implementation team for the companies that Redesign Health launches (internally referred to as OpCos). We provide data strategy, data pipeline, data analytics and forecasting services to newly formed companies in a repeatable and scalable manner...
Want to post a job here? Email us for details --> team@datascienceweekly.org
Training & Resources
- Rough Guide to Matplotlib (PyPlot)
Do you have the need to create graphs? Visualize highly numerical data , or just look at beautifully generated graphs and get turned on! Well like any other situation Python has you covered! Matplotlib provides easy and efficient way of creating graphs...But we don’t need to import matplotlib itself, matplotlib.pyplot provides most of the features we need. The rest of matplotlib can be used to create extremely complex graphs. Pyplot is inspired from MATLAB . If you have used MATLAB before , it has the same API. This guide focuses on PyPlot API...
- How do you share big datasets with your team and others? [Reddit Discussion]
Looking for a bit of a discussion. I'm wondering how you collaborate on data... i.e. how do you share big datasets with data scientists/engineers, within and outside of your team? Do you just push it into a simple DB, do you upload it to Kaggle (if non-sensitive) or via Google Drive/OneDrive?...What if the dataset gets updated frequently?...I'm working with a customer and sharing data is a bit of a pain...
What you’re up to – notes from DSW readers
- Working on something cool? Let us know here :) ...
* To share your projects and updates, share the details here.
** Want to chat with one of the above people? Hit reply and let us know :)
Last Week's Newsletter's 3 Most Clicked Links
* Based on unique clicks.
** Find last week's newsletter here.
P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian |