Editor's Picks
- Spotting Topographic Changes from 250 Miles Above
The Earth Science and Remote Sensing Unit at NASA’s Johnson Space Center in Houston, Texas, is using machine learning to sort through and identify photos taken from the orbital laboratory, making them more searchable and useful to scientists. The Gateway to Astronaut Photography of Earth [Link] service contains nearly 4 million astronaut-captured images. Using AI, researchers have categorized over 2 million photos of Earth’s geographic features, 250,000 images of auroras, 37,000 lightning photos, and 18,000 images of 64 different cities around the world...
- Do data-driven companies actually win?
Imagine, if you can, that you're a venture capitalist...Amid the various pitches that land in your inbox, an odd coincidence arrives: Five nearly identical companies reach out to you at once. They're all launching a new clothing line for working from home...They aren’t exactly the same though—different types of experts run each company...the first company has been working in fashion for decades...the second one believes in moving fast and making things...the third company is run by a thirty year-old wonder kid...the fourth company emphasizes operational excellence...and the fifth company believes data will be their competitive edge...Who do you invest in?...
A Message from this week's Sponsor:
Free Access to the Semantic Layer Summit with Bill Inmon, Kirk Borne, and 30+ Enterprise Data Leaders
You're invited to a free one-day virtual event. Explore the importance and impact of using a semantic layer for analytics with an all-star lineup of data leaders from Cigna, Starbucks, Bank of America, and more. Lots to look forward to!
Data Science Articles & Videos
- Data Center Heatmap
At Automattic, our systems team manages over 10,000 physical servers located across 30 data centers on 6 continents...Normal data center operating temperatures tend to be between 20F-25C, but cooling failures are somewhat common (they even affect Google), so we have to monitor temperatures carefully...We are big fans of Prometheus and Grafana and for a few years we have had temperature graphs that look like this...
- What is in your Data Stack? [Reddit Discussion]
It would be really useful to get a sense of what data tools companies use to get an idea of what are the best options...To contribute, post info in the following format: 1) ETL, 2) Data Warehouse, 3) Data Transformation, 4) BI, 5) Exploratory Data Analysis, 6) Company Size (approx # employees) [optional], 7) Company Industry [optional], 8) Company HQ (city, country) [optional]...
- Graph Inverse Reinforcement Learning from Diverse Videos
To learn a reward function from diverse videos, we propose to perform graph abstraction on the videos followed by temporal matching in the graph space to measure the task progress. Our insight is that a task can be described by entity interactions that form a graph, and this graph abstraction can help remove irrelevant information such as textures, resulting in more robust reward functions. We evaluate our approach, GraphIRL, by learning from human demonstrations for real-robot manipulation and via cross-embodiment learning in X-MAGICAL. We show significant improvements in robustness to diverse video demonstrations over previous approaches, and even achieve better results than manual reward design on real robot tasks...
- An astronomer's introduction to NumPyro
In this post I’ll focus primarily on providing an introduction to NumPyro, which is a probabilistic programming library that provides an interface for defining probabilistic models and running inference algorithms. At this point, NumPyro is probably the most mature JAX-based probabilistic programming library, and its documentation page has a lot of examples, but I’ve found that these docs are not that user-friendly for my collaborators, so I wanted to provide a different perspective. In the following sections, I’ll present two examples...
- Escaping Poverty, Benchmarking ML Systems, and Advancing Data-Centric AI with Cody Coleman
The 97th episode of Datacast is my conversation with Cody Coleman — the Founder and CEO of Coactive AI...Our wide-ranging conversation touches on his remarkable childhood growing up in poverty and finding a few people who have made big differences in his story; his academic experience at MIT studying EE & CS; his industry experience interning at Google and working at JUMP Trading; his Ph.D. work on data selection for deep learning at Stanford, his current journey with Coactive AI; key developments for the Data-Centric AI community; similarities between being a researcher and a founder; and much more....
- Professional ML engineers: How much of your day to day job involves math and proofs? [Reddit Discussion]
If you are a professional ML engineer (not data engineer) how much of your day to day work involves doing math and proofs? I can 'do' linear algebra and statistics but I am not sure if doing math and writing proofs on a daily basis would be my cup of tea...EDIT: The reason I asked is because the MS program I am considering requires proofs to pass the ML related classes. I can do that for a couple of classes but not every day...
- Data Viz Today Podcast - Episode 76: Creativity Mini-Series with Andy Kirk
Welcome to episode 76 of Data Viz Today. We’re exploring creativity in information design from the perspective of amazingly creative people in the field! If it’s not a magical process, then what is it? Let’s hear how Andy Kirk approaches creativity. We dive into how he defines creativity, what his routines are, what kills his creativity, how he presents creative ideas to clients, where he finds inspiration…and more!...
- PETs Prize Challenge: Advancing Privacy-Preserving Federated Learning
Privacy-enhancing technologies (PETs) have the potential to unlock more trustworthy innovation in data analysis and machine learning. Federated learning is one such technology that enables organizations to analyze sensitive data while providing improved privacy protections...That’s why the U.S. and U.K. governments are partnering to deliver a set of prize challenges to unleash the potential of these democracy-affirming technologies to make a positive impact. In particular, this challenge will tackle two critical problems via separate data tracks: Data Track A will help with the identification of financial crime, while Data Track B will bolster pandemic responses...
- Explaining Complex Models in Production: SHAP Walkthrough
Understanding and explaining more complex models that we want to use in production is not only critical for legal and ethical reasons but also makes solid business sense before we hand off important decisions to automated systems. For these more complex models, we have to try some indirect approaches. Two of the more common and useful approaches are Shapley Values, which are a way of estimating a particular feature's effect on a specific prediction, and Partial Dependence and Individual Conditional Expectation plots, which are used to visualize the interaction between the features and the prediction values....
Tool*
Data Maturity Assessment
You already know that data is one of an organization’s most valuable assets. But is your organization harnessing the full power of its data? Take Pragmatic Institute’s complimentary Data Maturity Assessment to discover where your organization falls on the data maturity continuum and start building a data-driven culture.
The Data Maturity Assessment is a powerful tool for organizations that want to boost data literacy, democratize data, and leverage data in everyday decision making
Take assessment
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!
Jobs
- Data Scientist - Success Academy Charter Schools, Inc - NYC
This new Data Scientist role will be a key contributor to our mission of driving innovation across the organization. Reporting to the Leader of Enterprise Analytics, this role will be responsible for working with stakeholders in various functions to understand areas of opportunity, developing analytical solutions ranging from dashboards to sophisticated mathematical models, and helping functional teams adopt those solutions. This role will be part of a highly collaborative team of professionals with a wide range of skills including data science, data engineering, business analysis, and project management....
Want to post a job here? Email us for details --> team@datascienceweekly.org
Training & Resources
- Maths for Machine Learning Map
This map presents a mathematical-heavy approach, building from the ground up to give a deep understanding of the field. This deep understanding is required to go into Machine Learning research and is also valuable background knowledge if deploying models in production as a Machine Learning Engineer or improving their runtime efficiency as a Software Engineer...
- New Book: Understanding Deep Learning
I've been writing a new textbook. It's entitled "Understanding Deep Learning" and will be published by MIT press... A partial draft is now available...
What you’re up to – notes from DSW readers
- Working on something cool? Let us know here :) ...
* To share your projects and updates, share the details here.
** Want to chat with one of the above people? Hit reply and let us know :)
Last Week's Newsletter's 3 Most Clicked Links
* Based on unique clicks.
** Find last week's newsletter here.
P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian |