On being a Data Lead. Data 'creation' vs. 'extraction.' Categorizing ML interpretability approaches. Mapping wind data. ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

ISSUE 404 · September 13, 2022

Insight

Organizations need to deliberately create data

People sometimes say that "data is the new oil," but that line of thinking confines models to the data that's available for extraction. A better approach is to figure out what data you need and then figure out how to create it. This is a great post that explores the limitations of extracted data and how teams gain by deliberately creating the data they need.
Data Creation | Yali Sassoon

Takeaways from Gartner Data & Analytics Summit

Nice overview of highlights and four big ideas from the recent Gartner Data & Analytics Summit.
Prukalpa

Tutorials, Projects & Opinions

5 questions to categorize machine learning interpretability approaches

After reading hundreds of papers and writing a book on machine learning interpretation, Christoph Molnar has identified some useful categories of interpretation techniques. In this post, he organizes his thinking into five simple questions that will help you assess the ML interpretation approaches that are suitable for different use-cases.
Mindful Modeler | Christoph Molnar

Getting Started with Apache Arrow in R

Nice collection of R resources, cheatsheets, and a tutorial for using Apache Arrow to work with data that's larger than memory. This is aimed at experienced R users who are new to Arrow.
Voltron Data

Want a data science project?

This is the best take I've seen on the recently released treasure trove of hospital pricing data. Over 100TB of data was released and by all accounts, it's a mess. In this post, Randy Au explores what's available, what needs to happen next and how, ultimately, the lack of tools and structure has more to do with real-world data handling issues than maliciousness. There are important problems and opportunities here.
Counting Stuff | Randy Au

Djinn by Tonic.ai - AI-driven synthetic data models

Whether it's privacy controls or a lack of high quality data slowing you down, Djinn's AI-driven synthetic data models create private and augmented data within minutes of setup. Answer nuanced scientific questions, optimize business processes, and make better decisions.
// sponsored

Code & Tools

PySearch: Python Function Search by Description

PySearch is a free search engine for querying python libraries using natural language descriptions. Just select the libraries you want to search and then use natural language or keywords to describe what you're looking for. Check out the examples to see it in action.
PySearch

Data Visualization

Mapping wind data with R

Great R tutorial that shows how to access, reshape and visualize wind data as streamlines. This is a step-by-step tutorial that includes code and links to key resources along the way.
Milos Popovic

Which fonts to use for your charts and tables

Sans-serif or serif typefaces? Lining or oldstyle figures? Narrow or wide? With lots of examples, this post explains which fonts work best for various types of data visualizations.
Datawrapper | Lisa Charlotte Muth

Career

The Difficult Life of the Data Lead

As data teams get bigger, more Data Leads are needed but Data Leads have one of the hardest roles in data. They have to manage a team, work with stakeholders and still stay hands-on. In this post, Mikkel Dengsøe explores the challenges and ideas for making the role better.
Mikkel Dengsøe

Join the Data Elixir Talent Collective

The Data Elixir Talent Collective is a reverse job board where top companies apply to you. Choose to be anonymous or public and get matched with opportunities that fit your specific interests.

This is a free resource but membership is limited. To apply, you need 3+ years experience in data science, analytics, machine learning, visualization, or a related field. For more info, APPLY HERE.

If you’re hiring, apply now to find top candidates faster, sourced from the Data Elixir community. We're creating the highest signal-to-noise hiring resource for roles in the data ecosystem. Already, there are more than 100 mid to senior level candidates from a wide variety of organizations; from fast moving startups to big companies, like Google, Amazon, Apple, NVIDIA and more. If you're hiring, APPLY HERE.

How did you like this issue of Data Elixir?

👎 1 2 3 4 5 👍

Data Elixir - Data Elixir - Issue 404

ISSUE 404 · September 13, 2022

Insight

Organizations need to deliberately create data

Takeaways from Gartner Data & Analytics Summit

Sponsored Link

Register: TechCrunch x iMerit ML DataOps Summit

Reach Data Elixir readers by sponsoring an issue. Click here for details.

Tutorials, Projects & Opinions

5 questions to categorize machine learning interpretability approaches

Getting Started with Apache Arrow in R

Want a data science project?

Djinn by Tonic.ai - AI-driven synthetic data models

Code & Tools

PySearch: Python Function Search by Description

Data Visualization

Mapping wind data with R

Which fonts to use for your charts and tables

Career

The Difficult Life of the Data Lead

Join the Data Elixir Talent Collective

How did you like this issue of Data Elixir?

🔍 the Data Elixir Archives

Older messages

Data Elixir - Issue 403

Data Elixir - Issue 402

Data Elixir - Issue 401

Data Elixir - Issue 400

Data Elixir - Issue 399

You Might Also Like

Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator

Defining Your Paranoia Level: Navigating Change Without the Overkill

5 ways AI can help with taxes 🪄

Recurring Automations + Secret Updates

The First Provable AI-Proof Game: Introducing Butterfly Wings 4

GCP Newsletter #437

Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰

The Great Social Media Diaspora & Tapestry is here

Daily Coding Problem: Problem #1689 [Medium]

📧 Stop Conflating CQRS and MediatR