Data Elixir - Data Elixir - Issue 367
ISSUE 367 · December 21, 2021Note that Data Elixir is taking next week off and will be back in your Inbox on January 4th. In the meantime, stay safe out there and I hope you have great holidays! 🎉 -Lon InsightDo large language models understand us?It’s sometimes claimed that machine learning is "just stats" and that AI can't really understand. But what does it really mean to "understand?" This is a great essay that will stretch your thinking about language, intelligence, understanding, sociality, and even personhood. Sponsored Link![]() Start working towards your career in data science.Springboard’s Data Science Career Track is designed to help you master job-ready skills and actually land your dream job. With 1-on-1 mentorship, career coaching, and personalized support, you’ll gain the portfolio, skills, and confidence you need to get hired in a new role. Their students aren’t just finding new jobs, they’re launching new careers. Apply today. Tutorials, Projects & OpinionsReduced, Reused and Recycled: The Life of a Dataset in Machine Learning ResearchBenchmark datasets play a key role in the organization of machine learning research but more and more, widely-used datasets are introduced by only a handful of elite institutions. This Best Paper winner from the recent NeurIPS Conference explores how that creates practical and ethical problems in machine learning research. Jupyter GamesJupyter isn't usually thought of as an application for games but games are a great vehicle for learning how to push its edges. This tutorial shows how to build tiny games in Jupyter using the Box2D physics engine and Ipycanvas. Covers games like Rocket, Billiards, ~Angry Birds, and World of Goo — each in under 1000 lines of code. The Guide to Data VersioningThere are a lot of reasons you might want to version your data. If you're already versioning your code with git, here's how data versioning works using the same abstractions. How to make AI & BI work at scaleWhy isn't there more automation in data labeling? It makes sense to want to automate the labeling process as much as possible. Using human judgments seems expensive and inefficient. Industry experts weigh in on the pros and cons of applying automation in data labeling for machine learning. Code & ToolsTop Python libraries of 2021Tryolabs' annual list of top Python libraries is consistently a must-read post. This well-researched list includes tools for working with awkward arrays, versioning Jupyter Notebooks, monitoring ML models, working with time-series, interacting with SQL databases, and much more. Sharing Data With the pins PackageThe pins package publishes R objects, such as datasets and predictive models, on a virtual cork board so you can safely share and reuse them. In other words, pins makes it easy to share and update data across projects and people. ResourcesFrom R to Python 🐍The latest chapter of Joscelin Rocha's Learning-R-Resources repo will help you learn the basics of Python if you already know R. Also, check out the index on the side. There's a lot in this repo for R learners. Data VisualizationScience of Visual Data Communication: What Works.Epic review of research-backed guidelines to determine what works and what doesn't work in visualizations. Includes lots of examples, linked references, and a summary of key guidelines. This is essential reading for leveling-up your data visualization proficiency. Python power-up: new tool visualizes complex dataNapari is an interactive, image viewer for Python that's designed for browsing, annotating, and analyzing large multi-dimensional images. It's important because Python has become a key language for scientific computing but it doesn't handle arbitrarily complex data. This article in Nature is a nice overview of Napari and the problems it solves. To find specific content from prior issues or to research topics, check out the searchable Archives on Data Elixir's Search Page >> |
Older messages
Data Elixir - Issue 366
Tuesday, December 14, 2021
Big data paradox. Data "scientists"...? ML playgrounds. Data serialisation in R. Building models like open-source software.
Data Elixir - Issue 365
Tuesday, December 7, 2021
Consulting rates for DS. Tidy/Pandas visual tutors. State of Open Data. Unit testing in R. Mining the Pandora Papers.
Data Elixir - Issue 364
Tuesday, November 30, 2021
Shuffling the cloud. Data for good, responsibly. Decision tree viz. Longform NLP pipelines. Controlling the job hunt.
Data Elixir - Issue 363
Tuesday, November 23, 2021
Transformers from scratch. Open-source experiment tracker. Parameter exploration w/ Bayesian Optimization. Confidential computing.
Data Elixir - Issue 362
Tuesday, November 16, 2021
Holistic decision-making. ROI of data work. The Data Librarian. Automated root cause analysis. Scientific visualization. When ML hates veggies.
You Might Also Like
Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator
Friday, February 14, 2025
What came before the golem? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Defining Your Paranoia Level: Navigating Change Without the Overkill
Friday, February 14, 2025
We've all been there: trying to learn something new, only to find our old habits holding us back. We discussed today how our gut feelings about solving problems can sometimes be our own worst enemy
5 ways AI can help with taxes 🪄
Friday, February 14, 2025
Remotely control an iPhone; 💸 50+ early Presidents' Day deals -- ZDNET ZDNET Tech Today - US February 10, 2025 5 ways AI can help you with your taxes (and what not to use it for) 5 ways AI can help
Recurring Automations + Secret Updates
Friday, February 14, 2025
Smarter automations, better templates, and hidden updates to explore 👀 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
The First Provable AI-Proof Game: Introducing Butterfly Wings 4
Friday, February 14, 2025
Top Tech Content sent at Noon! Boost Your Article on HackerNoon for $159.99! Read this email in your browser How are you, @newsletterest1? undefined The Market Today #01 Instagram (Meta) 714.52 -0.32%
GCP Newsletter #437
Friday, February 14, 2025
Welcome to issue #437 February 10th, 2025 News BigQuery Cloud Marketplace Official Blog Partners BigQuery datasets now available on Google Cloud Marketplace - Google Cloud Marketplace now offers
Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰
Friday, February 14, 2025
Discover how the share of US wealth held by the top 1% has evolved from 1989 to 2024 in this infographic. View Online | Subscribe | Download Our App Download our app to see thousands of new charts from
The Great Social Media Diaspora & Tapestry is here
Friday, February 14, 2025
Apple introduces new app called 'Apple Invites', The Iconfactory launches Tapestry, beyond the traditional portfolio, and more in this week's issue of Creativerly. Creativerly The Great
Daily Coding Problem: Problem #1689 [Medium]
Friday, February 14, 2025
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. Given a linked list, sort it in O(n log n) time and constant space. For example,
📧 Stop Conflating CQRS and MediatR
Friday, February 14, 2025
Stop Conflating CQRS and MediatR Read on: my website / Read time: 4 minutes The .NET Weekly is brought to you by: Step right up to the Generative AI Use Cases Repository! See how MongoDB powers your