Data Elixir - Data Elixir - Issue 450
ISSUE 450 · August 20, 2023Posts & TutorialsHandle big, ugly and bad CSV filesThe csv file format is one of the most common formats for storing and exchanging data but it has issues. In this deep dive, Andrea Borruso explores the format, its problems, how to use DuckDB to analyze csv, and finally, why the Parquet format is a good alternative. Note that the link goes to an automated English translation, which is pretty good but not perfect. The original is in Italian >> R for Sign Language LinguisticsNice introduction to sign language data and how to work with it using R. Most people don't give much thought to sign language data but there's actually a lot going on in this space, including an international data science workshop next month called Autumn School. How to design useful color keysA carefully designed color key can mean the difference between readers glancing at your visualization and deciding it’s too hard to figure out, and readers actually reading it. This post shows how to create useful, truthful, easily skimmable color keys, starting with simple tricks and ending with a collection of complex, clever, and fun color keys. Sponsored LinkAmazon Bedrock offers access to multiple generative AI modelsThe emergence of open source LLMs led to the potential generation of toxic outputs. Amazon Bedrock, the latest step in the company’s ongoing effort to democratize ML, uses Amazon’s Titan FM to help customers detect and remove harmful content in inputs and filter model outputs. Tools & CodeCode Llama, a state-of-the-art LLM for codingCode Llama is a new state-of-the-art LLM that's designed to generate code from text prompts. It can also generate text about code, generate code completions, help debug code, and ultimately, it can help you write more robust and well-documented software. It's free to use and works with a variety of languages, including Python, C++, Javascript and more. DataheraldDataherald is an open-source SQL engine that understands natural language. It's designed for enterprise-level Q/A and can be hosted locally, giving non-SQL business users the ability to answer ad-hoc questions on their own. PapersComputational reproducibility of Jupyter notebooksAfter reviewing 27,271 Jupyter notebooks that were associated with 3,467 publications, only 1,203 notebooks ran without any errors. And of those, only 879 produced the expected results. This paper dives into the issues, highlights trends, and offers suggestions to improve Jupyter-related workflows. A Survey on LLM-based Autonomous AgentsGreat survey paper exploring the landscape and the possibilities for using large language models to power autonomous agents. It covers the construction of LLM-based agents, as well as a summary of applications in the social sciences, natural sciences, and engineering. ResourcesGeographic Data Science with PythonThis book covers the tools, methods, and theory for solving geographic problems with data. It starts with a "Building Blocks" section that lays the groundwork for geographic thinking and then dives into a variety of topics in spatial data, mapping, and spatial statistics. Free to download. Python for Data ScienceThis new online book will teach you how to load, transform, visualize, and understand your data using Python. The book is inspired by R for Data Science and assumes that readers have some coding experience but are new to data science. Was this email forwarded to you? Sign up here >> |
Older messages
Data Elixir - Issue 449
Tuesday, August 22, 2023
Analysis of the data job market. LLM open challenges. Data viz with LLMs. Is probability frequentist or Bayesian? Opening career doors.
Data Elixir - Issue 448
Tuesday, August 15, 2023
The weird world of LLMs. Learning how DBs work under the hood. Probabilistic ML: Advanced Topics. Packaging C code for R. Tactile data viz.
Data Elixir - Issue 447
Tuesday, August 8, 2023
Finance Toolkit. Awesome Quarto. Do Machine ML models memorize or generalize? ML⇄DB seminar series. Functions are vectors.
Data Elixir - Issue 446
Tuesday, August 1, 2023
Nix for data science. Telling Stories with Data. Design patterns for LLM systems & products. Practical guide to conjoint analysis. Treemaps.
Data Elixir - Issue 445
Tuesday, July 25, 2023
Salary Calculator. Python vector DBs. Visual superpowers. Polars for R Cookbook. Test Driven Data Analysis. Python cheatsheet.
You Might Also Like
Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator
Friday, February 14, 2025
What came before the golem? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Defining Your Paranoia Level: Navigating Change Without the Overkill
Friday, February 14, 2025
We've all been there: trying to learn something new, only to find our old habits holding us back. We discussed today how our gut feelings about solving problems can sometimes be our own worst enemy
5 ways AI can help with taxes 🪄
Friday, February 14, 2025
Remotely control an iPhone; 💸 50+ early Presidents' Day deals -- ZDNET ZDNET Tech Today - US February 10, 2025 5 ways AI can help you with your taxes (and what not to use it for) 5 ways AI can help
Recurring Automations + Secret Updates
Friday, February 14, 2025
Smarter automations, better templates, and hidden updates to explore 👀 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
The First Provable AI-Proof Game: Introducing Butterfly Wings 4
Friday, February 14, 2025
Top Tech Content sent at Noon! Boost Your Article on HackerNoon for $159.99! Read this email in your browser How are you, @newsletterest1? undefined The Market Today #01 Instagram (Meta) 714.52 -0.32%
GCP Newsletter #437
Friday, February 14, 2025
Welcome to issue #437 February 10th, 2025 News BigQuery Cloud Marketplace Official Blog Partners BigQuery datasets now available on Google Cloud Marketplace - Google Cloud Marketplace now offers
Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰
Friday, February 14, 2025
Discover how the share of US wealth held by the top 1% has evolved from 1989 to 2024 in this infographic. View Online | Subscribe | Download Our App Download our app to see thousands of new charts from
The Great Social Media Diaspora & Tapestry is here
Friday, February 14, 2025
Apple introduces new app called 'Apple Invites', The Iconfactory launches Tapestry, beyond the traditional portfolio, and more in this week's issue of Creativerly. Creativerly The Great
Daily Coding Problem: Problem #1689 [Medium]
Friday, February 14, 2025
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. Given a linked list, sort it in O(n log n) time and constant space. For example,
📧 Stop Conflating CQRS and MediatR
Friday, February 14, 2025
Stop Conflating CQRS and MediatR Read on: my website / Read time: 4 minutes The .NET Weekly is brought to you by: Step right up to the Generative AI Use Cases Repository! See how MongoDB powers your