Data Elixir - Data Elixir - Issue 450
ISSUE 450 · August 20, 2023Posts & TutorialsHandle big, ugly and bad CSV filesThe csv file format is one of the most common formats for storing and exchanging data but it has issues. In this deep dive, Andrea Borruso explores the format, its problems, how to use DuckDB to analyze csv, and finally, why the Parquet format is a good alternative. Note that the link goes to an automated English translation, which is pretty good but not perfect. The original is in Italian >> R for Sign Language LinguisticsNice introduction to sign language data and how to work with it using R. Most people don't give much thought to sign language data but there's actually a lot going on in this space, including an international data science workshop next month called Autumn School. How to design useful color keysA carefully designed color key can mean the difference between readers glancing at your visualization and deciding it’s too hard to figure out, and readers actually reading it. This post shows how to create useful, truthful, easily skimmable color keys, starting with simple tricks and ending with a collection of complex, clever, and fun color keys. Sponsored LinkAmazon Bedrock offers access to multiple generative AI modelsThe emergence of open source LLMs led to the potential generation of toxic outputs. Amazon Bedrock, the latest step in the company’s ongoing effort to democratize ML, uses Amazon’s Titan FM to help customers detect and remove harmful content in inputs and filter model outputs. Tools & CodeCode Llama, a state-of-the-art LLM for codingCode Llama is a new state-of-the-art LLM that's designed to generate code from text prompts. It can also generate text about code, generate code completions, help debug code, and ultimately, it can help you write more robust and well-documented software. It's free to use and works with a variety of languages, including Python, C++, Javascript and more. DataheraldDataherald is an open-source SQL engine that understands natural language. It's designed for enterprise-level Q/A and can be hosted locally, giving non-SQL business users the ability to answer ad-hoc questions on their own. PapersComputational reproducibility of Jupyter notebooksAfter reviewing 27,271 Jupyter notebooks that were associated with 3,467 publications, only 1,203 notebooks ran without any errors. And of those, only 879 produced the expected results. This paper dives into the issues, highlights trends, and offers suggestions to improve Jupyter-related workflows. A Survey on LLM-based Autonomous AgentsGreat survey paper exploring the landscape and the possibilities for using large language models to power autonomous agents. It covers the construction of LLM-based agents, as well as a summary of applications in the social sciences, natural sciences, and engineering. ResourcesGeographic Data Science with PythonThis book covers the tools, methods, and theory for solving geographic problems with data. It starts with a "Building Blocks" section that lays the groundwork for geographic thinking and then dives into a variety of topics in spatial data, mapping, and spatial statistics. Free to download. Python for Data ScienceThis new online book will teach you how to load, transform, visualize, and understand your data using Python. The book is inspired by R for Data Science and assumes that readers have some coding experience but are new to data science. Was this email forwarded to you? Sign up here >> |
Older messages
Data Elixir - Issue 449
Tuesday, August 22, 2023
Analysis of the data job market. LLM open challenges. Data viz with LLMs. Is probability frequentist or Bayesian? Opening career doors.
Data Elixir - Issue 448
Tuesday, August 15, 2023
The weird world of LLMs. Learning how DBs work under the hood. Probabilistic ML: Advanced Topics. Packaging C code for R. Tactile data viz.
Data Elixir - Issue 447
Tuesday, August 8, 2023
Finance Toolkit. Awesome Quarto. Do Machine ML models memorize or generalize? ML⇄DB seminar series. Functions are vectors.
Data Elixir - Issue 446
Tuesday, August 1, 2023
Nix for data science. Telling Stories with Data. Design patterns for LLM systems & products. Practical guide to conjoint analysis. Treemaps.
Data Elixir - Issue 445
Tuesday, July 25, 2023
Salary Calculator. Python vector DBs. Visual superpowers. Polars for R Cookbook. Test Driven Data Analysis. Python cheatsheet.
You Might Also Like
80% Off iPhone Photo Academy!
Monday, May 6, 2024
Hi there, Are you ready to dramatically improve your iPhone photography skills? We all know that the iPhone camera is extremely powerful, but most of us still manage to take a lot of photos that aren
Architecture Weekly #178 - 6th May 2024
Monday, May 6, 2024
This time, we discussed biases. Biases on the perspective on our technologies, so not seeing their evolutions. We also checked how biases can impact our knowledge, collaboration and eventually also the
WP Weekly 192 - WP Biz - Brands Merged, Woo Cart Popup, Fastest Hosting
Monday, May 6, 2024
Read on Website WP Weekly 192 / WP Biz The 'business of WordPress' is buzzing for sure, be it the acquisition of plugins or the massive Envato ownership change. Also, WordPress content brands
SRE Weekly Issue #423
Monday, May 6, 2024
View on sreweekly.com A message from our sponsor, FireHydrant: FireHydrant is now AI-powered for faster, smarter incidents! Power up your incidents with auto-generated real-time summaries,
⚠️ Avoiding AI Scams on Social Media — An Open Source Google Photos Alternative
Sunday, May 5, 2024
Also: Reviewing the Customizable Drop Mechanical Keyboard, and More! How-To Geek Logo May 5, 2024 📩 Get expert reviews, the hottest deals, how-to's, breaking news, and more delivered directly to
Daily Coding Problem: Problem #1432 [Medium]
Sunday, May 5, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This question was asked by Snapchat. Given the head to a singly linked list, where each node also has a “random”
PD#572 Good Ideas in Computer Science
Sunday, May 5, 2024
Ideas every programmer likes and why Garbage Collection and Object Oriented Programming don't count
RD#454 API Layer & Fetch Functions
Sunday, May 5, 2024
ixing API and UI code quickly leads to messy and unmaintainable code
The Shiny Toy Syndrome & Tiny macOS utility apps I love
Sunday, May 5, 2024
Lex launching its redesign, Raycast shares another monthly update packed with AI updates, prompts should be designed not engineered, and a lot more in this week's issue of Creativerly. Creativerly
Hyundai antes up $1B for AV startup Motional and Elon unplugs the Tesla Supercharger team
Sunday, May 5, 2024
Plus, layoffs come for Luminar, Fisker and Ola View this email online in your browser By Kirsten Korosec Sunday, May 5, 2024 Image Credits: Motional Welcome back to TechCrunch Mobility — your central