Data Elixir - Data Elixir - Issue 450
ISSUE 450 · August 20, 2023Posts & TutorialsHandle big, ugly and bad CSV filesThe csv file format is one of the most common formats for storing and exchanging data but it has issues. In this deep dive, Andrea Borruso explores the format, its problems, how to use DuckDB to analyze csv, and finally, why the Parquet format is a good alternative. Note that the link goes to an automated English translation, which is pretty good but not perfect. The original is in Italian >> R for Sign Language LinguisticsNice introduction to sign language data and how to work with it using R. Most people don't give much thought to sign language data but there's actually a lot going on in this space, including an international data science workshop next month called Autumn School. How to design useful color keysA carefully designed color key can mean the difference between readers glancing at your visualization and deciding it’s too hard to figure out, and readers actually reading it. This post shows how to create useful, truthful, easily skimmable color keys, starting with simple tricks and ending with a collection of complex, clever, and fun color keys. Sponsored LinkAmazon Bedrock offers access to multiple generative AI modelsThe emergence of open source LLMs led to the potential generation of toxic outputs. Amazon Bedrock, the latest step in the company’s ongoing effort to democratize ML, uses Amazon’s Titan FM to help customers detect and remove harmful content in inputs and filter model outputs. Tools & CodeCode Llama, a state-of-the-art LLM for codingCode Llama is a new state-of-the-art LLM that's designed to generate code from text prompts. It can also generate text about code, generate code completions, help debug code, and ultimately, it can help you write more robust and well-documented software. It's free to use and works with a variety of languages, including Python, C++, Javascript and more. DataheraldDataherald is an open-source SQL engine that understands natural language. It's designed for enterprise-level Q/A and can be hosted locally, giving non-SQL business users the ability to answer ad-hoc questions on their own. PapersComputational reproducibility of Jupyter notebooksAfter reviewing 27,271 Jupyter notebooks that were associated with 3,467 publications, only 1,203 notebooks ran without any errors. And of those, only 879 produced the expected results. This paper dives into the issues, highlights trends, and offers suggestions to improve Jupyter-related workflows. A Survey on LLM-based Autonomous AgentsGreat survey paper exploring the landscape and the possibilities for using large language models to power autonomous agents. It covers the construction of LLM-based agents, as well as a summary of applications in the social sciences, natural sciences, and engineering. ResourcesGeographic Data Science with PythonThis book covers the tools, methods, and theory for solving geographic problems with data. It starts with a "Building Blocks" section that lays the groundwork for geographic thinking and then dives into a variety of topics in spatial data, mapping, and spatial statistics. Free to download. Python for Data ScienceThis new online book will teach you how to load, transform, visualize, and understand your data using Python. The book is inspired by R for Data Science and assumes that readers have some coding experience but are new to data science. Was this email forwarded to you? Sign up here >> |
Older messages
Data Elixir - Issue 449
Tuesday, August 22, 2023
Analysis of the data job market. LLM open challenges. Data viz with LLMs. Is probability frequentist or Bayesian? Opening career doors.
Data Elixir - Issue 448
Tuesday, August 15, 2023
The weird world of LLMs. Learning how DBs work under the hood. Probabilistic ML: Advanced Topics. Packaging C code for R. Tactile data viz.
Data Elixir - Issue 447
Tuesday, August 8, 2023
Finance Toolkit. Awesome Quarto. Do Machine ML models memorize or generalize? ML⇄DB seminar series. Functions are vectors.
Data Elixir - Issue 446
Tuesday, August 1, 2023
Nix for data science. Telling Stories with Data. Design patterns for LLM systems & products. Practical guide to conjoint analysis. Treemaps.
Data Elixir - Issue 445
Tuesday, July 25, 2023
Salary Calculator. Python vector DBs. Visual superpowers. Polars for R Cookbook. Test Driven Data Analysis. Python cheatsheet.
You Might Also Like
Ranked | The Top Grossing Movies Worldwide in 2024 🎬
Saturday, January 11, 2025
Established IP dominated the 2024 box office, with top films mostly being sequels, spin-offs, or franchise continuations. View Online | Subscribe | Download Our App FEATURED STORY Ranked: Top Grossing
📖 Your Step-by-Step Guide to Securing AI in the Enterprise
Saturday, January 11, 2025
January 11, 2025 | Read Online Subscribe | Advertise Good Morning. Welcome to this special edition of The Deep View, brought to you in collaboration with Tines. When it comes to adopting AI securely,
🐍 New Python tutorials on Real Python
Saturday, January 11, 2025
Hey there, There's always something going on over at Real Python as far as Python tutorials go. Here's what you may have missed this past week: Iterators and Iterables in Python: Run Efficient
Life Update: Me. In Shorts. In Antarctica [Pics Inside 🧊]
Saturday, January 11, 2025
And yes, I jumped in. It taught me a lot 😅 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Google Researcher Uncovers Zero-Click Exploit Targeting Android Devices
Saturday, January 11, 2025
THN Daily Updates Newsletter cover The Kubernetes Book: Navigate the world of Kubernetes with expertise , Second Edition ($39.99 Value) FREE for a Limited Time Containers transformed how we package and
📧 Working with LLMs in .NET using Microsoft.Extensions.AI
Saturday, January 11, 2025
Working with LLMs in .NET using Microsoft․Extensions․AI Read on: my website / Read time: 6 minutes The .NET Weekly is brought to you by: Transform your database performance with RavenDB:
iOS Dev Weekly – Issue 694
Friday, January 10, 2025
Hopefully you won't see that much difference with receiving this issue, but it's ALL CHANGED behind the scenes! 😱
Daily Coding Problem: Problem #1664 [Easy]
Friday, January 10, 2025
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Twitter. A permutation can be specified by an array P , where P[i] represents the
Spyglass Dispatch: The Case for a For-Profit OpenAI
Friday, January 10, 2025
RIP Venu • A More Political and Real Time Threads • An OpenAI Auction • Apple's Tough 2025 The Spyglass Dispatch is a newsletter sent on weekdays featuring links and commentary on timely topics
⌨️ 10 Mods to Improve Your Mechanical Keyboard — How to Set Up Quick Share on Windows
Friday, January 10, 2025
Also: Why Are Tech Companies Trying to Sell Me Expensive Clocks? How-To Geek Logo January 10, 2025 Did You Know Famed biologist Charles Darwin and US President Abraham Lincoln were born on the same day