Data Elixir - Data Elixir - Issue 450
ISSUE 450 · August 20, 2023Posts & TutorialsHandle big, ugly and bad CSV filesThe csv file format is one of the most common formats for storing and exchanging data but it has issues. In this deep dive, Andrea Borruso explores the format, its problems, how to use DuckDB to analyze csv, and finally, why the Parquet format is a good alternative. Note that the link goes to an automated English translation, which is pretty good but not perfect. The original is in Italian >> R for Sign Language LinguisticsNice introduction to sign language data and how to work with it using R. Most people don't give much thought to sign language data but there's actually a lot going on in this space, including an international data science workshop next month called Autumn School. How to design useful color keysA carefully designed color key can mean the difference between readers glancing at your visualization and deciding it’s too hard to figure out, and readers actually reading it. This post shows how to create useful, truthful, easily skimmable color keys, starting with simple tricks and ending with a collection of complex, clever, and fun color keys. Sponsored LinkAmazon Bedrock offers access to multiple generative AI modelsThe emergence of open source LLMs led to the potential generation of toxic outputs. Amazon Bedrock, the latest step in the company’s ongoing effort to democratize ML, uses Amazon’s Titan FM to help customers detect and remove harmful content in inputs and filter model outputs. Tools & CodeCode Llama, a state-of-the-art LLM for codingCode Llama is a new state-of-the-art LLM that's designed to generate code from text prompts. It can also generate text about code, generate code completions, help debug code, and ultimately, it can help you write more robust and well-documented software. It's free to use and works with a variety of languages, including Python, C++, Javascript and more. DataheraldDataherald is an open-source SQL engine that understands natural language. It's designed for enterprise-level Q/A and can be hosted locally, giving non-SQL business users the ability to answer ad-hoc questions on their own. PapersComputational reproducibility of Jupyter notebooksAfter reviewing 27,271 Jupyter notebooks that were associated with 3,467 publications, only 1,203 notebooks ran without any errors. And of those, only 879 produced the expected results. This paper dives into the issues, highlights trends, and offers suggestions to improve Jupyter-related workflows. A Survey on LLM-based Autonomous AgentsGreat survey paper exploring the landscape and the possibilities for using large language models to power autonomous agents. It covers the construction of LLM-based agents, as well as a summary of applications in the social sciences, natural sciences, and engineering. ResourcesGeographic Data Science with PythonThis book covers the tools, methods, and theory for solving geographic problems with data. It starts with a "Building Blocks" section that lays the groundwork for geographic thinking and then dives into a variety of topics in spatial data, mapping, and spatial statistics. Free to download. Python for Data ScienceThis new online book will teach you how to load, transform, visualize, and understand your data using Python. The book is inspired by R for Data Science and assumes that readers have some coding experience but are new to data science. Was this email forwarded to you? Sign up here >> |
Older messages
Data Elixir - Issue 449
Tuesday, August 22, 2023
Analysis of the data job market. LLM open challenges. Data viz with LLMs. Is probability frequentist or Bayesian? Opening career doors.
Data Elixir - Issue 448
Tuesday, August 15, 2023
The weird world of LLMs. Learning how DBs work under the hood. Probabilistic ML: Advanced Topics. Packaging C code for R. Tactile data viz.
Data Elixir - Issue 447
Tuesday, August 8, 2023
Finance Toolkit. Awesome Quarto. Do Machine ML models memorize or generalize? ML⇄DB seminar series. Functions are vectors.
Data Elixir - Issue 446
Tuesday, August 1, 2023
Nix for data science. Telling Stories with Data. Design patterns for LLM systems & products. Practical guide to conjoint analysis. Treemaps.
Data Elixir - Issue 445
Tuesday, July 25, 2023
Salary Calculator. Python vector DBs. Visual superpowers. Polars for R Cookbook. Test Driven Data Analysis. Python cheatsheet.
You Might Also Like
JSter #231 - Libraries and more
Monday, November 18, 2024
JavaScript. It lives forever. Right there in your heart. I just finished a busy week at Web Summit and I still have a writeup to do. Interestingly enough the event doesn't have much to do with the
Re: My VPN recommendation
Monday, November 18, 2024
Have you ever wondered how safe your data is when you're online? Whether you're browsing from home or connecting to public Wi-Fi, your information is always at risk of being tracked or hacked.
Laravel Daily: Update Profile
Monday, November 18, 2024
Laravel Daily We received a request to change your subscription preferences for Laravel Daily. If you made this request, and would like to change your preferences, use the link below Update your
WP Weekly 220 - Closed - White Label Hosting, WP Brand Tone, Appointment Invoices
Monday, November 18, 2024
Read on Website WP Weekly 220 / Closed Almost 1000 plugins were closed after the Bug Bounty program in October from Patchstack. Check all new tools like RAVE and OnePageGA. Also, tracking the latest
Laravel 11.31, PHPxWorld, PhpStorm 2024.3, PHPStan 2.0, and more! №539
Monday, November 18, 2024
Your Laravel week in review ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
SRE Weekly Issue #451
Monday, November 18, 2024
View on sreweekly.com A message from our sponsor, FireHydrant: Practice Makes Prepared: Why Every Minor System Hiccup Is Your Team's Secret Training Ground. https://firehydrant.com/blog/the-hidden-
👍 I Love Hardware Gimmicks on Phones — Tips to Clean Up Your Facebook
Sunday, November 17, 2024
Also: Battle Passes Are Ruining Multiplayer Games, and More! How-To Geek Logo November 17, 2024 Did You Know The 1960s cartoon The Jetsons only had 24 episodes in the initial run of the show, but
PD#601 Exploring the browser rendering process
Sunday, November 17, 2024
What occurs between typing a URL in your browser and the moment a webpage is displayed
C#532 Announcing .NET 9
Sunday, November 17, 2024
featuring significant improvements in performance, security, and AI capabilities
RD#481 React is a programming language
Sunday, November 17, 2024
and its rules are syntax