📝 Guest post: Elemeta: Metafeature extraction for unstructured data*
Was this email forwarded to you? Sign up here In this guest post, Lior Durahly, data & ML engineer @Superwise, introduces Elemeta, a brand new open-source library, currently in beta, for metafeature extraction from unstructured data. What is ElemetaWith more and more models style DALLᐧE and ChatGPT hitting the shelves, we've reached incredible capabilities and results, fundamentally changing our ability to tap into and leverage unstructured data in machine learning. With that said, the general architectural understanding and intuition into how these models make decisions is vague at best, much less interpretable. So how can we as practitioners leverage NLP and vision while enjoying similar monitoring, interpretability, and explainability available to their tabular counterparts? This is where Elemeta comes in! We're excited to open source the first version of Elemeta (focused on NLP) that will allow you to extract metafeatures from unstructured data so you can explore, model, and monitor NLP use cases through enriched tabular representations. Let’s dive in. How to get started with ElemetaTo get started, simply run
And use our getting started guide to get going. From there, you'll find a set of colab notebooks that can help you dig deeper into the use cases and metafeatures and explore, model, and monitor NLP with Elemeta. What can Elemeta be used forWe see Elemeta being applied to three core use cases: Exploratory Data Analysis (EDA), modeling, and model monitoring. But we've already heard of some additional potential use cases we didn't think about from beta testers, so don't stick to how we think Elemeta should be used; we're looking forward to seeing how the community puts it to use.
What are metafeaturesElemeta already has an extensive set of out-of-the-box meta features such as SpecialCharsCount, EmojiCount, OutOfVocabularyCount, SentimentSubjectivity, etc. Additionally, you can create both low-level API extractors and custom metafeature extractors to fit your specific needs. For example, if we want to create IsPalindromeExtractor, that will return if the given text is a palindrome: And it will return: Within Elemeta, metafeatures are currently split into two groups of metrics, statistical metrics and contextual metrics. Statistical metrics calculate technical values such as word length, word count, etc., and contextual metrics extract information regarding the context of the text. Statistical metrics are language agnostic, while contextual metrics currently support English and, to some extent, Indo-European languages (not tested). What's on the roadmap for ElemetaWe've only just gotten started with Elemeta. And while there are already a few areas we know we're going to invest in, such as image extractors and additional language coverage, we've already had input from beta users on expansions that we didn't initially think about. That's precisely why we decided to shift Elemeta into a free, open-source project for the community. We want to know what metafeatures you need for your use cases and domains, and we are more than happy to accept community contributions! So if you're working with NLP and need better exploratory data analysis, feature extraction, or monitoring, check out the Elemeta repo, take it for a spin with our colab notebooks, and if you star/follow the repo (show some ♥️), you'll get notified as soon as there's a new release. *This post was written Lior Durahly, data & ML engineer at Superwise. We thank Superwise for their ongoing support of TheSequence.You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
Open Source Generative AI is Experiencing a "Linux Moment" but it Needs an "Apache Moment"
Sunday, April 23, 2023
Sundays, The Sequence Scope brings a summary of the most important research papers, technology releases and VC funding deals in the artificial intelligence space.
💡The Buyer’s Guide to Evaluating ML Feature Stores & Feature Platforms
Friday, April 21, 2023
If you're looking to adopt a feature store or platform, but don't know where or how to start your research, Tecton created this helpful guide for you. Download this free guide to: Access a
Edge 284: Meet Dolly 2.0: One of the First Open Source Instruction Following LLMs
Thursday, April 20, 2023
Dolly builds on the principles of InstructGPT on the GPT-J model.
The Sequence Chat: Consensys's Lex Sokolin on Generative Art and Philosophical Principles of Generative AI
Wednesday, April 19, 2023
A conversation about the history, current state and foundations of generative art.
The Sequence Chat: Salesforce Research's Junnan Li on Multimodal Generative AI
Wednesday, April 19, 2023
One of the creators of the famous BLIP-2 model shares his insights about the current state of multimodal generative AI.
You Might Also Like
Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator
Friday, February 14, 2025
What came before the golem? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Defining Your Paranoia Level: Navigating Change Without the Overkill
Friday, February 14, 2025
We've all been there: trying to learn something new, only to find our old habits holding us back. We discussed today how our gut feelings about solving problems can sometimes be our own worst enemy
5 ways AI can help with taxes 🪄
Friday, February 14, 2025
Remotely control an iPhone; 💸 50+ early Presidents' Day deals -- ZDNET ZDNET Tech Today - US February 10, 2025 5 ways AI can help you with your taxes (and what not to use it for) 5 ways AI can help
Recurring Automations + Secret Updates
Friday, February 14, 2025
Smarter automations, better templates, and hidden updates to explore 👀 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
The First Provable AI-Proof Game: Introducing Butterfly Wings 4
Friday, February 14, 2025
Top Tech Content sent at Noon! Boost Your Article on HackerNoon for $159.99! Read this email in your browser How are you, @newsletterest1? undefined The Market Today #01 Instagram (Meta) 714.52 -0.32%
GCP Newsletter #437
Friday, February 14, 2025
Welcome to issue #437 February 10th, 2025 News BigQuery Cloud Marketplace Official Blog Partners BigQuery datasets now available on Google Cloud Marketplace - Google Cloud Marketplace now offers
Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰
Friday, February 14, 2025
Discover how the share of US wealth held by the top 1% has evolved from 1989 to 2024 in this infographic. View Online | Subscribe | Download Our App Download our app to see thousands of new charts from
The Great Social Media Diaspora & Tapestry is here
Friday, February 14, 2025
Apple introduces new app called 'Apple Invites', The Iconfactory launches Tapestry, beyond the traditional portfolio, and more in this week's issue of Creativerly. Creativerly The Great
Daily Coding Problem: Problem #1689 [Medium]
Friday, February 14, 2025
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. Given a linked list, sort it in O(n log n) time and constant space. For example,
📧 Stop Conflating CQRS and MediatR
Friday, February 14, 2025
Stop Conflating CQRS and MediatR Read on: my website / Read time: 4 minutes The .NET Weekly is brought to you by: Step right up to the Generative AI Use Cases Repository! See how MongoDB powers your