🎙 Google’s Allen Day on Using ML in the Cryptocurrency Space
Was this email forwarded to you? Sign up here It’s so inspiring to learn from practitioners and thinkers. Getting to know the experience gained by researchers, engineers, and entrepreneurs doing real ML work is an excellent source of insight and inspiration. Share this interview if you like it. No subscription is needed. 👤 Quick bio /
Allen Day (AD): I work in Google Cloud’s developer relations team. Our mission is to build a best-in-class experience for cloud devs. Within this team, I advocate for Google Cloud's web3 and data & analytics products. These spans the range of engineering data pipelines, from ingest through analytics and machine learning. I spend most of my time with the data processing and transformation products. Regarding how I got into machine learning, it wasn’t through deliberate intention but rather the result of my lifelong interest to explore and build at the intersection of computer code and DNA-based biocode. This started with self-study and learning to program a computer at six years old, and led me to pursue a graduate degree in bioinformatics, during which I learned how to use distributed systems to implement machine learning algorithms to do research in human genetics. 🛠 ML Work
AD: I got interested in cryptocurrencies in 2013 but didn't get around to learning about the blockchain data structures until Ethereum's ICO boom in 2017. I noticed there were some structural parallels — the blockchain transaction graph looks like the graph of genetic interactions inside a cell. So I decided to apply some simple analyses to find e.g. central nodes and write a blog post about how I did it. It ended up being more data engineering work than I expected to get to a few charts in a Jupyter notebook. I decided that nobody should need to do that work again, so I open-sourced the ETL and put the processed data into a free-to-access BigQuery dataset. Then I wrote my blog post. It was very well received by the blockchain community. Many analysts and engineers reached out to me, and a community formed around the open data. It became clear that we needed to address two key challenges to meet the community's needs: (1) a robust DevOps architecture (kubernetes, docker) to keep up to date with a blockchain network's consensus state, and (2) an extensible architecture for ETLing complex streaming data (pub/sub, dataflow, airflow) so that we could work with other blockchains such as Ethereum. I teamed up with a talented data engineer, Evgeny Medvedev, and we built the Blockchain ETL community and open-source software project. Today at GCP we maintain ~20 of these datasets in BigQuery. There's a Kaggle community analyzing them, and Evgeny went on to build a blockchain analytics company, Nansen, based on our work.
AD: If we consider all of the data on all of the public blockchains, there are indeed some small areas that are effectively invisible. For the majority of the data, though, we can see the transactions. Some blockchains are account-based so we can directly see system actors. Other blockchains are transaction-based and we need to use clustering methods to build synthetic identities. In all cases, we can reduce the ledger activity to a working set of system actors. From here, it's common to create continuous features via dispersion modeling to estimate contamination from a ransomware payment address. It's also common to use public label data to create categorical features — for example, using a random forest to find look-alikes to known labeled actors (miners, traders) based on their activity aggregated over time.
AD: Yes, definitely! Graph database investment and popularity in graph analytics workloads continue to grow. Their data access capabilities are on the cusp of being generally usable and there is an opportunity to apply graph databases to blockchain data structures. Why do we care about graph data structures at all? A graph is the ultimate generalized data structure. It captures and can represent the blockchain data with high fidelity, and it has the capability to encode rich relationships between nodes (temporal, semantic, social, spatial, functional). We've already demonstrated that there's useful inductive bias for non-graph-based methods. It seems reasonable to expect that graph-aware models like GNNs will outperform the more basic methods. I also think it's the right time to be thinking about this. As I described earlier, most of the activity on-chain is open for all to see. But we should expect these data to become more obfuscated and opaque over time. After all, one of the fundamental technologies upon which blockchains are based is cryptography. So more hiding capabilities will be introduced, and the awareness of on-chain actors that they're living in a dark forest will also increase. This becomes an adversarial ML problem. So we'll need the more powerful capabilities that are unlocked with GNNs, like identifying anomalous transactions, and conversely which transactions don't exist (...yet) that should. Classifying nodes with GNN embeddings and applying graph kernels to characterize neighborhoods will also prove useful.
AD: The theme of your question seems to be about building ML microservices that use a blockchain backplane. We're already seeing this today with blockchain oracles: middleware solutions that address the software oracle problem. I pioneered the concept of hybrid blockchain/cloud applications with Chainlink, and the essential problem we solved was how to run intense workloads by decoupling the on-chain compute for logging the transaction from the resources needed to deliver the result. As a concrete example, this design pattern allows spinning up a docker container to train a model or perform inference using a GPU and get results delivered on-chain. Blockchains employ checksums everywhere, so a nice feature that you get for free by doing this is responsible AI — the input dataset can be transparent and verified, and the model training/inference processes are deterministic and reproducible. Regarding federated learning, I haven't seen an implementation of coordinating with a blockchain, but it seems possible. We can reuse the same Oracle-based worker pattern described above, and converge with a MapReduce orchestrator. The techniques used to survive in the dark forest, like zero-knowledge proofs, may also be helpful here for managing privacy as blockchain-integrated ML models are brought to market.
AD: With regard to ML and NFTs, we're seeing NFTs that grant the owner access to ML-linked products and experiences — acting like a license key or a config file. ML is already being used in off-chain trading systems, and I expect we'll also see the on-chain equivalent of this, where Oracle-linked ML models are integral to the automated protocols that power decentralized finance and games. It's a great time to get involved at the intersection of ML and crypto, and it's been an honor to share with your audience some current market opportunities and areas of open inquiry. I'm excited to see more ML practitioners get involved and see what they'll create. 💥 Miscellaneous – a set of rapid-fire questions
AD: Elements of Statistical Learning (free PDF) by Trevor Hastie, Robert Tibshirani, Jerome Friedman; Introduction to Information Retrieval (free PDF) by Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze; Foundations of Statistical Natural Language Processing by Chris Manning and Hinrich Schütze.
AD: I don’t think they’re equal, no. If P=NP there are of course major ramifications for cryptography and the entire stack of blockchain applications built on top of that. But it’s a tiny disruption in relation to all of our assumed limitations that get broken. Perhaps this question is so captivating because of how close it is to the human condition. We want both unlimited reach (P=NP) while operating from a place of total safety (P!=NP). But the math sublimely indicates we can’t have it both ways; this is both beautiful and terrifying. You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
Sign in to TheSequence
Tuesday, June 21, 2022
. Here's a link to sign in to TheSequence. This link can only be used once and expires after 24 hours. Sign in now © 2022 Jesus Rodriguez, Ksenia Semenova 75 Miracle Mile, Suite 7688, Coral Gables,
💠 Edge#201: Understanding Graph Convolutional Neural Networks
Tuesday, June 21, 2022
In this issue: we explain Graph Convolutional Neural Networks; we overview the original GCN Paper; we explore PyTorch Geometric, one of the most complete GNN frameworks available today. Enjoy the
📌 Event: Explore the future of scalable AI & more at Ray Summit: August 23-24 in San Francisco!
Monday, June 20, 2022
Must-attend
🔵⚪️ The Alexa Factor
Sunday, June 19, 2022
Weekly news digest curated by the industry insiders
📌 Event: Discover What It Takes to Scale Innovation & Data Science
Friday, June 17, 2022
Get inspired on June 22
You Might Also Like
Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator
Friday, February 14, 2025
What came before the golem? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Defining Your Paranoia Level: Navigating Change Without the Overkill
Friday, February 14, 2025
We've all been there: trying to learn something new, only to find our old habits holding us back. We discussed today how our gut feelings about solving problems can sometimes be our own worst enemy
5 ways AI can help with taxes 🪄
Friday, February 14, 2025
Remotely control an iPhone; 💸 50+ early Presidents' Day deals -- ZDNET ZDNET Tech Today - US February 10, 2025 5 ways AI can help you with your taxes (and what not to use it for) 5 ways AI can help
Recurring Automations + Secret Updates
Friday, February 14, 2025
Smarter automations, better templates, and hidden updates to explore 👀 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
The First Provable AI-Proof Game: Introducing Butterfly Wings 4
Friday, February 14, 2025
Top Tech Content sent at Noon! Boost Your Article on HackerNoon for $159.99! Read this email in your browser How are you, @newsletterest1? undefined The Market Today #01 Instagram (Meta) 714.52 -0.32%
GCP Newsletter #437
Friday, February 14, 2025
Welcome to issue #437 February 10th, 2025 News BigQuery Cloud Marketplace Official Blog Partners BigQuery datasets now available on Google Cloud Marketplace - Google Cloud Marketplace now offers
Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰
Friday, February 14, 2025
Discover how the share of US wealth held by the top 1% has evolved from 1989 to 2024 in this infographic. View Online | Subscribe | Download Our App Download our app to see thousands of new charts from
The Great Social Media Diaspora & Tapestry is here
Friday, February 14, 2025
Apple introduces new app called 'Apple Invites', The Iconfactory launches Tapestry, beyond the traditional portfolio, and more in this week's issue of Creativerly. Creativerly The Great
Daily Coding Problem: Problem #1689 [Medium]
Friday, February 14, 2025
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. Given a linked list, sort it in O(n log n) time and constant space. For example,
📧 Stop Conflating CQRS and MediatR
Friday, February 14, 2025
Stop Conflating CQRS and MediatR Read on: my website / Read time: 4 minutes The .NET Weekly is brought to you by: Step right up to the Generative AI Use Cases Repository! See how MongoDB powers your