🎙 Doug Downey/Semantic Scholar: Applying Cutting Edge NLP at scale
Was this email forwarded to you? Sign up here It’s so inspiring to learn from practitioners. Getting to know the experience gained by researchers, engineers, and entrepreneurs doing real ML work is an excellent source of insight and inspiration. Share this interview if you find it enriching. No subscription is needed. 👤 Quick bio / Doug Downey
Doug Downey (DD): I received my Ph.D. in Computer Science and Engineering from the University of Washington in 2008, focusing on AI – specifically, information extraction from large-scale text. Since then, I have been a professor at Northwestern University but have been away from the university since 2019, working full-time at the Allen Institute for AI (AI2). I manage the research unit of AI2's Semantic Scholar group focused on machine learning, natural language processing, and human-computer interaction in service of Semantic Scholar's mission of accelerating scientific breakthroughs with AI. 🛠 ML Work
DD: Semantic Scholar aims to radically improve people's ability to identify and understand relevant research. In the past couple of years, we've rolled out new capabilities like automatically-generated "TLDRs" for papers, adaptive recommendation feeds for staying up to date with recent research, and improvements to core capabilities like search and author disambiguation. The tool we're most excited about today is the Semantic Reader, which aims to revolutionize reading by making it more accessible and contextual. The Reader already provides a seamless way to lookup references while reading without losing your place. And it is available for thousands of papers. We'll soon make it available for hundreds of thousands of papers, and we are exploring new features like assisted skimming, on-demand symbol and term definitions, and more.
DD: For TLDRs, interestingly, the long-form input doesn't change the problem too much. Our production model only uses the abstract, intro, and conclusion as input, which is not too long and tends to be sufficient for generating good TLDRs. But TLDRs only scratch the surface of what we might want to summarize from scientific papers. For example, say you're reading a paper, and it says, "we use the same experimental setup as reference 17." Wouldn't it be great if you could click on that statement and immediately get a concise summary of the relevant part of reference 17's experimental design? This is something we're working on. For that, we may need to model whole documents using a tool like Longformer, and even model multiple documents at a time, as in the cross-document language models that we introduced this past year.
DD: The Semantic Scholar website and the Reader face a difficult and expensive extraction challenge as a first step. Given a PDF, we have to pull out all of the basic paper elements – title, authors, citations, equations, figures, and so on. We recently introduced a new technique for this task called VILA (for VIsual LAyout), which uses a simple intuition that in typical scientific paper layouts, semantically-related elements tend to appear in the same visual block on the page. Fairly simple models that encode this intuition are more efficient than previous work and still get high accuracy. We've released a version of those and plan to improve them further in 2022.
DD: Few-shot learning techniques are very relevant to Semantic Scholar in settings like feeds, where users might give just a handful of example papers and want to get high-quality recommendations right away, or in domain transfer, where we might have a model built for one scientific domain like computer science and want to quickly adapt it to bioinformatics. To help support additional few-shot research by the community, we recently established a new benchmark, FLEX, that provides a standard and realistic challenge workload for few-shot NLP. One limitation of current few-shot learning work, including FLEX, is that it tends to focus on classification, and other settings like few-shot text generation and summarization are far less studied. We've recently demonstrated a few-shot summarization technique (PRIMER), but more work is needed in this direction.
DD: Great question. We'd like to get to a point where our systems understand science well enough to verify a claim in one paper by reading other papers, or suggest the best tools for a given task, or recommend new hypotheses for human scientists to investigate. Reaching that level of reasoning requires not just scientific knowledge but also vast commonsense knowledge. Semantic Scholar has collaborated with the MOSAIC team here at AI2 on a variety of commonsense challenge tasks, including ones that are critical for science like abductive reasoning (reasoning to the best explanation). Recent years have shown that large-scale language modeling approaches, especially when trained on large and varied commonsense datasets, can perform fairly well on the commonsense question-answering benchmarks that we devise. But it's not obvious how to convert performance on those constructed QA tasks into a successful real-life application, like a scientific assistant that suggests hypotheses. I hope we can work more with MOSAIC in this direction. 💥 Miscellaneous – a set of rapid-fire questions
The "surprise test" paradox.
Weapons of Math Destruction by Cathy O'Neil, although my own views on predictive modeling are more optimistic.
It has flaws, but yes it's still relevant. In particular, the value of an interactive test is underappreciated by today's benchmark-focused AI research. To really evaluate an AI system, you have to interrogate it – not just test it on a fixed data set.
Seems like not, and it would be so great to understand exactly why. You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
🍮 Edge#147: MLOPs – Model Serving
Tuesday, December 7, 2021
plus overview of the TensorFlow serving paper and TorchServe
✴️ Amazon’s Big ML Week
Sunday, December 5, 2021
Weekly news digest curated by the industry insiders
📕 If only someone wrote the book on ML Observability*
Friday, December 3, 2021
Getting a model from research to production is hard.
🔺 Edge#146: A Deep Dive Into Arize AI ML Observability Platform
Thursday, December 2, 2021
While the number of platforms that incorporate observability as a native capability is still small, the relevance of this feature in ML applications has been progressively increasing
📝 Guest post: A Guide to Leveraging Active Learning for Data Labeling
Wednesday, December 1, 2021
by Labelbox
You Might Also Like
Software Testing Weekly - Issue 247
Tuesday, November 26, 2024
QA Job Hunting Resources 📚 View on the Web Archives ISSUE 247 November 26th 2024 COMMENT Welcome to the 247th issue! Today, I'd like to highlight a fantastic set of QA Job Hunting Resources.
🔒 The Vault Newsletter: November issue 🔑
Monday, November 25, 2024
Get the latest business security news, updates, and advice from 1Password. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
🧐 The Most Interesting Phones You Didn't See in 2024 — Making Reddit Faster on Older Devices
Monday, November 25, 2024
Also: Best Black Friday Deals So Far, and More! How-To Geek Logo November 25, 2024 Did You Know If you look closely over John Lennon's shoulder on the iconic cover of The Beatles Abbey Road album,
JSK Daily for Nov 25, 2024
Monday, November 25, 2024
JSK Daily for Nov 25, 2024 View this email in your browser A community curated daily e-mail of JavaScript news JavaScript Certification Black Friday Offer – Up to 54% Off! Certificates.dev, the trusted
Ranked | How Americans Rate Business Figures 📊
Monday, November 25, 2024
This graphic visualizes the results of a YouGov survey that asks Americans for their opinions on various business figures. View Online | Subscribe Presented by: Non-consensus strategies that go where
Spyglass Dispatch: Apple Throws Their Film to the Wolves • The AI Supercomputer Arms Race • Sony's Mobile Game • The EU Hunts Bluesky • Bluesky Hunts User Trust • 'Glicked' Pricked • One Massive iPad
Monday, November 25, 2024
Apple Throws Their Film to the Wolves • The AI Supercomputer Arms Race • Sony's Mobile Game • The EU Hunts Bluesky • Bluesky Hunts User Trust • 'Glicked' Pricked • One Massive iPad The
Daily Coding Problem: Problem #1619 [Hard]
Monday, November 25, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. Given two non-empty binary trees s and t , check whether tree t has exactly the
Unpacking “Craft” in the Software Interface & The Five Pillars of Creative Flow
Monday, November 25, 2024
Systems Over Substance, Anytype's autumn updates, Ghost's progress with its ActivityPub integration, and a lot more in this week's issue of Creativerly. Creativerly Unpacking “Craft” in the
What Investors Want From AI Startups in 2025
Monday, November 25, 2024
Top Tech Content sent at Noon! How the world collects web data Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, November 25, 2024? The HackerNoon
GCP Newsletter #426
Monday, November 25, 2024
Welcome to issue #426 November 25th, 2024 News LLM Official Blog Vertex AI Announcing Mistral AI's Large-Instruct-2411 on Vertex AI - Google Cloud has announced the availability of Mistral AI's