🎙 Doug Downey/Semantic Scholar: Applying Cutting Edge NLP at scale
Was this email forwarded to you? Sign up here It’s so inspiring to learn from practitioners. Getting to know the experience gained by researchers, engineers, and entrepreneurs doing real ML work is an excellent source of insight and inspiration. Share this interview if you find it enriching. No subscription is needed. 👤 Quick bio / Doug Downey
Doug Downey (DD): I received my Ph.D. in Computer Science and Engineering from the University of Washington in 2008, focusing on AI – specifically, information extraction from large-scale text. Since then, I have been a professor at Northwestern University but have been away from the university since 2019, working full-time at the Allen Institute for AI (AI2). I manage the research unit of AI2's Semantic Scholar group focused on machine learning, natural language processing, and human-computer interaction in service of Semantic Scholar's mission of accelerating scientific breakthroughs with AI. 🛠 ML Work
DD: Semantic Scholar aims to radically improve people's ability to identify and understand relevant research. In the past couple of years, we've rolled out new capabilities like automatically-generated "TLDRs" for papers, adaptive recommendation feeds for staying up to date with recent research, and improvements to core capabilities like search and author disambiguation. The tool we're most excited about today is the Semantic Reader, which aims to revolutionize reading by making it more accessible and contextual. The Reader already provides a seamless way to lookup references while reading without losing your place. And it is available for thousands of papers. We'll soon make it available for hundreds of thousands of papers, and we are exploring new features like assisted skimming, on-demand symbol and term definitions, and more.
DD: For TLDRs, interestingly, the long-form input doesn't change the problem too much. Our production model only uses the abstract, intro, and conclusion as input, which is not too long and tends to be sufficient for generating good TLDRs. But TLDRs only scratch the surface of what we might want to summarize from scientific papers. For example, say you're reading a paper, and it says, "we use the same experimental setup as reference 17." Wouldn't it be great if you could click on that statement and immediately get a concise summary of the relevant part of reference 17's experimental design? This is something we're working on. For that, we may need to model whole documents using a tool like Longformer, and even model multiple documents at a time, as in the cross-document language models that we introduced this past year.
DD: The Semantic Scholar website and the Reader face a difficult and expensive extraction challenge as a first step. Given a PDF, we have to pull out all of the basic paper elements – title, authors, citations, equations, figures, and so on. We recently introduced a new technique for this task called VILA (for VIsual LAyout), which uses a simple intuition that in typical scientific paper layouts, semantically-related elements tend to appear in the same visual block on the page. Fairly simple models that encode this intuition are more efficient than previous work and still get high accuracy. We've released a version of those and plan to improve them further in 2022.
DD: Few-shot learning techniques are very relevant to Semantic Scholar in settings like feeds, where users might give just a handful of example papers and want to get high-quality recommendations right away, or in domain transfer, where we might have a model built for one scientific domain like computer science and want to quickly adapt it to bioinformatics. To help support additional few-shot research by the community, we recently established a new benchmark, FLEX, that provides a standard and realistic challenge workload for few-shot NLP. One limitation of current few-shot learning work, including FLEX, is that it tends to focus on classification, and other settings like few-shot text generation and summarization are far less studied. We've recently demonstrated a few-shot summarization technique (PRIMER), but more work is needed in this direction.
DD: Great question. We'd like to get to a point where our systems understand science well enough to verify a claim in one paper by reading other papers, or suggest the best tools for a given task, or recommend new hypotheses for human scientists to investigate. Reaching that level of reasoning requires not just scientific knowledge but also vast commonsense knowledge. Semantic Scholar has collaborated with the MOSAIC team here at AI2 on a variety of commonsense challenge tasks, including ones that are critical for science like abductive reasoning (reasoning to the best explanation). Recent years have shown that large-scale language modeling approaches, especially when trained on large and varied commonsense datasets, can perform fairly well on the commonsense question-answering benchmarks that we devise. But it's not obvious how to convert performance on those constructed QA tasks into a successful real-life application, like a scientific assistant that suggests hypotheses. I hope we can work more with MOSAIC in this direction. 💥 Miscellaneous – a set of rapid-fire questions
The "surprise test" paradox.
Weapons of Math Destruction by Cathy O'Neil, although my own views on predictive modeling are more optimistic.
It has flaws, but yes it's still relevant. In particular, the value of an interactive test is underappreciated by today's benchmark-focused AI research. To really evaluate an AI system, you have to interrogate it – not just test it on a fixed data set.
Seems like not, and it would be so great to understand exactly why. |
Older messages
🍮 Edge#147: MLOPs – Model Serving
Tuesday, December 7, 2021
plus overview of the TensorFlow serving paper and TorchServe
✴️ Amazon’s Big ML Week
Sunday, December 5, 2021
Weekly news digest curated by the industry insiders
📕 If only someone wrote the book on ML Observability*
Friday, December 3, 2021
Getting a model from research to production is hard.
🔺 Edge#146: A Deep Dive Into Arize AI ML Observability Platform
Thursday, December 2, 2021
While the number of platforms that incorporate observability as a native capability is still small, the relevance of this feature in ML applications has been progressively increasing
📝 Guest post: A Guide to Leveraging Active Learning for Data Labeling
Wednesday, December 1, 2021
by Labelbox
You Might Also Like
Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator
Friday, February 14, 2025
What came before the golem? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Defining Your Paranoia Level: Navigating Change Without the Overkill
Friday, February 14, 2025
We've all been there: trying to learn something new, only to find our old habits holding us back. We discussed today how our gut feelings about solving problems can sometimes be our own worst enemy
5 ways AI can help with taxes 🪄
Friday, February 14, 2025
Remotely control an iPhone; 💸 50+ early Presidents' Day deals -- ZDNET ZDNET Tech Today - US February 10, 2025 5 ways AI can help you with your taxes (and what not to use it for) 5 ways AI can help
Recurring Automations + Secret Updates
Friday, February 14, 2025
Smarter automations, better templates, and hidden updates to explore 👀 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
The First Provable AI-Proof Game: Introducing Butterfly Wings 4
Friday, February 14, 2025
Top Tech Content sent at Noon! Boost Your Article on HackerNoon for $159.99! Read this email in your browser How are you, @newsletterest1? undefined The Market Today #01 Instagram (Meta) 714.52 -0.32%
GCP Newsletter #437
Friday, February 14, 2025
Welcome to issue #437 February 10th, 2025 News BigQuery Cloud Marketplace Official Blog Partners BigQuery datasets now available on Google Cloud Marketplace - Google Cloud Marketplace now offers
Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰
Friday, February 14, 2025
Discover how the share of US wealth held by the top 1% has evolved from 1989 to 2024 in this infographic. View Online | Subscribe | Download Our App Download our app to see thousands of new charts from
The Great Social Media Diaspora & Tapestry is here
Friday, February 14, 2025
Apple introduces new app called 'Apple Invites', The Iconfactory launches Tapestry, beyond the traditional portfolio, and more in this week's issue of Creativerly. Creativerly The Great
Daily Coding Problem: Problem #1689 [Medium]
Friday, February 14, 2025
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. Given a linked list, sort it in O(n log n) time and constant space. For example,
📧 Stop Conflating CQRS and MediatR
Friday, February 14, 2025
Stop Conflating CQRS and MediatR Read on: my website / Read time: 4 minutes The .NET Weekly is brought to you by: Step right up to the Generative AI Use Cases Repository! See how MongoDB powers your