Luca Beurer-Kellner: ETH Zürich, Creator, Language Model Query Language,
Was this email forwarded to you? Sign up here Luca Beurer-Kellner: ETH Zürich, Creator, Language Model Query Language,LMQL, language model programming and the future of LLMs.👤 Quick bio
I am currently doing my PhD (3rd year) in Computer Science at ETH Zürich. In my PhD studies I focus on the intersection of machine learning (ML) and programming language (PL) research. Before that, I did my BSc in CS at HU Berlin and later on my MSc at ETH as well. In between studies, I worked at a few smaller companies doing software and compiler engineering. All throughout my academic and programmer life, I have always been very fascinated with the design and implementation of programming languages. Only later on, during my master’s studies, I got more exposed to the machine learning world and quickly found interest in combining both in my research and other projects. In this context, I have worked on both PL–informed machine learning and machine learning focused PL projects, such as differentiable programming and language model programming more recently. 🛠 ML Work
The origin story of LMQL is always a fun one to tell, when we get around to it. We started working on the project in July 2022, months before ChatGPT came out. Still, already at that time, we observed with great interest, how more recent LLMs had become more and more powerful, and started to exhibit a level of programmability previously not known from other models. In contrast to traditional single-task ML models, LLMs can be prompted to perform all sorts of tasks. More recent developments indicate that they may even have the potential to become general-purpose reasoning engines. As PL researchers this is super exciting to see, as it allows for fundamentally new forms of programming, where code is still needed, but tightly interwoven with an LLM, which acts as a form of text computer that can do all sorts of computations that previously were very hard to do. Based on this perspective, we started to explore how LLMs could be used as primitive building blocks, to construct and program a novel form of systems. During the summer of 2022, we built the first version of LMQL, which happened to be completed right before NeurIPS, around the same time ChatGPT was announced. Unfortunately, we could not release LMQL back then, because of anonymized peer reviews that had to pass first. Still, we felt empowered and validated by that generation of (RLHF) models, and continued to build LMQL out into the open source project we lead today. Our core vision is to further explore and facilitate LLM programming, and to provide good infrastructure for this evolving space, focusing on language abstractions, interface robustness, types and vendor compatibility.
Fundamentally, LMQL separates LLM programming into three orthogonal dimensions: (1) How text is generated in terms of the decoding algorithm you use (e.g. argmax, sampling or e.g. beam search). (2) What kind of (multi-part) prompt you use to call the model, and (3) what kind of constraints and formatting requirements you have on the model’s response. Decoding algorithms (1) and constraints (3) are relatively declarative aspects about this process, however, prompting itself is more of an imperative, programmatic concept, i.e. you imperatively provide the model with instructions and examples on how to respond. Based on this understanding, LMQL adopts declarative elements to enable the specification of decoder and constraints, and allows imperative Python code for the actual prompting logic of your program. If your focus is on prompting alone, we also provide a reduced syntax mode that feels very much like standard Python to do just the prompting. Overall, I think this separation of declarative vs. imperative makes a lot of sense, and maps well to the programming models we observe in LLM configuration vs LLM prompting itself. In our initial paper on LMQL we defined this exact form of programming as language model programming (LMP). However, since our work was initially published, we have seen a plethora of different approaches emerge. More specifically, we observe compositional frameworks that focus on retrieval and chaining, and more template-based frameworks that focus mostly on output formatting. LMQL sits somewhat outside of that spectrum, as it also emphasises constrained templates, but puts algorithmic LLM use at the centre, i.e. the top-level statements in LMQL are code not prompt. This allows LMQL to tightly integrate and optimize inline LLM calls in your code, while providing the same outside interface as a standard Python function. Overall, this enables the use of an LMQL program as a functional component in your existing compositional frameworks, while also benefiting from the concise syntax and runtime optimizations.
While I think the fundamental reasoning capabilities of LLMs lie in the model weights, I also think constraining and decoding are important aspects from a programming perspective. For us, constraining mostly serves the purpose of establishing interface robustness, i.e. you want your LLM to provide you an answer to a specific query. However, to reliably process this answer in an automated system, you need it to be in a very specific, parsable format every time. LMQL constraints afford you this by making sure the model’s vocabulary is limited in a way that only allows it to produce (at least syntactically) correct outputs. If you instead rely on prompting alone, you will end up with a small error rate per LLM call, just because of unexpected output formatting on the model side. In production, when you issue many consecutive LLM calls in a single process, this will exponentiate into unacceptable error rates for your overall system. In that sense, I think constraining is a cornerstone of robust LLM use that enables a reliable form of programmability, otherwise not possible. Decoding is generally known to improve overall output quality for the price of more model calls. However, in the presence of constraints and multi-part LLM use and reasoning, I think it will play an increasingly important role when it comes to backtracking forms of reasoning, like recently shown with tree-of-though. Especially when you externally enforce constraints on LLM output, this form of backtracking can become crucial, as constraints can turn out to be unsatisfiable quite late during generation, which can then only be solved by stepping back out of the current trajectory of reasoning.
At all levels, LMQL is extensible and allows for custom user functions and functionality. First, our constraint decoding engine provides custom evaluation semantics for an expressive constraint language that can be extended by custom constraints, as long as they satisfy the internally-used interface. This system even comes with proven guarantees on the soundness of the resulting validation behaviour. With respect to the programs users write, LMQL is fully interoperable with Python. This means it seamlessly embeds in your existing Python program, you can call any of your existing Python functions and also call LMQL programs opaquely, just like standard Python functions. With respect to model backends, we currently support OpenAI, HuggingFace Transformers and llama.cpp. We also started to generalize our backend infrastructure, with the aim of standardizing the LLM backend interface, beyond just mocking the OpenAI API. We call this the Language Model Transport Protocol, and hope it can benefit the broader community, unlock more backends for LMQL, but also transfer some of the optimizations we enable internally, for other projects.
Most existing frameworks (with very few notable exceptions) consider LLMs to be magic boxes with a purely text-based interface. This means, they pass in some prompt and get out some textual response. If you look more closely, however, there is more about the internals of an LLM that you can leverage. This is what enables constrained decoding, advanced caching or distributions in LMQL. However, since LMQL mostly operates inside this “magic box”, it is fully compatible with existing frameworks like LangChain, and can be used in conjunction with such compositional layers. In the long run, we intend to extend LMQL beyond this scope of operating on the level of what a query program or prompt template may be considered today. However, at the time of writing, I generally advise people to embrace both, LMQL in the loop, and other frameworks outside of the loop or for retrieval. LMQL on its own also provides a very simple text-based interface to your calling code, which is a very simple and powerful model to work with yourself. What LMQL contributes here are optimizations, model backends, decoders, scripting, constraining, error handling and lower-level convenience functionality. So I definitely also encourage everyone to write their own custom code to chain calls, which has been shown to often be much simpler than existing do-it-all-style frameworks. On this level, many abstractions still seem very early and trying many different variants for yourself is likely the fastest way to identify the best solution. 💥 Miscellaneous – a set of rapid-fire questions
Having done some work there, I generally find differentiable programming and algorithmically-guided neural networks a very interesting research discipline.
I think currently it is very clear that OpenAI has the best models and the highest use. However, open source models are catching up and I am very optimistic about their future. LMQL is vendor neutral, so you can use them all, although, I have to say that I prefer open source models, as they allow us full access, and we do not have to work around very restricted, proprietary APIs.
I think hallucinations are *the* biggest issue here. Hopefully retrieval and augmentation will help eventually, but ultimately I think it will require a very big and fundamentally different modelling decision on the level of model training and architecture.
I think conversational models like ChatGPT and GPT-4 offer a very interesting future for programming in general. I think we will have a lot of neuro-symbolic systems that heavily rely on models as primitive blocks. This is an exciting prospect for programming language development and something we definitely want to contribute to, with LMQL and all the features and updates we have planned so far. If models continue the trend of getting more and more capable, the resulting programmability will be very fruitful to build upon. You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
Edge 309: What is Active Prompting?
Tuesday, July 18, 2023
Understanding one of the most effective techniques to improve the effectiveness of prompts in LLM applications.
The Sequence Chat: Emmanuel Turlay – CEO, Sematic
Sunday, July 16, 2023
Model orchestration, Airflow limitaitons in ML and new ideas about MLOps.
Meet LMQL: An Open Source Query Language for LLMs
Sunday, July 16, 2023
Developed by ETH Zürich, the language explores new paradigms for LLM programming.
Some Key Facts About Anthropic’s Claude 2 Release
Sunday, July 16, 2023
Sundays, The Sequence Scope brings a summary of the most important research papers, technology releases and VC funding deals in the artificial intelligence space.
Edge 307: Learning About Program-Aided Language Models
Tuesday, July 11, 2023
LLMs that generate code for intermediate steps in a target task and the NLP Test framework.
You Might Also Like
Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator
Friday, February 14, 2025
What came before the golem? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Defining Your Paranoia Level: Navigating Change Without the Overkill
Friday, February 14, 2025
We've all been there: trying to learn something new, only to find our old habits holding us back. We discussed today how our gut feelings about solving problems can sometimes be our own worst enemy
5 ways AI can help with taxes 🪄
Friday, February 14, 2025
Remotely control an iPhone; 💸 50+ early Presidents' Day deals -- ZDNET ZDNET Tech Today - US February 10, 2025 5 ways AI can help you with your taxes (and what not to use it for) 5 ways AI can help
Recurring Automations + Secret Updates
Friday, February 14, 2025
Smarter automations, better templates, and hidden updates to explore 👀 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
The First Provable AI-Proof Game: Introducing Butterfly Wings 4
Friday, February 14, 2025
Top Tech Content sent at Noon! Boost Your Article on HackerNoon for $159.99! Read this email in your browser How are you, @newsletterest1? undefined The Market Today #01 Instagram (Meta) 714.52 -0.32%
GCP Newsletter #437
Friday, February 14, 2025
Welcome to issue #437 February 10th, 2025 News BigQuery Cloud Marketplace Official Blog Partners BigQuery datasets now available on Google Cloud Marketplace - Google Cloud Marketplace now offers
Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰
Friday, February 14, 2025
Discover how the share of US wealth held by the top 1% has evolved from 1989 to 2024 in this infographic. View Online | Subscribe | Download Our App Download our app to see thousands of new charts from
The Great Social Media Diaspora & Tapestry is here
Friday, February 14, 2025
Apple introduces new app called 'Apple Invites', The Iconfactory launches Tapestry, beyond the traditional portfolio, and more in this week's issue of Creativerly. Creativerly The Great
Daily Coding Problem: Problem #1689 [Medium]
Friday, February 14, 2025
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. Given a linked list, sort it in O(n log n) time and constant space. For example,
📧 Stop Conflating CQRS and MediatR
Friday, February 14, 2025
Stop Conflating CQRS and MediatR Read on: my website / Read time: 4 minutes The .NET Weekly is brought to you by: Step right up to the Generative AI Use Cases Repository! See how MongoDB powers your