Luca Beurer-Kellner: ETH Zürich, Creator, Language Model Query Language,
Was this email forwarded to you? Sign up here Luca Beurer-Kellner: ETH Zürich, Creator, Language Model Query Language,LMQL, language model programming and the future of LLMs.👤 Quick bio
I am currently doing my PhD (3rd year) in Computer Science at ETH Zürich. In my PhD studies I focus on the intersection of machine learning (ML) and programming language (PL) research. Before that, I did my BSc in CS at HU Berlin and later on my MSc at ETH as well. In between studies, I worked at a few smaller companies doing software and compiler engineering. All throughout my academic and programmer life, I have always been very fascinated with the design and implementation of programming languages. Only later on, during my master’s studies, I got more exposed to the machine learning world and quickly found interest in combining both in my research and other projects. In this context, I have worked on both PL–informed machine learning and machine learning focused PL projects, such as differentiable programming and language model programming more recently. 🛠 ML Work
The origin story of LMQL is always a fun one to tell, when we get around to it. We started working on the project in July 2022, months before ChatGPT came out. Still, already at that time, we observed with great interest, how more recent LLMs had become more and more powerful, and started to exhibit a level of programmability previously not known from other models. In contrast to traditional single-task ML models, LLMs can be prompted to perform all sorts of tasks. More recent developments indicate that they may even have the potential to become general-purpose reasoning engines. As PL researchers this is super exciting to see, as it allows for fundamentally new forms of programming, where code is still needed, but tightly interwoven with an LLM, which acts as a form of text computer that can do all sorts of computations that previously were very hard to do. Based on this perspective, we started to explore how LLMs could be used as primitive building blocks, to construct and program a novel form of systems. During the summer of 2022, we built the first version of LMQL, which happened to be completed right before NeurIPS, around the same time ChatGPT was announced. Unfortunately, we could not release LMQL back then, because of anonymized peer reviews that had to pass first. Still, we felt empowered and validated by that generation of (RLHF) models, and continued to build LMQL out into the open source project we lead today. Our core vision is to further explore and facilitate LLM programming, and to provide good infrastructure for this evolving space, focusing on language abstractions, interface robustness, types and vendor compatibility.
Fundamentally, LMQL separates LLM programming into three orthogonal dimensions: (1) How text is generated in terms of the decoding algorithm you use (e.g. argmax, sampling or e.g. beam search). (2) What kind of (multi-part) prompt you use to call the model, and (3) what kind of constraints and formatting requirements you have on the model’s response. Decoding algorithms (1) and constraints (3) are relatively declarative aspects about this process, however, prompting itself is more of an imperative, programmatic concept, i.e. you imperatively provide the model with instructions and examples on how to respond. Based on this understanding, LMQL adopts declarative elements to enable the specification of decoder and constraints, and allows imperative Python code for the actual prompting logic of your program. If your focus is on prompting alone, we also provide a reduced syntax mode that feels very much like standard Python to do just the prompting. Overall, I think this separation of declarative vs. imperative makes a lot of sense, and maps well to the programming models we observe in LLM configuration vs LLM prompting itself. In our initial paper on LMQL we defined this exact form of programming as language model programming (LMP). However, since our work was initially published, we have seen a plethora of different approaches emerge. More specifically, we observe compositional frameworks that focus on retrieval and chaining, and more template-based frameworks that focus mostly on output formatting. LMQL sits somewhat outside of that spectrum, as it also emphasises constrained templates, but puts algorithmic LLM use at the centre, i.e. the top-level statements in LMQL are code not prompt. This allows LMQL to tightly integrate and optimize inline LLM calls in your code, while providing the same outside interface as a standard Python function. Overall, this enables the use of an LMQL program as a functional component in your existing compositional frameworks, while also benefiting from the concise syntax and runtime optimizations.
While I think the fundamental reasoning capabilities of LLMs lie in the model weights, I also think constraining and decoding are important aspects from a programming perspective. For us, constraining mostly serves the purpose of establishing interface robustness, i.e. you want your LLM to provide you an answer to a specific query. However, to reliably process this answer in an automated system, you need it to be in a very specific, parsable format every time. LMQL constraints afford you this by making sure the model’s vocabulary is limited in a way that only allows it to produce (at least syntactically) correct outputs. If you instead rely on prompting alone, you will end up with a small error rate per LLM call, just because of unexpected output formatting on the model side. In production, when you issue many consecutive LLM calls in a single process, this will exponentiate into unacceptable error rates for your overall system. In that sense, I think constraining is a cornerstone of robust LLM use that enables a reliable form of programmability, otherwise not possible. Decoding is generally known to improve overall output quality for the price of more model calls. However, in the presence of constraints and multi-part LLM use and reasoning, I think it will play an increasingly important role when it comes to backtracking forms of reasoning, like recently shown with tree-of-though. Especially when you externally enforce constraints on LLM output, this form of backtracking can become crucial, as constraints can turn out to be unsatisfiable quite late during generation, which can then only be solved by stepping back out of the current trajectory of reasoning.
At all levels, LMQL is extensible and allows for custom user functions and functionality. First, our constraint decoding engine provides custom evaluation semantics for an expressive constraint language that can be extended by custom constraints, as long as they satisfy the internally-used interface. This system even comes with proven guarantees on the soundness of the resulting validation behaviour. With respect to the programs users write, LMQL is fully interoperable with Python. This means it seamlessly embeds in your existing Python program, you can call any of your existing Python functions and also call LMQL programs opaquely, just like standard Python functions. With respect to model backends, we currently support OpenAI, HuggingFace Transformers and llama.cpp. We also started to generalize our backend infrastructure, with the aim of standardizing the LLM backend interface, beyond just mocking the OpenAI API. We call this the Language Model Transport Protocol, and hope it can benefit the broader community, unlock more backends for LMQL, but also transfer some of the optimizations we enable internally, for other projects.
Most existing frameworks (with very few notable exceptions) consider LLMs to be magic boxes with a purely text-based interface. This means, they pass in some prompt and get out some textual response. If you look more closely, however, there is more about the internals of an LLM that you can leverage. This is what enables constrained decoding, advanced caching or distributions in LMQL. However, since LMQL mostly operates inside this “magic box”, it is fully compatible with existing frameworks like LangChain, and can be used in conjunction with such compositional layers. In the long run, we intend to extend LMQL beyond this scope of operating on the level of what a query program or prompt template may be considered today. However, at the time of writing, I generally advise people to embrace both, LMQL in the loop, and other frameworks outside of the loop or for retrieval. LMQL on its own also provides a very simple text-based interface to your calling code, which is a very simple and powerful model to work with yourself. What LMQL contributes here are optimizations, model backends, decoders, scripting, constraining, error handling and lower-level convenience functionality. So I definitely also encourage everyone to write their own custom code to chain calls, which has been shown to often be much simpler than existing do-it-all-style frameworks. On this level, many abstractions still seem very early and trying many different variants for yourself is likely the fastest way to identify the best solution. 💥 Miscellaneous – a set of rapid-fire questions
Having done some work there, I generally find differentiable programming and algorithmically-guided neural networks a very interesting research discipline.
I think currently it is very clear that OpenAI has the best models and the highest use. However, open source models are catching up and I am very optimistic about their future. LMQL is vendor neutral, so you can use them all, although, I have to say that I prefer open source models, as they allow us full access, and we do not have to work around very restricted, proprietary APIs.
I think hallucinations are *the* biggest issue here. Hopefully retrieval and augmentation will help eventually, but ultimately I think it will require a very big and fundamentally different modelling decision on the level of model training and architecture.
I think conversational models like ChatGPT and GPT-4 offer a very interesting future for programming in general. I think we will have a lot of neuro-symbolic systems that heavily rely on models as primitive blocks. This is an exciting prospect for programming language development and something we definitely want to contribute to, with LMQL and all the features and updates we have planned so far. If models continue the trend of getting more and more capable, the resulting programmability will be very fruitful to build upon. You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
Edge 309: What is Active Prompting?
Tuesday, July 18, 2023
Understanding one of the most effective techniques to improve the effectiveness of prompts in LLM applications.
The Sequence Chat: Emmanuel Turlay – CEO, Sematic
Sunday, July 16, 2023
Model orchestration, Airflow limitaitons in ML and new ideas about MLOps.
Meet LMQL: An Open Source Query Language for LLMs
Sunday, July 16, 2023
Developed by ETH Zürich, the language explores new paradigms for LLM programming.
Some Key Facts About Anthropic’s Claude 2 Release
Sunday, July 16, 2023
Sundays, The Sequence Scope brings a summary of the most important research papers, technology releases and VC funding deals in the artificial intelligence space.
Edge 307: Learning About Program-Aided Language Models
Tuesday, July 11, 2023
LLMs that generate code for intermediate steps in a target task and the NLP Test framework.
You Might Also Like
Weekend Reading — More time to write
Sunday, November 24, 2024
More Time to Write A fully functional clock that ticks backwards, giving you more time to write. Tech Stuff Martijn Faassen (FWIW I don't know how to use any debugger other than console.log) People
🕹️ Retro Consoles Worth Collecting While You Still Can — Is Last Year's Flagship Phone Worth Your Money?
Saturday, November 23, 2024
Also: Best Outdoor Smart Plugs, and More! How-To Geek Logo November 23, 2024 Did You Know After the "flair" that servers wore—buttons and other adornments—was made the butt of a joke in the
JSK Daily for Nov 23, 2024
Saturday, November 23, 2024
JSK Daily for Nov 23, 2024 View this email in your browser A community curated daily e-mail of JavaScript news React E-Commerce App for Digital Products: Part 4 (Creating the Home Page) This component
Not Ready For The Camera 📸
Saturday, November 23, 2024
What (and who) video-based social media leaves out. Here's a version for your browser. Hunting for the end of the long tail • November 23, 2024 Not Ready For The Camera Why hasn't video
Daily Coding Problem: Problem #1617 [Easy]
Saturday, November 23, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Microsoft. You are given an string representing the initial conditions of some dominoes.
Ranked | The Tallest and Shortest Countries, by Average Height 📏
Saturday, November 23, 2024
These two maps compare the world's tallest countries, and the world's shortest countries, by average height. View Online | Subscribe | Download Our App TIME IS RUNNING OUT There's just 3
⚙️ Your own Personal AI Agent, for Everything
Saturday, November 23, 2024
November 23, 2024 | Read Online Subscribe | Advertise Good Morning. Welcome to this special edition of The Deep View, brought to you in collaboration with Convergence. Imagine if you had a digital
Educational Byte: Are Privacy Coins Like Monero and Zcash Legal?
Saturday, November 23, 2024
Top Tech Content sent at Noon! How the world collects web data Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, November 23, 2024? The HackerNoon
🐍 New Python tutorials on Real Python
Saturday, November 23, 2024
Hey there, There's always something going on over at Real Python as far as Python tutorials go. Here's what you may have missed this past week: Black Friday Giveaway @ Real Python This Black
Re: Hackers may have stolen everyone's SSN!
Saturday, November 23, 2024
I wanted to make sure you saw Incogni's Black Friday deal, which is exclusively available for iPhone Life readers. Use coupon code IPHONELIFE to save 58%. Here's why we recommend Incogni for