The Sequence Chat: Justin D. Harris - About Building Microsoft CoPilot
Was this email forwarded to you? Sign up here Quick bio
I grew up in the suburbs of Montreal and I have always been passionate about mathematics. I left Montreal to study math and computer science at the University of Waterloo in Canada. I currently live in Toronto with my wonderful girlfriend and our little dog Skywalker who enjoys kayaking with us around the beaches and bluffs. I am a Principal Software Engineer at Microsoft, where I have worked on various AI projects and have been a core contributor in the development of Microsoft Copilot. While my colleagues recognize me as a diligent engineer, but only a few have had the opportunity to witness my prowess as a skier. For my career, I have been dedicated to building AI applications since I was in university 15 years ago. During my studies, I joined Maluuba as one of the early engineers. We developed personal assistants for phones, TVs, and cars that handled a wide range of commands. We started with using classical machine learning models such as SVMs, Naive Bayes, and CRFs before adapting to use deep learning. We sold Maluuba to Microsoft in 2017 to help Microsoft in their journey to incorporate AI into more products. I have been working on a few AI projects at Microsoft including some research to put trainable models in Ethereum smart contracts, which we talked about in our last interview. Since 2020, I have been working on a chat system built into Bing which evolved into Microsoft Copilot. I am currently a Principal Software Engineer on the Copilot Platform team at Microsoft, where we’re focused on developing a generalized platform for copilots at Microsoft. 🛠 ML Work
We have built a platform for copilots and apps that want to leverage large language models (LLMs) and easily take advantage of the latest developments in AI. Many products use our platform such as Windows, Edge, Skype, Bing, SwiftKey, and many Office products. Their needs and customization points vary. It's a fun engineering challenge to build a system that's designed to work well for many different types of clients in different programming languages and that scales from simple LLM usage to more sophisticated integrations with plugins and custom middlewares. Many teams benefit not only from the power of our customizable platform, but they also benefit from the many Responsible AI (RAI) and security guards built into our system.
Copilots help us automate many types of tasks and get our work done more quickly in a breadth of scenarios, but right now, we still often need to review their work such as emails or code they write. Other types of automation might be hard for an individual to configure, but once it’s configured, it’s designed to run autonomously and be trusted because its scope is limited. Another big difference with using LLMs compared to previous automation trends is that we can now use the same model to help with many different types of tasks when given the right instructions and examples. When given the right instructions and grounding information, LLMs can often generalize to new scenarios.
AutoGen is an interesting paradigm that’s adapting classical ideas like ensembling techniques from previous eras of AI for the new more generalized LLMs. AutoGen can use multiple different LLMs to collaborate and solve a task. Semantic Kernel is a great tool for aiding in orchestrating LLMs and easily integrating plugins, RAG, and different models. It also works well with my favorite tool to easily run models locally: Ollama.
It's really helpful to have features like RAG available as external integrations for brand new scenarios and to ensure that we cite the right data. When training models, we talk about the ‘cold start’ problem: how do we get data and examples for new use cases? Very large models can learn about certain desired knowledge, but it's hard to foresee what will be required in this quickly changing space. Many teams using our Copilot Platform expect to use RAG and plugins to easily integrate their stored knowledge from various sources that update often, such as content from the web based on news, or documentation that changes daily. It would be outlandish to tell them to collect lots of training data, even if it's unlabeled or unstructured data, and to fine-tune a model hourly or even more often as the world changes. We’re not ready for that yet. Citing the right data is also important. Without RAG, current models hallucinate too much and cannot yet be trusted to cite the original source of information. With RAG, we know what information is available for the model to cite at runtime and we include those links in the UI along with a model’s response, even if the model did not choose to cite them, because they’re helpful as references for us to learn more about a topic.
SLMs are very useful for specific use cases and can be more easily fine-tuned. The biggest caveat for SLMs is that many only work well in fewer languages than GPT-4 which knows many languages. I've had a great time playing around with them using Ollama. It's easy to experiment and build an application with a local SLM, especially while you're more focused on traditional engineering problems and designing parts of your project. Once you're ready to scale to many languages and meet the foray of customer needs, a more powerful model might be more useful. I think the real answer will be hybrid systems that find ways to leverage small and large models.
We have many important guardrails for Responsible AI (RAI) built into our Copilot Platform from inspecting user input to verifying model output. These protections are one of the main reasons that many teams use our platform. The guardrails and shields that we set up for RAI are very important in our designs. RAI is a core part of every design review, and we standardize how RAI works for everything that goes into and comes out of our platform. We work with many teams across Microsoft to standardize what to validate and share knowledge. We also ensure that the long prompt with special instructions, examples, and formatting is treated securely, just like code, and not exposed outside of our platform.
We built new user experiences for our copilots to integrate them into existing products and we wrote a blog to share some of our design choices such as how we stream responses and designed the platform to work with many different types of clients in different programming languages. I also did a podcast to discuss some topics mentioned in the blog post more. One of the biggest noticeable differences with previous assistants or agents is how an answer is streamed word by word, or token by token, as the response is generated. The largest and most powerful models can also be the slowest ones and it can take many seconds or sometimes minutes to generate a full response with grounding data and references, so it’s important for us to start to show the user an answer as quickly as possible. We use SignalR to help us simplify streaming the answer to the client. SignalR automatically detects and chooses the best transport method among the various web standard protocols and techniques. WebSockets are used as the transport method by default for most of our applications and we can gracefully fall back to Server-Sent Events or long polling. SignalR also simplifies bidirectional communication, such as when the application needs to send information to the service to interrupt the streaming of a response. We use Adaptive Cards and Markdown to easily scale to displaying responses in multiple different applications or different programming languages. We use the new object-basin library that we built to generalize and simplify streaming components of JSON to modify the JSON in the Adaptive Cards that were already streamed to the application. This gives the service a lot of control over what is displayed in the applications and the application can easily tweak how the response is formatted, for example, by changing CSS. 💥 Miscellaneous – a set of rapid-fire questions
Quantum Computing.
Reasoning and planning are important for some complex scenarios beyond question answering where multiple steps are involved such as planning a vacation or determining the phases of a project. I'm also excited about ways that we can use smaller and simpler local models securely for simple scenarios.
I’m confident that we will get far with LLMs because we’ve seen them do awesome things already. My personal observation is that we tend to make giant leaps in AI every few years and then the progress is slower and more incremental in the years between the giant leaps. I think at least one more giant leap will be required before AGI is achieved, but I’m confident that LLMs will help us make that giant leap sooner by making us more productive. Language is just one part of intelligence. Models will need to understand the qualia associated with sensory experiences to become truly intelligent.
Copilots will be integrated more into the development experience, but I hope they don’t eliminate coding completely. Copilots will help us even more with our tasks and going back to not having a copilot already feels weird and lonely to me. I like coding and feeling like I built something, but I’m happy to let a copilot take over with more tedious tasks or help me discover different techniques. Copilots will help us get more done faster as they get more powerful and increase in context size to understand more of a project instead of just a couple of sections or files. Copilots will also need to become more proactive and less reactive to respond only when prompted. We will have to be careful to build helpful systems that are not pestering.
I don’t think I can pick a specific person that I fully admire, but right now, even though we wouldn’t typically call them mathematicians, Amos Tversky and Daniel Kahneman come to mind. People have been talking more about them lately because Daniel Kahneman passed away a few months ago. I think about them, system 1 vs. system 2 thinking, and slowing down to apply logic, a deep kind of mathematics, as I read “The Undoing Project” and “Thinking, Fast and Slow” a few years ago. You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
Edge 405: Memory and Autonomous Agents
Tuesday, June 18, 2024
Augmenting autonomous agents capabilities with different memory architectures can lead to amazing capabilities. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
📽 [Virtual talk] Build hyper-personalized product experiences with Full RAG
Monday, June 17, 2024
Hey there, Want to build highly personalized product experiences? Building them with traditional RAG (Retrieval-Augmented Generation) alone is tough, especially when it comes to adding real-time and
Amazing Dream Machine
Sunday, June 16, 2024
A text-to-video model freely available to everyone. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Edge 404: Learn About Meta AI's Promising Technique to Predict Multiple Tokens at the Same Time in LLMs
Thursday, June 13, 2024
The mehod addresses the limitations of the classic next token prediction method. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Edge 403: Memory-Based Planning and Autonomous Agents
Tuesday, June 11, 2024
Supplying agents with an external memory to execute more complex plans. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
You Might Also Like
WebAIM November 2024 Newsletter
Friday, November 22, 2024
WebAIM November 2024 Newsletter Read this newsletter online at https://webaim.org/newsletter/2024/november Features Using Severity Ratings to Prioritize Web Accessibility Remediation When it comes to
➡️ Why Your Phone Doesn't Want You to Sideload Apps — Setting the Default Gateway in Linux
Friday, November 22, 2024
Also: Hey Apple, It's Time to Upgrade the Macs Storage, and More! How-To Geek Logo November 22, 2024 Did You Know Fantasy author JRR Tolkien is credited with inventing the main concept of orcs and
JSK Daily for Nov 22, 2024
Friday, November 22, 2024
JSK Daily for Nov 22, 2024 View this email in your browser A community curated daily e-mail of JavaScript news React E-Commerce App for Digital Products: Part 4 (Creating the Home Page) This component
Spyglass Dispatch: The Fate of Chrome • Amazon Tops Up Anthropic • Pros Quit Xitter • Brave Powers AI Search • Apple's Lazy AI River • RIP Enrique Allen
Friday, November 22, 2024
The Fate of Chrome • Amazon Tops Up Anthropic • Pros Quit Xitter • Brave Powers AI Search • Apple's Lazy AI River • RIP Enrique Allen The Spyglass Dispatch is a free newsletter sent out daily on
Charted | How the Global Distribution of Wealth Has Changed (2000-2023) 💰
Friday, November 22, 2024
This graphic illustrates the shifts in global wealth distribution between 2000 and 2023. View Online | Subscribe | Download Our App Presented by: MSCI >> Get the Free Investor Guide Now FEATURED
Daily Coding Problem: Problem #1616 [Easy]
Friday, November 22, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Alibaba. Given an even number (greater than 2), return two prime numbers whose sum will
The problem to solve
Friday, November 22, 2024
Use problem framing to define the problem to solve This week, Tom Parson and Krishna Raha share tools and frameworks to identify and address challenges effectively, while Voltage Control highlights
Issue #568: Random mazes, train clock, and ReKill
Friday, November 22, 2024
View this email in your browser Issue #568 - November 22nd 2024 Weekly newsletter about Web Game Development. If you have anything you want to share with our community please let me know by replying to
Whats Next for AI: Interpreting Anthropic CEOs Vision
Friday, November 22, 2024
Top Tech Content sent at Noon! How the world collects web data Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, November 22, 2024? The HackerNoon
iOS Cocoa Treats
Friday, November 22, 2024
View in browser Hello, you're reading Infinum iOS Cocoa Treats, bringing you the latest iOS related news straight to your inbox every week. Using the SwiftUI ImageRenderer The SwiftUI ImageRenderer