|
Hello! Once a week, we write this email to share the links we thought were worth sharing in the Data Science, ML, AI, Data Visualization, and ML/Data Engineering worlds.
If this newsletter is helpful to your job, please become a paid subscriber here: https://datascienceweekly.substack.com/subscribe :)
And now…let's dive into some interesting links from this week.
Andrej Karpathy’s Intro to Large Language Models This is a 1 hour general-audience introduction to Large Language Models: the core technical component behind systems like ChatGPT, Claude, and Bard. What they are, where they are headed, comparisons and analogies to present-day operating systems, and some of the security-related challenges of this new computing paradigm…
The background needed to understand "Attention is All You Need" Paper In my opinion the Attention is all you need paper is one of the most important papers for understanding how LLM are built and work. However, my background is woefully inadequate to understand the mathematics of it. What are some books and papers that I should read to be able to grok the paper, especially attention, and k,q,v matrices and how it is all operating? I like to think that I have fairly good mathematical maturity so don't hesitate to throw standard and difficult references at me, I don't want to read a common language explainer, I want to be able to write my own LLM, even though I might never have the budget to actually train it…
minimax: Efficient Baselines for Autocurricula in JAX Unsupervised environment design (UED) is a form of automatic curriculum learning for training robust decision-making agents to zero-shot transfer into unseen environments. Such autocurricula have received much interest from the RL community. However, UED experiments, based on CPU rollouts and GPU model updates, have often required several weeks of training. This compute requirement is a major obstacle to rapid innovation for the field. This work introduces the minimax library for UED training on accelerated hardware…
Hex is a collaborative workspace for data science and analytics. Now data teams can run their queries, notebooks, and interactive reports — all in one place. Hex has Magical AI tools that can generate queries and code, create visualizations, and even kickstart a whole analysis, all from natural language prompts, allowing teams to accelerate work and focus on what matters. Join hundreds of data teams like Notion, AllTrails, Loom, Brex, and Algolia using Hex every day to make their work more impactful. Sign up today at hex.tech/datascienceweekly to get a 30-day free trial of the Hex Team plan!
* Want to sponsor the newsletter? Email us for details --> team@datascienceweekly.org
Process Hundreds of GB of Data in the Cloud with Polars Local machines can struggle to process large datasets due to memory and network limitations. Coiled Functions provide a cloud-based solution that allows for efficient and cost-effective handling of such extensive datasets, overcoming the constraints of local hardware for complex data processing tasks. Incorporating libraries like Polars can further enhance this approach, leveraging optimized computation capabilities to process data more quickly and efficiently. In this post we’ll use Coiled Functions to process the 150 GB Uber-Lyft dataset on a single cloud machine with Polars…
Adding support for polynomials to Numba In this blog post, I will be talking about my experience as a summer intern at Quansight Labs working on enhancing NumPy support in Numba…(Numba is a just-in-time compiler for Python that translates a subset of Python and NumPy into fast machine code. The most common way to use Numba is through its collection of decorators that can be applied to your functions to instruct Numba to compile them.)…In particular, my work was focused on adding support for the Polynomial class as well as other functions from the polynomial API…
Skill Creep in ML/DL Roles - is the field getting not just more competitive, but more difficult? [Reddit] At what point do you think there was an inflection point for technical expertise and credentials requires for mid-top tier ML roles? Or was there never one? To be specific, would knowing simple scikit-learn algorithms, or basics of decision trees/SVM qualify you for full-fledged roles only in the past or does it still today? At what point did FAANGs boldly state: preferred (required) to have publications at top-tier venues (ICLR, ICML, CVPR, NIPS, etc) in their job postings?..
JAX, M.D. - Accelerated, Differentiable, Molecular Dynamics Molecular dynamics is a workhorse of modern computational condensed matter physics. It is frequently used to simulate materials to observe how small scale interactions can give rise to complex large-scale phenomenology…recent work in machine learning has led to significant software developments that might make it possible to write more concise molecular dynamics simulations that offer a range of benefits. Here we target JAX, which allows us to write python code that gets compiled to XLA and allows us to run on CPU, GPU, or TPU. Moreover, JAX allows us to take derivatives of python code. Thus, not only is this molecular dynamics simulation automatically hardware accelerated, it is also end-to-end differentiable. This should allow for some interesting experiments that we're excited to explore…
Data Internships Newsletter - A curated list of Data Science internships I have created this website to help hundreds of thousands of students who want to get their first job as Data Scientist…
MLOps Community: LLMs Mini Summit In his session, Thomas Capelle, ML Engineer at Weights & Biases, focuses on understanding the ins and outs of fine-tuning LLMs. We all have a lot of questions during the fine-tuning process. How do you prepare your data? How much data do you need? Do you need to use a high-level API, or can you do this in PyTorch? During this talk, we will try to answer these questions. Thomas will share some tips and tricks on his journey in the LLM fine-tuning landscape. What worked and what did not, and hopefully, you will learn from his experience and the mistakes he made…
GAIA: a benchmark for General AI Assistants We introduce GAIA, a benchmark for General AI Assistants that, if solved, would represent a milestone in AI research. GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency. GAIA questions are conceptually simple for humans yet challenging for most advanced AIs: we show that human respondents obtain 92% vs. 15% for GPT-4 equipped with plugins…
Adversarial Attacks on LLMs The use of large language models in the real world has strongly accelerated by the launch of ChatGPT. We (including my team at OpenAI, shoutout to them) have invested a lot of effort to build default safe behavior into the model during the alignment process (e.g. via RLHF). However, adversarial attacks or jailbreak prompts could potentially trigger the model to output something undesired…
When is Automation “Not Possible?” [Reddit] I came from a company where the data engineer I was working across told me that automation was “not possible.” I am of the opinion that automation, at least partial automation, is a possibility in most data warehousing and ETL processes. Could someone tell me when automation isn’t possible?..
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model Using reinforcement learning with human feedback (RLHF) has shown significant promise in fine-tuning diffusion models…However, crafting an efficient reward model demands extensive datasets, optimal architecture, and manual hyperparameter tuning, making the process both time and cost-intensive. The direct preference optimization (DPO) method, effective in fine-tuning large language models, eliminates the necessity for a reward model…To address this issue, we introduce the Direct Preference for Denoising Diffusion Policy Optimization (D3PO) method to directly fine-tune diffusion models…
Equivalence Between Policy Gradients and Soft Q-Learning Two of the leading approaches for model-free reinforcement learning are policy gradient methods and Q-learning methods. Q-learning methods can be effective and sample-efficient when they work, however, it is not well-understood why they work, since empirically, the Q-values they estimate are very inaccurate. A partial explanation may be that Q-learning methods are secretly implementing policy gradient updates: we show that there is a precise equivalence between Q-learning and policy gradient methods in the setting of entropy-regularized reinforcement learning, that "soft" (entropy-regularized) Q-learning is exactly equivalent to a policy gradient method…
Practical Tips for Finetuning LLMs Using LoRA (Low-Rank Adaptation) Low-rank adaptation (LoRA) is among the most widely used and effective techniques for efficiently training custom LLMs…For those interested in open-source LLMs, it's an essential technique worth familiarizing oneself with…Last month, I shared an article with several LoRA experiments, based on the open-source Lit-GPT repository that I co-maintain with my colleagues at Lightning AI. This article aims to discuss the primary lessons I derived from my experiments. Additionally, I'll address some of the frequently asked questions related to the topic. If you are interested in finetuning custom LLMs, I hope these insights will save you some time in "the long run" (no pun intended)…
We are BCG X. BCG X is the tech build & design unit of BCG. Turbocharging BCG’s deep industry and functional expertise, BCG X brings together advanced tech knowledge and ambitious entrepreneurship to help organizations enable innovation at scale. With nearly 3,000 technologists, scientists, programmers, engineers, and human-centered designers located across 80+ cities, BCG X builds and designs platforms and software to address the world’s most important challenges and opportunities. Our BCG X teams own the full analytics value-chain end to end: framing new business challenges, designing innovative algorithms, implementing, and deploying scalable solutions, and enabling colleagues and clients to fully embrace AI. Our product offerings span from fully custom-builds to industry specific leading edge AI software solutions. Our Data Scientists and Senior Data Scientist are part of our rapidly growing team to apply data science methods and analytics to real-world business situations across industries to drive significant business impact. You'll have the chance to partner with clients in a variety of BCG regions and industries, and on key topics like climate change, enabling them to design, build, and deploy new and innovative solutions.
Apply here Want to post a job here? Email us for details --> team@datascienceweekly.org
Simplest mathematical example of a function that can only be solved by gradient descent [Reddit] I'm trying to teach a lesson on gradient descent from a more statistical and theoretical perspective, and need a good example to show its usefulness. What is the simplest possible algebraic function that would be impossible or rather difficult to optimize for, by setting its 1st derivative to 0, but easily doable with gradient descent? I preferably want to demonstrate this in context linear regression or some extremely simple machine learning model…
Testing Language Model Agents Safely in the Wild A prerequisite for safe autonomy-in-the-wild is safe testing-in-the-wild. Yet real-world autonomous tests face several unique safety challenges, both due to the possibility of causing harm during a test, as well as the risk of encountering new unsafe agent behavior through interactions with real-world and potentially malicious actors. We propose a framework for conducting safe autonomous agent tests on the open internet: agent actions are audited by a context-sensitive monitor that enforces a stringent safety boundary to stop an unsafe test, with suspect behavior ranked and logged to be examined by humans…
Faith and Fate: Limits of Transformers on Compositionality Transformer large language models (LLMs) have sparked admiration for their exceptional performance on tasks that demand intricate multi-step reasoning. Yet, these models simultaneously show failures on surprisingly trivial problems. This begs the question: Are these errors incidental, or do they signal more substantial limitations? In an attempt to demystify transformer LLMs, we investigate the limits of these models across three representative compositional tasks -- multi-digit multiplication, logic grid puzzles, and a classic dynamic programming problem…
* Based on unique clicks. ** Find last week's issue #521 here.
Looking to hire? Hit reply to this email and let us know.
Looking to get a job? Check out our “Get A Data Science Job” Course A comprehensive course that teaches you everything related to getting a data science job based on answers to thousands of emails from readers like you. The course has 3 sections: Section 1 covers how to get started, Section 2 covers how to put together a portfolio to showcase your experience (even if you don’t have any), and Section 3 covers how to write your resume.
Promote yourself to ~60,000 subscribers by sponsoring this newsletter.
Thank you for joining us this week :) All our best, Hannah & Sebastian
P.S. Was today’s newsletter helpful to your job?
Consider becoming a paid subscriber here: https://datascienceweekly.substack.com/subscribe :)
Copyright © 2013-2023 DataScienceWeekly.org, All rights reserved.
Invite your friends and earn rewardsIf you enjoy Data Science Weekly Newsletter, share it with your friends and earn rewards when they subscribe. Invite Friends | |