|
|
Editor's Picks
- On Analogy-Making in Large Language Models
I read with great interest a recent paper by cognitive scientists Taylor Webb, Keith Holyoak, and Hongjing Lu, entitled “Emergent Analogical Reasoning in Large Language Models. This paper investigates zero-shot analogical reasoning abilities in GPT-3...In this article I give some of my own perspectives on the Webb et al. paper’s results and claims. I discuss the analogy problems that Webb et al. gave to GPT-3 (in this paper, “GPT-3” will refer to text-davinci-003), do some of my own experiments on letter-string analogies (one of their problem types), and draw some conclusions about the robustness and generality of GPT-3’s analogy-making abilities...
- Prompt Engineering 101: Introduction and resources
Generative AI models interface with the user through mostly textual input. You tell the model what to do through a textual interface, and the model tries to accomplish the task. What you tell the model to do in a broad sense is the prompt...In this article we'll cover: a) What is a prompt?, b) Elements of a prompt, c) Basic prompt examples, d) So, what is prompt engineering anyways?, e) Some more advanced prompt examples, and f) Resources...
A Message from this week's Sponsor:
Pinecone vector database
The Pinecone vector database makes it easy to build high-performance vector search applications. Developer-friendly, fully managed, and easily scalable without infrastructure hassles.
Use Pinecone to build semantic search, object recognition, recommendations, anomaly detection, and other vector-based functionality into your applications.
Data Science Articles & Videos
- What Do You Median?
Most empirical studies, even to this day, use Ordinary Least Squares (OLS) to estimate regression models. Many of us even have a modicum of understanding as to why: OLS is "great." Some may even know in what sense OLS is "great" ... it is BLUE, where BLUE stands for the Best Linear Unbiased Estimator. Unfortunately, that may be where most understanding stops...But, this begs two questions that empirical researchers and consumers of empirical research ought to understand: 1) What does it mean to be BLUE and should we care? and 2) What assumptions are required for OLS to be BLUE and what happens if they fail?...
- Data Pipeline Design Patterns: #1 - Data flow patterns
Data pipelines can become flakey over time if the data pipeline design foundations are not solid...This post will cover the typical data flow design patterns. We will learn about the pros and cons of each design pattern, when to use them, and, more importantly, when not to use them...
- Towards Deployable RL - What’s Broken with RL Research and a Potential Fix
Reinforcement learning (RL) has demonstrated great potential, but is currently full of overhyping and pipe dreams. We point to some difficulties with current research which we feel are endemic to the direction taken by the community. To us, the current direction is not likely to lead to “deployable” RL: RL that works in practice and can work in practical situations yet still is economically viable. We also propose a potential fix to some of the difficulties of the field...
- ShinyConf 2023 Call For Speakers
We invite members of the R community to submit talks for this year’s all-virtual ShinyConf on March 15-17, 2023!...Any talks relating to R Shiny are acceptable for consideration – whether the talk is about a Shiny app you’ve created, an introduction to a package you have developed, or an explanation of how you are using Shiny in your research or business...
- How Shapley Values Work
Shapley values - and their popular extension, SHAP - are machine learning explainability techniques that are easy to use and interpret. However, trying to make sense of their theory can be intimidating. In this article, we will explore how Shapley values work - not using cryptic formulae, but by way of code and simplified explanations...
- Writing a Python SQL engine from scratch
This post will cover why I went through the effort of creating a Python SQL engine and how a simple query goes from a string to actually transforming data. The following steps are briefly summarized: a) Tokenizing, b) Parsing, c) Optimizing, d) Planning, and e) Executing...
- Datacast Episode 106: Advancing AI Adoption with Dania Meira
Dania Meira is the founding member/director of AI Guild - the go-to community for data and business professionals advancing AI adoption...Our wide-ranging conversation touches on her upbringing and education in Brazil, her early career in marketing intelligence, her move to Berlin to work as a data scientist in different startups, her current journey with AI Guild building the go-to community for data professionals advancing AI adoption, the evolution of the data field over the past decade, and much more...
- A Tale of Two Means
This article is dedicated to learning how to compare two populations, building on our knowledge of a sample mean compared to the population...We may want to investigate if there is a difference on some facet between two populations (or two samples). For example, is there a difference in the age of those attending medical school at Northwestern University or University of Chicago? Here we seek to compare the two means, rather than in the previous bootcamp, where we tried to see if there existed a difference in the mean of a sample to a mean relative to the population...
- Explaining Reinforcement Learning with Human Feedback
Reinforcement learning with human feedback is a new technique for training next-gen language models like ChatGPT. Instead of training LLMs merely to predict the next word, we train them to understand instructions and generate helpful responses...Want to learn more about RLHF and how it works? Read on!...
- 2022 Top Papers in AI — A Year of Generative Models
This year, we see significant progress in the field of generative models. Stable Diffusion 🎨 creates hyperrealistic art. ChatGPT 💬 answers questions to the meaning of life. Galactica 🧬 learns humanity’s scientific knowledge but also reveals the limitations of large language models...This article is my take on the 20 most impactful AI papers of 2022...
Tool*
Build powerful ML visualizations with Comet
With just 2 lines of code, Comet automatically logs metrics, hyperparameters, libraries, and more. This means automatic chart generation so you can easily manage training runs in real time. When you combine that with:
- built-in visualizations (like the image panel),
- custom project views, and
- your own python panels,
Comet is a powerful tool for optimizing your ML workflow. All for free! Less friction, more ML.
Create your free account.
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!
Jobs
- Data Scientist / Machine Learning Engineer - Epsilon - NYC
Epsilon Strategy and Insights, Data Sciences team is looking for a talented team player in a Data Scientist/Machine Learning Engineer role. You are an expert, mentor and advocate. You have strong machine learning and deep learning background and are passionate about transforming data into ml models. You welcome the challenge of data science and are proficient in Python, Spark MLLib, Tensorflow, Keras, ML algorithms and Deep Neural Networks, Big Data. You must be self-driven, take initiative and want to work in a dynamic, busy and innovative group...
Want to post a job here? Email us for details --> team@datascienceweekly.org
Training & Resources
- Probabilistic Machine Learning: Advanced Topics
I am delighted to announce that the "real" camera-ready version of my new book, "Probabilistic Machine Learning: Advanced Topics", is now available. It will appear in print this summer, but it is already freely available online at...
- The Illustrated Machine Learning Website
Welcome to our website, where we strive to make the complex world of Machine Learning more approachable through clear and concise illustrations. Our goal is to provide a visual aid for students, professionals, and anyone preparing for a technical interview to better understand the underlying concepts of Machine Learning...
- Diffusion Models - Live Coding Tutorial [YouTube]
This is my live (to the most extent) coding video, where I implement from a scratch a diffusion model that generates 32 x 32 RGB images. The tutorial assumes a basic knowledge of deep learning and Python...
Last Week's Newsletter's 3 Most Clicked Links
* Based on unique clicks.
** Find last week's newsletter here.
Cutting Room Floor
P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian |
|
|
|