Hello and thank you for tuning in to Issue #511!
Once a week we write this email to share the links we thought were worth sharing in the Data Science, ML, AI, Data Visualization, and ML/Data Engineering worlds.
Seeing this for the first time? Subscribe here:
If you find this newsletter helpful to your job, consider becoming a paid subscriber here: https://datascienceweekly.substack.com/subscribe :)
If you don’t find this email useful, please unsubscribe here.
And now, let's dive into some interesting links from this week :)
MVDream: Multi-view Diffusion for 3D Generation
We propose MVDream, a multi-view diffusion model that is able to generate geometrically consistent multi-view images from a given text prompt. By leveraging image diffusion models pre-trained on large-scale web datasets and a multi-view dataset rendered from 3D assets, the resulting multi-view diffusion model can achieve both the generalizability of 2D diffusion and the consistency of 3D data…
The Road to Composable Data Systems: Thoughts on the Last 15 Years and the Future
A new joint VLDB paper on Composable Data Management Systems with Meta, Databricks, Sundeck, and others at is out! This post is a reflection on how I arrived at thinking about these problems and what the future might look like. Enjoy…
Absolutely true. Studies have found that companies utilizing both these powerful tools not only maintain absolute accuracy in data, but also experience an incredible surge in the speed of self-serve analytics compared to traditional business intelligence methods.
Translation: Total accuracy and speedy results, at your fingertips.
But how can a typical business gain access to such cutting-edge tools that have traditionally been the realm of tech giants and industry leaders? The answer is Zenlytic, an award-winning Business Intelligence solution.
Not only is Zenlytic user-friendly, but it also significantly reduces the load on your data analysts, with an average of 90% reduction in ad hoc data requests from your business team. It's like having a supercharged analytics team, but without the added costs or complexity.
Recent users of Zenlytic have reported major improvements in their data handling and analysis process.
Today, Data Science Weekly readers can gain priority access to this transformative platform by skipping the waitlist with this exclusive link.
* Want to sponsor the newsletter? Email us for details --> team@datascienceweekly.org
Can LLMs learn from a single example?
While fine-tuning a large language model (LLM) on multiple-choice science exam questions, we observed some unusual training loss curves….it appeared the model was able to rapidly memorize examples from the dataset after seeing them just once. This astonishing feat contradicts most prior wisdom about neural network sample efficiency. Intrigued by this result, we conducted a series of experiments to validate and better understand this phenomenon. It’s early days, but the experiments support the hypothesis that the models are able to rapidly remember inputs. This might mean we have to re-think how we train and use LLMs.
Mapping oak wilt disease from space using land surface phenology
Protecting the future of forests relies on our ability to observe changes in forest health. Thus, developing tools for sensing diseases in a timely fashion is critical for managing threats at broad scales. Oak wilt —a disease caused by a pathogenic fungus (Bretziella fagacearum)— is threatening oaks, killing thousands yearly while negatively impacting the ecosystem services they provide. Here we propose a novel workflow for mapping oak wilt by targeting temporal disease progression through symptoms using land surface phenology (LSP) from spaceborne observations…
Introduction to Hilbert Space Gaussian Processes in PyMC
Gaussian processes (GPs) are a versatile tool in the Bayesian modelers toolbox – in theory. In practice, for all but the smallest data sets, one needs to resort to approximations to actually fit GPs in any reasonable amount of time…The Hilbert Space Gaussian Process (HSGP) approximation works well with any likelihood and scales as O(nm + m). In this talk I’ll introduce a PyMC HSGP implementation and show via case studies how it fills a few key gaps in the PyMC GP library: fast GPs as model subcomponents, and fast GPs with non-Gaussian likelihoods. I’ll also cover tips and tricks for applying HSGPs effectively in practice…
10 hard-earned lessons from shipping generative AI products over the past 18 months [Reddit Discussion]
I'm the founder of a generative AI consultancy and we build Gen AI powered products for other companies. We've been doing this for 18 months now and I thought I share our learnings - it might help others…
Communicative Agents for Software Development
In this paper, we present an innovative paradigm that leverages large language models (LLMs) throughout the entire software development process, streamlining and unifying key processes through natural language communication, thereby eliminating the need for specialized models at each phase. At the core of this paradigm lies ChatDev, a virtual chat-powered software development company that mirrors the established waterfall model, meticulously dividing the development process into four distinct chronological stages: designing, coding, testing, and documenting. Each stage engages a team of agents, such as programmers, code reviewers, and test engineers, fostering collaborative dialogue and facilitating a seamless workflow. The chat chain acts as a facilitator, breaking down each stage into atomic subtasks…
How Principal Are Your Components?
In a previous post I explored the correlations between measurements in the ANSUR-II dataset, which includes 93 measurements from a sample of U.S. military personnel…A friend of mine, and co-developer of the Modeling and Simulation class I taught at Olin, asked whether I had tried running principal component analysis (PCA). I had not, but now I have. Let’s look at the results…The principal components of human dimensions are height, girth, torso length, hands and feet, shoulders, head, pelvis, and ears…
UK’s Frontier AI Taskforce: first progress report
The Taskforce is a start-up inside government, delivering on the ambitious mission given to us by the Prime Minister: to build an AI research team that can evaluate risk at the frontier of AI. As AI systems become more capable they may significantly augment risks…
Generating Conversation: RLHF and LLM Evaluations with Nathan Lambert
This week on Generating Conversation, we have Nathan Lambert with us. Nathan is a research scientist and RLHF team lead at HuggingFace. Nathan did his PhD at UC Berkeley working on reinforcement learning. After graduating, he joined HuggingFace, where he’s been working on RL with human feedback (RLHF) and LLM evaluations. He’s also the author of the popular blog Interconnects…
How would YOU handle Data Science recruitment? [Reddit Discussion]
Let's say you're the hiring manager for a Data science role that you've benchmarked as needing someone with ~1 to 2 years experience. The job role automatically closes after it's got 1000 applicants... which you get in about a day.
How do you handle those 1000 applicants?…
Code Llama: Open Foundation Models for Code
We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks. We provide multiple flavors to cover a wide range of applications: foundation models (Code Llama), Python specializations (Code Llama - Python), and instruction-following models (Code Llama - Instruct) with 7B, 13B and 34B parameters each…
Can robots find their own reward functions?
In reinforcement learning, an agent learns behaviors to maximize expected cumulative rewards. An open issue in reinforcement learning applications is how to design a reward function for a desired behavior. A related issue in neuroscience is what rewards really are in animals and humans. This talk will present embodied evolution experiments to test whether robots can acquire their own reward functions for survival and reproduction, and ongoing research on evolving intrinsic rewards to promote directed exploration…
At LVMH, San Francisco, we are looking for a Manager of CRM Data Science Analytics to join our team and help us transform our customer data into insights and strategies for our luxury brands.
The successful candidate will have an extensive background in data science, analytics and customer relationship management (CRM), as well as a strong understanding of the luxury industry and how it applies to customer data analysis.
The Manager of CRM Data Science Analytics will be responsible for leveraging customer data and analytics to inform CRM strategies, drive customer engagement and ensure our brands’ success. In addition, this individual will support the development and implementation of data-driven strategies across the entire LVMH group
Apply here
Want to post a job here? Email us for details --> team@datascienceweekly.org
Teaching statistics interactively with webR
I really enjoyed talking about using webR for interactive teaching at RSSAnnualConf today!…Slides (with webR demo) available…
Applied Demographic Data Analysis
My goal for this book is to take the lessons I’ve learned teaching statistics to a diverse and often cursorily trained group of students who have problems they care about, that they need to bring demographic data to bear upon. This is a challenge, and I have always been a stalwart proponent of teaching statistics and data analysis in a very applied manner. As such, this book won’t be going into rigorous proofs of estimators or devoting pages to expositions of icky algebra; instead it will focus on exploring modern methods of data analysis that in used by demographers every day, but not always taught in our training programs…
Markov Chains: Why Walk When You Can Flow?
If you are still using a Gibbs sampler, you are working too hard for too little result. Newer, better algorithms trade random walks for frictionless flow…
* Based on unique clicks.
** Find last week's issue #510 here.
Thank you for joining us this week :)
All our best,
Hannah & Sebastian
P.S.
If you found this newsletter helpful to your job, please consider becoming a paid subscriber here: https://datascienceweekly.substack.com/subscribe :)
Copyright © 2013-2023 DataScienceWeekly.org, All rights reserved.