Hello and thank you for tuning in to Issue #500.
Once a week we write this email to share the links we thought were worth sharing in the Data Science, ML, AI, Data Visualization, and ML/Data Engineering worlds.
***
Seeing this for the first time? Subscribe here:
***
Want to support us? Become a paid subscriber here.
***
If you don’t find this email useful, please unsubscribe here.
***
And now, let's dive into some interesting links from this week:
:)
People: The API User’s Guide
A keynote I gave at PyCon 2023…This is the overall gist: the stereotype of engineers is that we are bad at people interactions because we are engineers, but my claim is that we can use our engineering skills to improve. We can be good at people interactions because we are engineers. Here’s the video, or you can read the text below on this page….
The New New Moats: Why Systems of Intelligence are still the next defensible business model
Six years ago I published “The New Moats: Why Systems of Intelligence are the next defensible business model.” In that blog, I postulated that startups would be able to build defensible moats using AI. In light of all the developments in the past year, I want to revisit this framework and see what still holds true and what has changed…To illustrate my thinking, I’ve taken a red pen to the original “Systems of Intelligence” framing, updating and amending predictions, posing new questions, and throwing out others. I hope this exercise helps to keep us grounded as we navigate the current AI hype cycle…
Exploring your datasets and augmenting them has never been easier. Generative AI is here to help you explore and share data in a privacy safe way. Rebalance, impute, and sample in any way you want. You can sign up for MOSTLY AI's advanced AI-powered synthetic data generator for free to get:
Data anonymization without utility loss, even for time-series data
Rebalancing for advanced analytics & ML
Data imputation with statistically relevant synthetic data for better readability
Bigger or smaller (yet representative!) versions of your datasets
Conservative, representative, or creative generation moods for fun data exploration
Are you ready to experience synthetic data? Sign up for free! No credit card required.
Want to sponsor the newsletter? Email us for details --> team@datascienceweekly.org
Using Large Language Models With Care
LLMs carry risks that have already led to real harm, and while it shouldn’t be the responsibility of the user to figure out these risks on their own, current tools often don’t explain these risks or provide safeguards. With these concerns in mind, we’re sharing an introductory outline of the risks of LLMs, written for the everyday user. We focus on risks of current text-based systems that can directly affect users, leaving the discussion of societal risks and risks posed by image generation tools to other writers…
Some Moral and Technical Consequences of Automation (Norbert Wiener, 1960)
As machines learn they may develop unforeseen strategies at rates that baffle their programmers…
AI Getting Started
A Javascript AI getting started stack for weekend projects, including image/text models, vector stores, auth, and deployment configs..
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
In this work, we introduce TinyStories, a synthetic dataset of short stories that only contain words that a typical 3 to 4-year-olds usually understand, generated by GPT-3.5 and GPT-4. We show that TinyStories can be used to train and evaluate LMs that are much smaller than the state-of-the-art models (below 10 million total parameters), or have much simpler architectures (with only one transformer block), yet still produce fluent and consistent stories with several paragraphs that are diverse and have almost perfect grammar, and demonstrate reasoning capabilities…
Adversarial Collaboration Project
The Adversarial Collaboration Project supports scholars with clashing theoretical-ideological views to engage in best practices for resolving scientific disputes. There are many ongoing debates in the social and behavioral sciences that influence policy and organizational decision-making in which both sides have become entrenched, research findings have become politicized, and scientific progress has come to a halt. We seek to stimulate a culture shift among social and behavioral scientists whose work touches on polarizing topics with policy significance by encouraging disagreeing scholars to work together to make scientific progress…
Immersive 3D Rendering from Casual Videos
We present an algorithm for reconstructing the radiance field of a large-scale scene from a single casually captured video. The task poses two core challenges. First, most existing radiance field reconstruction approaches rely on accurate pre-estimated camera poses from Structure-from-Motion algorithms, which frequently fail on in-the-wild videos. Second, using a single, global radiance field with finite representational capacity does not scale to longer trajectories in an unbounded scene…
Infinite Photorealistic Worlds using Procedural Generation
We introduce Infinigen, a procedural generator of photorealistic 3D scenes of the natural world. Infinigen is entirely procedural: every asset, from shape to texture, is generated from scratch via randomized mathematical rules, using no external source and allowing infinite variation and composition…
Are you passionate about data and would like to help The LEGO Group to discover deeper insights, make better predictions, or generate relevant product recommendations?
This is your chance to apply data science in a real business context to contribute to one of the world’s best-loved brands. Our team is responsible for the LEGO Builder app (Digital building instructions). Help us become data driven in our development, by finding patterns in the data and thus helping us get the insights to the usage and users affinity –thereby helping us to build an even more engaging and proven experience for the Builders of tomorrow.
Apply here
Want to post a job here? Email us for details --> team@datascienceweekly.org
Prompt Engineering 201: Advanced methods and toolkits
Six months ago I published my Prompt Engineering 101 post. That later turned into a pretty popular LinkedIn course that has been taken by over 20k people at this point. That post and course were thought out as a gentle introduction to the topic. If you are new to prompt engineering, please start there. Also, a lot has happened in the world of LLMs since then. So, now is a good time to complement the Prompt Engineering 101 with an up-to-date and a bit more advanced post. I’ll call it Prompt Engineering 201. I will start off by repeating a “classic” technique that was already covered in the “advanced section” of the original post, Chain of Thought, but will build up from there…
Polars Cookbook
This is a fork the pandas-cookbook modified to use the polars library instead of pandas. polars is a Python library for doing data analysis. It's really fast and lets you do exploratory work incredibly quickly. The goal of this cookbook is to give you some concrete examples for getting started with polars. The docs are really comprehensive. However, I've often had people tell me that they have some trouble getting started, so these are examples with real-world data, and all the bugs and weirdness that entails…
Why aren't gaussian processes used more often? [ Reddit Discussion ]
I didn't learn GPs in school, rather I came across a Youtube video on it and found the concept interesting and dug deeper. I think I have a good intuition into GPs, and they seem incredibly good for modeling non linear data. I know they have disadvantages as fitting a GP is O(n^3) and tuning the hyper-parameters is almost an art, but still... why aren't they used more often? the assumptions we make on other regression models don't come into play here, stuff like linear dependency, homoscedasticity, etc... and we get a free uncertainty estimation "custom" to each prediction…
* Based on unique clicks.
** Find last week's issue #499 here.
Thanks for joining us this week :)
All our best,
Hannah & Sebastian
P.S.,
If you found this newsletter helpful, consider supporting us by becoming a paid subscriber here: https://datascienceweekly.substack.com/subscribe :)
Copyright © 2013-2023 DataScienceWeekly.org, All rights reserved.