Data Science Weekly - Issue 500

Curated news, articles and jobs related to Data Science

Jun 23

Share

Issue #500
June 22 2023

Hello and thank you for tuning in to Issue #500.

Once a week we write this email to share the links we thought were worth sharing in the Data Science, ML, AI, Data Visualization, and ML/Data Engineering worlds.

***

Seeing this for the first time? Subscribe here:

***

Want to support us? Become a paid subscriber here.

***

If you don’t find this email useful, please unsubscribe here.

***

And now, let's dive into some interesting links from this week:

:)

Editor's Picks

AI will soon be able to cover public meetings. But should it?
“Is it ready for primetime, ready to be released to the masses? Absolutely not…But can it be done? Can you design an AI system that attends a city meeting and generates a story? Yeah, I did it.”…

People: The API User’s Guide
A keynote I gave at PyCon 2023…This is the overall gist: the stereotype of engineers is that we are bad at people interactions because we are engineers, but my claim is that we can use our engineering skills to improve. We can be good at people interactions because we are engineers. Here’s the video, or you can read the text below on this page….

The New New Moats: Why Systems of Intelligence are still the next defensible business model
Six years ago I published “The New Moats: Why Systems of Intelligence are the next defensible business model.” In that blog, I postulated that startups would be able to build defensible moats using AI. In light of all the developments in the past year, I want to revisit this framework and see what still holds true and what has changed…To illustrate my thinking, I’ve taken a red pen to the original “Systems of Intelligence” framing, updating and amending predictions, posing new questions, and throwing out others. I hope this exercise helps to keep us grounded as we navigate the current AI hype cycle…

A Message from this week's Sponsor:

Generative AI for data science

Exploring your datasets and augmenting them has never been easier. Generative AI is here to help you explore and share data in a privacy safe way. Rebalance, impute, and sample in any way you want. You can sign up for MOSTLY AI's advanced AI-powered synthetic data generator for free to get:

Data anonymization without utility loss, even for time-series data
Rebalancing for advanced analytics & ML
Data imputation with statistically relevant synthetic data for better readability
Bigger or smaller (yet representative!) versions of your datasets
Conservative, representative, or creative generation moods for fun data exploration

Are you ready to experience synthetic data? Sign up for free! No credit card required.

Want to sponsor the newsletter? Email us for details --> team@datascienceweekly.org

Data Science Articles & Videos

Hand-Crafted Transformers
I explain a transformer as a kind of "virtual machine" and then choose weights to make one execute long-form addition the way humans do…

Why professors are so bad at giving advice
I was recently on a panel with several other professors and we were asked to give some tips to graduate students in machine learning. It got me thinking about why professors are so bad at giving advice. So here are some reasons why you should not take advice from professors…

Using Large Language Models With Care
LLMs carry risks that have already led to real harm, and while it shouldn’t be the responsibility of the user to figure out these risks on their own, current tools often don’t explain these risks or provide safeguards. With these concerns in mind, we’re sharing an introductory outline of the risks of LLMs, written for the everyday user. We focus on risks of current text-based systems that can directly affect users, leaving the discussion of societal risks and risks posed by image generation tools to other writers…
Some Moral and Technical Consequences of Automation (Norbert Wiener, 1960)
As machines learn they may develop unforeseen strategies at rates that baffle their programmers…
AI Getting Started
A Javascript AI getting started stack for weekend projects, including image/text models, vector stores, auth, and deployment configs..

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
In this work, we introduce TinyStories, a synthetic dataset of short stories that only contain words that a typical 3 to 4-year-olds usually understand, generated by GPT-3.5 and GPT-4. We show that TinyStories can be used to train and evaluate LMs that are much smaller than the state-of-the-art models (below 10 million total parameters), or have much simpler architectures (with only one transformer block), yet still produce fluent and consistent stories with several paragraphs that are diverse and have almost perfect grammar, and demonstrate reasoning capabilities…
Adversarial Collaboration Project
The Adversarial Collaboration Project supports scholars with clashing theoretical-ideological views to engage in best practices for resolving scientific disputes. There are many ongoing debates in the social and behavioral sciences that influence policy and organizational decision-making in which both sides have become entrenched, research findings have become politicized, and scientific progress has come to a halt. We seek to stimulate a culture shift among social and behavioral scientists whose work touches on polarizing topics with policy significance by encouraging disagreeing scholars to work together to make scientific progress…

Freakonomics Podcast: Satya Nadella’s Intelligence Is Not Artificial
As C.E.O. of the resurgent Microsoft, he is firmly at the center of the A.I. revolution. We speak with him about the perils and blessings of A.I., Google vs. Bing, the Microsoft succession plan — and why his favorite use of ChatGPT is translating poetry…

Immersive 3D Rendering from Casual Videos
We present an algorithm for reconstructing the radiance field of a large-scale scene from a single casually captured video. The task poses two core challenges. First, most existing radiance field reconstruction approaches rely on accurate pre-estimated camera poses from Structure-from-Motion algorithms, which frequently fail on in-the-wild videos. Second, using a single, global radiance field with finite representational capacity does not scale to longer trajectories in an unbounded scene…

Infinite Photorealistic Worlds using Procedural Generation
We introduce Infinigen, a procedural generator of photorealistic 3D scenes of the natural world. Infinigen is entirely procedural: every asset, from shape to texture, is generated from scratch via randomized mathematical rules, using no external source and allowing infinite variation and composition…

Designing artificial conversational agents to train children's curiosity during learning, a proof of concept through the Kids Ask project
Our “Kids Ask” project leverages new technologies and proposes a curiosity training based on the awareness of “knowledge gaps”; paying particular attention to self-assessment and how to avoid the “knowledge illusion” trap. In particular, we implement conversational agents capable of stimulating questioning and exploration motivated by the desire to compensate for specific missing information…

The AI Apocalypse: A Scorecard
How worried are top AI experts about the threat posed by large language models like GPT-4?…

Jobs

Data Scientist at The LEGO Group

Are you passionate about data and would like to help The LEGO Group to discover deeper insights, make better predictions, or generate relevant product recommendations?

This is your chance to apply data science in a real business context to contribute to one of the world’s best-loved brands. Our team is responsible for the LEGO Builder app (Digital building instructions). Help us become data driven in our development, by finding patterns in the data and thus helping us get the insights to the usage and users affinity –thereby helping us to build an even more engaging and proven experience for the Builders of tomorrow.

Apply here

Want to post a job here? Email us for details --> team@datascienceweekly.org

Training & Resources

Prompt Engineering 201: Advanced methods and toolkits
Six months ago I published my Prompt Engineering 101 post. That later turned into a pretty popular LinkedIn course that has been taken by over 20k people at this point. That post and course were thought out as a gentle introduction to the topic. If you are new to prompt engineering, please start there. Also, a lot has happened in the world of LLMs since then. So, now is a good time to complement the Prompt Engineering 101 with an up-to-date and a bit more advanced post. I’ll call it Prompt Engineering 201. I will start off by repeating a “classic” technique that was already covered in the “advanced section” of the original post, Chain of Thought, but will build up from there…
Polars Cookbook
This is a fork the pandas-cookbook modified to use the polars library instead of pandas. polars is a Python library for doing data analysis. It's really fast and lets you do exploratory work incredibly quickly. The goal of this cookbook is to give you some concrete examples for getting started with polars. The docs are really comprehensive. However, I've often had people tell me that they have some trouble getting started, so these are examples with real-world data, and all the bugs and weirdness that entails…
Why aren't gaussian processes used more often? [ Reddit Discussion ]
I didn't learn GPs in school, rather I came across a Youtube video on it and found the concept interesting and dug deeper. I think I have a good intuition into GPs, and they seem incredibly good for modeling non linear data. I know they have disadvantages as fitting a GP is O(n^3) and tuning the hyper-parameters is almost an art, but still... why aren't they used more often? the assumptions we make on other regression models don't come into play here, stuff like linear dependency, homoscedasticity, etc... and we get a free uncertainty estimation "custom" to each prediction…

Last Week's Newsletter's 3 Most Clicked Links

You don't need the Modern Data Stack to get sh*t done

Does anyone else hate Pandas?

Knowledge Graphs & LLMs: Multi-Hop Question Answering

* Based on unique clicks.
** Find last week's issue #499 here.

Cutting Room Floor

Thanks for joining us this week :)

All our best,
Hannah & Sebastian

P.S.,
If you found this newsletter helpful, consider supporting us by becoming a paid subscriber here: https://datascienceweekly.substack.com/subscribe :)

You're currently a free subscriber to Data Science Weekly Newsletter. For the full experience, upgrade your subscription.

Data Science Weekly - Data Science Weekly - Issue 500

Data Science Weekly - Issue 500

Curated news, articles and jobs related to Data Science

Issue #500
June 22 2023

Editor's Picks

A Message from this week's Sponsor:

Generative AI for data science

Data Science Articles & Videos

Jobs

Data Scientist at The LEGO Group

Training & Resources

Last Week's Newsletter's 3 Most Clicked Links

Cutting Room Floor

Older messages

Data Science Weekly - Issue 499

Data Science Weekly - Issue 498

Data Science Weekly - Issue 497

Data Science Weekly - Issue 496

Data Science Weekly - Issue 495

You Might Also Like

Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator

Defining Your Paranoia Level: Navigating Change Without the Overkill

5 ways AI can help with taxes 🪄

Recurring Automations + Secret Updates

The First Provable AI-Proof Game: Introducing Butterfly Wings 4

GCP Newsletter #437

Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰

The Great Social Media Diaspora & Tapestry is here

Daily Coding Problem: Problem #1689 [Medium]

📧 Stop Conflating CQRS and MediatR

Data Science Weekly - Data Science Weekly - Issue 500

Curated news, articles and jobs related to Data Science

Issue #500June 22 2023

Editor's Picks

A Message from this week's Sponsor:

Generative AI for data science

Data Science Articles & Videos

Jobs

Training & Resources

Last Week's Newsletter's 3 Most Clicked Links

Cutting Room Floor

Older messages

You Might Also Like

Issue #500
June 22 2023