Data Science Weekly - Issue 494

Curated news, articles and jobs related to Data Science

May 12

Share

Issue #494
May 11 2023

Hello and thank you for tuning in to Issue #494.

Once a week we write this email to share the links we thought were worth sharing in the Data Science, ML, AI, Data Visualization, and ML/Data Engineering worlds.

***

Seeing this for the first time? Subscribe here:

***

We run a subscriber-only Slack community where we tackle learning the latest tools, keeping up with the latest techniques, career entry & growth, and anything else that's stressing you out at the office.

Become a paid subscriber here if this newsletter is helpful to your work:
https://datascienceweekly.substack.com/subscribe

Let’s build great Data/ML products, drive results, and accelerate your career.

***

Lastly, if you don’t find this email useful, please unsubscribe here.

***

And now, let's dive into some interesting links from this week:

Hope you enjoy it!

:)

Editor's Picks

The leaked google memo is a great overview of what open source AI has achieved... but the conclusion is wrong [Twitter]
OSS enables permissionless innovation. This going to be the driving force that makes models small, fast and installed everywhere ...but the open source models will always be playing catchup for the best quality…

Data is not available upon request
Many journals now require data sharing and require articles to include a Data Availability Statement. However, several studies over the past two decades have shown that promissory notes about data sharing are rarely abided by, and that data is generally not available upon request. This has negative consequences for many essential aspects of scientific knowledge production, including independent verification of results, efficient secondary use of data, and knowledge synthesis. Here, I assessed the prevalence of data sharing upon request in articles employing the Implicit Relational Assessment Procedure published within the last 5 years…

Haiku Imagined
Great poems are timeless. We wanted to illustrate these classic and modern haiku poems with the latest technical innovations in a way that preserves and extends their artistry...The visual assets include the letter shapes and icons, the music clips, and the video were created using Google AI generation research. The interface was constructed with three.js…

A Message from this week's Sponsor:

Track every customer interaction in real-time and gain a deep understanding of your customers’ behavior

Track every customer interaction in real-time and gain a deep understanding of your customers’ behavior

Segment Unify allows you to unite online and offline customer data in real-time across every platform and channel. Use Segment Profiles Sync to send identity resolved customer profiles to your data warehouse, where they can be used for advanced analytics and enhanced with valuable data-at-rest. Then use Segment Reverse ETL to immediately activate your ‘golden’ profiles across your CX tools of choice.

Want to sponsor the newsletter? Email us for details --> team@datascienceweekly.org

Data Science Articles & Videos

“Last Week in Computer Vision” Newsletter
There's a new champion in town for state-of-the-art (SOTA) object detection models - YOLO-NAS 🚀. Midjourney has rolled out another version - V5.1. While it's indeed a marvel, text rendering remains its Achilles' heel 🥲. OpenAI has dropped a new generative model for 3D asset creation 🦾. I've also got a bunch of learning resources and insights to share. Hope you find them as exciting as I do! 🤗 Happy reading!…

AI girlfriends are going to be a huge market [Twitter]
Influencer Caryn Marjorie trained a voice chatbot on thousands of hours of her videos. She started charging $1/minute for access - and made $72k in the first week…

The Complete Introduction to Survival Analysis in Python
Understand survival analysis, its use in the industry, and how to apply it in Python…
The System Model and the User Model: Exploring AI Dashboard Design
This is a speculative essay on interface design and artificial intelligence. Recently there has been a surge of attention to chatbots based on large language models, including widely reported unsavory interactions. We contend that part of the problem is that text is not all you need: sophisticated AI systems should have dashboards, just like all other complicated devices. Assuming the hypothesis that AI systems based on neural networks will contain interpretable models of aspects of the world around them, we discuss what data such dashboards might display. We conjecture that, for many systems, the two most important models will be of the user and of the system itself…
A Practical Guide to Feature Embeddings for ML Engineers
In this practical guide ,we'll delve into the various techniques and methodologies used for feature engineering with embeddings, including popular techniques such as pre-trained convolutional neural networks (CNNs) and autoencoders. We will also explore the advantages and disadvantages of different embedding techniques, as well as how to optimize them for specific tasks…

Introduction to autoencoders
Autoencoders are an unsupervised learning technique in which we leverage neural networks for the task of representation learning. Specifically, we'll design a neural network architecture such that we impose a bottleneck in the network which forces a compressed knowledge representation of the original input. If the input features were each independent of one another, this compression and subsequent reconstruction would be a very difficult task. However, if some sort of structure exists in the data (ie. correlations between input features), this structure can be learned and consequently leveraged when forcing the input through the network's bottleneck…
Learning Physically Simulated Tennis Skills from Broadcast Videos
We present a system that learns diverse, physically simulated tennis skills from large-scale demonstrations of tennis play harvested from broadcast videos. Our approach is built upon hierarchical models, combining a low-level imitation policy and a high-level motion planning policy to steer the character in a motion embedding learned from broadcast videos…We demonstrate that our system produces controllers for physically-simulated tennis players that can hit the incoming ball to target positions accurately using a diverse array of strokes (serves, forehands, and backhands), spins (topspins and slices), and playing styles (one/two-handed backhands, left/right-handed play). Overall, our system can synthesize two physically simulated characters playing extended tennis rallies with simulated racket and ball dynamics…

R for the Rest of Us Podcast Episode #9: Pull in survey results (Meghan Harris)
In this episode, I speak with Meghan Harris, a data integration specialist at the Primary Care Research Institute, University of Buffalo. There, she brings together data from multiple sources to create insights that benefit people affected by opioid use disorder. Meghan talks about how she uses R to pull data directly from Google Sheets, and highlights the advantages of this workflow as opposed to working on a manually downloaded Google Sheets file…

Building Better Data Warehouses with Dimensional Modeling: A Guide for Data Engineers
Data model design is crucial for setting the foundation for any data warehouse system. I want to bring the community’s attention to the essential- Building Better Data Warehouses with Dimensional Modeling: A Guide for Data Engineers…

Attention Viz
Attention Viz is an interactive tool that visualizes global attention patterns for transformer models. To create this tool, we visualize the joint embeddings of query and key vectors. Click a button below to learn more...

Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus? [PDF]
Newly-developed large language models (LLM)—because of how they are trained and designed—are implicit computational models of humans—a homo silicus. LLMs can be used like economists use homo economicus: they can be given endowments, information, preferences, and so on, and then their behavior can be explored in scenarios via simulation. Experiments using this approach, derived from Charness and Rabin (2002), Kahneman, Knetsch and Thaler (1986), and Samuelson and Zeckhauser (1988) show qualitatively similar results to the original, but it is also easy to try variations for fresh insights. LLMs could allow researchers to pilot studies via simulation first, searching for novel social science insights to test in the real world…

Improving Deep Reinforcement Learning via Quality Diversity, Open-Ended and AI-Generating Algorithms
Quality Diversity (QD) algorithms are those that seek to produce a diverse set of high-performing solutions to problems. I will describe them and a number of their positive attributes. I will summarize how they enable robots, after being damaged, to adapt in 1-2 minutes in order to continue performing their mission. I will next describe our QD-based Go-Explore algorithm, which dramatically improves the ability of deep reinforcement learning algorithms to solve previously unsolvable problems wherein reward signals are sparse, meaning that intelligent exploration is required…

Jobs

Game Data Pros: Data Scientist

Do you have an expertise in experimental design and Bayesian statistics? Experience with Stan (we're a Stan shop) or a comparable PPL? Want to work with awesome people on cool projects in the video game industry? We're hiring Data Scientists!

As part of our Data Services team, you will work with senior scientists and business intelligence analysts from the games and media industries. If you have the technical chops, can communicate what you are doing and why, and love working with others to answer interesting questions with data, this team’s for you!

About Game Data Pros:

Game Data Pros is a data application consultancy working in digital entertainment fields like video games and streaming video. We work with established global games and media companies, helping them to define experimentation and cross-promotion strategies. We are responsible for data science initiatives and also building data-aware tools that help manage data, run experiments, and perform analyses.

Apply here

Want to post a job here? Email us for details --> team@datascienceweekly.org

Training & Resources

Stanford’s CS25: Transformers United v2
In this seminar, we examine the details of how transformers work, and dive deep into the different kinds of transformers and how they're applied in different fields. We do this through a combination of instructor lectures, guest lectures, and classroom discussions. We will invite people at the forefront of transformers research across different domains for guest lectures…The bulk of this class will comprise of talks from researchers discussing latest breakthroughs with transformers and explaining how they apply them to their fields of research. The objective of the course is to bring together the ideas from ML, NLP, CV, biology and other communities on transformers, understand their broad implications, and spark cross-collaborative research…
Full Stack LLM Bootcamp
tl;dr We're releasing our lectures on building LLM-powered apps, for FREE…
🚀 Launch an LLM App in One Hour
✨ Prompt Engineering
🗿 LLM Foundations
🔨 Augmented LLMs
🤷 UX for LUIs
🏎️ LLMOps
🔮 What's Next?
👷 Project Walkthrough
Microsoft’s The art of the prompt: How to get the best out of generative AI
As generative AI tools become increasingly popular for work and play, it’s helpful to know how to get the most out of them. Crafting the right prompt is essential, but it can be a give-and-take. Here are a few of Marsman’s top tips and tricks for writing effective prompts…

Last Week's Newsletter's 3 Most Clicked Links

Advice for data scientists

Google "We Have No Moat, And Neither Does OpenAI"

Are you curious what firms have been charging for "data and analytics engineering" consulting? [Twitter Thread]

* Based on unique clicks.
** Find last week's issue #493 here.

Cutting Room Floor

Thanks for joining us this week :)

All our best,
Hannah & Sebastian

P.S.,
If this newsletter is helpful to your job, please consider supporting us by becoming a paid subscriber here: https://datascienceweekly.substack.com/subscribe

:)

You're currently a free subscriber to Data Science Weekly Newsletter. For the full experience, upgrade your subscription.

Data Science Weekly - Data Science Weekly - Issue 494

Data Science Weekly - Issue 494

Curated news, articles and jobs related to Data Science

Issue #494
May 11 2023

Editor's Picks

A Message from this week's Sponsor:

Data Science Articles & Videos

Jobs

Game Data Pros: Data Scientist

Training & Resources

Last Week's Newsletter's 3 Most Clicked Links

Cutting Room Floor

Older messages

Data Science Weekly - Issue 493

Data Science Weekly - Issue 492

Data Science Weekly - Issue 491

Data Science Weekly - Issue 490

Data Science Weekly - Issue 489

You Might Also Like

Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator

Defining Your Paranoia Level: Navigating Change Without the Overkill

5 ways AI can help with taxes 🪄

Recurring Automations + Secret Updates

The First Provable AI-Proof Game: Introducing Butterfly Wings 4

GCP Newsletter #437

Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰

The Great Social Media Diaspora & Tapestry is here

Daily Coding Problem: Problem #1689 [Medium]

📧 Stop Conflating CQRS and MediatR

Data Science Weekly - Data Science Weekly - Issue 494

Curated news, articles and jobs related to Data Science

Issue #494May 11 2023

Editor's Picks

A Message from this week's Sponsor:

Data Science Articles & Videos

Jobs

Training & Resources

Last Week's Newsletter's 3 Most Clicked Links

Cutting Room Floor

Older messages

You Might Also Like

Issue #494
May 11 2023