Data Science Weekly - Issue 502

Curated news, articles and jobs related to Data Science

Data Science Weekly

Jul 7

Share

Issue #502
July 06 2023

Hello and thank you for tuning in to Issue #502.

Once a week we write this email to share the links we thought were worth sharing in the Data Science, ML, AI, Data Visualization, and ML/Data Engineering worlds.

Seeing this for the first time? Subscribe here:

Want to support us? Become a paid subscriber here.

If you don’t find this email useful, please unsubscribe here.

And now, let's dive into some interesting links from this week:

:)

Editor's Picks

Where is generative design in drug discovery today?
In this blog post, I share the current state of generative molecular design, and offer my perspective on its progress to date. I will endeavor to explain why some past criticisms are no longer relevant and highlight the remaining challenges that need to be overcome to further improve the impact that generative design can have on the drug discovery pipeline…

What should the UK’s £100 million Foundation Model Taskforce do?
The UK government has recently established a ‘Foundation Model Taskforce‘, appointed a savvy technologist named Ian Hogarth to run it, and allegedly allocated ~ £100 million in funding to it. Later this year, the UK plans to hold a global summit on AI and AI safety and this will likely leverage the taskforce, also…Given that, what should the taskforce do and what kind of impacts might it have? That’s what I try to sketch out in this essay….

Statistical Excellence in Journalism Awards: 2023 winners
We are pleased to announce this year’s Statistical Excellence in Journalism Award winners. Awards were presented at a ceremony at Errol Street, in four categories: ‘explaining the facts’, ‘data visualization’ ‘investigative journalism’ and ‘best statistical commentary by a non-journalist’…

A Message from this week's Sponsor:

A Banksy got everyday investors 32% returns?

Mm-hmm, sure. So, what’s the catch?

We know it may sound too good to be true. But thousands of investors are already smiling all the way to the bank. All thanks to the fine-art investing platform Masterworks.

These results aren’t cherry-picking. This is the whole bushel. Masterworks has built a track record of 13 exits, realizing +10.4%, +27.3%, and +35.0% net returns, even while financial markets plummeted.

Offerings can sell out in just minutes, but as a trusted partner, Data Science Weekly readers are invited to skip the waitlist with this exclusive link.

See important disclosures at masterworks.com/cd

Want to sponsor the newsletter? Email us for details --> team@datascienceweekly.org

Data Science Articles & Videos

Linjie (Lindsey) Li - Woman in Computer Vision
I am humbled to be re-featured as Women in Computer Vision for the BEST of CVPR section of the Computer Vision News July Magazine. It was great chatting with Ralph Anzarouth. I hope my unconventional career path can encourage more female researchers…

Large Disagreement Modelling
In this blogpost I’d like to talk about large language models. There's a bunch of hype, sure, but there's also an opportunity to revisit one of my favourite machine learning techniques: disagreement…Let’s say that you’re interested in running a NLP model. You have text as input and you’d like to emit some structured information from it. Things like named entities, categories, spans ... that sort of thing. Then you could try to leverage a large language model, armed with a prompt, to fetch this information. It might work, but there’s a fair amount of evidence that you might be better off training a custom model in the long run, especially if you’re in a specific domain. Not to mention the costs that might be involved with running a LLM, the latency involved or the practical constraints of working with a text-to-text system. So instead of fully relying on a large language model, how might we use it effectively in existing pipelines?…

Diffusion Models in Generative Chemistry for Drug Design
This article aims to simplify and summarize recent developments in generative models, specifically focusing on small molecule drug design using diffusion models. It takes a mostly technical approach, catering to readers without a background in machine learning. While briefly touching on challenges like data and evaluation, the main focus of this article is not on those aspects…
NLP - What should I work on?
Recent progress in large language models has enabled the deployment of many generative NLP applications. At the same time, it has also led to a misleading public discourse that ``it's all been solved.'' Not surprisingly, this has in turn made many NLP researchers -- especially those at the beginning of their career -- wonder about what NLP research area they should focus on. This document is a compilation of NLP research directions that are rich for exploration, reflecting the views of a diverse group of PhD students in an academic research lab…
Which AI startups are winning the race to hire the best technical talent?
In this report, based on proprietary research and public data, we break down where top talent is going in order to be a part of the AI transformation. Start your engines…

Why you should use Topological Data Analysis over t-SNE or UMAP?
We compare the results generated from TDA with results from t-SNE and UMAP packages…
State of Computer Vision 2023
This year, CVPR 2023 has accepted an impressive total of 2359 papers, resulting in a vast array of posters for participants to explore. As I navigated through the sea of posters and engaged in conversations with attendees and researchers, I found that most research focused on one of the following 4 themes:
1. Vision Transformers
2. Generative AI for Vision: Diffusion Models and GANs
3. NeRF: Neural Radiance Fields
4. Object Detection and Segmentation
Below, I will be giving a brief introduction to these four subfields. Subsequently, I will also highlight an intriguing paper selected from the conference proceedings, which relates to each field…

What if we set GPT-4 free in Minecraft? ⛏️
I’m excited to announce Voyager, the first lifelong learning agent that plays Minecraft purely in-context. Voyager continuously improves itself by writing, refining, committing, and retrieving *code* from a skill library. GPT-4 unlocks a new paradigm: “training” is code execution rather than gradient descent. “Trained model” is a codebase of skills that Voyager iteratively composes, rather than matrices of floats. We are pushing no-gradient architecture to its limit…

FurnitureBench: Real-World Furniture Assembly Benchmark
FurnitureBench is the real-world furniture assembly benchmark, which aims at providing a reproducible and easy-to-use platform for long-horizon complex robotic manipulation. Furniture assembly poses integral robotic manipulation challenges that autonomous robots must be capable of: long-horizon planning, dexterous control, and robust visual perception. By presenting a well-defined suite of tasks with a lower barrier of entry (large-scale human teleoperation data and standardized configurations), we encourage the research community to push the boundaries of the current robotic system, reaching the level of automating everyday activities…

Introduction to g-methods: time fixed treatments
New blog post on estimating treatment effects using the g-methods. Introduces the g-formula, IPTW, and doubly robust extensions (augmented IPW and TMLE). Examples of each "by hand" and with R packages, plus machine learning algorithms with SuperLearner…This is the first post in a series looking at g-methods and their doubly robust extensions for estimating causal effects in epidemiological research. This one is focused on time fixed treatments (or exposures) and the second will cover the case of time varying treatments…

Do you study outside of work? [Reddit Discussion]
How is your study routine when you are already employed? do you study on the weekend or after working hours? or just during work? and if it's during work, do you try to research and implement the new concepts in some project at work or do you really study by taking a course, etc? thanks!…

ELI5: Why is the GPT family of models based on the decoder-only architecture?
I recently dove into the nitty-gritty of these models for the first time…The transformer architecture was intriguing on its own but learning about its variations is fascinating…I am curious to understand the reason behind the choice of GPT models' architecture if that's possible to explain coherently here. I understand particular architectures are better suited for certain tasks but the GPT and similar LLMs are doing really well across a variety of tasks…If scale has something to do with it, does decoder-only architecture scale better?…

Jobs

DATA SCIENTIST (F/H) at Louis Vuitton - PARIS

* Define and deploy new approaches, exploiting the wealth of data and the power of associated technologies, to respond to the problems of teams in areas: Americas, China, Japan, Europe, South Asia and North Asia

* Collaborate on a daily basis with the teams on their needs and build ready-to-use algorithms in order to feed their business challenges and customer experience in particular

* Ensure all stages of data science projects: framing and management, implementation, development and operation, adoption and commitment
Support the business teams in the use of the tools put in place to serve their challenges and enable them to act more and more independently

* Ensure the governance of Data Science projects according to the defined principles

* Monitor, maintain and improve the models and tools in place

Apply here

Want to post a job here? Email us for details --> team@datascienceweekly.org

Training & Resources

Attention is All You Need Tutorial
Attention is all you need is a paper from google brain and google research which was initially proposed as a replacement for RNN networks in natural language processing. In this blog we will understand the theoretical idea behind the transformer and later implement it in PyTorch…
What is "the right way" to install Python on a new M2 MacBook?
What is "the right way" to install Python on a new M2 MacBook? I assume it isn't the system Python3 right? Maybe Homebrew?…
Unlock Advanced dbt Use Cases with the Meta Config and the Graph Variable
The meta config can unlock very powerful use cases for your dbt project, but working with them during runtime can be tricky. In this post, we’ll be exploring how to access the meta config via the output of dbt’s graph context variable. We'll consider how to safely iterate over all models where only a few have meta configs defined. To do that, we’ll learn about dot notation and get() as handy methods for accessing JSON objects…

Last Week's Newsletter's 3 Most Clicked Links

Naming things

People who use python for data science - what are the use cases for building your own classes?

What are the major advantages of having deep understanding of ML algorithms?

* Based on unique clicks.
** Find last week's issue #501 here.

Cutting Room Floor

Thanks for joining us this week :)

All our best,
Hannah & Sebastian

P.S.,
If you found this newsletter helpful, consider supporting us by becoming a paid subscriber here: https://datascienceweekly.substack.com/subscribe :)

You're currently a free subscriber to Data Science Weekly Newsletter. For the full experience, upgrade your subscription.

Data Science Weekly - Data Science Weekly - Issue 502

Data Science Weekly - Issue 502

Curated news, articles and jobs related to Data Science

Issue #502
July 06 2023

Editor's Picks

A Message from this week's Sponsor:

A Banksy got everyday investors 32% returns?

Data Science Articles & Videos

Jobs

DATA SCIENTIST (F/H) at Louis Vuitton - PARIS

Training & Resources

Last Week's Newsletter's 3 Most Clicked Links

Cutting Room Floor

Older messages

Data Science Weekly - Issue 501

Data Science Weekly - Issue 500

Data Science Weekly - Issue 499

Data Science Weekly - Issue 498

Data Science Weekly - Issue 497

You Might Also Like

Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator

Defining Your Paranoia Level: Navigating Change Without the Overkill

5 ways AI can help with taxes 🪄

Recurring Automations + Secret Updates

The First Provable AI-Proof Game: Introducing Butterfly Wings 4

GCP Newsletter #437

Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰

The Great Social Media Diaspora & Tapestry is here

Daily Coding Problem: Problem #1689 [Medium]

📧 Stop Conflating CQRS and MediatR

Data Science Weekly - Data Science Weekly - Issue 502

Curated news, articles and jobs related to Data Science

Issue #502July 06 2023

Editor's Picks

A Message from this week's Sponsor:

Data Science Articles & Videos

Jobs

Training & Resources

Last Week's Newsletter's 3 Most Clicked Links

Cutting Room Floor

Older messages

You Might Also Like

Issue #502
July 06 2023