Hello and thank you for tuning in to Issue #502.
Once a week we write this email to share the links we thought were worth sharing in the Data Science, ML, AI, Data Visualization, and ML/Data Engineering worlds.
Seeing this for the first time? Subscribe here:
Want to support us? Become a paid subscriber here.
If you don’t find this email useful, please unsubscribe here.
And now, let's dive into some interesting links from this week:
:)
Where is generative design in drug discovery today?
In this blog post, I share the current state of generative molecular design, and offer my perspective on its progress to date. I will endeavor to explain why some past criticisms are no longer relevant and highlight the remaining challenges that need to be overcome to further improve the impact that generative design can have on the drug discovery pipeline…
What should the UK’s £100 million Foundation Model Taskforce do?
The UK government has recently established a ‘Foundation Model Taskforce‘, appointed a savvy technologist named Ian Hogarth to run it, and allegedly allocated ~ £100 million in funding to it. Later this year, the UK plans to hold a global summit on AI and AI safety and this will likely leverage the taskforce, also…Given that, what should the taskforce do and what kind of impacts might it have? That’s what I try to sketch out in this essay….
Statistical Excellence in Journalism Awards: 2023 winners
We are pleased to announce this year’s Statistical Excellence in Journalism Award winners. Awards were presented at a ceremony at Errol Street, in four categories: ‘explaining the facts’, ‘data visualization’ ‘investigative journalism’ and ‘best statistical commentary by a non-journalist’…
Mm-hmm, sure. So, what’s the catch?
We know it may sound too good to be true. But thousands of investors are already smiling all the way to the bank. All thanks to the fine-art investing platform Masterworks.
These results aren’t cherry-picking. This is the whole bushel. Masterworks has built a track record of 13 exits, realizing +10.4%, +27.3%, and +35.0% net returns, even while financial markets plummeted.
Offerings can sell out in just minutes, but as a trusted partner, Data Science Weekly readers are invited to skip the waitlist with this exclusive link.
See important disclosures at masterworks.com/cd
Want to sponsor the newsletter? Email us for details --> team@datascienceweekly.org
Large Disagreement Modelling
In this blogpost I’d like to talk about large language models. There's a bunch of hype, sure, but there's also an opportunity to revisit one of my favourite machine learning techniques: disagreement…Let’s say that you’re interested in running a NLP model. You have text as input and you’d like to emit some structured information from it. Things like named entities, categories, spans ... that sort of thing. Then you could try to leverage a large language model, armed with a prompt, to fetch this information. It might work, but there’s a fair amount of evidence that you might be better off training a custom model in the long run, especially if you’re in a specific domain. Not to mention the costs that might be involved with running a LLM, the latency involved or the practical constraints of working with a text-to-text system. So instead of fully relying on a large language model, how might we use it effectively in existing pipelines?…
Diffusion Models in Generative Chemistry for Drug Design
This article aims to simplify and summarize recent developments in generative models, specifically focusing on small molecule drug design using diffusion models. It takes a mostly technical approach, catering to readers without a background in machine learning. While briefly touching on challenges like data and evaluation, the main focus of this article is not on those aspects…
NLP - What should I work on?
Recent progress in large language models has enabled the deployment of many generative NLP applications. At the same time, it has also led to a misleading public discourse that ``it's all been solved.'' Not surprisingly, this has in turn made many NLP researchers -- especially those at the beginning of their career -- wonder about what NLP research area they should focus on. This document is a compilation of NLP research directions that are rich for exploration, reflecting the views of a diverse group of PhD students in an academic research lab…
Which AI startups are winning the race to hire the best technical talent?
In this report, based on proprietary research and public data, we break down where top talent is going in order to be a part of the AI transformation. Start your engines…
Why you should use Topological Data Analysis over t-SNE or UMAP?
We compare the results generated from TDA with results from t-SNE and UMAP packages…
State of Computer Vision 2023
This year, CVPR 2023 has accepted an impressive total of 2359 papers, resulting in a vast array of posters for participants to explore. As I navigated through the sea of posters and engaged in conversations with attendees and researchers, I found that most research focused on one of the following 4 themes:
Vision Transformers
Generative AI for Vision: Diffusion Models and GANs
NeRF: Neural Radiance Fields
Object Detection and Segmentation
Below, I will be giving a brief introduction to these four subfields. Subsequently, I will also highlight an intriguing paper selected from the conference proceedings, which relates to each field…
What if we set GPT-4 free in Minecraft? ⛏️
I’m excited to announce Voyager, the first lifelong learning agent that plays Minecraft purely in-context. Voyager continuously improves itself by writing, refining, committing, and retrieving *code* from a skill library. GPT-4 unlocks a new paradigm: “training” is code execution rather than gradient descent. “Trained model” is a codebase of skills that Voyager iteratively composes, rather than matrices of floats. We are pushing no-gradient architecture to its limit…
FurnitureBench: Real-World Furniture Assembly Benchmark
FurnitureBench is the real-world furniture assembly benchmark, which aims at providing a reproducible and easy-to-use platform for long-horizon complex robotic manipulation. Furniture assembly poses integral robotic manipulation challenges that autonomous robots must be capable of: long-horizon planning, dexterous control, and robust visual perception. By presenting a well-defined suite of tasks with a lower barrier of entry (large-scale human teleoperation data and standardized configurations), we encourage the research community to push the boundaries of the current robotic system, reaching the level of automating everyday activities…
Introduction to g-methods: time fixed treatments
New blog post on estimating treatment effects using the g-methods. Introduces the g-formula, IPTW, and doubly robust extensions (augmented IPW and TMLE). Examples of each "by hand" and with R packages, plus machine learning algorithms with SuperLearner…This is the first post in a series looking at g-methods and their doubly robust extensions for estimating causal effects in epidemiological research. This one is focused on time fixed treatments (or exposures) and the second will cover the case of time varying treatments…
Do you study outside of work? [Reddit Discussion]
How is your study routine when you are already employed? do you study on the weekend or after working hours? or just during work? and if it's during work, do you try to research and implement the new concepts in some project at work or do you really study by taking a course, etc? thanks!…
ELI5: Why is the GPT family of models based on the decoder-only architecture?
I recently dove into the nitty-gritty of these models for the first time…The transformer architecture was intriguing on its own but learning about its variations is fascinating…I am curious to understand the reason behind the choice of GPT models' architecture if that's possible to explain coherently here. I understand particular architectures are better suited for certain tasks but the GPT and similar LLMs are doing really well across a variety of tasks…If scale has something to do with it, does decoder-only architecture scale better?…
* Define and deploy new approaches, exploiting the wealth of data and the power of associated technologies, to respond to the problems of teams in areas: Americas, China, Japan, Europe, South Asia and North Asia
* Collaborate on a daily basis with the teams on their needs and build ready-to-use algorithms in order to feed their business challenges and customer experience in particular
* Ensure all stages of data science projects: framing and management, implementation, development and operation, adoption and commitment
Support the business teams in the use of the tools put in place to serve their challenges and enable them to act more and more independently
* Ensure the governance of Data Science projects according to the defined principles
* Monitor, maintain and improve the models and tools in place
Apply here
Want to post a job here? Email us for details --> team@datascienceweekly.org
Attention is All You Need Tutorial
Attention is all you need is a paper from google brain and google research which was initially proposed as a replacement for RNN networks in natural language processing. In this blog we will understand the theoretical idea behind the transformer and later implement it in PyTorch…
What is "the right way" to install Python on a new M2 MacBook?
What is "the right way" to install Python on a new M2 MacBook? I assume it isn't the system Python3 right? Maybe Homebrew?…
Unlock Advanced dbt Use Cases with the Meta Config and the Graph Variable
The meta config can unlock very powerful use cases for your dbt project, but working with them during runtime can be tricky. In this post, we’ll be exploring how to access the meta config via the output of dbt’s graph context variable. We'll consider how to safely iterate over all models where only a few have meta configs defined. To do that, we’ll learn about dot notation and get() as handy methods for accessing JSON objects…
* Based on unique clicks.
** Find last week's issue #501 here.
Thanks for joining us this week :)
All our best,
Hannah & Sebastian
P.S.,
If you found this newsletter helpful, consider supporting us by becoming a paid subscriber here: https://datascienceweekly.substack.com/subscribe :)
Copyright © 2013-2023 DataScienceWeekly.org, All rights reserved.