|
Hello! Once a week, we write this email to share the links we thought were worth sharing in the Data Science, ML, AI, Data Visualization, and ML/Data Engineering worlds.
And now…let's dive into some interesting links from this week.
Camera Calibration in Sports with Keypoints Camera calibration is important to accurate vision AI systems that analyse sports. It allows the mapping of their movement on a video frame to real movement on the field, and thus the tracking of the distance they cover, the direction, and the speed at which they move…Homography is commonly used for this purpose. It is a geometric transformation that maps points from one plane to another, enabling the correction of perspective distortions…we will train an Ultralytics YOLOv8 keypoint detection model to automatically identify specific characteristic points on the soccer field within each video frame. By detecting these points in the video frame and knowing their corresponding locations on the actual field, we can establish the necessary source and target points required for homography calculation…
Responsible Datasets in Context Understanding the social and historical context of data is essential for all responsible data work…We host datasets that are paired with rich documentation, data essays, and teaching resources, all of which draw on context and humanities perspectives and methods…We provide models for responsible data curation, documentation, story-telling, and analysis…
The Open Encyclopedia of Cognitive Science The Open Encyclopedia of Cognitive Science is a new, multidisciplinary guide to understanding the mind: a freely-available, growing collection of peer-reviewed articles introducing key topics to a broad audience of students and scholars…
Everyone tells you to learn AI but no one tells you where. We have partnered with Growthschool to bring this ChatGTP & AI Workshop to our readers. It is usually $199, but free for you because you are our loyal readers 🎁 Register here for free – valid for next 24 hours only! This course on AI has been taken by 1 Million people across the globe, who have been able to: Improve forecasting by 30% that helps in staying ahead of competition Make Quick & smarter decisions using AI-led data insights Automate data-processing and almost 50% of your workflow. Write Python code to process the data and produce analytical output.
You’ll wish you knew about this FREE AI Training sooner (Btw, it’s rated at 9.8/10 ⭐) Save your seat for $0 now! (Valid for 100 people only) * Want to sponsor the newsletter? Email us for details --> team@datascienceweekly.org
Predistribution over Redistribution: Beyond the Windfall Clause If AI could bring enormous benefits to humanity, but also threatens to put us all out of work and concentrate economic production in the hands of a few, what should our policy approach be? Perhaps more importantly, what world does that policy approach presume, and what alternative world does it seek to bring about…While economic redistribution might be essential to ensure widespread prosperity in a world with advanced AI systems, we argue that focusing on predistribution — proactively ensuring widespread opportunity to benefit from AI — can reduce the likelihood that AI exacerbates inequality in the first place. To make this argument, we start by examining the Windfall Clause — a prominent policy proposal for distributing AI company profits in a world where extremely powerful AI systems are developed and economic power is heavily concentrated in developers’ hands… How does backpropagation find the *global* loss minimum? [Reddit Discussion] From what I understand, gradient descent / back propagation makes small changes to weights and biases akin to a ball slowly traveling down a hill, given how many epochs are necessary to train the neural network, and how many training data batches within each epoch, changes are small…I don't understand how the neural network trains automatically to 'work through' local minima some how? Only if the learning rate is made large enough periodically can the threshold of changes required to escape a local minima be made?…
Levels of Autonomy in AI-enhanced Software Engineering Already many or even most developers are already using AI in their everyday development cycles. These include code-completion tools such as Github CoPilot or Cursor, frameworks designed to solve end-to-end varieties of software development tasks such as DiffBlue for unit test generation or TransCoder for code porting, as well as general-purpose software development agents such as Devin or OpenDevin. What are some different levels of autonomy in software development agents, and how do they relate to existing tools? Read on to learn more!…
Active Statistics This book provides statistics instructors and students with complete classroom material for a one- or two-semester course on applied regression and causal inference. It is built around 52 stories, 52 class-participation activities, 52 hands-on computer demonstrations, and 52 discussion problems that allow instructors and students to explore in a fun way the real-world complexity of the subject. The book fosters an engaging ‘flipped classroom’ environment with a focus on visualization and understanding…
Designing Complex Experiments: Some Recent Developments ”These recent slides from Susan Athey and Guido Imbens at NBER are a great recent review of the most valuable data science methods I'm aware of. They cover tons of ground with lots of pointers”…
Why ridge regression typically beats linear regression Linear models are great, but not ever linear model is the same. As this video explains, there are small changes that you can make the algorithm perform better in practice…
A Reliable Contextual Bandit Algorithm: LinUCB A user visits a news website. Which articles should they be shown?…This question was the target of the paper “A Contextual-Bandit Approach to Personalized News Article Recommendation”, which introduced the now famous LinUCB contextual bandit algorithm. In fact, personalizing news is just one application. Others include: Dynamic Pricing: Which discounts should be offered to maximize profit? Personalized Advertising: Which advertisements should be shown to maximize clicks? Medical Trials: Which treatments should be prescribed to maximize survival?
To see how it applies generally, it helps to understand the personalized news application in more detail…
How to Interpret Interactions, Polynomials, and Splines with {marginaleffects} Heterogeneity is present in virtually all empirical domains, when the effect of an intervention is stronger in some groups or contexts…For instance, a new treatment might significantly reduce blood pressure in younger adults, but have a weaker effect on older ones…Or a marketing campaign may increase sales in rural areas but not urban ones. This post shows how to use marginaleffects to report strata-specific effects, gauge if the impact of a variable is moderated by another, and gain a deeper understanding of context conditionality…We will focus on three strategies to account for heterogeneity and increase the flexibility of our models: multiplicative interactions, polynomials, and splines…
A notebook for seeing with language models I’ve spent my career investigating how computers could help us not just store the outputs of our thinking, but actively aid in our thinking process. Recently, this has involved building on top of advancements in machine learning…Today, I want to share with you some of my early design explorations for what future creative thinking tools may look like, based on these techniques and ideas. This particular exploration involves what I’ve labelled a computational notebook for ideas…This computational notebook is designed based on one cornerstone principle: documents aren’t a collection of words, but a collection of concepts…
Best Big Company Data Engineering Blogs? [Reddit Discussion] I'm looking to stay updated on the latest in data engineering, especially new implementations and design patterns. Can anyone recommend some excellent blogs from big companies that focus on these topics?…I’m interested in posts that cover innovative solutions, practical examples, and industry trends in batch processing pipelines, orchestration, data quality checks and anything around end-to-end data platform building…
A brief tutorial on information theory At the 2023 Les Houches Summer School on Theoretical Biological Physics, several students asked for some background on information theory, and so we added a tutorial to the scheduled lectures. This is largely a transcript of that tutorial, lightly edited. It covers basic definitions and context rather than detailed calculations. We hope to have maintained the informality of the presentation, including exchanges with the students, while still being useful…
What is the hardest thing as a machine learning engineer [Reddit Discussion] I have just begun my journey into machine learning. For practice, I obtain data from Kaggle.com, but I decided to challenge myself further by collecting data on my own. I discovered that gathering a substantial amount of data is quite challenging. How is data typically collected, and are there any thing harder than that?…
Statistical rethinking 2 with rstan and the tidyverse This book is based on the second edition of Richard McElreath’s (2020) text, Statistical rethinking: A Bayesian course with examples in R and Stan. My contributions show how to fit the models he covered with rstan (Stan Development Team, 2024a), which allows one to fit Bayesian models in R (R Core Team, 2022) using Hamiltonian Monte Carlo. I also prefer plotting and data wrangling with the packages from the tidyverse (Wickham et al., 2019; Wickham, 2022), so we’ll be using those methods, too…
A Course in Dynamic Optimization These lecture notes are derived from a graduate-level course in dynamic optimization, offering an introduction to techniques and models extensively used in management science, economics, operations research, engineering, and computer science. The course emphasizes the theoretical underpinnings of discrete-time dynamic programming models and advanced algorithmic strategies for solving these models…The course also delves into the properties of value and policy functions, leveraging classical results \cite{topkis1998supermodularity} and recent developments. Additionally, it offers an introduction to reinforcement learning, including a formal proof of the convergence of Q-learning algorithms…
Topological Deep Learning: Going Beyond Graph Data Want to start in topological deep learning and not sure where to start? We decided to make our unifying Topological Deep Learning (TDL) framework available as a book and make it accessible for free online…
* Based on unique clicks. ** Find last week's issue #558 here.
Looking to get a job? Check out our “Get A Data Science Job” Course It is a comprehensive course that teaches you everything related to getting a data science job based on answers to thousands of emails from readers like you. The course has 3 sections: Section 1 covers how to get started, Section 2 covers how to assemble a portfolio to showcase your experience (even if you don’t have any), and Section 3 covers how to write your resume. Promote yourself/organization to ~63,000 subscribers by sponsoring this newsletter. 35-45% weekly open rate.
Thank you for joining us this week! :) Stay Data Science-y! All our best, Hannah & Sebastian
| |