Editor's Picks
- What Does it Mean to Give Someone What They Want? The Nature of Preferences in Recommender Systems
In practice, most recommenders optimize for engagement. This has been justified by the assumption that people always choose what they want, an idea from 20th-century economics called revealed preference. However, this approach to preferences can lead to a variety of unwanted outcomes including clickbait, addiction, or algorithmic manipulation...Doing better requires both a change in thinking and a change in approach. We’ll propose a more realistic definition of preferences, taking into account a century of interdisciplinary study, and two concrete ways to build better recommender systems...
- The GitLab Data Team Handbook
GitLab has two primary distinct groups within the Data Program who use data to drive insights and business decisions...The two teams are the (central) Data Team and, separately, Function Analytics Teams located in Sales, Marketing, Product, Engineering or Finance...The Data Team Handbook contains a large amount of information! To help you navigate the handbook we've organized it into the following major sections: a) Dashboards & Data you can use, b) How data works at GitLab, c) How the data team works, d) How the data platform works, and e) What the data team is working on...
- Playtesting Candy Crush: Human-Like Playtesting with Deep Learning
Today I learned that there's actualy research in playtesting video games using deep learning...What's interesting is that the paper is actually written by actual employees from actual video game companies. But also that it decided to explore these techniques for Candy Crush Saga...
A Message from this week's Sponsor:
Pinecone vector database
The Pinecone vector database makes it easy to build high-performance vector search applications. Developer-friendly, fully managed, and easily scalable without infrastructure hassles.
Use Pinecone to build semantic search, object recognition, recommendations, anomaly detection, and other vector-based functionality into your applications.
Data Science Articles & Videos
- World's Simplest Data Pipeline?
How much you invest in your data engineering capability is dependent on your own ambition and needs and the risks of over- or under-investing are well documented. I believe there are some rules and guidelines that are universally applicable, regardless of your team size or tech stack, and that following these rules can save huge headaches in both teams of one and teams of one hundred...In order to demonstrate this, about a year ago I built the simplest data pipeline I could build while still adhering to my rules...
- Datacast Episode 102: Early-Stage Investing, Modern Venture Capital, and Trends in Enterprise Infrastructure With Astasia Myers
Astasia Myers is a Partner on Quiet Capital's enterprise team leading investments in ML, data infrastructure, open-source, developer tools, and security. She focuses on pre-seed, seed, and Series A...Our wide-ranging conversation touches on early-stage hiring, product-led growth, community-led sales, angel investing; her enthusiasm for data-centric ML; and much more....
- Explaining the Effects of Clouds on Remote Sensing Scene Classification
Most of Earth is covered by haze or clouds, impeding the constant monitoring of our planet...little effort has been spent on understanding how exactly atmospheric disturbances impede the application of modern machine learning methods to Earth observation data...We provide a thorough investigation of how classifiers trained on cloud-free data fail once they encounter noisy imagery – a common scenario encountered when deploying pretrained models for remote sensing to real use cases...
- A Short Guide for Feature Engineering and Feature Selection
Feature engineering and selection is the art/science of converting data to the best way possible, which involve an elegant blend of domain expertise, intuition and mathematics. This guide is a concise reference for beginners with most simple yet widely used techniques for feature engineering and selection...
- Demystifying ML PhD Admissions to US Universities [Video]
This video is a recording of the panel discussion and Q & A on ML PhD admissions to US universities...Several faculty from various universities in the US took part in it including Tatsu Hashimoto (Stanford), Rada Mihalcea (UMichigan), Devi Parikh (Georgia Tech), Sameer Singh (UC Irvine), and James Zou (Stanford)...
- Writing a scientific article: A step-by-step guide for beginners
Many young researchers find it extremely difficult to write scientific articles, and few receive specific training in the art of presenting their research work in written format. Yet, publication is often vital for career advancement, to obtain funding, to obtain academic qualifications, or for all these reasons. We describe here the basic steps to follow in writing a scientific article. We outline the main sections that an average article should contain; the elements that should appear in these sections, and some pointers for making the overall result attractive and acceptable for publication...
- Tools to Improve Training Data - Talking Language AI Episode #2 [Video]
Vincent Warmerdam builds a lot of NLP tools. Many of these tools target the scikit-learn ecosystem and there's a theme of labeling across many of them. A recent focus of his stack of tools is to improve training data. In this video, Vincent and Jay discuss a few of these tools and show how they work together...These tools are discussed in the video: a) Human-learn: a toolkit to build human-based scikit-learn components, b) Doubtlab: a toolkit to help find doubtful labels in data, c) Embetter: A library that makes it very easy to use embeddings in scikit-learn, and d) Bulk: a library that uses embeddings to leverage bulk labeling...The talk includes live demos for each and to show how some simple tricks can go a long way...
- Robot Learning on the Job: Human-in-the-Loop Autonomy and Learning During Deployment
To harness the capabilities of state-of-the-art robot learning models while embracing their imperfections, we present Sirius, a principled framework for humans and robots to collaborate through a division of work. In this framework, partially autonomous robots are tasked with handling a major portion of decision-making where they work reliably; meanwhile, human operators monitor the process and intervene in challenging situations...
- General purpose visual recognition across modalities with limited supervision [Video]
Ishan Misra, FAIR (Meta AI), presents on how modern computer vision models are good at specialized tasks...However, specialist models also have severe limitations — they can only do what they are trained for and require copious amounts of pristine supervision for it. In this talk, he focuses on two limitations: specialist models cannot work on tasks beyond what they saw training labels for, or on new types of visual data. He’ll present our recent efforts that design better architectures, training paradigms and loss functions to address these issues...
Tool*
Retool is the fast way to build an interface for any database
With Retool, you don't need to be a developer to quickly build an app or dashboard on top of any data set. Data teams at companies like NBC use Retool to build any interface on top of their data—whether it's a simple read-write visualization or a full-fledged ML workflow.
Drag and drop UI components—like tables and charts—to create apps. At every step, you can jump into the code to define the SQL queries and JavaScript that power how your app acts and connects to data. The result—less time on repetitive work and more time to discover insights.
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!
Webinar*
SuperAnnotate Webinar
In December last year SuperAnnotate hosted a webinar “2021 CV’s year retrospective and opportunities for 2022” to wrap up the passing year in AI and share their predictions of 2022.
We are excited to share that this year SuperAnnotate is hosting an end-of-the-year webinar again reviewing the developments in the AI space in 2022 and sharing what we can expect from the year ahead. This webinar will be covering everything from generative models like Stable Diffusion, NLP with Large Language Models, DataOps and Data-Centricity, Transformers expanding into CV; new models like YOLOv7, large partnerships in A(G)I space and more! Following that, SuperAnnotate's CTO and co-founder Vahan Petrosyan will share his predictions for 2023.
Join us to see which of their predictions from the previous webinar came true, sum up developments in AI this year, and see what to expect from 2023. Register Now.
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!
Jobs
- Senior Data Analyst - Epic Games - New York
Epic Games spans across 19 countries with 55 studios and 4,500+ employees globally. For over 25 years, we’ve been making award-winning games and engine technology that empowers others to make visually stunning games and 3D content that bring environments to life like never before.
Use your expert experience in data & analytics to build powerful stories and visuals that inform the games we make, the technology we develop, and business decisions that drive Epic... Epic Games is looking for a Senior Data Analyst to help us create the models that fuel our creator economy. The successful candidate will have excellent SQL knowledge, and enjoy combining analytic skills with business acumen to provide the data and insights that will drive our continued success...
Want to post a job here? Email us for details --> team@datascienceweekly.org
Training & Resources
- Introduction to Robotics @ Princeton
Lectures from "Introduction to Robotics" at Princeton University by instructor Anirudha Majumdar...This course will provide an introduction to the fundamental theoretical and algorithmic principles behind robotic systems. The course will also allow students to get hands-on experience through project-based assignments with the Crazyflie quadrotor....
Last Week's Newsletter's 3 Most Clicked Links
* Based on unique clicks.
** Find last week's newsletter here.
Cutting Room Floor
P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian |