Data Science Weekly - Data Science Weekly - Issue 434

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #434

March 17 2022

Editor Picks

 
  • A Deep Dive into NLP Tokenization and Encoding with Word and Sentence Embeddings
    This is a deep dive: over 8,000 words long. Don’t be afraid to bookmark this article and read it in pieces. There is a lot to cover. We will start with basic One-Hot encoding, move on to word2vec word and sentence embeddings, build our own custom embeddings using R, and finally, work with the cutting-edge BERT model and its contextual embeddings...
  • Making Deep Learning Go Brrrr From First Principles
    So, you want to improve the performance of your deep learning model. How might you approach such a task? Often, folk fall back to a grab-bag of tricks that might've worked before or saw on a tweet. "Use in-place operations! Set gradients to None! Install PyTorch 1.10.0 but not 1.10.1!"...It's understandable why users often take such an ad-hoc approach performance on modern systems (particularly deep learning) often feels as much like alchemy as it does science. That being said, reasoning from first principles can still eliminate broad swathes of approaches, thus making the problem much more approachable...So, if you want to keep your GPUs going brrrr, let's discuss the three components your system might be spending time on - compute, memory bandwidth, and overhead...
  • Announcing the 2022 AI Index Report
    The AI Index is an independent initiative at the Stanford Institute for Human-Centered Artificial Intelligence (HAI), led by the AI Index Steering Committee, an interdisciplinary group of experts from across academia and industry. The annual report tracks, collates, distills, and visualizes data relating to artificial intelligence, enabling decision-makers to take meaningful action to advance AI responsibly and ethically with humans in mind...
 
 

A Message from this week's Sponsor:

 



Retool is the fast way to build an interface for any database

With Retool, you don't need to be a developer to quickly build an app or dashboard on top of any data set. Data teams at companies like NBC use Retool to build any interface on top of their data—whether it's a simple read-write visualization or a full-fledged ML workflow.

Drag and drop UI components—like tables and charts—to create apps. At every step, you can jump into the code to define the SQL queries and JavaScript that power how your app acts and connects to data. The result—less time on repetitive work and more time to discover insights.

 

 

Data Science Articles & Videos

 
  • The “0 / 1 / Done” Strategy for Data Science
    To achieve operational excellence in applied data science delivery aim for: 0-day Handovers, 1-day Prototyping and to declare projects as Done when Completely Done. Work backwards from these goals, conducting a gap analysis between those and current team capabilities, to identify process, tooling and governance initiatives to reach them...
  • What Good Data Product Managers Do – And Why You Probably Need One
    We are seeing data departments modernize their team structure with data product managers at the helm of such projects...In this article, we’ll walk through: a) What is a data product manager? How did the role evolve?, b) What is a data product?, c) What does a data product manager do? What skills do they need?, d) What background do product managers need? Who do they report to?, e) Data product manager vs product manager, f) Data product manager vs data scientist, and g) The future of the data product manager...
  • How to Build Effective (and Useful) Dashboards
    With practice I have been developing a four-step approach (that I am still fine-tuning) to build dashboards that are, first of all, effective, but also useful...In this article I want to take you through these four steps. Whether you are an experienced analyst building data visualizations all day long or a business user using dashboards from time to time, I hope you will find these guidelines useful...
  • Future ML Systems Will Be Qualitatively Different
    In 1972, the Nobel prize-winning physicist Philip Anderson wrote the essay "More Is Different". In it, he argues that quantitative changes can lead to qualitatively different and unexpected phenomena...In this post, I'll argue that emergence often occurs in the field of AI, and that this should significantly affect our intuitions about the long-term development and deployment of AI systems. We should expect weird and surprising phenomena to emerge as we scale up systems. This presents opportunities, but also poses important risks...
  • Building systems to securely reason over private data
    People today rely on AI systems such as assistants and chatbots to help with countless tasks...For systems to execute these tasks, users must provide them with relevant information — such as one’s location or work calendar. In some cases, however, people would prefer to keep information private, which means not uploading it to cloud-based AI systems or sharing it with others...Meta AI is releasing ConcurrentQA, the first public data set for studying information retrieval and question answering (QA) with data from multiple privacy scopes. Alongside the data set and problem exploration, we have developed a new methodology as a starting point for thinking about privacy in retrieval-based settings called Public-Private Autoregressive Information Retrieval (PAIR)...
  • Generative Flow Networks
    I [Yoshua Bengio] have rarely been as enthusiastic about a new research direction. We call them GFlowNets, for Generative Flow Networks. They live somewhere at the intersection of reinforcement learning, deep generative models and energy-based probabilistic modelling...What I find exciting is that they open so many doors, but in particular for implementing the system 2 inductive biases I have been discussing in many of my papers and talks since 2017, that I argue are important to incorporate causality and deal with out-of-distribution generalization in a rational way...
  • The promise of AI with Demis Hassabis
    Last episode of Season 2 of DeepMind: The Podcast...Hannah wraps up the series by meeting DeepMind co-founder and CEO, Demis Hassabis. In an extended interview, Demis describes why he believes AGI is possible, how we can get there, and the problems he hopes it will solve. Along the way, he highlights the important role of consciousness and why he’s so optimistic that AI can help solve many of the world’s major challenges. As a final note, Demis shares the story of a personal meeting with Stephen Hawking to discuss the future of AI and discloses Hawking’s parting message...
  • Winning at Competitive ML in 2022
    Hoping to win a machine learning competition in 2022? Here’s what you need to know. I collaborated with ML Contests, using their database of over 80 competitions that took place in 2021 across Kaggle, DrivenData, AICrowd, Zindi, and 13 other platforms. Wherever the information was available, we categorized winners to figure out what made them win...
 
 

Summit*

 



You're invited to the first-ever Metrics Store Summit

Transform is hosting the first-ever industry summit on the metrics layer. The first-ever Metrics Store Summit on April 26, 2022 will bring discussions around the semantic layer into one event—providing context with use cases for metrics stores, highlighting applications for metrics, and sharing ideas from leaders across the modern data stack.You can expect to hear from Airbnb, Slack, Spotify, Atlan, Hex, Mode, Hightouch, AtScale and many more in this action-packed 1-day event. We would love to see you there! Register today for free.



*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

 
 

Jobs

 
  • Sr. Machine Learning Engineer - eBay - NYC

    The eBay Ads team focuses on building data / ML services for our advertiser sellers to guide them to optimize for their Ads budget and goals, for example by recommending the right inventory, keywords and Ads bid rate to apply for their campaigns, and eventually create campaigns automatically for advertisers. This is a relatively new area but with a very high business potential and need. It would allow you to work with massive amounts of data and use a variety of data science techniques.

    We are looking for someone passionate about deploying reliable and efficient ML services to a production environment at scale. You will collaborate with researchers to invent, design and implement end-to-end production services in Python and Java/Scala using state-of-the-art big data and ML tools. Come and help us blow away the boundaries of e-commerce through AI!

    If you are interested in applying, please contact the hiring manager, Dan Schonfeld, at dschonfeld@ebay.com.

        Want to post a job here? Email us for details --> team@datascienceweekly.org

 
 

Training & Resources

 
  • Efficient Transformers: A Survey
    Transformer model architectures have garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning...Recently, a dizzying number of "X-former" models have been proposed - Reformer, Linformer, Performer, Longformer, to name a few - which improve upon the original Transformer architecture, many of which make improvements around computational and memory efficiency. With the aim of helping the avid researcher navigate this flurry, this paper characterizes a large and thoughtful selection of recent efficiency-flavored "X-former" models, providing an organized and comprehensive overview of existing work and models across multiple domains...
  • GFlowNet Tutorial
    A GFlowNet is a trained stochastic policy or generative model, trained such that it samples objects $x$ through a sequence of constructive steps, with probability proportional to $R(x)$, where $R$ is some given non-negative integrable reward function...
  • The Bayesian Learning Rule
    We show that many machine-learning algorithms are specific instances of a single algorithm called the Bayesian learning rule. The rule, derived from Bayesian principles, yields a wide-range of algorithms from fields such as optimization, deep learning, and graphical models. This includes classical algorithms such as ridge regression, Newton's method, and Kalman filter, as well as modern deep-learning algorithms such as stochastic-gradient descent, RMSprop, and Dropout...
 
 

Books

 

 
  • Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits


    Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
     


    P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2021 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Older messages

Data Science Weekly - Issue 433

Friday, March 11, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #433 March 10 2022 Editor Picks Deep

Data Science Weekly - Issue 432

Thursday, March 3, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #432 March 03 2022 Editor Picks The

Data Science Weekly - Issue 431

Friday, February 25, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #431 February 24 2022 Editor Picks A

Data Science Weekly - Issue 430

Thursday, February 17, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #430 February 17 2022 Editor Picks The

Data Science Weekly - Issue 429

Thursday, February 10, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #429 February 10 2022 Editor Picks

You Might Also Like

Quick question

Sunday, April 28, 2024

I want to learn how I can better serve you ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

Kotlin Weekly #404 (NOT FOUND)

Sunday, April 28, 2024

ISSUE #404 28st of April 2024 Announcements Kotlin Multiplatform State of the Art Survey 2024 Help to shape and understand the Kotlin Multiplatform Ecosystem! It takes 4 minutes to fill this survey.

📲 Why Is It Called Bluetooth? — Check Out This AI Text to Song Generator

Sunday, April 28, 2024

Also: What to Know About Emulating Games on iPhone, and More! How-To Geek Logo April 28, 2024 📩 Get expert reviews, the hottest deals, how-to's, breaking news, and more delivered directly to your

Daily Coding Problem: Problem #1425 [Easy]

Sunday, April 28, 2024

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Microsoft. Suppose an arithmetic expression is given as a binary tree. Each leaf is an

PD#571 Software Design Principles I Learned the Hard Way

Sunday, April 28, 2024

If there's two sources of truth, one is probably wrong. And yes, please repeat yourself. ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

When Procrastination is Productive & Ghost integrating with ActivityPub

Sunday, April 28, 2024

Automattic, Texts, and Beeper join forces to build world's best inbox, Reflect launches its iOS app, how to start small rituals, and a lot more in this week's issue of Creativerly. Creativerly

C#503 Building pipelines with System.Threading.Channels

Sunday, April 28, 2024

Concurrent programming challenges can be effectively addressed using channels ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

RD#453 Get your codebase ready for React 19

Sunday, April 28, 2024

Is your app ready for what's coming up in React 19's release ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

☁️ Azure Weekly #464 - 28th April 2024

Sunday, April 28, 2024

Azure Weekly Newsletter Issue #464 powered by endjin Welcome to issue 464 of the Azure Weekly Newsletter. In AI we have a good mix of high-level and deep-dive technical articles. Next-Gen Customer

Tesla profits tumble, Fisker flatlines, and California cities battle for control of AVs

Sunday, April 28, 2024

Plus, an up-close look at the all-electric Mercedes G-Wagen and more View this email online in your browser By Kirsten Korosec Sunday, April 28, 2024 Welcome back to TechCrunch Mobility — your central