Data Science Weekly - Data Science Weekly - Issue 456

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #456

August 18 2022

Editor's Picks

 

 
  • Inferring Concept Drift Without Labeled Data
    After iterations of development and testing, deploying a well-fit machine learning model often feels like the final hurdle for an eager data science team. In practice, however, a trained model is never final. This milestone marks just the beginning of the perpetual maintenance race that is production machine learning. This is because most machine learning models are static, but the world we live in is dynamic...
  • Testing Firefox more efficiently with machine learning
    A browser is an incredibly complex piece of software. With such enormous complexity, the only way to maintain a rapid pace of development is through an extensive CI system that can give developers confidence that their changes won’t introduce bugs. Given the scale of our CI, we’re always looking for ways to reduce load while maintaining a high standard of product quality. We wondered if we could use machine learning to reach a higher degree of efficiency...
 
 

A Message from this week's Sponsor:

 



Free Access to the Semantic Layer Summit with Bill Inmon, Kirk Borne, and 30+ Enterprise Data Leaders

You're invited to a free one-day virtual event. Explore the importance and impact of using a semantic layer for analytics with an all-star lineup of data leaders from Cigna, Starbucks, Bank of America, and more. Lots to look forward to!

 

 

Data Science Articles & Videos

 
  • Comparing quantiles at scale in online A/B-testing
    Using the properties of the Poisson bootstrap algorithm and quantile estimators, we have been able to reduce the computational complexity of Poisson bootstrap difference-in-quantiles confidence intervals enough to unlock bootstrap inference for almost arbitrary large samples. At Spotify, we can now easily calculate bootstrap confidence intervals for difference-in-quantiles in A/B tests with hundreds of millions of observations...
  • In 2022, what is the proper way to get into machine/deep learning? [HN Discussion]
    By getting into machine or deep learning I mean building upto a stage to do ML/DL research. Applied research or core theory of ML/DL research. Ofcourse, the path to both will quite different. Standing in 2022, what are the best resources for a CS student/decent programmer to get into the field of ML and DL on their own. Resources can be both books or public courses...The target ability: 1. To understand the theory behind the algorithms, 2. To implement an algorithm on a dataset of choice. (Data cleaning and management should also be learned), 3. Read research publications and try to implement them....
  • How to Build a GPT-3 for Science
    Want to create an image of velociraptors working on a skyscraper, in the style of “Lunch Atop A Skyscraper” of 1932? Use DALL-E...Want to deeply understand COVID-19 research and answer your questions based on evidence? Learn how to do a Boolean search, read scientific papers, and maybe get a PhD, because there are no generative AI models trained on the vast body of scientific research publications...
  • LLM.int8() and Emergent Features
    When I attended NAACL, I wanted to do a little test. I had two pitches for my LLM.int8() paper. One pitch is about how I use advanced quantization methods to achieve no performance degradation transformer inference at scale that makes large models more accessible. The other pitch talks about emergent outliers in transformers and how they radically change what transformers learn and how they function...This blog post will spill some mandatory details about quantization, but I want to mostly make it about these emergent features that I found in transformers at scale...
  • Unleashing the power of large language models
    Maarten Grootendorst on applying large language models to topic models and fuzzy string matching...Maarten Grootendorst, is a data scientist at IKNL, an institute that strives to reduce the impact of cancer by collecting and unlocking essential and reliable data. More importantly, he’s the author of a few open source libraries that I’ve come to enjoy: BERTopic (topic modeling with transformers and c-TF-IDF), PolyFuzz (fuzzy string matching), and KeyBERT (keyword extraction)...
  • inControl Podcast - Sean Meyn: Markov chains, networks, reinforcement learning, beekeeping and jazz
    inControl Podcast - a podcast on control theory and related topics, including feedback, decision making, artificial intelligence, robotics and much more...In this episode, our guest is Sean Meyn, Professor and Robert C. Pittman Eminent Scholar Chair in the Department of Electrical and Computer Engineering at the University of Florida. The episode features Sean’s adventures in the areas of Markov chains, networks and Reinforcement Learning (RL) as well as anecdotes and trivia about beekeeping and jazz...
  • NeuMan: Neural Human Radiance Field from a Single Video
    Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences. We [Apple] propose a novel framework to reconstruct the human and the scene that can be rendered with novel human poses and views from just a single in-the-wild video. Given a video captured by a moving camera, we train two NeRF models: a human NeRF model and a scene NeRF model...
  • A Library for Representing Python Programs as Graphs for Machine Learning
    Graph representations of programs are commonly a central element of machine learning for code research. We introduce an open source Python library python_graphs that applies static analysis to construct graph representations of Python programs suitable for training machine learning models...
 
 

Course*

 


Data Science Specialities: What Are My Options in Data Science?

Data science is a rewarding career field full of opportunities for advancement. Specialized roles are fundamental to helping organizations maximize their ability to harness data for strategic planning. Want to know more about your options as a data scientist? Read our blog!

TDI’s Data Programs are intensive bootcamps that turn STEM academics into leading data professionals, providing expert training, live code, and real-world data sets. Each industry-leading principle is tailored to prepare you as you venture towards new career paths, advanced education, and overall skill refinement. Applications open next week!

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

 

 

Jobs

 
  • Data Scientist - Success Academy Charter Schools, Inc - NYC

    This new Data Scientist role will be a key contributor to our mission of driving innovation across the organization. Reporting to the Leader of Enterprise Analytics, this role will be responsible for working with stakeholders in various functions to understand areas of opportunity, developing analytical solutions ranging from dashboards to sophisticated mathematical models, and helping functional teams adopt those solutions. This role will be part of a highly collaborative team of professionals with a wide range of skills including data science, data engineering, business analysis, and project management....
     

        Want to post a job here? Email us for details --> team@datascienceweekly.org

 

 

Training & Resources

 
  • AI Research Intensive
    These lectures are part of the "AI Research Intensive", designed to teach fundamental skills involved in conducting cutting-edge AI research and writing a research paper...The AI Research Intensive was hosted by Rajpurkar Lab at Harvard Medical School on August 4 & 5, 2022...
  • Resources To Secure Your Next MLE / DS / SWE Job!
    This repo contains cheat sheets + data structures & algorithms templates useful for MLE, DS, and SWE interviews. All cheat sheets were created by me and helped me secure multiple offers at big tech companies...
  • Cornell's Operations Research and Information Engineering 4741: Learning with Big Messy Data
    Modern data sets...are often big, messy, and extremely useful. This course addresses scalable robust methods for learning from big messy data. We will cover techniques for learning with data that is messy  —  consisting of measurements that are continuous, discrete, boolean, categorical, or ordinal, or of more complex data such as graphs, texts, or sets, with missing entries and with outliers  —  and that is big  —  which means we can only use algorithms whose complexity scales linearly in the size of the data. We will cover techniques for cleaning data, supervised and unsupervised learning, finding similar items, model validation, and feature engineering...
 
 

What you’re up to – notes from DSW readers

 
  • Robert Ritz is working on Datafantic, a data blog, to tell data driven stories and share data science tutorials. First entry is on Matplotlib stylesheets. Site is Datafantic.com...

* To share your projects and updates, share the details here.

** Want to chat with one of the above people? Hit reply and let us know :)

 

Last Week's Newsletter's 3 Most Clicked Links

 

* Based on unique clicks.

** Find last week's newsletter here.

 

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2022 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Older messages

Data Science Weekly - Issue 455

Friday, August 12, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #455 August 11 2022 Editor's Picks

Data Science Weekly - Issue 454

Friday, August 5, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #454 August 04 2022 Editor's Picks

Data Science Weekly - Issue 453

Friday, July 29, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #453 July 28 2022 Editor's Picks

Data Science Weekly - Issue 452

Friday, July 22, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #452 July 21 2022 Editor's Picks Is

Data Science Weekly - Issue 451

Friday, July 15, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #451 July 14 2022 Editor's Picks The

You Might Also Like

Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator

Friday, February 14, 2025

What came before the golem? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Defining Your Paranoia Level: Navigating Change Without the Overkill

Friday, February 14, 2025

We've all been there: trying to learn something new, only to find our old habits holding us back. We discussed today how our gut feelings about solving problems can sometimes be our own worst enemy

5 ways AI can help with taxes 🪄

Friday, February 14, 2025

Remotely control an iPhone; 💸 50+ early Presidents' Day deals -- ZDNET ZDNET Tech Today - US February 10, 2025 5 ways AI can help you with your taxes (and what not to use it for) 5 ways AI can help

Recurring Automations + Secret Updates

Friday, February 14, 2025

Smarter automations, better templates, and hidden updates to explore 👀 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

The First Provable AI-Proof Game: Introducing Butterfly Wings 4

Friday, February 14, 2025

Top Tech Content sent at Noon! Boost Your Article on HackerNoon for $159.99! Read this email in your browser How are you, @newsletterest1? undefined The Market Today #01 Instagram (Meta) 714.52 -0.32%

GCP Newsletter #437

Friday, February 14, 2025

Welcome to issue #437 February 10th, 2025 News BigQuery Cloud Marketplace Official Blog Partners BigQuery datasets now available on Google Cloud Marketplace - Google Cloud Marketplace now offers

Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰

Friday, February 14, 2025

Discover how the share of US wealth held by the top 1% has evolved from 1989 to 2024 in this infographic. View Online | Subscribe | Download Our App Download our app to see thousands of new charts from

The Great Social Media Diaspora & Tapestry is here

Friday, February 14, 2025

Apple introduces new app called 'Apple Invites', The Iconfactory launches Tapestry, beyond the traditional portfolio, and more in this week's issue of Creativerly. Creativerly The Great

Daily Coding Problem: Problem #1689 [Medium]

Friday, February 14, 2025

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. Given a linked list, sort it in O(n log n) time and constant space. For example,

📧 Stop Conflating CQRS and MediatR

Friday, February 14, 2025

​ Stop Conflating CQRS and MediatR Read on: m​y website / Read time: 4 minutes The .NET Weekly is brought to you by: Step right up to the Generative AI Use Cases Repository! See how MongoDB powers your