Data Science Weekly - Data Science Weekly - Issue 456

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #456

August 18 2022

Editor's Picks

 

 
  • Inferring Concept Drift Without Labeled Data
    After iterations of development and testing, deploying a well-fit machine learning model often feels like the final hurdle for an eager data science team. In practice, however, a trained model is never final. This milestone marks just the beginning of the perpetual maintenance race that is production machine learning. This is because most machine learning models are static, but the world we live in is dynamic...
  • Testing Firefox more efficiently with machine learning
    A browser is an incredibly complex piece of software. With such enormous complexity, the only way to maintain a rapid pace of development is through an extensive CI system that can give developers confidence that their changes won’t introduce bugs. Given the scale of our CI, we’re always looking for ways to reduce load while maintaining a high standard of product quality. We wondered if we could use machine learning to reach a higher degree of efficiency...
 
 

A Message from this week's Sponsor:

 



Free Access to the Semantic Layer Summit with Bill Inmon, Kirk Borne, and 30+ Enterprise Data Leaders

You're invited to a free one-day virtual event. Explore the importance and impact of using a semantic layer for analytics with an all-star lineup of data leaders from Cigna, Starbucks, Bank of America, and more. Lots to look forward to!

 

 

Data Science Articles & Videos

 
  • Comparing quantiles at scale in online A/B-testing
    Using the properties of the Poisson bootstrap algorithm and quantile estimators, we have been able to reduce the computational complexity of Poisson bootstrap difference-in-quantiles confidence intervals enough to unlock bootstrap inference for almost arbitrary large samples. At Spotify, we can now easily calculate bootstrap confidence intervals for difference-in-quantiles in A/B tests with hundreds of millions of observations...
  • In 2022, what is the proper way to get into machine/deep learning? [HN Discussion]
    By getting into machine or deep learning I mean building upto a stage to do ML/DL research. Applied research or core theory of ML/DL research. Ofcourse, the path to both will quite different. Standing in 2022, what are the best resources for a CS student/decent programmer to get into the field of ML and DL on their own. Resources can be both books or public courses...The target ability: 1. To understand the theory behind the algorithms, 2. To implement an algorithm on a dataset of choice. (Data cleaning and management should also be learned), 3. Read research publications and try to implement them....
  • How to Build a GPT-3 for Science
    Want to create an image of velociraptors working on a skyscraper, in the style of “Lunch Atop A Skyscraper” of 1932? Use DALL-E...Want to deeply understand COVID-19 research and answer your questions based on evidence? Learn how to do a Boolean search, read scientific papers, and maybe get a PhD, because there are no generative AI models trained on the vast body of scientific research publications...
  • LLM.int8() and Emergent Features
    When I attended NAACL, I wanted to do a little test. I had two pitches for my LLM.int8() paper. One pitch is about how I use advanced quantization methods to achieve no performance degradation transformer inference at scale that makes large models more accessible. The other pitch talks about emergent outliers in transformers and how they radically change what transformers learn and how they function...This blog post will spill some mandatory details about quantization, but I want to mostly make it about these emergent features that I found in transformers at scale...
  • Unleashing the power of large language models
    Maarten Grootendorst on applying large language models to topic models and fuzzy string matching...Maarten Grootendorst, is a data scientist at IKNL, an institute that strives to reduce the impact of cancer by collecting and unlocking essential and reliable data. More importantly, he’s the author of a few open source libraries that I’ve come to enjoy: BERTopic (topic modeling with transformers and c-TF-IDF), PolyFuzz (fuzzy string matching), and KeyBERT (keyword extraction)...
  • inControl Podcast - Sean Meyn: Markov chains, networks, reinforcement learning, beekeeping and jazz
    inControl Podcast - a podcast on control theory and related topics, including feedback, decision making, artificial intelligence, robotics and much more...In this episode, our guest is Sean Meyn, Professor and Robert C. Pittman Eminent Scholar Chair in the Department of Electrical and Computer Engineering at the University of Florida. The episode features Sean’s adventures in the areas of Markov chains, networks and Reinforcement Learning (RL) as well as anecdotes and trivia about beekeeping and jazz...
  • NeuMan: Neural Human Radiance Field from a Single Video
    Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences. We [Apple] propose a novel framework to reconstruct the human and the scene that can be rendered with novel human poses and views from just a single in-the-wild video. Given a video captured by a moving camera, we train two NeRF models: a human NeRF model and a scene NeRF model...
  • A Library for Representing Python Programs as Graphs for Machine Learning
    Graph representations of programs are commonly a central element of machine learning for code research. We introduce an open source Python library python_graphs that applies static analysis to construct graph representations of Python programs suitable for training machine learning models...
 
 

Course*

 


Data Science Specialities: What Are My Options in Data Science?

Data science is a rewarding career field full of opportunities for advancement. Specialized roles are fundamental to helping organizations maximize their ability to harness data for strategic planning. Want to know more about your options as a data scientist? Read our blog!

TDI’s Data Programs are intensive bootcamps that turn STEM academics into leading data professionals, providing expert training, live code, and real-world data sets. Each industry-leading principle is tailored to prepare you as you venture towards new career paths, advanced education, and overall skill refinement. Applications open next week!

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

 

 

Jobs

 
  • Data Scientist - Success Academy Charter Schools, Inc - NYC

    This new Data Scientist role will be a key contributor to our mission of driving innovation across the organization. Reporting to the Leader of Enterprise Analytics, this role will be responsible for working with stakeholders in various functions to understand areas of opportunity, developing analytical solutions ranging from dashboards to sophisticated mathematical models, and helping functional teams adopt those solutions. This role will be part of a highly collaborative team of professionals with a wide range of skills including data science, data engineering, business analysis, and project management....
     

        Want to post a job here? Email us for details --> team@datascienceweekly.org

 

 

Training & Resources

 
  • AI Research Intensive
    These lectures are part of the "AI Research Intensive", designed to teach fundamental skills involved in conducting cutting-edge AI research and writing a research paper...The AI Research Intensive was hosted by Rajpurkar Lab at Harvard Medical School on August 4 & 5, 2022...
  • Resources To Secure Your Next MLE / DS / SWE Job!
    This repo contains cheat sheets + data structures & algorithms templates useful for MLE, DS, and SWE interviews. All cheat sheets were created by me and helped me secure multiple offers at big tech companies...
  • Cornell's Operations Research and Information Engineering 4741: Learning with Big Messy Data
    Modern data sets...are often big, messy, and extremely useful. This course addresses scalable robust methods for learning from big messy data. We will cover techniques for learning with data that is messy  —  consisting of measurements that are continuous, discrete, boolean, categorical, or ordinal, or of more complex data such as graphs, texts, or sets, with missing entries and with outliers  —  and that is big  —  which means we can only use algorithms whose complexity scales linearly in the size of the data. We will cover techniques for cleaning data, supervised and unsupervised learning, finding similar items, model validation, and feature engineering...
 
 

What you’re up to – notes from DSW readers

 
  • Robert Ritz is working on Datafantic, a data blog, to tell data driven stories and share data science tutorials. First entry is on Matplotlib stylesheets. Site is Datafantic.com...

* To share your projects and updates, share the details here.

** Want to chat with one of the above people? Hit reply and let us know :)

 

Last Week's Newsletter's 3 Most Clicked Links

 

* Based on unique clicks.

** Find last week's newsletter here.

 

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2022 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Key phrases

Older messages

Data Science Weekly - Issue 455

Friday, August 12, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #455 August 11 2022 Editor's Picks

Data Science Weekly - Issue 454

Friday, August 5, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #454 August 04 2022 Editor's Picks

Data Science Weekly - Issue 453

Friday, July 29, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #453 July 28 2022 Editor's Picks

Data Science Weekly - Issue 452

Friday, July 22, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #452 July 21 2022 Editor's Picks Is

Data Science Weekly - Issue 451

Friday, July 15, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #451 July 14 2022 Editor's Picks The

You Might Also Like

Airbnb Icons 🏠, Microsoft's OpenAI email leaks 🤖, software friction 👨‍💻

Thursday, May 2, 2024

Airbnb's Icons is a new collection of experiences hosted by big names in music, film, television, arts, sports, and more Sign Up |Advertise|View Online TLDR Together With Dollar Flight Club TLDR

📧 Did you want this discount?

Thursday, May 2, 2024

Your chance to save on MMA is about to end. ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

Scoop: Tiger Global-backed Innovaccer in talks to raise $250M

Wednesday, May 1, 2024

Plus: An update on Google's layoffs and the social platform X didn't see coming View this email online in your browser By Christine Hall Wednesday, May 1, 2024 Welcome to TechCrunch PM. Today,

🖥️ Why I'm Never Going Back to a Windows PC — Tips Before You Buy a Smart Ring

Wednesday, May 1, 2024

Also: How to Clear the Moisture Detected Warning on Samsung Phones, and More How-To Geek Logo May 1, 2024 Did You Know A single 1 oz shot of espresso only has approximately 40 mg of caffeine, whereas a

Daily Coding Problem: Problem #1428 [Hard]

Wednesday, May 1, 2024

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Microsoft. Given an array of positive integers, divide the array into two subsets such

Top Tech Deals 👀 Samsung Gaming Monitor, Pixel Watch 2, MacBook Air, and More

Wednesday, May 1, 2024

Get a discounted M3 MacBook Air or expand your Xbox storage. How-To Geek Logo May 1, 2024 Top Tech Deals: Samsung Gaming Monitor, Pixel Watch 2, MacBook Air, and More Get a discounted M3 MacBook Air or

Infographic | Visualizing Global Gold Production in 2023 🏅

Wednesday, May 1, 2024

Gold production in 2023 was led by China, Australia, and Russia, with each outputting over 300 tonnes. View Online | Subscribe Presented by: Access European benchmarks with a trusted 25-year history

⚙️ GPT-5 may be releasing sooner than expected

Wednesday, May 1, 2024

Plus: Amazon rebrands AI branch ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

Noonification: How to Create a CI/CD Pipeline Using GitHub and AWS EC2

Wednesday, May 1, 2024

Top Tech Content sent at Noon! Get Algolia: AI Search that understands How are you, @newsletterest1? 🪐 What's happening in tech today, May 1, 2024? The HackerNoon Newsletter brings the HackerNoon

Arc for Windows is better than Chrome

Wednesday, May 1, 2024

Adobe bug bounty; Rabbit's first R1 software update; Dream podcaster mic -- ZDNET ZDNET Tech Today - US May 1, 2024 placeholder Arc browser is now available for Windows and it's so much better