Data Science Weekly - Data Science Weekly - Issue 459

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #459

September 08 2022

Editor's Picks

 

 
  • New (1h57m) video lecture from Andrej Karpathy
    "The spelled-out intro to language modeling: building makemore"...We build a neural net bigram language model (working up to transformers). Micrograd was fun, now things complexify: tensors, broadcasting, training, sampling...
  • Organizations need to deliberately create data
    In this post, I’m going to try and persuade you that organizations should invest in creating better data - and that investment can directly drive value and competitive advantage...
  • Data Activation In The Modern Data Stack
    The activation layer of the modern data stack is my favorite since it allows you to take action on the data — in the tools you depend on — to build personalized, data-powered experiences...You finally get to go beyond looking at dashboards and utilize data in a meaningful manner, and in the process, do more impactful work...With so many companies innovating and building products to activate data, it’s not straightforward to ascertain which of the processes, tools, and technologies should fall under data activation...After talking to many founders and giving it a lot of thought, here’s what I recommend the activation layer should comprise......
 
 

A Message from this week's Sponsor:

 



Pinecone vector database

The Pinecone vector database makes it easy to build high-performance vector search applications. Developer-friendly, fully managed, and easily scalable without infrastructure hassles.

Use Pinecone to build semantic search, object recognition, recommendations, anomaly detection, and other vector-based functionality into your applications.

 

 

Data Science Articles & Videos

 
  • Applied NLP Research at Primer
    John Bohannon is a Senior Director of Data Science and Head of Research at Primer AI, an end-to-end machine intelligence solution for textual data. We discussed their process of translating ML research into ML products, through the lens of the following examples: a) Zero shot entity recognition, b) Inference triage, c) Tools for detecting synthetic text, d) Text Summarization, e) End-to-end platforms for NLP applications...
  • Using Web Server Logs to Answer Product and Business Questions
    This tutorial demonstrates how to set up a relatively lightweight data stack that will serve as a platform to answer questions from web server access logs about who is using your product and how they are using it. This Data Stack can run on any cloud, could scale with your business and potentially provide all the capabilities you require, this ain’t no toy data stack...
  • MuJoCo Menagerie
    Menagerie is a collection of high-quality models for the MuJoCo physics engine, curated by DeepMind...A physics simulator is only as good as the model it is simulating, and in a powerful simulator like MuJoCo with many modeling options, it is easy to create "bad" models which do not behave as expected. The goal of this collection is to provide the community with a curated library of well-designed models that work well right out of the gate...
  • AudioLM: a Language Modeling Approach to Audio Generation
    We introduce AudioLM, a framework for high-quality audio generation with long-term consistency. AudioLM maps the input audio to a sequence of discrete tokens and casts audio generation as a language modeling task in this representation space. We show how existing audio tokenizers provide different trade-offs between reconstruction quality and long-term structure, and we propose a hybrid tokenization scheme to achieve both objectives. Namely, we leverage the discretized activations of a masked language model pre-trained on audio to capture long-term structure and the discrete codes produced by a neural audio codec to achieve high-quality synthesis...
  • What songs were popular when I was in high school?
    Why do algorithmic recommendations leave so much to be desired?...you start scrolling through your algorithmically created playlists. The ones Spotify makes just for you. But they don't feel like they are for you, but instead for someone who looks like you, the stereotypical you...like any data-oriented person, I decided to do entirely too much work to learn a few things. In the end, I did come up with the insight I was looking for...
  • Why Momentum Really Works
    We often think of Momentum as a means of dampening oscillations and speeding up the iterations, leading to faster convergence. But it has other interesting behavior. It allows a larger range of step-sizes to be used, and creates its own oscillations. What is going on?...
  • A Review of Sparse Expert Models in Deep Learning
    Sparse expert models are a thirty-year old concept re-emerging as a popular architecture in deep learning. This class of architecture encompasses Mixture-of-Experts, Switch Transformers, Routing Networks, BASE layers, and others, all with the unifying idea that each example is acted on by a subset of the parameters...We review the concept of sparse expert models, provide a basic description of the common algorithms, contextualize the advances in the deep learning era, and conclude by highlighting areas for future work...
  • Stop Pickling your ML Models. Use ONNX instead!
    When you pickle a model you are serializing a python object so it can be stored in a file...In contrast when you export a model to ONNX you are converting it to a set of operations that can be executed directly by the framework...What this means is that your model is no longer strongly coupled to your specific python environment. In fact it’s no longer coupled with Python at all, because ONNX models are portable to many different languages...let’s now get into an example on how you can convert your models to both pickle and ONNX...
  • A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification
    This hands-on introduction is aimed to provide the reader a working understanding of conformal prediction and related distribution-free uncertainty quantification techniques with one self-contained document. We lead the reader through practical theory for and examples of conformal prediction and describe its extensions to complex machine learning tasks involving structured outputs, distribution shift, time-series, outliers, models that abstain, and more...
  • Analyzing Employee Attrition in Healthcare Data and Predicting Outcomes
    Healthcare employers can use their proprietary data, much of which contain insightful signals on causes of attrition and burn out. This is where data analytics and predictive modeling can be useful. For example, data analytics can aid employers in identifying employees and departments at high risk of attrition. Further, this can aid employers in determining the factors that contribute to high attrition rates....
  • Finding a picture in an image without marking it up?
    We often see pictures in images: comics, for example, combine several pictures into one. And if you have an entertainment app where people post memes, like in our iFunny, you’re going to run into that all the time. Neural networks are already capable of finding animals, people, or other objects, but what if we need to find but another image in the image? Let’s take a closer look at our algorithm so that you can test it with a notebook in Google Colaboratory and even implement it in your project....
 
 

Summit*

 


Register for IMPACT 2022: The Data Observability Summit

Join thousands of professionals for a virtual event October 25-26 to learn how to drive real-world impact with your data at scale.

Get inspired with virtual keynotes from Nate Silver, the FiveThirtyEight founder and editor-in-chief, Daniel Kahneman, the Nobel Prize-winning psychologist, economist, and author of Thinking, Fast and Slow. Hear from the founders and chief executives of Databricks, Looker, Confluent, dbt Labs, and Fivetran about the industry's hottest technologies. Leverage best practices from leaders heading the industry’s top data organizations including The New York Times, Roche, and GitLab.

RSVP at impactdatasummit.com/2022


*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

 

 

Jobs

 
  • Data Scientist - Success Academy Charter Schools, Inc - NYC

    This new Data Scientist role will be a key contributor to our mission of driving innovation across the organization. Reporting to the Leader of Enterprise Analytics, this role will be responsible for working with stakeholders in various functions to understand areas of opportunity, developing analytical solutions ranging from dashboards to sophisticated mathematical models, and helping functional teams adopt those solutions. This role will be part of a highly collaborative team of professionals with a wide range of skills including data science, data engineering, business analysis, and project management....
     

        Want to post a job here? Email us for details --> team@datascienceweekly.org

 

 

Training & Resources

 
  • Deep Learning Paper Implementations
    59 Implementations/tutorials of deep learning papers with side-by-side notes ๐Ÿ“; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, ...), gans(cyclegan, stylegan2, ...), ๐ŸŽฎ reinforcement learning (ppo, dqn), capsnet, distillation,... ๐Ÿง ...
  • Cycle-GAN implemented in PyTorch
    This repository contains an implementation of the Cylce-GAN architecture for style transfer along with instructions to train on an own dataset...
  • 6.S965 • Fall 2022 • MIT: TinyML and Efficient Deep Learning
    This course is a deep dive into efficient machine learning techniques that enable powerful deep learning applications on resource-constrained devices. Topics cover efficient inference techniques, including model compression, pruning, quantization, neural architecture search, and distillation; and efficient training techniques, including gradient compression and on-device transfer learning; followed by application-specific model optimization techniques for videos, point cloud, and NLP; and efficient quantum machine learning...
 
 

What you’re up to – notes from DSW readers

   

* To share your projects and updates, share the details here.

** Want to chat with one of the above people? Hit reply and let us know :)

 

Last Week's Newsletter's 3 Most Clicked Links

 

* Based on unique clicks.

** Find last week's newsletter here.

 

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2022 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Key phrases

Older messages

Data Science Weekly - Issue 458

Friday, September 2, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #458 September 01 2022 Editor's

Data Science Weekly - Issue 457

Friday, August 26, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #457 August 25 2022 Editor's Picks

Data Science Weekly - Issue 456

Friday, August 19, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #456 August 18 2022 Editor's Picks

Data Science Weekly - Issue 455

Friday, August 12, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #455 August 11 2022 Editor's Picks

Data Science Weekly - Issue 454

Friday, August 5, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #454 August 04 2022 Editor's Picks

You Might Also Like

📧 Introduction to Distributed Tracing With OpenTelemetry in .NET

Saturday, April 20, 2024

​ Introduction to Distributed Tracing With OpenTelemetry in .NET Read on: m​y website / Read time: 5 minutes BROUGHT TO YOU BY ​ Shesha: The .NET Open-Source Low-Code Framework ​ Introducing Shesha, a

a16z’s Infrastructure team gets a new general partner

Friday, April 19, 2024

Post News is shutting down and Wall Street isn't feeling a Salesforce-Informatica pairing View this email online in your browser By Christine Hall Friday, April 19, 2024 Image Credits: Andreessen

New Roundtable! Additive for Mass Production Applications

Friday, April 19, 2024

The Outlook for the Future View this email in your browser engineering.com Roundtable - Additive for Mass Production Applications: The Outlook for the Future 6 Considerations for Choosing the Right

📷 What to Know About Macro Photography — Why You Should Buy a Budget Motherboard

Friday, April 19, 2024

Also: How to Automatically Highlight Values in Excel, and More! How-To Geek Logo April 19, 2024 📩 Get expert reviews, the hottest deals, how-to's, breaking news, and more delivered directly to your

Is the wind going out of the AI sails?

Friday, April 19, 2024

Rippling vacuums up venture capital and Ramp bags more millions View this email online in your browser By Haje Jan Kamps Friday, April 19, 2024 Image Credits: Getty Images / Carol Yepes Welcome to

Llama 3 is out - Weekly News Roundup - Issue #463

Friday, April 19, 2024

Plus: brand-new, all-electric Atlas; AI Index Report 2024; Microsoft pitched GenAI tools to US military; Humane AI Pin reviews are in; debunking Devin; and more! ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Daily Coding Problem: Problem #1417 [Easy]

Friday, April 19, 2024

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Wayfair. You are given a 2 x N board, and instructed to completely cover the board with

Charted | How Hard Is It to Get Into an Ivy League School? 🎓

Friday, April 19, 2024

We detail the admission rates and average annual cost for Ivy League schools, as well as the median SAT scores required to be accepted. View Online | Subscribe Presented by: Discover the motivations

Dark Matter & Tortured Poets

Friday, April 19, 2024

New music releases aren't what they used to be -- for good and bad. Dark Matter & Tortured Poets By MG Siegler • 19 Apr 2024 View in browser View in browser New music releases in 2024 are a

Impact of AI on Product Management

Friday, April 19, 2024

​ Impact of AI on Product Management The rise of the AI Product Manager. Product managers have always championed customer's needs. However, with AI, the job requires new technical and ethical