Data Science Weekly - Data Science Weekly - Issue 414

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #414

October 28 2021

Editor Picks
 
  • Reimagining Philippine mythical creatures using VQGAN+CLIP
    Most of what I know from Philippine folklore came from stories that were passed down from one generation to the next. I knew that a kapre is a large creature smoking a cigar because the “auntie of my mother’s friend” said so...If we provide a machine learning model with text descriptions of folk creatures, what images can it conjure?...I used a neural network called VQGAN+CLIP...and supplied it with descriptions of Philippine folk creatures. The resulting images can then be thought of as to what the model “imagined” upon reading them...and they’re a bit surreal and creepy!...
  • Dirty Data Science: Machine Learning on non-curated data
    These slides are a one-hour course on machine learning with non-curated data...According to industry surveys, the number one hassle of data scientists is cleaning the data to analyze it. Here, I survey what "dirtyness" forces - time-consuming cleaning. We will then cover two specific aspects of dirty data: non-normalized entries and missing values. I show how, for these two problems, machine-learning practice can be adapted to work directly on a data table without curation. The normalization problem can be tackled by adapting methods from natural language processing. The missing-values problem will lead us to revisit classic statistical results in the setting of supervised learning...
  • Just Ask for Generalization
    This blog post outlines a key engineering principle I’ve come to believe strongly in for building general AI systems with deep learning. This principle guides my present-day research tastes and day-to-day design choices in building large-scale, general-purpose ML systems...Generalizing to what you want may be easier than optimizing directly for what you want...
 
 

A Message from this week's Sponsor:

 



Quit writing SQL. Find answers faster.

Tired of building dashboards and writing SQL queries for colleagues? PostHog enables teams to get answers by themselves quickly and easily, without needing to write any code.

And it can be deployed on your own infrastructure, which is nice.

PostHog offers everything product-led teams need to grow, including funnel analysis, session recordings and feature flags — all in one platform, all without SQL.

Deploy PostHog today for free.

 

 

Data Science Articles & Videos

 
  • A Behind-the-Scenes Look at How Postman’s Data Team Works
    Postman is no stranger to scale. What started out as a side project six years ago is now one of India’s latest unicorns with a $5.6 billion valuation...Through a series of conversations with Prudhvi Vasa, Postman’s Analytics Leader, I’ve written this article to dive into a behind-the-scenes view of Postman’s data team — how it’s structured, who they hire for different roles, how they plan and prioritize their work democratically, and how they use sprints to constantly identify problems and make improvements...
  • Machine learning is just statistics + quantifier reversal
    In a recent blog post titled “Machine learning is not nonparametric statistics”, Ben Recht speaks to some of the difficulties in applying classical statistical tools to understand why machine learning works. A core piece of his argument goes as follows: Given a fixed classifier, we can assess the classifier’s population error rate using a sample of data and basic statistics. But in machine learning there’s a switcheroo—we select the sample of data first, and then we use that data to select the classifier. This means those classical statistical tools don’t work anymore. What gives?...It turns out that back in 1998, David McAllester worked out an elegant way to deal with this switcheroo that he called quantifier reversal. By applying quantifier reversal, those classical statistical tools become useful again. So what is quantifier reversal? How can it make me a million dollars? And what can it tell me about why machine learning works? That’s what I’m going to answer in this post!...
  • Apple: On-device Panoptic Segmentation for Camera Using Transformers
    The Apple Camera App (in iOS and iPadOS) relies on a wide range of scene-understanding technologies to develop images. In particular, pixel-level understanding of image content, also known as image segmentation, is behind many of the app's front-and-center features...Panoptic segmentation unifies scene-level and subject-level understanding by predicting two attributes for each pixel: a categorical label and a subject label...In this post, we walk through the technical details of how we designed a neural architecture for panoptic segmentation, based on Transformers, that is accurate enough to use in the camera pipeline but compact and efficient enough to execute on-device with negligible impact on battery life...
  • Parameter Prediction for Unseen Deep Architectures
    The algorithms optimizing neural network parameters remain largely hand-designed and computationally inefficient. We study if we can use deep learning to directly predict these parameters by exploiting the past knowledge of training other networks. We introduce a large-scale dataset of diverse computational graphs of neural architectures - DeepNets-1M - and use it to explore parameter prediction on CIFAR-10 and ImageNet. By leveraging advances in graph neural networks, we propose a hypernetwork that can predict performant parameters in a single forward pass taking a fraction of a second, even on a CPU. The proposed model achieves surprisingly good performance on unseen and diverse networks...
  • How to deploy machine learning with differential privacy?
    In many applications of machine learning, such as machine learning for medical diagnosis, we would like to have machine learning algorithms that do not memorize sensitive information about the training set, such as the specific medical histories of individual patients. Differential privacy is a notion that allows quantifying the degree of privacy protection provided by an algorithm on the underlying (sensitive) data set it operates on. Through the lens of differential privacy, we can design machine learning algorithms that responsibly train models on private data...
  • Applications and Techniques for Fast Machine Learning in Science
    We discuss applications and techniques for fast machine learning (ML) in science -- the concept of integrating power ML methods into the real-time experimental data processing loop to accelerate scientific discovery. The material...covers three main areas: applications for fast ML across a number of scientific domains; techniques for training and implementing performant and resource-efficient ML algorithms; and computing architectures, platforms, and technologies for deploying these algorithms. We also present overlapping challenges across the multiple scientific domains where common solutions can be found. This community report is intended to give plenty of examples and inspiration for scientific discovery through integrated and accelerated ML solutions...
  • Modern Data Stack Conference (MDSCON) 2021: The Top 5 Takeaways You Should Know
    A few weeks ago, Fivetran hosted the Modern Data Stack Conference (MDSCON) 2021, a virtual conference to empower data-driven decisions that transform businesses, teams, and careers...For those who missed the conference, and for those who were there but couldn’t attend every session, here are five key ideas and takeaways from MDSCON 2021...
  • The Future of the Data Engineer
    Maxime Beauchemin, one of the first data engineers at Facebook and Airbnb, wrote and open sourced the wildly popular orchestrator, Apache Airflow, followed shortly thereafter by Apache Superset, a data exploration tool that’s taking the data viz landscape by storm....he also wrote the landmark 2017 blog post, The Rise of the Data Engineer...So, five years later, where do we stand?...I sat down with Maxime to discuss the current state of affairs, including the decentralization of the modern data stack, the fragmentation of the data team, the rise of the cloud, and how all these factors have changed the role of the data engineer forever...
  • A First-Principles Theory of Neural Network Generalization
    Deep learning has proven a stunning success for countless problems of interest, but this success belies the fact that, at a fundamental level, we do not understand why it works so well...Perhaps the greatest of these mysteries has been the question of generalization: why do the functions learned by neural networks generalize so well to unseen data?...in our recent paper, we derive a first-principles theory that allows one to make accurate predictions of neural network generalization (at least in certain settings)...
  • Declutter and Focus: Empirically Evaluating Design Guidelines for Effective Data Communication
    The visualization practitioner community prescribes two popular guidelines for creating clear and efficient visualizations: declutter and focus. The declutter guidelines suggest removing non-critical gridlines, excessive labeling of data values, and color variability to improve aesthetics and to maximize the emphasis on the data relative to the design itself. The focus guidelines for explanatory communication recommend including a clear headline that describes the relevant data pattern, highlighting a subset of relevant data values with a unique color, and connecting those values to written annotations that contextualize them in a broader argument. We evaluated how these recommendations impact recall of the depicted information across cluttered, decluttered, and decluttered+focused designs of six graph topics...
 
 

Tools*

 



Retool is the fast way to build an interface for any database

With Retool, you don't need to be a developer to quickly build an app or dashboard on top of any data set. Data teams at companies like NBC use Retool to build any interface on top of their data—whether it's a simple read-write visualization or a full-fledged ML workflow.

Drag and drop UI components—like tables and charts—to create apps. At every step, you can jump into the code to define the SQL queries and JavaScript that power how your app acts and connects to data. The result—less time on repetitive work and more time to discover insights.

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

 
 

Jobs

 
  • Entry Level Data Scientist: 2022 - IBM - Multiple Locations

    As a Data Scientist at IBM, you will help transform our clients’ data into tangible business value by analyzing information, communicating outcomes and collaborating on product development. Work with Best in Class open source and visual tools, along with the most flexible and scalable deployment options. Whether it’s investigating patient trends or weather patterns, you will work to solve real world problems for the industries transforming how we live.

        Want to post a job here? Email us for details >> team@datascienceweekly.org

 
 

Training & Resources

 
  • Reddit Discussion: A Guide to Tesla’s Configurable Floating Point Formats & Arithmetic
    Tesla just randomly dropped a PDF with details of the custom floating point formats they've created for their Dojo training hardware...I think it's pretty interesting. They want to eliminate 32 bit floating point from training almost entirely, using custom 16-bit and even 8-bit floating point formats instead, with a configurable "exponent bias" that is shared between many numbers and can apparently be learned during training. Also, they have stochastic rounding which seems like a great idea for low precision formats. Worth a glance if you care about hardware....
 
 

Books

 

 
  • Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits


    Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
     


    P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2021 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Key phrases

Older messages

[in case you missed it] Data Science Weekly - Issue 413

Sunday, October 24, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #413 October 21 2021 Editor Picks The

Data Science Weekly - Issue 412

Friday, October 15, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #412 October 14 2021 Editor Picks

[in case you missed it] Data Science Weekly - Issue 410

Sunday, October 3, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #410 September 30 2021 Editor Picks Top

Data Science Weekly - Issue 410

Friday, October 1, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #410 September 30 2021 Editor Picks Top

Data Science Weekly - Issue 409

Friday, September 24, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #409 September 23 2021 Editor Picks Tree

You Might Also Like

a16z’s Infrastructure team gets a new general partner

Friday, April 19, 2024

Post News is shutting down and Wall Street isn't feeling a Salesforce-Informatica pairing View this email online in your browser By Christine Hall Friday, April 19, 2024 Image Credits: Andreessen

New Roundtable! Additive for Mass Production Applications

Friday, April 19, 2024

The Outlook for the Future View this email in your browser engineering.com Roundtable - Additive for Mass Production Applications: The Outlook for the Future 6 Considerations for Choosing the Right

📷 What to Know About Macro Photography — Why You Should Buy a Budget Motherboard

Friday, April 19, 2024

Also: How to Automatically Highlight Values in Excel, and More! How-To Geek Logo April 19, 2024 📩 Get expert reviews, the hottest deals, how-to's, breaking news, and more delivered directly to your

Is the wind going out of the AI sails?

Friday, April 19, 2024

Rippling vacuums up venture capital and Ramp bags more millions View this email online in your browser By Haje Jan Kamps Friday, April 19, 2024 Image Credits: Getty Images / Carol Yepes Welcome to

Llama 3 is out - Weekly News Roundup - Issue #463

Friday, April 19, 2024

Plus: brand-new, all-electric Atlas; AI Index Report 2024; Microsoft pitched GenAI tools to US military; Humane AI Pin reviews are in; debunking Devin; and more! ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Daily Coding Problem: Problem #1417 [Easy]

Friday, April 19, 2024

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Wayfair. You are given a 2 x N board, and instructed to completely cover the board with

Charted | How Hard Is It to Get Into an Ivy League School? 🎓

Friday, April 19, 2024

We detail the admission rates and average annual cost for Ivy League schools, as well as the median SAT scores required to be accepted. View Online | Subscribe Presented by: Discover the motivations

Dark Matter & Tortured Poets

Friday, April 19, 2024

New music releases aren't what they used to be -- for good and bad. Dark Matter & Tortured Poets By MG Siegler • 19 Apr 2024 View in browser View in browser New music releases in 2024 are a

Impact of AI on Product Management

Friday, April 19, 2024

​ Impact of AI on Product Management The rise of the AI Product Manager. Product managers have always championed customer's needs. However, with AI, the job requires new technical and ethical

⚙️ Zuck has entered the chat(bot)

Friday, April 19, 2024

Plus: AI video's coming to mobile! ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌