Data Science Weekly - Data Science Weekly - Issue 413

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #413

October 21 2021

Editor Picks
  • The philosophical and musical failings of “Beethoven X: The AI Project”
    When VAN asked me to do a review of an artificial-intelligence-created realization of Beethoven’s Tenth Symphony called “Beethoven X: The AI Project,” which is based on the skimpy sketches he left when he died, I more or less groaned in my reply. “Not for me,” I said. “I know pretty much what I’ll think about it, and my review could get snarky.” “If so, that would be all right with us,” VAN said. “Well, OK,” I groaned back. So here I am and here goes...At the end of the symphony I found myself more philosophical than annoyed. I’ll start with that...
  • MIT's "The Missing Semester of Your CS Education" Class
    Classes teach you all about advanced topics within CS, from operating systems to machine learning, but there’s one critical subject that’s rarely covered, and is instead left to students to figure out on their own: proficiency with their tools. We’ll teach you how to master the command-line, use a powerful text editor, use fancy features of version control systems, and much more!...
  • Predicting Spreadsheet Formulas from Semi-structured Contexts
    We describe a new model that learns to automatically generate formulas based on the rich context around a target cell. When a user starts writing a formula with the “=” sign in a target cell, the system generates possible relevant formulas for that cell by learning patterns of formulas in historical spreadsheets....

A Message from this week's Sponsor:


Kickstart Your New Career with a Data Science & Analytics Bootcamp

Don’t miss your chance to join a Data Scientist-led, online Metis bootcamp plus get career support until you’re hired. Bootcamps are starting soon! Ready to take your data science or analytics career to the next level? Learn more about the Metis Online Data Science & Analytics Bootcamps.



Data Science Articles & Videos

  • Explaining in Style: Training a GAN to explain a classifier in StyleSpace
    We propose a training procedure for a StyleGAN, which incorporates the classifier model, in order to learn a classifier-specific StyleSpace. Explanatory attributes are then selected from this space. These can be used to visualize the effect of changing multiple attributes per image, thus providing image-specific explanations. We apply StylEx to multiple domains, including animals, leaves, faces and retinal images. For these, we show how an image can be modified in different ways to change its classifier output. Our results show that the method finds attributes that align well with semantic ones, generate meaningful image-specific explanations, and are human-interpretable as measured in user-studies....
  • Challenges in Detoxifying Language Models
    Large language models (LM) generate remarkably fluent text and can be efficiently adapted across NLP tasks. Measuring and guaranteeing the safety of generated text is imperative for deploying LMs in the real world; to this end, prior work often relies on automatic evaluation of LM toxicity...We demonstrate that while basic intervention strategies can effectively optimize previously established automatic metrics on the RealToxicityPrompts dataset, this comes at the cost of reduced LM coverage for both texts about, and dialects of, marginalized groups. Additionally, we find that human raters often disagree with high automatic toxicity scores for texts generated by models with strong toxicity reduction interventions...
  • Composability in Julia: Implementing Deep Equilibrium Models via Neural ODEs
    In this blog post we will show how to easily, efficiently, and robustly use steady state nonlinear solvers with neural networks in Julia. We will showcase the relationship between steady states and ODEs, thus making a connection between the methods for Deep Equilibrium Models (DEQs) and Neural ODEs...
  • ETL Pipelines with Airflow: the Good, the Bad and the Ugly
    In this article, we review how to use Airflow ETL operators to transfer data from Postgres to BigQuery with the ETL and ELT paradigms. Then, we share some challenges you may encounter when attempting to load data incrementally with Airflow DAGs. Finally, we argue why Airflow ETL operators won’t be able to cover the long tail of integrations for your business data...
  • Considerations Before Pushing Machine Learning Models to Production
    I daily see, as a data scientist, the challenges that come with putting AI-based solutions in production. These challenges are numerous and cover a variety of aspects: modeling and system design, data engineering, resource management, SLA, etc...I don’t pretend mastery in any of those fields. I do however know that implementing some software engineering principles and using the right tools helped me a lot in making my work reproducible and ready for production...In this article, I’ll share with you 7 of the considerations I have in mind before productionizing my models....
  • Generative art resources in R
    An extremely incomplete (and probably biased) list of resources to help an aspiring generative artist get started making pretty pictures in R...
  • Who is a Data Scientist in 2021?
    Every year we publishe a study on 1,001 data scientist profiles. The information is collected from public LinkedIn profiles, assuming that the information posted on the social media platform is an unbiased estimator of their resume...This research allows us to gain insights, with a reasonable degree of certainty, about who is a data scientist in 2021. We present only aggregate data to highlight important trends that can be useful to anyone who wants to break into the field, as well as to organizations looking to hire data scientists....



Create AI-powered search and recommendation apps with Pinecone

Pinecone is a fully managed vector database that makes it easy to add vector search to production applications. It combines state-of-the-art vector search libraries, advanced features such as filtering, and distributed infrastructure to provide high performance and reliability at any scale. Get started now — it's free!

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!



  • Entry Level Data Scientist: 2022 - IBM - Multiple Locations

    As a Data Scientist at IBM, you will help transform our clients’ data into tangible business value by analyzing information, communicating outcomes and collaborating on product development. Work with Best in Class open source and visual tools, along with the most flexible and scalable deployment options. Whether it’s investigating patient trends or weather patterns, you will work to solve real world problems for the industries transforming how we live.

        Want to post a job here? Email us for details >>


Training & Resources

  • Carnegie Mellon University 10721: Philosophical Foundations of Machine Intelligence
    What is this field? What are its normative aims? What are its modes of inquiry? What are (and have been) its intellectual and ideological commitments? What foundational questions is it in dialogue with, and what foundational obstacles obstruct its progress? Finally: What are our responsibilities as researchers & practitioners deploying this technology?...
  • SHAP: Explain Any Machine Learning Model in Python
    Imagine you are trying to train a machine learning model to predict whether an ad is clicked by a particular person. After receiving some information about a person, the model predicts that a person will not click on an ad...But why does the model predict that? How much does each feature contribute to the prediction? Wouldn’t it be nice if you can see a plot indicating how much each feature contributes to the prediction?...That is when Shapley value comes in handy...
  • Random Forests Algorithm explained with a real-life example and some Python code
    Random Forests is a Machine Learning algorithm that tackles one of the biggest problems with Decision Trees: variance...Even though Decision Trees is simple and flexible, it is greedy algorithm. It focuses on optimizing for the node split at hand, rather than taking into account how that split impacts the entire tree. A greedy approach makes Decision Trees run faster, but makes it prone overfitting...An overfit tree is highly optimized to predicting the values in the training dataset, resulting in a learning model with high-variance. How you calculate variance in a Decision Tree depends on the problem you’re solving...



  • Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits

    Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.

    P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2021, All rights reserved.
unsubscribe from this list    update subscription preferences 

Key phrases

Older messages

Data Science Weekly - Issue 412

Friday, October 15, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #412 October 14 2021 Editor Picks

Data Science Weekly - Issue 411

Friday, October 8, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #411 October 07 2021 Editor Picks

[in case you missed it] Data Science Weekly - Issue 410

Sunday, October 3, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #410 September 30 2021 Editor Picks Top

Data Science Weekly - Issue 410

Friday, October 1, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #410 September 30 2021 Editor Picks Top

Data Science Weekly - Issue 409

Friday, September 24, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #409 September 23 2021 Editor Picks Tree

NEW: Audio Player Now Available on HackerNoon

Friday, December 3, 2021

Enjoy life while a robot dictates the thoughts written into your favourite blogs. The Noonies celebrates the amazing people who power the technology industry. Hacker Noon: How Hackers Start Their

Software Testing Weekly - Issue 100

Friday, December 3, 2021

WE'VE MADE IT! 🎉 View on the Web Archives ISSUE 100 December 4th 2021 COMMENT Welcome to the 100th issue! 🎉 WOW! We've made it! I'm truly astonished at what we've achieved. I say "

FTC sues to block Nvidia-Arm merger — U.S. State Department phones hacked with Israeli company spyware — and Show HN: Emoji to Scale

Friday, December 3, 2021

Issue #606 — Top 20 stories of December 04, 2021 Issue #606 — December 04, 2021 You receive this email because you are subscribed to Hacker News Digest. You can open it in the browser if you prefer. 1

Oura Ring offers a smaller wearable for tracking fitness data

Friday, December 3, 2021

TechCrunch Newsletter TechCrunch logo The Daily Crunch logo Friday, December 03, 2021 • By Alex Wilhelm Hello and welcome to Daily Crunch for December 3, 2021! I don't know about you, but after

Booting Up The Cash Register 💾

Friday, December 3, 2021

The pioneers who built the first computer stores. Here's a version for your browser. Hunting for the end of the long tail • December 03, 2021 Hey all, Ernie here with a refreshed piece on the birth

Charted | How Car Safety Has Improved Over 60 Years 🚗

Friday, December 3, 2021

Seatbelts first became mandatory in the US in 1968. Since then, new technologies have greatly reduced road fatalities. Email presented by: What's Driving the Latest ESG Investment Trends? >>

This Week in Rust 419

Friday, December 3, 2021

Email isn't displaying correctly? Read this e-mail on the Web This Week in Rust issue 419 — 01 DEC 2021 Hello and welcome to another issue of This Week in Rust! Rust is a programming language

Alpine.js Weekly #80

Friday, December 3, 2021

Made with Alpine.js If you've got something to share with the Alpine.js community, you can submit your link or reach out to me on twitter @hugo__df. Here are this week's Alpine.js adopters:

SWLW #471: Iterating on your Data Team, Mapping Alignment, and more.

Friday, December 3, 2021

Weekly articles & videos about people, culture and leadership: everything you need to design the org that makes the product. A weekly newsletter by Oren Ellenbogen with the best content I found

iOS Dev Weekly - Issue 536

Friday, December 3, 2021

Was I right to be sceptical about Xcode Cloud? 🤷‍♂️ View on the Web Archives ISSUE 536 December 3rd 2021 Comment I was sceptical when I talked about Xcode Cloud last week, and a few people responded,