[in case you missed it] Data Science Weekly - Issue 413

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #413

October 21 2021

Editor Picks
 
  • The philosophical and musical failings of “Beethoven X: The AI Project”
    When VAN asked me to do a review of an artificial-intelligence-created realization of Beethoven’s Tenth Symphony called “Beethoven X: The AI Project,” which is based on the skimpy sketches he left when he died, I more or less groaned in my reply. “Not for me,” I said. “I know pretty much what I’ll think about it, and my review could get snarky.” “If so, that would be all right with us,” VAN said. “Well, OK,” I groaned back. So here I am and here goes...At the end of the symphony I found myself more philosophical than annoyed. I’ll start with that...
  • MIT's "The Missing Semester of Your CS Education" Class
    Classes teach you all about advanced topics within CS, from operating systems to machine learning, but there’s one critical subject that’s rarely covered, and is instead left to students to figure out on their own: proficiency with their tools. We’ll teach you how to master the command-line, use a powerful text editor, use fancy features of version control systems, and much more!...
  • Predicting Spreadsheet Formulas from Semi-structured Contexts
    We describe a new model that learns to automatically generate formulas based on the rich context around a target cell. When a user starts writing a formula with the “=” sign in a target cell, the system generates possible relevant formulas for that cell by learning patterns of formulas in historical spreadsheets....
 
 

A Message from this week's Sponsor:

 



Kickstart Your New Career with a Data Science & Analytics Bootcamp

Don’t miss your chance to join a Data Scientist-led, online Metis bootcamp plus get career support until you’re hired. Bootcamps are starting soon! Ready to take your data science or analytics career to the next level? Learn more about the Metis Online Data Science & Analytics Bootcamps.

 

 

Data Science Articles & Videos

 
  • Explaining in Style: Training a GAN to explain a classifier in StyleSpace
    We propose a training procedure for a StyleGAN, which incorporates the classifier model, in order to learn a classifier-specific StyleSpace. Explanatory attributes are then selected from this space. These can be used to visualize the effect of changing multiple attributes per image, thus providing image-specific explanations. We apply StylEx to multiple domains, including animals, leaves, faces and retinal images. For these, we show how an image can be modified in different ways to change its classifier output. Our results show that the method finds attributes that align well with semantic ones, generate meaningful image-specific explanations, and are human-interpretable as measured in user-studies....
  • Challenges in Detoxifying Language Models
    Large language models (LM) generate remarkably fluent text and can be efficiently adapted across NLP tasks. Measuring and guaranteeing the safety of generated text is imperative for deploying LMs in the real world; to this end, prior work often relies on automatic evaluation of LM toxicity...We demonstrate that while basic intervention strategies can effectively optimize previously established automatic metrics on the RealToxicityPrompts dataset, this comes at the cost of reduced LM coverage for both texts about, and dialects of, marginalized groups. Additionally, we find that human raters often disagree with high automatic toxicity scores for texts generated by models with strong toxicity reduction interventions...
  • Composability in Julia: Implementing Deep Equilibrium Models via Neural ODEs
    In this blog post we will show how to easily, efficiently, and robustly use steady state nonlinear solvers with neural networks in Julia. We will showcase the relationship between steady states and ODEs, thus making a connection between the methods for Deep Equilibrium Models (DEQs) and Neural ODEs...
  • ETL Pipelines with Airflow: the Good, the Bad and the Ugly
    In this article, we review how to use Airflow ETL operators to transfer data from Postgres to BigQuery with the ETL and ELT paradigms. Then, we share some challenges you may encounter when attempting to load data incrementally with Airflow DAGs. Finally, we argue why Airflow ETL operators won’t be able to cover the long tail of integrations for your business data...
  • Considerations Before Pushing Machine Learning Models to Production
    I daily see, as a data scientist, the challenges that come with putting AI-based solutions in production. These challenges are numerous and cover a variety of aspects: modeling and system design, data engineering, resource management, SLA, etc...I don’t pretend mastery in any of those fields. I do however know that implementing some software engineering principles and using the right tools helped me a lot in making my work reproducible and ready for production...In this article, I’ll share with you 7 of the considerations I have in mind before productionizing my models....
  • Generative art resources in R
    An extremely incomplete (and probably biased) list of resources to help an aspiring generative artist get started making pretty pictures in R...
  • Who is a Data Scientist in 2021?
    Every year we publishe a study on 1,001 data scientist profiles. The information is collected from public LinkedIn profiles, assuming that the information posted on the social media platform is an unbiased estimator of their resume...This research allows us to gain insights, with a reasonable degree of certainty, about who is a data scientist in 2021. We present only aggregate data to highlight important trends that can be useful to anyone who wants to break into the field, as well as to organizations looking to hire data scientists....
 
 

Tools*

 



Create AI-powered search and recommendation apps with Pinecone

Pinecone is a fully managed vector database that makes it easy to add vector search to production applications. It combines state-of-the-art vector search libraries, advanced features such as filtering, and distributed infrastructure to provide high performance and reliability at any scale. Get started now — it's free!

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

 
 

Jobs

 
  • Entry Level Data Scientist: 2022 - IBM - Multiple Locations

    As a Data Scientist at IBM, you will help transform our clients’ data into tangible business value by analyzing information, communicating outcomes and collaborating on product development. Work with Best in Class open source and visual tools, along with the most flexible and scalable deployment options. Whether it’s investigating patient trends or weather patterns, you will work to solve real world problems for the industries transforming how we live.

        Want to post a job here? Email us for details >> team@datascienceweekly.org

 
 

Training & Resources

 
  • Carnegie Mellon University 10721: Philosophical Foundations of Machine Intelligence
    What is this field? What are its normative aims? What are its modes of inquiry? What are (and have been) its intellectual and ideological commitments? What foundational questions is it in dialogue with, and what foundational obstacles obstruct its progress? Finally: What are our responsibilities as researchers & practitioners deploying this technology?...
  • SHAP: Explain Any Machine Learning Model in Python
    Imagine you are trying to train a machine learning model to predict whether an ad is clicked by a particular person. After receiving some information about a person, the model predicts that a person will not click on an ad...But why does the model predict that? How much does each feature contribute to the prediction? Wouldn’t it be nice if you can see a plot indicating how much each feature contributes to the prediction?...That is when Shapley value comes in handy...
  • Random Forests Algorithm explained with a real-life example and some Python code
    Random Forests is a Machine Learning algorithm that tackles one of the biggest problems with Decision Trees: variance...Even though Decision Trees is simple and flexible, it is greedy algorithm. It focuses on optimizing for the node split at hand, rather than taking into account how that split impacts the entire tree. A greedy approach makes Decision Trees run faster, but makes it prone overfitting...An overfit tree is highly optimized to predicting the values in the training dataset, resulting in a learning model with high-variance. How you calculate variance in a Decision Tree depends on the problem you’re solving...
 
 

Books

 

 
  • Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits


    Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
     


    P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2021 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Older messages

Data Science Weekly - Issue 412

Friday, October 15, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #412 October 14 2021 Editor Picks

[in case you missed it] Data Science Weekly - Issue 410

Sunday, October 3, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #410 September 30 2021 Editor Picks Top

Data Science Weekly - Issue 410

Friday, October 1, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #410 September 30 2021 Editor Picks Top

Data Science Weekly - Issue 409

Friday, September 24, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #409 September 23 2021 Editor Picks Tree

Data Science Weekly - Issue 408

Friday, September 17, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #408 September 16 2021 Editor Picks The

You Might Also Like

Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator

Friday, February 14, 2025

What came before the golem? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Defining Your Paranoia Level: Navigating Change Without the Overkill

Friday, February 14, 2025

We've all been there: trying to learn something new, only to find our old habits holding us back. We discussed today how our gut feelings about solving problems can sometimes be our own worst enemy

5 ways AI can help with taxes 🪄

Friday, February 14, 2025

Remotely control an iPhone; 💸 50+ early Presidents' Day deals -- ZDNET ZDNET Tech Today - US February 10, 2025 5 ways AI can help you with your taxes (and what not to use it for) 5 ways AI can help

Recurring Automations + Secret Updates

Friday, February 14, 2025

Smarter automations, better templates, and hidden updates to explore 👀 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

The First Provable AI-Proof Game: Introducing Butterfly Wings 4

Friday, February 14, 2025

Top Tech Content sent at Noon! Boost Your Article on HackerNoon for $159.99! Read this email in your browser How are you, @newsletterest1? undefined The Market Today #01 Instagram (Meta) 714.52 -0.32%

GCP Newsletter #437

Friday, February 14, 2025

Welcome to issue #437 February 10th, 2025 News BigQuery Cloud Marketplace Official Blog Partners BigQuery datasets now available on Google Cloud Marketplace - Google Cloud Marketplace now offers

Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰

Friday, February 14, 2025

Discover how the share of US wealth held by the top 1% has evolved from 1989 to 2024 in this infographic. View Online | Subscribe | Download Our App Download our app to see thousands of new charts from

The Great Social Media Diaspora & Tapestry is here

Friday, February 14, 2025

Apple introduces new app called 'Apple Invites', The Iconfactory launches Tapestry, beyond the traditional portfolio, and more in this week's issue of Creativerly. Creativerly The Great

Daily Coding Problem: Problem #1689 [Medium]

Friday, February 14, 2025

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. Given a linked list, sort it in O(n log n) time and constant space. For example,

📧 Stop Conflating CQRS and MediatR

Friday, February 14, 2025

​ Stop Conflating CQRS and MediatR Read on: m​y website / Read time: 4 minutes The .NET Weekly is brought to you by: Step right up to the Generative AI Use Cases Repository! See how MongoDB powers your