Data Science Weekly - Data Science Weekly - Issue 455

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #455

August 11 2022

Editor's Picks

 

 
  • Data Engineers Spend Two Days Per Week Firefighting Bad Data, Data Quality Survey Says
    Everyone who talks about data quality (including us!) cites the Gartner survey that poor data quality costs organizations an average $12.9 million every year...So we decided to partner with Wakefield Research to survey more than 300 data professionals...The resulting 2022 data quality survey found data professionals are spending a whopping 40% of their time evaluating or checking data quality and that poor data quality impacts 26% of their companies’ revenue, among other key findings...
  • What Did My AI Learn? How Data Scientists Make Sense of Model Behavior
    Data scientists require rich mental models of how AI systems behave to effectively train, debug, and work with them. Despite the prevalence of AI analysis tools, there is no general theory describing how people make sense of what their models have learned. We frame this process as a form of sensemaking and derive a framework describing how data scientists develop mental models of AI behavior. To evaluate the framework, we show how existing AI analysis tools fit into this sensemaking process...
 
 

A Message from this week's Sponsor:

 



Vector Search in Retail (Free Online Workshop)

Retailers are increasingly deploying vector search to improve their revenue, operational efficiency, customer satisfaction, and customer loyalty. Long the “secret sauce” of the most advanced retailers, vector search is now something any ML and engineering team can leverage. Jacob Zweig from Strong Analytics and Mark Moyou from NVIDIA will explain the applications of vector search in retail, the challenges you might face, and how to overcome them to deliver impactful applications like production recognition, semantic and multi-modal search, session-based recommenders, and data curation. You will also learn about tools for deploying vector search pipelines including NVIDIA Merlin and Triton + TensorRT and the Pinecone vector database. Register Now -->

 

 

Data Science Articles & Videos

 
  • Differential Neurotechnology Development
    A neurotechnology is any tool that directly, exogenously observes or manipulates the state of biological nervous systems, especially the human brain. Brain-computer interfaces and antidepressant drugs are familiar examples...
  • Building production-ready machine learning pipelines
    Hamza Tahir and Adam Probst are co-creators of ZenML, an extensible open source framework for building reproducible pipelines. We discuss the current state of ZenML, the many use cases that ZenML has been designed for, and its near-term roadmap. We also dive into MLOps (trends, challenges, and opportunities) as well as the ecosystem of tools and processes for productionizing machine learning pipelines...
  • EvoTorch
    EvoTorch is an advanced evolutionary computation library built directly on top of PyTorch, created at NNAISENSE...
  • The current and future state of AI/ML is shockingly demoralizing with little hope of redemption [Reddit Discussion]
    I recently encountered the PaLM (Scaling Language Modeling with Pathways) paper from Google Research and it opened up a can of worms of ideas I’ve felt I’ve intuitively had for a while, but have been unable to express – and I know I can’t be the only one. Sometimes I wonder what the original pioneers of AI – Turing, Neumann, McCarthy, etc. – would think if they could see the state of AI that we’ve gotten ourselves into. 67 authors, 83 pages, 540B parameters in a model, the internals of which no one can say they comprehend with a straight face, 6144 TPUs in a commercial lab that no one has access to, on a rig that no one can afford, trained on a volume of data that a human couldn’t process in a lifetime, 1 page on ethics with the same ideas that have been rehashed over and over elsewhere with no attempt at a solution – bias, racism, malicious use, etc. – for purposes that who asked for?...
  • Nobody talks about all of the waiting in Data Science [Reddit Discussion]
    All of the waiting, sometimes hours, that you do when you are running queries or training models with huge datasets...I am currently on hour two of waiting for a query that works with a table with billions of rows to finish running. I basically have nothing to do until it finishes. I guess this is just the nature of working with big data...
  • Foundation Models: A Primer for Investors and Builders
    Foundation models (FM) are a class of machine learning models that are trained on diverse data and can be adapted or fine-tuned for a wide range of downstream tasks. The term “foundation” is controversial among some researchers, but setting aside disagreements over terminology, these models already have had a significant impact. They are already used in large-scale applications in various areas including search, natural language processing, and software development...A non-technical guide and market map...
  • Speech Synthesis with Mixed Emotions
    Emotional speech synthesis aims to synthesize human voices with various emotional effects. The current studies are mostly focused on imitating an averaged style belonging to a specific emotion type. In this paper, we seek to generate speech with a mixture of emotions at run-time. We propose a novel formulation that measures the relative difference between the speech samples of different emotions...
  • Stable Diffusion
    Stable Diffusion is a latent text-to-image diffusion model. Thanks to a generous compute donation from Stability AI and support from LAION, we were able to train a Latent Diffusion Model on 512x512 images from a subset of the LAION-5B database. Similar to Google's Imagen, this model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM...
  • PySearch: Python Function Search by Description
    PySearch is a completely free search engine for querying python libraries using natural language descriptions of the properties of the functions you are looking for. The goal is to help you find the function you are looking for when you know what library its in, but not what its name is...
 
 

Data Collaboration Tool*

 


Explore, analyze, and explain data. As a team.

Collaborate to uncover new insights and make better decisions. Visualize data to communicate clearly. Share findings with transparency and context. Get support and inspiration from the community.

Uncover new insights, answer more questions, and make better decisions.

Sign Up For Free


*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

 

 

Jobs

 
  • Data Scientist - Success Academy Charter Schools, Inc - NYC

    This new Data Scientist role will be a key contributor to our mission of driving innovation across the organization. Reporting to the Leader of Enterprise Analytics, this role will be responsible for working with stakeholders in various functions to understand areas of opportunity, developing analytical solutions ranging from dashboards to sophisticated mathematical models, and helping functional teams adopt those solutions. This role will be part of a highly collaborative team of professionals with a wide range of skills including data science, data engineering, business analysis, and project management....
     

        Want to post a job here? Email us for details --> team@datascienceweekly.org

 

 

Training & Resources

 
  • The Singular Value Decomposition
    In the previous parts of this series, we learned that principal components are eigenvectors. Specifically, they are the eigenvectors of the covariance matrix 𝐒 of our data 𝐗...In this part, we’ll develop a slightly different perspective: that the principal components are singular vectors. Not of the covariance matrix 𝐒, but of the data matrix 𝐗 itself. Singular vectors, which we will define below, are closely related to eigenvectors, but unlike eigenvectors they are defined for all matrices, even non-square ones...
  • How to get started with NLP? [Reddit Discussion]
    I want the topics to learn to start and progress in natural language processing. I prefer to not watch videos because I can learn faster by reading...If there is a website where I can learn NLP please let me know! or just giving me the modular topic names is perfect too...
 
 

What you’re up to – notes from DSW readers

 
  • Alex Martinelli is working on a series of tutorials / blog-entries about synthetic-data-generation using Blender. Here is the first entry...
  • Andrew Engel is working on A PostgreSQL C extension to allow tsfresh functions to be run in database...

* To share your projects and updates, share the details here.

** Want to chat with one of the above people? Hit reply and let us know :)

 

Last Week's Newsletter's 3 Most Clicked Links

 

* Based on unique clicks.

** Find last week's newsletter here.

 

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2022 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Older messages

Data Science Weekly - Issue 454

Friday, August 5, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #454 August 04 2022 Editor's Picks

Data Science Weekly - Issue 453

Friday, July 29, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #453 July 28 2022 Editor's Picks

Data Science Weekly - Issue 452

Friday, July 22, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #452 July 21 2022 Editor's Picks Is

Data Science Weekly - Issue 451

Friday, July 15, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #451 July 14 2022 Editor's Picks The

Data Science Weekly - Issue 450

Friday, July 8, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #450 July 07 2022 Editor's Picks AI

You Might Also Like

Mapped | Unemployment Rate By U.S. State in 2024 💼

Tuesday, November 26, 2024

As of October 2024, DC and Nevada tied for the highest unemployment rate in the US at 5.7%. Which states saw the lowest rates? View Online | Subscribe | Download Our App FINAL CHANCE - ENDS TONIGHT!

🔊 7 DIY Tips for Soundproofing a Room — Why I Switched to Xfce for Linux Mint

Tuesday, November 26, 2024

Also: Home Theater Sound Terms Explained, and More! How-To Geek Logo November 26, 2024 Did You Know The shiny layer of a CD doesn't contain the data; the plastic polycarbonate layer does. The shiny

JSK Daily for Nov 26, 2024

Tuesday, November 26, 2024

JSK Daily for Nov 26, 2024 View this email in your browser A community curated daily e-mail of JavaScript news JavaScript Certification Black Friday Offer – Up to 54% Off! Certificates.dev, the trusted

NumPy, Loop Targets, Vector Animation, and More

Tuesday, November 26, 2024

NumPy Practical Examples: Useful Techniques #657 – NOVEMBER 26, 2024 VIEW IN BROWSER The PyCoder's Weekly Logo NumPy Practical Examples: Useful Techniques In this tutorial, you'll learn how to

Your Games Quarterly newsletter has arrived

Tuesday, November 26, 2024

What's new for games in Google Play and Android Email not displaying correctly? View it online November 2024 The First Developer Preview of Android 16 The First Developer Preview of Android 16

Daily Coding Problem: Problem #1620 [Hard]

Tuesday, November 26, 2024

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Dropbox. Sudoku is a puzzle where you're given a partially-filled 9 by 9 grid with

Final Hours: Help Save "The Art of Data" From Falling Short 🙏

Tuesday, November 26, 2024

Will Visual Capitalist end up revealing the secrets behind data storytelling? There are 12 hours left to change the fate of "The Art of Data". View Online | Subscribe | Download Our App Help

Spyglass Dispatch: Meta's Bluesky • Apple's China AI Problem • Apple's Chinese iPhone Problem • Perplexity Voice Pin • Susan's Message on Lung Cancer

Tuesday, November 26, 2024

Meta's Bluesky • Apple's China AI Problem • Apple's Chinese iPhone Problem • Perplexity Voice Pin • Susan's Message on Lung Cancer The Spyglass Dispatch is a free newsletter sent out

🎁 Say Goodbye to Adobe and Hello to UPDF's Universal PDF Editor— 50% Off This Black Friday!

Tuesday, November 26, 2024

Promoted by UPDF How-To Geek Logo November 26, 2024 This email is sponsored by UPDF. Product choices and opinions expressed are from the sponsor and do not necessarily reflect the views of the How-To

What's coming in Go 1.24

Tuesday, November 26, 2024

Plus Brad Fitzpatrick on complexity and Go. | #​533 — November 26, 2024 Unsub | Web Version Together with Ardan Labs Go Weekly GoMLX: ML in Go without Python — Eli recently wrote about Go's