Data Science Weekly - Data Science Weekly - Issue 455

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #455

August 11 2022

Editor's Picks

 

 
  • Data Engineers Spend Two Days Per Week Firefighting Bad Data, Data Quality Survey Says
    Everyone who talks about data quality (including us!) cites the Gartner survey that poor data quality costs organizations an average $12.9 million every year...So we decided to partner with Wakefield Research to survey more than 300 data professionals...The resulting 2022 data quality survey found data professionals are spending a whopping 40% of their time evaluating or checking data quality and that poor data quality impacts 26% of their companies’ revenue, among other key findings...
  • What Did My AI Learn? How Data Scientists Make Sense of Model Behavior
    Data scientists require rich mental models of how AI systems behave to effectively train, debug, and work with them. Despite the prevalence of AI analysis tools, there is no general theory describing how people make sense of what their models have learned. We frame this process as a form of sensemaking and derive a framework describing how data scientists develop mental models of AI behavior. To evaluate the framework, we show how existing AI analysis tools fit into this sensemaking process...
 
 

A Message from this week's Sponsor:

 



Vector Search in Retail (Free Online Workshop)

Retailers are increasingly deploying vector search to improve their revenue, operational efficiency, customer satisfaction, and customer loyalty. Long the “secret sauce” of the most advanced retailers, vector search is now something any ML and engineering team can leverage. Jacob Zweig from Strong Analytics and Mark Moyou from NVIDIA will explain the applications of vector search in retail, the challenges you might face, and how to overcome them to deliver impactful applications like production recognition, semantic and multi-modal search, session-based recommenders, and data curation. You will also learn about tools for deploying vector search pipelines including NVIDIA Merlin and Triton + TensorRT and the Pinecone vector database. Register Now -->

 

 

Data Science Articles & Videos

 
  • Differential Neurotechnology Development
    A neurotechnology is any tool that directly, exogenously observes or manipulates the state of biological nervous systems, especially the human brain. Brain-computer interfaces and antidepressant drugs are familiar examples...
  • Building production-ready machine learning pipelines
    Hamza Tahir and Adam Probst are co-creators of ZenML, an extensible open source framework for building reproducible pipelines. We discuss the current state of ZenML, the many use cases that ZenML has been designed for, and its near-term roadmap. We also dive into MLOps (trends, challenges, and opportunities) as well as the ecosystem of tools and processes for productionizing machine learning pipelines...
  • EvoTorch
    EvoTorch is an advanced evolutionary computation library built directly on top of PyTorch, created at NNAISENSE...
  • The current and future state of AI/ML is shockingly demoralizing with little hope of redemption [Reddit Discussion]
    I recently encountered the PaLM (Scaling Language Modeling with Pathways) paper from Google Research and it opened up a can of worms of ideas I’ve felt I’ve intuitively had for a while, but have been unable to express – and I know I can’t be the only one. Sometimes I wonder what the original pioneers of AI – Turing, Neumann, McCarthy, etc. – would think if they could see the state of AI that we’ve gotten ourselves into. 67 authors, 83 pages, 540B parameters in a model, the internals of which no one can say they comprehend with a straight face, 6144 TPUs in a commercial lab that no one has access to, on a rig that no one can afford, trained on a volume of data that a human couldn’t process in a lifetime, 1 page on ethics with the same ideas that have been rehashed over and over elsewhere with no attempt at a solution – bias, racism, malicious use, etc. – for purposes that who asked for?...
  • Nobody talks about all of the waiting in Data Science [Reddit Discussion]
    All of the waiting, sometimes hours, that you do when you are running queries or training models with huge datasets...I am currently on hour two of waiting for a query that works with a table with billions of rows to finish running. I basically have nothing to do until it finishes. I guess this is just the nature of working with big data...
  • Foundation Models: A Primer for Investors and Builders
    Foundation models (FM) are a class of machine learning models that are trained on diverse data and can be adapted or fine-tuned for a wide range of downstream tasks. The term “foundation” is controversial among some researchers, but setting aside disagreements over terminology, these models already have had a significant impact. They are already used in large-scale applications in various areas including search, natural language processing, and software development...A non-technical guide and market map...
  • Speech Synthesis with Mixed Emotions
    Emotional speech synthesis aims to synthesize human voices with various emotional effects. The current studies are mostly focused on imitating an averaged style belonging to a specific emotion type. In this paper, we seek to generate speech with a mixture of emotions at run-time. We propose a novel formulation that measures the relative difference between the speech samples of different emotions...
  • Stable Diffusion
    Stable Diffusion is a latent text-to-image diffusion model. Thanks to a generous compute donation from Stability AI and support from LAION, we were able to train a Latent Diffusion Model on 512x512 images from a subset of the LAION-5B database. Similar to Google's Imagen, this model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM...
  • PySearch: Python Function Search by Description
    PySearch is a completely free search engine for querying python libraries using natural language descriptions of the properties of the functions you are looking for. The goal is to help you find the function you are looking for when you know what library its in, but not what its name is...
 
 

Data Collaboration Tool*

 


Explore, analyze, and explain data. As a team.

Collaborate to uncover new insights and make better decisions. Visualize data to communicate clearly. Share findings with transparency and context. Get support and inspiration from the community.

Uncover new insights, answer more questions, and make better decisions.

Sign Up For Free


*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

 

 

Jobs

 
  • Data Scientist - Success Academy Charter Schools, Inc - NYC

    This new Data Scientist role will be a key contributor to our mission of driving innovation across the organization. Reporting to the Leader of Enterprise Analytics, this role will be responsible for working with stakeholders in various functions to understand areas of opportunity, developing analytical solutions ranging from dashboards to sophisticated mathematical models, and helping functional teams adopt those solutions. This role will be part of a highly collaborative team of professionals with a wide range of skills including data science, data engineering, business analysis, and project management....
     

        Want to post a job here? Email us for details --> team@datascienceweekly.org

 

 

Training & Resources

 
  • The Singular Value Decomposition
    In the previous parts of this series, we learned that principal components are eigenvectors. Specifically, they are the eigenvectors of the covariance matrix 𝐒 of our data 𝐗...In this part, we’ll develop a slightly different perspective: that the principal components are singular vectors. Not of the covariance matrix 𝐒, but of the data matrix 𝐗 itself. Singular vectors, which we will define below, are closely related to eigenvectors, but unlike eigenvectors they are defined for all matrices, even non-square ones...
  • How to get started with NLP? [Reddit Discussion]
    I want the topics to learn to start and progress in natural language processing. I prefer to not watch videos because I can learn faster by reading...If there is a website where I can learn NLP please let me know! or just giving me the modular topic names is perfect too...
 
 

What you’re up to – notes from DSW readers

 
  • Alex Martinelli is working on a series of tutorials / blog-entries about synthetic-data-generation using Blender. Here is the first entry...
  • Andrew Engel is working on A PostgreSQL C extension to allow tsfresh functions to be run in database...

* To share your projects and updates, share the details here.

** Want to chat with one of the above people? Hit reply and let us know :)

 

Last Week's Newsletter's 3 Most Clicked Links

 

* Based on unique clicks.

** Find last week's newsletter here.

 

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2022 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Older messages

Data Science Weekly - Issue 454

Friday, August 5, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #454 August 04 2022 Editor's Picks

Data Science Weekly - Issue 453

Friday, July 29, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #453 July 28 2022 Editor's Picks

Data Science Weekly - Issue 452

Friday, July 22, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #452 July 21 2022 Editor's Picks Is

Data Science Weekly - Issue 451

Friday, July 15, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #451 July 14 2022 Editor's Picks The

Data Science Weekly - Issue 450

Friday, July 8, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #450 July 07 2022 Editor's Picks AI

You Might Also Like

The Future of Wallets: Introducing the Ridge Wallet for MagSafe

Thursday, May 2, 2024

Redefine convenience and security with MagSafe-compatible magnet arrays embedded in Ridge products. Upgrade what you carry and get Apple's® newest must-have accessory. Engineered for seamless

Edge 392: Meet RAFT: UC Berkeley's New Method to Improve RAG Patterns in LLMs

Thursday, May 2, 2024

The method brings the best of RAG and supervised fine tuning. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Airbnb Icons 🏠, Microsoft's OpenAI email leaks 🤖, software friction 👨‍💻

Thursday, May 2, 2024

Airbnb's Icons is a new collection of experiences hosted by big names in music, film, television, arts, sports, and more Sign Up |Advertise|View Online TLDR Together With Dollar Flight Club TLDR

📧 Did you want this discount?

Thursday, May 2, 2024

Your chance to save on MMA is about to end. ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

Scoop: Tiger Global-backed Innovaccer in talks to raise $250M

Wednesday, May 1, 2024

Plus: An update on Google's layoffs and the social platform X didn't see coming View this email online in your browser By Christine Hall Wednesday, May 1, 2024 Welcome to TechCrunch PM. Today,

🖥️ Why I'm Never Going Back to a Windows PC — Tips Before You Buy a Smart Ring

Wednesday, May 1, 2024

Also: How to Clear the Moisture Detected Warning on Samsung Phones, and More How-To Geek Logo May 1, 2024 Did You Know A single 1 oz shot of espresso only has approximately 40 mg of caffeine, whereas a

Daily Coding Problem: Problem #1428 [Hard]

Wednesday, May 1, 2024

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Microsoft. Given an array of positive integers, divide the array into two subsets such

Top Tech Deals 👀 Samsung Gaming Monitor, Pixel Watch 2, MacBook Air, and More

Wednesday, May 1, 2024

Get a discounted M3 MacBook Air or expand your Xbox storage. How-To Geek Logo May 1, 2024 Top Tech Deals: Samsung Gaming Monitor, Pixel Watch 2, MacBook Air, and More Get a discounted M3 MacBook Air or

Infographic | Visualizing Global Gold Production in 2023 🏅

Wednesday, May 1, 2024

Gold production in 2023 was led by China, Australia, and Russia, with each outputting over 300 tonnes. View Online | Subscribe Presented by: Access European benchmarks with a trusted 25-year history

⚙️ GPT-5 may be releasing sooner than expected

Wednesday, May 1, 2024

Plus: Amazon rebrands AI branch ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌