Data Science Weekly - Data Science Weekly - Issue 431

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #431

February 24 2022

Editor Picks
 
  • A Gentle Introduction to Vector Databases
    In this blog post, I’ll introduce concepts related to the vector database, a new type of technology designed to store, manage, and search embedding vectors. Vector databases are being used in an increasingly large number of applications, including but not limited to image search, recommender system, text understanding, video summarization, drug discovery, stock market analysis, and much more...
 
 

A Message from this week's Sponsor:

 



Retool is the fast way to build an interface for any database

With Retool, you don't need to be a developer to quickly build an app or dashboard on top of any data set. Data teams at companies like NBC use Retool to build any interface on top of their data—whether it's a simple read-write visualization or a full-fledged ML workflow.

Drag and drop UI components—like tables and charts—to create apps. At every step, you can jump into the code to define the SQL queries and JavaScript that power how your app acts and connects to data. The result—less time on repetitive work and more time to discover insights.

 

 

Data Science Articles & Videos

 
  • One Voice Detector to Rule Them All
    In this article we will tell you about Voice Activity Detection in general, describe our approach to VAD metrics, and show how to use our VAD and test it on your own voice...
  • Tools and Recommendations for Reproducible Teaching
    It is recommended that teacher-scholars of data science adopt reproducible workflows in their research as scholars and teach reproducible workflows to their students. In this paper, we propose a third dimension to reproducibility practices and recommend that regardless of whether they teach reproducibility in their courses or not, data science instructors adopt reproducible workflows for their own teaching. We consider computational reproducibility, documentation, and openness as three pillars of reproducible teaching framework. We share tools, examples, and recommendations for the three pillars...
  • Beyond Precision: Expressiveness in Visualization
    In recent years, I have grown increasingly dissatisfied with the way we teach and talk about data visualization – at least from what I observe in academic settings. In particular, I am concerned with the predominant paradigm that visualization can and should be designed according to how precisely a given visual encoding can represent data. The story we tell ourselves (and the same story I tell with increasing discomfort to my students) goes a little like this...
  • An introduction to the deceit of statistical significance without p-values
    A recent Twitter quiz asked “what is a powerful concept from your field that, if more people understood it, their lives would be better?” Unambiguously, the answer from my field is statistical significance...Here, I’ll explain in as plain terms as I can what statistical significance means in almost every published scientific study. I’ll do this without ever defining a p-value, as p-values have nothing to do with the way significance testing is used. Instead, significance testing amounts to hand wavy arguments about precision and variability. Laying it out this way shows why the authority granted to significance testing is so suspect and unearned...
  • Transfer Learning on Greyscale Images: How to Fine-Tune Pretrained Models on Black-and-White Datasets
    In this article, we shall attempt to demystify all of the considerations needed when finetuning with black-and-white images by exploring the difference between RGB and greyscale images, and how these formats affect the processing operations done by convolutional neural network models, before demonstrating how to use greyscale images with pretrained models. We shall finish by examining the performance of the different approaches explored on some open source datasets and compare this to training from scratch on greyscale images...
  • Graph Theory and Linear Algebra
    Graphs are an incredibly versatile structure insofar as they can model everything from the modernity of computer science and complexity of geography, to the intricacy of linguistic relationships and the universality of chemical structures...This paper explores the relationships between graph theory, their associated matrix representations, and the matrix properties found in linear algebra...In order to achieve this goal, this paper presents some of the most interesting theorems regarding matrix representations of graphs, and ties these theorems back to questions in graph theory itself....
  • An Introduction to Neural Data Compression
    Neural compression is the application of neural networks and other machine learning methods to data compression. While machine learning deals with many concepts closely related to compression, entering the field of neural compression can be difficult due to its reliance on information theory, perceptual metrics, and other knowledge specific to the field. This introduction hopes to fill in the necessary background by reviewing basic coding topics such as entropy coding and rate-distortion theory, related machine learning ideas such as bits-back coding and perceptual metrics, and providing a guide through the representative works in the literature so far...
  • What are the Most Important Statistical Ideas of the Past 50 Years?
    We review the most important statistical ideas of the past half century, which we categorize as: counterfactual causal inference, bootstrapping and simulation-based inference, overparameterized models and regularization, Bayesian multilevel models, generic computation algorithms, adaptive decision analysis, robust inference, and exploratory data analysis. We discuss key contributions in these subfields, how they relate to modern computing and big data, and how they might be developed and extended in future decades. The goal of this article is to provoke thought and discussion regarding the larger themes of research in statistics and data science...
  • DeepMind - The Podcast, Episode: Me, myself and AI
    AI doesn’t just exist in the lab, it’s already solving a range of problems in the real world. In this episode, Hannah encounters a realistic recreation of her voice by WaveNet, the voice synthesising system that powers the Google Assistant and helps people with speech difficulties and illnesses regain their voices. Hannah also discovers how ‘deepfake’ technology can be used to improve weather forecasting and how DeepMind researchers are collaborating with Liverpool Football Club, aiming to take sports to the next level...
  • Dive into Deep Learning Compilers
    This project is for readers who are interested in high-performance implementation of their programs utilizing deep learning techniques...In the first part, we will introduce how to implement and optimize operators, such as matrix multiplication and convolution, for various hardware platforms...In the second part, we will show how to convert neural network models from various deep learning frameworks and further optimize them in the program level. The last part we will address how to deploy the optimized program into various environment such as mobile phones...In addition, at the end of the book, we plan to cover some latest advance of the deep learning compiler domain...
  • Things that upset you as a Data scientist [Reddit Discussion]
    I have been a Data scientist since seven years. There are several challenges we face everyday and Till this day, something that absolutely upsets me is not having a single good IDE for prototyping and production development. I constantly see myself switching between Jupyterlab and VScode and it's really annoying!...Anyways, I just want to hear what are the other biggest pain points you face as a Data scientist in your everyday work that absolutely upset you!...
 
 

Forum*

 



Check out the new Anaconda Community for all-things data!

Want insights into the newest developments in the world of data, or need help getting “unstuck” on a problem?

Our Community Forums is the place to go! Be the first to engage with other professionals and ask questions to the broader data community. Users can join in conversations around trends, debate new features, post questions to the community, and more. Plus, it’s another avenue for technical help!

Create your free Anaconda Community account now.


*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

 
 

Jobs

 
  • (Senior) Analytics Engineer - Fabulous - Remote

    Fabulous is a mobile app helping thousands of people every day to change their lifestyles by integrating healthy habits into their lives. Fabulous is using a behavioral economics lens to help everyone achieve their fullest potential. We work closely with researchers based at Duke University and our advisor is Dan Ariely, author of NYT bestseller Predictably Irrational. We are looking for an experienced Analytics Engineer to consolidate the Data Science team and lead the development and enrichment of our Data Pipelines. We have a modern Data-Stack based on Fivetran, dbt, BigQuery, Amplitude, Metabase...

        Want to post a job here? Email us for details --> team@datascienceweekly.org

 
 

Training & Resources

 
  • Linear & Polynomial Regression: Exploring Some Red Flags For Models That Underfit
    The purpose of this project is to observe some of the red flags for a model that is severely underfitting to the data and how these red flags change when fitting a more appropriate model...The red flags that I’ll be considering are: a) MSE and R-squared – these are common performance metrics used in linear models, b) Residual plot – this plot will show us if some of the assumptions of linear regression have been violated, and c) Learning curves – this plot will show us how well the model fits to the data and usually gives a good indication of over/under fitting...
 
 

Books

 

 
  • Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits


    Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
     


    P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2021 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Older messages

Data Science Weekly - Issue 430

Thursday, February 17, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #430 February 17 2022 Editor Picks The

Data Science Weekly - Issue 429

Thursday, February 10, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #429 February 10 2022 Editor Picks

Data Science Weekly - Issue 428

Friday, February 4, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #428 February 03 2022 Editor Picks

Data Science Weekly - Issue 427

Friday, January 28, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #427 January 27 2022 Editor Picks

Data Science Weekly - Issue 426

Friday, January 21, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #426 January 20 2022 Editor Picks These

You Might Also Like

Re: Hackers may have stolen everyone's SSN!

Saturday, November 23, 2024

I wanted to make sure you saw Incogni's Black Friday deal, which is exclusively available for iPhone Life readers. Use coupon code IPHONELIFE to save 58%. Here's why we recommend Incogni for

North Korean Hackers Steal $10M with AI-Driven Scams and Malware on LinkedIn

Saturday, November 23, 2024

THN Daily Updates Newsletter cover Generative AI For Dummies ($18.00 Value) FREE for a Limited Time Generate a personal assistant with generative AI Download Now Sponsored LATEST NEWS Nov 23, 2024

📧 Building Async APIs in ASP.NET Core - The Right Way

Saturday, November 23, 2024

​ Building Async APIs in ASP .NET Core - The Right Way Read on: m​y website / Read time: 5 minutes The .NET Weekly is brought to you by: Even the smartest AI in the world won't save you from a

WebAIM November 2024 Newsletter

Friday, November 22, 2024

WebAIM November 2024 Newsletter Read this newsletter online at https://webaim.org/newsletter/2024/november Features Using Severity Ratings to Prioritize Web Accessibility Remediation When it comes to

➡️ Why Your Phone Doesn't Want You to Sideload Apps — Setting the Default Gateway in Linux

Friday, November 22, 2024

Also: Hey Apple, It's Time to Upgrade the Macs Storage, and More! How-To Geek Logo November 22, 2024 Did You Know Fantasy author JRR Tolkien is credited with inventing the main concept of orcs and

JSK Daily for Nov 22, 2024

Friday, November 22, 2024

JSK Daily for Nov 22, 2024 View this email in your browser A community curated daily e-mail of JavaScript news React E-Commerce App for Digital Products: Part 4 (Creating the Home Page) This component

Spyglass Dispatch: The Fate of Chrome • Amazon Tops Up Anthropic • Pros Quit Xitter • Brave Powers AI Search • Apple's Lazy AI River • RIP Enrique Allen

Friday, November 22, 2024

The Fate of Chrome • Amazon Tops Up Anthropic • Pros Quit Xitter • Brave Powers AI Search • Apple's Lazy AI River • RIP Enrique Allen The Spyglass Dispatch is a free newsletter sent out daily on

Charted | How the Global Distribution of Wealth Has Changed (2000-2023) 💰

Friday, November 22, 2024

This graphic illustrates the shifts in global wealth distribution between 2000 and 2023. View Online | Subscribe | Download Our App Presented by: MSCI >> Get the Free Investor Guide Now FEATURED

Daily Coding Problem: Problem #1616 [Easy]

Friday, November 22, 2024

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Alibaba. Given an even number (greater than 2), return two prime numbers whose sum will

The problem to solve

Friday, November 22, 2024

​ Use problem framing to define the problem to solve This week, Tom Parson and Krishna Raha share tools and frameworks to identify and address challenges effectively, while Voltage Control highlights