Data Science Weekly - Data Science Weekly - Issue 431

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #431

February 24 2022

Editor Picks
 
  • A Gentle Introduction to Vector Databases
    In this blog post, I’ll introduce concepts related to the vector database, a new type of technology designed to store, manage, and search embedding vectors. Vector databases are being used in an increasingly large number of applications, including but not limited to image search, recommender system, text understanding, video summarization, drug discovery, stock market analysis, and much more...
 
 

A Message from this week's Sponsor:

 



Retool is the fast way to build an interface for any database

With Retool, you don't need to be a developer to quickly build an app or dashboard on top of any data set. Data teams at companies like NBC use Retool to build any interface on top of their data—whether it's a simple read-write visualization or a full-fledged ML workflow.

Drag and drop UI components—like tables and charts—to create apps. At every step, you can jump into the code to define the SQL queries and JavaScript that power how your app acts and connects to data. The result—less time on repetitive work and more time to discover insights.

 

 

Data Science Articles & Videos

 
  • One Voice Detector to Rule Them All
    In this article we will tell you about Voice Activity Detection in general, describe our approach to VAD metrics, and show how to use our VAD and test it on your own voice...
  • Tools and Recommendations for Reproducible Teaching
    It is recommended that teacher-scholars of data science adopt reproducible workflows in their research as scholars and teach reproducible workflows to their students. In this paper, we propose a third dimension to reproducibility practices and recommend that regardless of whether they teach reproducibility in their courses or not, data science instructors adopt reproducible workflows for their own teaching. We consider computational reproducibility, documentation, and openness as three pillars of reproducible teaching framework. We share tools, examples, and recommendations for the three pillars...
  • Beyond Precision: Expressiveness in Visualization
    In recent years, I have grown increasingly dissatisfied with the way we teach and talk about data visualization – at least from what I observe in academic settings. In particular, I am concerned with the predominant paradigm that visualization can and should be designed according to how precisely a given visual encoding can represent data. The story we tell ourselves (and the same story I tell with increasing discomfort to my students) goes a little like this...
  • An introduction to the deceit of statistical significance without p-values
    A recent Twitter quiz asked “what is a powerful concept from your field that, if more people understood it, their lives would be better?” Unambiguously, the answer from my field is statistical significance...Here, I’ll explain in as plain terms as I can what statistical significance means in almost every published scientific study. I’ll do this without ever defining a p-value, as p-values have nothing to do with the way significance testing is used. Instead, significance testing amounts to hand wavy arguments about precision and variability. Laying it out this way shows why the authority granted to significance testing is so suspect and unearned...
  • Transfer Learning on Greyscale Images: How to Fine-Tune Pretrained Models on Black-and-White Datasets
    In this article, we shall attempt to demystify all of the considerations needed when finetuning with black-and-white images by exploring the difference between RGB and greyscale images, and how these formats affect the processing operations done by convolutional neural network models, before demonstrating how to use greyscale images with pretrained models. We shall finish by examining the performance of the different approaches explored on some open source datasets and compare this to training from scratch on greyscale images...
  • Graph Theory and Linear Algebra
    Graphs are an incredibly versatile structure insofar as they can model everything from the modernity of computer science and complexity of geography, to the intricacy of linguistic relationships and the universality of chemical structures...This paper explores the relationships between graph theory, their associated matrix representations, and the matrix properties found in linear algebra...In order to achieve this goal, this paper presents some of the most interesting theorems regarding matrix representations of graphs, and ties these theorems back to questions in graph theory itself....
  • An Introduction to Neural Data Compression
    Neural compression is the application of neural networks and other machine learning methods to data compression. While machine learning deals with many concepts closely related to compression, entering the field of neural compression can be difficult due to its reliance on information theory, perceptual metrics, and other knowledge specific to the field. This introduction hopes to fill in the necessary background by reviewing basic coding topics such as entropy coding and rate-distortion theory, related machine learning ideas such as bits-back coding and perceptual metrics, and providing a guide through the representative works in the literature so far...
  • What are the Most Important Statistical Ideas of the Past 50 Years?
    We review the most important statistical ideas of the past half century, which we categorize as: counterfactual causal inference, bootstrapping and simulation-based inference, overparameterized models and regularization, Bayesian multilevel models, generic computation algorithms, adaptive decision analysis, robust inference, and exploratory data analysis. We discuss key contributions in these subfields, how they relate to modern computing and big data, and how they might be developed and extended in future decades. The goal of this article is to provoke thought and discussion regarding the larger themes of research in statistics and data science...
  • DeepMind - The Podcast, Episode: Me, myself and AI
    AI doesn’t just exist in the lab, it’s already solving a range of problems in the real world. In this episode, Hannah encounters a realistic recreation of her voice by WaveNet, the voice synthesising system that powers the Google Assistant and helps people with speech difficulties and illnesses regain their voices. Hannah also discovers how ‘deepfake’ technology can be used to improve weather forecasting and how DeepMind researchers are collaborating with Liverpool Football Club, aiming to take sports to the next level...
  • Dive into Deep Learning Compilers
    This project is for readers who are interested in high-performance implementation of their programs utilizing deep learning techniques...In the first part, we will introduce how to implement and optimize operators, such as matrix multiplication and convolution, for various hardware platforms...In the second part, we will show how to convert neural network models from various deep learning frameworks and further optimize them in the program level. The last part we will address how to deploy the optimized program into various environment such as mobile phones...In addition, at the end of the book, we plan to cover some latest advance of the deep learning compiler domain...
  • Things that upset you as a Data scientist [Reddit Discussion]
    I have been a Data scientist since seven years. There are several challenges we face everyday and Till this day, something that absolutely upsets me is not having a single good IDE for prototyping and production development. I constantly see myself switching between Jupyterlab and VScode and it's really annoying!...Anyways, I just want to hear what are the other biggest pain points you face as a Data scientist in your everyday work that absolutely upset you!...
 
 

Forum*

 



Check out the new Anaconda Community for all-things data!

Want insights into the newest developments in the world of data, or need help getting “unstuck” on a problem?

Our Community Forums is the place to go! Be the first to engage with other professionals and ask questions to the broader data community. Users can join in conversations around trends, debate new features, post questions to the community, and more. Plus, it’s another avenue for technical help!

Create your free Anaconda Community account now.


*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

 
 

Jobs

 
  • (Senior) Analytics Engineer - Fabulous - Remote

    Fabulous is a mobile app helping thousands of people every day to change their lifestyles by integrating healthy habits into their lives. Fabulous is using a behavioral economics lens to help everyone achieve their fullest potential. We work closely with researchers based at Duke University and our advisor is Dan Ariely, author of NYT bestseller Predictably Irrational. We are looking for an experienced Analytics Engineer to consolidate the Data Science team and lead the development and enrichment of our Data Pipelines. We have a modern Data-Stack based on Fivetran, dbt, BigQuery, Amplitude, Metabase...

        Want to post a job here? Email us for details --> team@datascienceweekly.org

 
 

Training & Resources

 
  • Linear & Polynomial Regression: Exploring Some Red Flags For Models That Underfit
    The purpose of this project is to observe some of the red flags for a model that is severely underfitting to the data and how these red flags change when fitting a more appropriate model...The red flags that I’ll be considering are: a) MSE and R-squared – these are common performance metrics used in linear models, b) Residual plot – this plot will show us if some of the assumptions of linear regression have been violated, and c) Learning curves – this plot will show us how well the model fits to the data and usually gives a good indication of over/under fitting...
 
 

Books

 

 
  • Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits


    Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
     


    P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2021 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Older messages

Data Science Weekly - Issue 430

Thursday, February 17, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #430 February 17 2022 Editor Picks The

Data Science Weekly - Issue 429

Thursday, February 10, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #429 February 10 2022 Editor Picks

Data Science Weekly - Issue 428

Friday, February 4, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #428 February 03 2022 Editor Picks

Data Science Weekly - Issue 427

Friday, January 28, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #427 January 27 2022 Editor Picks

Data Science Weekly - Issue 426

Friday, January 21, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #426 January 20 2022 Editor Picks These

You Might Also Like

📈 Why Is My Ping So High While Gaming? — How to Keep Your Android From Overheating

Saturday, May 4, 2024

Also: Using ChatGPT to Craft a Resume, and More! How-To Geek Logo May 4, 2024 📩 Get expert reviews, the hottest deals, how-to's, breaking news, and more delivered directly to your inbox by

JSK Daily for May 4, 2024

Saturday, May 4, 2024

JSK Daily for May 4, 2024 View this email in your browser A community curated daily e-mail of JavaScript news The Power of React's Virtual DOM: A Comprehensive Explanation Modern JavaScript

Daily Coding Problem: Problem #1431 [Medium]

Saturday, May 4, 2024

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by MongoDB. Given a list of elements, find the majority element, which appears more than

Ranked | The World's Top Media Franchises by All-Time Revenue 📊

Saturday, May 4, 2024

From Pokémon to Star Wars, some media franchises are globally recognizable. How do media franchises compare in terms of all-time revenue? View Online | Subscribe Presented by Voronoi: The App Where

Noonification: Read Code Like a Hacker With the SAST

Saturday, May 4, 2024

Top Tech Content sent at Noon! Get Algolia: AI Search that understands How are you, @newsletterest1? 🪐 What's happening in tech today, May 4, 2024? The HackerNoon Newsletter brings the HackerNoon

Weekend Reading — May the fourth

Saturday, May 4, 2024

This week we setup our new Minecraft server, play Spacewar, avoid burnout, wonder about Facebook AI spam, lose our passkeys, and claim stairs on the way back home. 😎 Labnotes (by Assaf Arkin) Weekend

Google lays off workers

Saturday, May 4, 2024

Plus: Tesla cans its Supercharger team and UnitedHealthcare reveals security lapses View this email online in your browser By Kyle Wiggers Saturday, May 4, 2024 Image Credits: Tomohiro Ohsumi / Getty

When It Rains, It Pours ☔

Saturday, May 4, 2024

Why the umbrella's design can't be beat. Here's a version for your browser. Hunting for the end of the long tail • May 04, 2024 Hey there, Ernie here with a refreshed piece about umbrellas

🐍 New Python tutorials on Real Python

Saturday, May 4, 2024

Hey there, There's always something going on over at realpython.com as far as Python tutorials go. Here's what you may have missed this past week: Python's unittest: Writing Unit Tests for

Microsoft Outlook Flaw Exploited by Russia's APT28 to Hack Czech, German Entities

Saturday, May 4, 2024

THN Daily Updates Newsletter cover Webinar -- Data Security is Different at the Petabyte Scale Discover the secrets to securing fast-moving, massive data sets with insights from industry titans