Data Science Weekly - Data Science Weekly - Issue 422

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #422

December 23 2021

Editor Picks
 
  • Spreadsheet Games: All Playable in Excel or Google Sheets
    Everyone knows game designers love working with spreadsheets, but there aren't enough games that run *in* spreadsheets...But my students are helping set things right. Check out some of their amazing games, all playable in Excel or Google Sheets...
  • Ways I Use Testing as a Data Scientist
    As a data scientist, I wear many different hats, which also made learning about testing difficult. There’s plenty of material on testing from a software development perspective, but if I’m doing an analysis and not developing software, I found many of those concepts difficult to translate and apply in my work...In that spirit, I thought I would write a blog post on the many ways I use testing in my work, in hopes that other data scientists will find it helpful when they’re trying to figure out what to test and how to test in the code they write...
  • To Understand Language is to Understand Generalization
    Like the parable of the blind men and the elephant, computer scientists have come up with different abstract frameworks to describe what it would take to make our machines smarter...I’d like to throw in another take on the elephant: the aforementioned properties of generalization we seek can be understood as nothing more than the structure of human language. Before you think “ew, linguistics” and close this webpage, I promise that I’m not advocating for hard-coding formal grammars as inductive biases into our neural networks (see paragraph 1). To the contrary, I argue that considering generalization as being equivalent to language opens up exciting opportunities to scale up non-NLP models the way we have done for language...
 
 

A Message from this week's Sponsor:

 



High quality data labeling, consistently

Edge cases are the most common challenges that ML teams face when training their AI models, making it difficult to reach 95+% accuracy. This can be more complex once you need to scale and start working with 3rd party data labeling solutions. The evaluation metrics that we use to measure the quality of labeled data - Intersection over Union (IOU) and F1 score - has allowed us to make swift adjustments on the go and continuously improve the quality of our labeling standards. To find out more and start exploring our end-to-end data labeling service, speak to the team at Supahands today.

 

 

Data Science Articles & Videos

 
  • Weisfeiler and Leman go Machine Learning: The Story so far
    In recent years, algorithms and neural architectures based on the Weisfeiler-Leman algorithm, a well-known heuristic for the graph isomorphism problem, emerged as a powerful tool for machine learning with graphs and relational data. Here, we give a comprehensive overview of the algorithm's use in a machine learning setting, focusing on the supervised regime...Moreover, we give an overview of current applications and future directions to stimulate further research...
  • The Second Egress: Building a Code Change
    This website is a tool to make sense of the wicked problem of the second egress in Canada and prepare a building code change...The first section documents the history of the building code and two means of egress in Canada, situates the problem of the second egress within the imperative of missing middle densification and calls upon architects to challenge the legislative conditions of their work. The next section compares jurisdictions to better understand the Canadian code relative to its peers, followed by the proposed code change. The third section reimagines what could and should be built if it were legal, and illustrating these architectural opportunities with a series of case studies in alternative circulation...
  • MCMC for big datasets -- faster sampling with JAX and the GPU
    You’ll often hear people say that MCMC is too slow for big datasets. For the very biggest datasets with millions of observations, there may be some truth to that. But the developers of PyMC and Stan are constantly refining their samplers, and it’s now possible to fit models to much bigger datasets than you might think...But how much faster is MCMC with JAX, and with a GPU? This blog post explores this question on a single example. It’s limited, of course – maybe other models will see more or less of a gain – and, although I did my best to write code efficiently, things could probably be optimised further. Still, I hope you’ll agree that there are some interesting results...
  • The Mathematics of Linear Distortion
    The mathematics of linear distortion only applies to linear and time invariant systems. Therefore, these systems and their translation to the frequency domain, where the mathematical analysis is simplified, are briefly summarized. Then it is discussed how the presented theory can be applied to real transmission media and/or electronic components. Finally, the mathematics of all possible cases of linear distortion are summarized in a table, and each case is explained individually...
  • Introducing Skippa - Scikit-learn Pre-processing Pipelines in Pandas
    Skippa is a package designed to: a) ✨ drastically simplify development, b) πŸ“¦ package / serialize all data cleaning, pre-processing together with your model algorithm into a single pipeline file, c) 😌 reuse the interface/components from pandas & scikit-learn that you’re already familiar with, and more...Skippa helps you to easily define data cleaning & pre-processing operations on a pandas DataFrame and combine it with a scikit-learn model/algorithm into a single executable pipeline. It works roughly like this...
  • Programming as a Vehicle for Math
    In March 2020, I gave a talk at Math for America, an organization that fosters professional development for K-12 math teachers in the New York City area. It was part of my __A Programmer's Introduction to Mathematics__ “book tour,”...The MfA organizers never posted my talk online, and at this point I’ve lost hope that they will (thanks, Covid). So I’ll recap the content of the talk, linking to my slides (click there for nice images and gifs) and the transcript I prepared in advance of that talk. This post will summarize the main ideas and provide some extra color...
  • Algorithmic Trading Models - Machine Learning
    I’ve written 4 articles on theoretical concepts behind algorithmic trading models. The previous articles have covered breakouts, moving averages, oscillators and cyclical methods. The 5th model type, machine learning methods, is considerably more involved due to the scope of the topic and so this article is definitely not designed to be a white paper on the only way ML can be used in algorithmic trading. My goal in this article is to provide one framework that incorporates some form of computer learning to predict future prices of the GBP/USD rate. You can consider this part 1 of Algorithmic Trading Models — Machine Learning, because there’s a huge scope that can be covered in this topic that I wouldn’t be able to in one article and I will be writing more with alternate ideas in the future....
  • On Bayesian Geometry: Geometric interpretation of probability distributions
    The idea behind Bayes Geometry is simple: what if we represent any function in the parameter space as a vector in a certain vector space. Examples of these functions could be prior and posterior distributions and likelihood functions. Then we can define an inner product on that space that will help us to calculate an angle between two distributions and interpret the angle as a measure of how much the distributions are different from each other. In my discussion on this subject I will follow a paper by de Carvalho et al...
  • How Should Organizations Structure their Data?
    Since the rise of computing in the 90’s there have been heated debates between the best data structuring techniques. However, two have reigned supreme — the ideas of Bill Inmon and Ralf Kimball. Both define ETL pipelines that bring data from a variety of sources into the same location for access by stakeholders within the organization...However, in the early 2000’s, Dan Linstedt invented another data pipeline structure called a data vault...In this post we will review a comparison from a 2021 paper that outlines each method and explains the pros and cons of each. Please note that each topic is complex, so we only cover the very basics — more resources are linked throughout the post and in the comments...
 
 

Tools*

 


Free Course: Natural Language Processing (NLP) for Semantic Search

Learn how to build semantic search applications by making machines understand language as people do. This free course covers everything you need to build state-of-the-art language models, from machine translation to question-answering, and more. Brought to you by Pinecone. Start reading now.

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

 
 

Jobs

 
  • Data Scientist, Decisions - Lyft - New York, NY

    Data Science is at the heart of Lyft’s products and decision-making. As a member of the Science team, you will work in a dynamic environment, where we embrace moving quickly to build the world’s best transportation. Data Scientists take on a variety of problems ranging from shaping critical business decisions to building algorithms that power our internal and external products. We’re looking for passionate, driven Data Scientists to take on some of the most interesting and impactful problems in ridesharing...

        Want to post a job here? Email us for details >> team@datascienceweekly.org

 
 

Training & Resources

 
  • Relationship between SVD and PCA. How to use SVD to perform PCA?
    Principal component analysis (PCA) is usually explained via an eigen-decomposition of the covariance matrix. However, it can also be performed via singular value decomposition (SVD) of the data matrix 𝐗. How does it work? What is the connection between these two approaches? What is the relationship between SVD and PCA?...Or in other words, how to use SVD of the data matrix to perform dimensionality reduction?...
  • Implementing Naive Bayes From Scratch
    In the following sections, we will implement the Naive Bayes Classifier from scratch in a step-by-step fashion using just Python and NumPy...But, before we get started coding, let’s talk briefly about the theoretical background and assumptions underlying the Naive Bayes Classifier...
 
 

Books

 

 
  • Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits


    Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
     


    P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2021 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Older messages

[in case you missed it] Data Science Weekly - Issue 421

Sunday, December 19, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #421 December 16 2021 Editor Picks Lee

Data Science Weekly - Issue 421

Friday, December 17, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #421 December 16 2021 Editor Picks Lee

Data Science Weekly - Issue 420

Friday, December 10, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #420 December 09 2021 Editor Picks D3

Data Science Weekly - Issue 419

Friday, December 3, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #419 December 02 2021 Editor Picks Flux

Data Science Weekly - Issue 418

Thursday, November 25, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #418 November 25 2021 Editor Picks The

You Might Also Like

Re: Hackers may have stolen everyone's SSN!

Saturday, November 23, 2024

I wanted to make sure you saw Incogni's Black Friday deal, which is exclusively available for iPhone Life readers. Use coupon code IPHONELIFE to save 58%. Here's why we recommend Incogni for

North Korean Hackers Steal $10M with AI-Driven Scams and Malware on LinkedIn

Saturday, November 23, 2024

THN Daily Updates Newsletter cover Generative AI For Dummies ($18.00 Value) FREE for a Limited Time Generate a personal assistant with generative AI Download Now Sponsored LATEST NEWS Nov 23, 2024

📧 Building Async APIs in ASP.NET Core - The Right Way

Saturday, November 23, 2024

​ Building Async APIs in ASP .NET Core - The Right Way Read on: m​y website / Read time: 5 minutes The .NET Weekly is brought to you by: Even the smartest AI in the world won't save you from a

WebAIM November 2024 Newsletter

Friday, November 22, 2024

WebAIM November 2024 Newsletter Read this newsletter online at https://webaim.org/newsletter/2024/november Features Using Severity Ratings to Prioritize Web Accessibility Remediation When it comes to

➡️ Why Your Phone Doesn't Want You to Sideload Apps — Setting the Default Gateway in Linux

Friday, November 22, 2024

Also: Hey Apple, It's Time to Upgrade the Macs Storage, and More! How-To Geek Logo November 22, 2024 Did You Know Fantasy author JRR Tolkien is credited with inventing the main concept of orcs and

JSK Daily for Nov 22, 2024

Friday, November 22, 2024

JSK Daily for Nov 22, 2024 View this email in your browser A community curated daily e-mail of JavaScript news React E-Commerce App for Digital Products: Part 4 (Creating the Home Page) This component

Spyglass Dispatch: The Fate of Chrome • Amazon Tops Up Anthropic • Pros Quit Xitter • Brave Powers AI Search • Apple's Lazy AI River • RIP Enrique Allen

Friday, November 22, 2024

The Fate of Chrome • Amazon Tops Up Anthropic • Pros Quit Xitter • Brave Powers AI Search • Apple's Lazy AI River • RIP Enrique Allen The Spyglass Dispatch is a free newsletter sent out daily on

Charted | How the Global Distribution of Wealth Has Changed (2000-2023) 💰

Friday, November 22, 2024

This graphic illustrates the shifts in global wealth distribution between 2000 and 2023. View Online | Subscribe | Download Our App Presented by: MSCI >> Get the Free Investor Guide Now FEATURED

Daily Coding Problem: Problem #1616 [Easy]

Friday, November 22, 2024

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Alibaba. Given an even number (greater than 2), return two prime numbers whose sum will

The problem to solve

Friday, November 22, 2024

​ Use problem framing to define the problem to solve This week, Tom Parson and Krishna Raha share tools and frameworks to identify and address challenges effectively, while Voltage Control highlights