Data Science Weekly - Data Science Weekly - Issue 422

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #422

December 23 2021

Editor Picks
 
  • Spreadsheet Games: All Playable in Excel or Google Sheets
    Everyone knows game designers love working with spreadsheets, but there aren't enough games that run *in* spreadsheets...But my students are helping set things right. Check out some of their amazing games, all playable in Excel or Google Sheets...
  • Ways I Use Testing as a Data Scientist
    As a data scientist, I wear many different hats, which also made learning about testing difficult. There’s plenty of material on testing from a software development perspective, but if I’m doing an analysis and not developing software, I found many of those concepts difficult to translate and apply in my work...In that spirit, I thought I would write a blog post on the many ways I use testing in my work, in hopes that other data scientists will find it helpful when they’re trying to figure out what to test and how to test in the code they write...
  • To Understand Language is to Understand Generalization
    Like the parable of the blind men and the elephant, computer scientists have come up with different abstract frameworks to describe what it would take to make our machines smarter...I’d like to throw in another take on the elephant: the aforementioned properties of generalization we seek can be understood as nothing more than the structure of human language. Before you think “ew, linguistics” and close this webpage, I promise that I’m not advocating for hard-coding formal grammars as inductive biases into our neural networks (see paragraph 1). To the contrary, I argue that considering generalization as being equivalent to language opens up exciting opportunities to scale up non-NLP models the way we have done for language...
 
 

A Message from this week's Sponsor:

 



High quality data labeling, consistently

Edge cases are the most common challenges that ML teams face when training their AI models, making it difficult to reach 95+% accuracy. This can be more complex once you need to scale and start working with 3rd party data labeling solutions. The evaluation metrics that we use to measure the quality of labeled data - Intersection over Union (IOU) and F1 score - has allowed us to make swift adjustments on the go and continuously improve the quality of our labeling standards. To find out more and start exploring our end-to-end data labeling service, speak to the team at Supahands today.

 

 

Data Science Articles & Videos

 
  • Weisfeiler and Leman go Machine Learning: The Story so far
    In recent years, algorithms and neural architectures based on the Weisfeiler-Leman algorithm, a well-known heuristic for the graph isomorphism problem, emerged as a powerful tool for machine learning with graphs and relational data. Here, we give a comprehensive overview of the algorithm's use in a machine learning setting, focusing on the supervised regime...Moreover, we give an overview of current applications and future directions to stimulate further research...
  • The Second Egress: Building a Code Change
    This website is a tool to make sense of the wicked problem of the second egress in Canada and prepare a building code change...The first section documents the history of the building code and two means of egress in Canada, situates the problem of the second egress within the imperative of missing middle densification and calls upon architects to challenge the legislative conditions of their work. The next section compares jurisdictions to better understand the Canadian code relative to its peers, followed by the proposed code change. The third section reimagines what could and should be built if it were legal, and illustrating these architectural opportunities with a series of case studies in alternative circulation...
  • MCMC for big datasets -- faster sampling with JAX and the GPU
    You’ll often hear people say that MCMC is too slow for big datasets. For the very biggest datasets with millions of observations, there may be some truth to that. But the developers of PyMC and Stan are constantly refining their samplers, and it’s now possible to fit models to much bigger datasets than you might think...But how much faster is MCMC with JAX, and with a GPU? This blog post explores this question on a single example. It’s limited, of course – maybe other models will see more or less of a gain – and, although I did my best to write code efficiently, things could probably be optimised further. Still, I hope you’ll agree that there are some interesting results...
  • The Mathematics of Linear Distortion
    The mathematics of linear distortion only applies to linear and time invariant systems. Therefore, these systems and their translation to the frequency domain, where the mathematical analysis is simplified, are briefly summarized. Then it is discussed how the presented theory can be applied to real transmission media and/or electronic components. Finally, the mathematics of all possible cases of linear distortion are summarized in a table, and each case is explained individually...
  • Introducing Skippa - Scikit-learn Pre-processing Pipelines in Pandas
    Skippa is a package designed to: a) ✨ drastically simplify development, b) πŸ“¦ package / serialize all data cleaning, pre-processing together with your model algorithm into a single pipeline file, c) 😌 reuse the interface/components from pandas & scikit-learn that you’re already familiar with, and more...Skippa helps you to easily define data cleaning & pre-processing operations on a pandas DataFrame and combine it with a scikit-learn model/algorithm into a single executable pipeline. It works roughly like this...
  • Programming as a Vehicle for Math
    In March 2020, I gave a talk at Math for America, an organization that fosters professional development for K-12 math teachers in the New York City area. It was part of my __A Programmer's Introduction to Mathematics__ “book tour,”...The MfA organizers never posted my talk online, and at this point I’ve lost hope that they will (thanks, Covid). So I’ll recap the content of the talk, linking to my slides (click there for nice images and gifs) and the transcript I prepared in advance of that talk. This post will summarize the main ideas and provide some extra color...
  • Algorithmic Trading Models - Machine Learning
    I’ve written 4 articles on theoretical concepts behind algorithmic trading models. The previous articles have covered breakouts, moving averages, oscillators and cyclical methods. The 5th model type, machine learning methods, is considerably more involved due to the scope of the topic and so this article is definitely not designed to be a white paper on the only way ML can be used in algorithmic trading. My goal in this article is to provide one framework that incorporates some form of computer learning to predict future prices of the GBP/USD rate. You can consider this part 1 of Algorithmic Trading Models — Machine Learning, because there’s a huge scope that can be covered in this topic that I wouldn’t be able to in one article and I will be writing more with alternate ideas in the future....
  • On Bayesian Geometry: Geometric interpretation of probability distributions
    The idea behind Bayes Geometry is simple: what if we represent any function in the parameter space as a vector in a certain vector space. Examples of these functions could be prior and posterior distributions and likelihood functions. Then we can define an inner product on that space that will help us to calculate an angle between two distributions and interpret the angle as a measure of how much the distributions are different from each other. In my discussion on this subject I will follow a paper by de Carvalho et al...
  • How Should Organizations Structure their Data?
    Since the rise of computing in the 90’s there have been heated debates between the best data structuring techniques. However, two have reigned supreme — the ideas of Bill Inmon and Ralf Kimball. Both define ETL pipelines that bring data from a variety of sources into the same location for access by stakeholders within the organization...However, in the early 2000’s, Dan Linstedt invented another data pipeline structure called a data vault...In this post we will review a comparison from a 2021 paper that outlines each method and explains the pros and cons of each. Please note that each topic is complex, so we only cover the very basics — more resources are linked throughout the post and in the comments...
 
 

Tools*

 


Free Course: Natural Language Processing (NLP) for Semantic Search

Learn how to build semantic search applications by making machines understand language as people do. This free course covers everything you need to build state-of-the-art language models, from machine translation to question-answering, and more. Brought to you by Pinecone. Start reading now.

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

 
 

Jobs

 
  • Data Scientist, Decisions - Lyft - New York, NY

    Data Science is at the heart of Lyft’s products and decision-making. As a member of the Science team, you will work in a dynamic environment, where we embrace moving quickly to build the world’s best transportation. Data Scientists take on a variety of problems ranging from shaping critical business decisions to building algorithms that power our internal and external products. We’re looking for passionate, driven Data Scientists to take on some of the most interesting and impactful problems in ridesharing...

        Want to post a job here? Email us for details >> team@datascienceweekly.org

 
 

Training & Resources

 
  • Relationship between SVD and PCA. How to use SVD to perform PCA?
    Principal component analysis (PCA) is usually explained via an eigen-decomposition of the covariance matrix. However, it can also be performed via singular value decomposition (SVD) of the data matrix 𝐗. How does it work? What is the connection between these two approaches? What is the relationship between SVD and PCA?...Or in other words, how to use SVD of the data matrix to perform dimensionality reduction?...
  • Implementing Naive Bayes From Scratch
    In the following sections, we will implement the Naive Bayes Classifier from scratch in a step-by-step fashion using just Python and NumPy...But, before we get started coding, let’s talk briefly about the theoretical background and assumptions underlying the Naive Bayes Classifier...
 
 

Books

 

 
  • Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits


    Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
     


    P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2021 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Older messages

[in case you missed it] Data Science Weekly - Issue 421

Sunday, December 19, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #421 December 16 2021 Editor Picks Lee

Data Science Weekly - Issue 421

Friday, December 17, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #421 December 16 2021 Editor Picks Lee

Data Science Weekly - Issue 420

Friday, December 10, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #420 December 09 2021 Editor Picks D3

Data Science Weekly - Issue 419

Friday, December 3, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #419 December 02 2021 Editor Picks Flux

Data Science Weekly - Issue 418

Thursday, November 25, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #418 November 25 2021 Editor Picks The

You Might Also Like

Happening TUESDAY! Follow Our Coverage of Apple’s Spring Announcement

Monday, May 6, 2024

iPhone Life magazine Follow Our Coverage of Apple's Latest Announcement. twitter facebook YouTube Podcast Tune in for Apple's 'Let Loose' Event Tomorrow! Surprise! Just a month before

Who wants a new iPad?

Monday, May 6, 2024

Plus: OpenAI and Stack Overflow partner and LockBit's website returns View this email online in your browser By Christine Hall Monday, May 6, 2024 Good afternoon, and welcome back to TechCrunch PM.

🔋 Why You Need More Than One Power Bank — Things We Want to See in Windows 12

Monday, May 6, 2024

Also: 7 Samsung Messages Features You Should Be Using, and More! How-To Geek Logo May 6, 2024 Did You Know You can find all manner of canned vegetables, but not broccoli: the temperatures required for

Launch pad decongestion

Monday, May 6, 2024

We've got some very cool news from Hubble Networks, which became the first company to connect a Bluetooth chip to a satellite. View this email online in your browser By Aria Alamalhodaei Monday,

Daily Coding Problem: Problem #1433 [Medium]

Monday, May 6, 2024

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Nest. Create a basic sentence checker that takes in a stream of characters and

Want to become an AI consultant?

Monday, May 6, 2024

My take on this new industry ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

Visualized | Interest Rate Forecasts for Advanced Economies 📈📉

Monday, May 6, 2024

In this graphic, we show the IMF's interest rate forecast for the US, Europe, the UK, and Japan for the next five years ahead. View Online | Subscribe Presented by Voronoi: The App Where Data Tells

⚙️ Apple AI updates

Monday, May 6, 2024

Plus: X AI stories & YouTube "skip to the good part" ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

Unlock Time Series Data, FTC Chair Joins StrictlyVC & More

Monday, May 6, 2024

TechCrunch Events Roundup | May 6 TechCrunch Events TechCrunch events roundup Unlock the power of time series data with industry experts from AWS and InfluxDB on May 16. Join us next week for this free

Deepdive – product strategy, AI, leadership, emotional intelligence

Monday, May 6, 2024

Earlier this month, we presented our Virtual edition of INDUSTRY: The Product Conference, featuring some of our favorite product leaders worldwide. There were seven great keynote presentations, live