Data Science Weekly - Data Science Weekly - Issue 466

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #466

October 27 2022

Editor's Picks

 

  • The Scientific Virtues
    Science education usually starts with teaching students different tools and techniques, methods for conducting research...This is wrong. Science education should begin with the scientific virtues...The scientific virtues are: a) Stupidity, b) Arrogance, c) Laziness, d) Carefreeness, e) Beauty, f) Rebellion, g) Humor...
  • Using a data dictionary as your roadmap to quality data
    A data dictionary, a rectangular format collection of names, definitions, and attributes about variables in a dataset, is arguably the single most important piece of documentation you will create..While a data dictionary, sometimes also called a codebook or variable information log, is often used as a tool to help you and others interpret your data at the end of your project, it is actually even more powerful if created before you ever collect a single piece of data, serving as a roadmap as you design your data collection tools and clean your data...
  • The Farama Foundation: The future of open source reinforcement learning
    Today we’re announcing the Farama Foundation – a new nonprofit organization designed in part to house major existing open source reinforcement learning (“RL”) libraries in a neutral nonprofit body. We aim to provide standardization and long term maintenance to these projects, as well as improvements to their reproducibility, performance, and quality of life features. We are also working to develop key pieces of missing software for the open source reinforcement learning ecosystem...This post explains who we are, what we’re working on right now, and what our long term goals and vision are. This post also publicly announces the release of Gymnasium, a library where the future maintenance of OpenAI Gym will be taking place...
 
 

A Message from this week's Sponsor:

 



Learn and Practice AI/ML with Global Communities

Join the largest AI/ML/Data developers community globally (180K+ developers in 150+ countries) to learn and practice AI, machine learning, deep learning, and data science technologies. A few upcoming learning events:
  • Nov 1st (Austin): Build Image Recognition System with Kafka
  • Nov 2nd (Silicon Valley, NYC, Bengaluru): Google Data Stream Processing Night
  • Nov 10th (Seattle, Boston, New York): AWS Dev Day on Cloud Data Lakehouse
  • Nov 15th (Virtual): MLOps Platform - Notebook to Production (Expert Level Workshop)
  • And 20+ more on the website


 
 

Data Science Articles & Videos

 
  • Create Data-Rich Presentation from Jupyter Notebook
    Presentation is a great way to share your results and findings with a non-technical audience. The data-rich presentation with charts, tables, and code may be tedious to create. The good news is that you can create a presentation directly from Jupyter Notebook!...
  • The Russian Roulette: An Unbiased Estimator of the Limit
    The Russian Roulette offers a simple way to construct an unbiased estimator for the limit of a sequence. It allows for example to construct an unbiased estimator of the pseudoinverse of a matrix, which is otherwise difficult to obtain. We'll first show that the estimator is unbiased. Then we'll discuss one of the original applications of this method: an unbiased estimator of the matrix pseudoinverse. Finally, we'll discuss its limitations and practical issues through a variance analysis...
  • The most important recent developments in AI
    From solving maths and science problems to translating with astonishing accuracy between hundreds of languages – not to mention generating images and videos based on a natural language prompt – AI is making strides pretty much across the board...In this article, I’ll briefly discuss some of the most recent (and the most exciting!) developments that you should know about...
  • A Transformer That Solves Small Tabular Classification Problems in a Second
    This may revolutionize data science: we introduce TabPFN, a new tabular data classification method that takes < 1 second & yields SOTA performance (competitive with the best AutoML pipelines in an hour)...So far, it is limited in scale, though: it can only tackle problems up to 1000 training examples, 100 features and 10 classes...TabPFN is radically different from previous ML methods. It is a meta-learned algorithm and it provably approximates Bayesian inference with a prior for principles of causality and simplicity. Qualitatively, its resulting predictions are very intuitive as well, with very smooth uncertainty estimates...
  • Math of Gaussian Mixture Model Clustering
    The math of Gaussian Mixture Model Clustering can be tough for undergrads to grasp, but it gives a TON of insight into how GMM works!...I made this GMM math worksheet to do with my class...
  • Generalizing in the Real World with Representation Learning
    As applications of ML, particularly in AI systems, become more pervasive in the real world, we need to critically examine these assumptions, norms, and problem settings, as well as the methods that have become de-facto standards. There is much we still do not understand about how and why deep networks trained with stochastic gradient descent are able to generalize as well as they do, why they fail when they do, and how they will perform on out-of-distribution data. In this thesis I cover some of my work towards better understanding deep net generalization, identify several ways assumptions and problem settings fail to generalize to the real world, and propose ways to address those failures in practice...
  • Coding for Economists: Common Plots
    In this chapter, we’ll look at some of the most common plots that you might want to make–and how to create them using the most popular data visualisations libraries, including matplotlib, plotnine, seaborn, altair, and plotly...
  • LangChain - Building applications with LLMs through composability
    Large language models (LLMs) are emerging as a transformative technology, enabling developers to build applications that they previously could not. But using these LLMs in isolation is often not enough to create a truly powerful app - the real power comes when you are able to combine them with other sources of computation or knowledge...This library is aimed at assisting in the development of those types of applications. It aims to create: a) a comprehensive collection of pieces you would ever want to combine, b) a flexible interface for combining pieces into a single comprehensive "chain", and c) a schema for easily saving and sharing those chains...
  • Low-Rank Approximation Toolbox: Nyström, Cholesky, and Schur
    In this post, we will draw a connection between low-rank approximation by Nyström approximation and solving linear systems of equations by Gaussian elimination. The connection between these two seemingly unrelated areas of matrix computations will pay dividends, leading to effective algorithms to compute Nyström approximations by the (partial) Cholesky factorization of a positive (semi)definite matrix and an elegant description of the residual of the Nyström approximation as the Schur complement....
  • Deep Whole-Body Control: Learning a Unified Policy for Manipulation and Locomotion
    In this work, we propose to learn a unified policy for whole-body control of a legged manipulator using reinforcement learning. We propose Regularized Online Adaptation to bridge the Sim2Real gap for high-DoF control, and Advantage Mixing exploiting the causal dependency in the action space to overcome local minima during training the whole-body system. We also present a simple design for a low-cost legged manipulator, and find that our unified policy can demonstrate dynamic and agile behaviors across several task setups...
  • Optimisation & Generalisation in Networks of Neurons
    The goal of this thesis is to develop the optimisation and generalisation theoretic foundations of learning in artificial neural networks. On optimisation, a new theoretical framework is proposed for deriving architecture-dependent first-order optimisation algorithms. The approach works by combining a "functional majorisation" of the loss function with "architectural perturbation bounds" that encode an explicit dependence on neural architecture. The framework yields optimisation methods that transfer hyperparameters across learning problems. On generalisation, a new correspondence is proposed between ensembles of networks and individual networks...
  • The Unreasonable Effectiveness of Data Pipeline Smoke Tests
    Data practitioners waste time writing unit tests to catch bugs they could have caught with smoke tests...In this post, we’ll discuss a powerful technique for speeding up data pipeline development: the data pipeline smoke test. You write your smoke test just once: you don’t need to write a test for every newly derived data asset. It can complete in a few seconds and exercises every transformation inside your data pipeline...The idea of the data pipeline smoke test is to automatically run all your data transformations on empty or synthetic data...
 
 

Tool*

 



Jumpstart your data science journey and master the foundations of our data-driven world with Anaconda.

If you're looking to learn essential data science skills, there’s no need to sort through countless tools, guides, and boot camps that overpromise and underdeliver—Anaconda is here! With an Anaconda subscription, you can now access on-demand data science training and cloud-hosted notebooks. Learn from experts in the field and spin up data science projects anytime, anywhere—with all the packages and computing power you need. Whether you’re just getting started or ready to take your data science skills to the next level, Anaconda provides the building blocks you need to make sense of our data-driven world.

Get started at Anaconda.cloud.


*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

   
 

Jobs

 
  • Data Scientist - Mount Sinai Data Commons - NYC

    A position is available for an individual with skills in data science, bioinformatics and software engineering to play the key role in running and managing the Mount Sinai Data Commons – known as the Data Ark. The Data Ark team brings together all the most important data sets used by Sinai researchers (e.g. 1000G, GTEx, UK Biobank) in a single location on our HPC server (minvera.org), performs QA/QC processing of the data, conducts initial demographics analyses to showcase the different data sets, and will be tasked with expanding the data commons to host a large range of different data sets of different types (genotype, WES, WGS, RNA-seq, EHR-linked, imaging etc.), which will come with their own computational and platform challenges...
     

        Want to post a job here? Email us for details --> team@datascienceweekly.org

 

 

Training & Resources

 
  • Understanding ShinyApps
    Today, we’ll discover how you can use the power of R (and RStudio) to create, for instance, an interactive visualization with the ShinyApp framework...
  • An Introduction to Poisson Flow Generative Models
    Poisson Flow Generative Models (PFGMs) are a new type of generative Deep Learning model, taking inspiration from physics much like Diffusion Models. Learn the theory behind PFGMs and how to generate images with them in this easy-to-follow guide...
 

What you’re up to – notes from DSW readers

 
  • Fill out the form below to appear here :) ...
 

* To share your projects and updates, share the details here.

** Want to chat with one of the above people? Hit reply and let us know :)

 

Last Week's Newsletter's 3 Most Clicked Links

   

* Based on unique clicks.

** Find last week's newsletter here.

 

Cutting Room Floor

 


P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2022 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Older messages

Data Science Weekly - Issue 465

Thursday, October 20, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #465 October 20 2022 Editor's Picks

Data Science Weekly - Issue 464

Thursday, October 13, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #464 October 13 2022 Editor's Picks

Data Science Weekly - Issue 463

Thursday, October 6, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #463 October 06 2022 Editor's Picks

Data Science Weekly - Issue 462

Thursday, September 29, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #462 September 29 2022 Editor's

Data Science Weekly - Issue 461

Friday, September 23, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #461 September 22 2022 Editor's

You Might Also Like

📧 Building Async APIs in ASP.NET Core - The Right Way

Saturday, November 23, 2024

​ Building Async APIs in ASP .NET Core - The Right Way Read on: m​y website / Read time: 5 minutes The .NET Weekly is brought to you by: Even the smartest AI in the world won't save you from a

WebAIM November 2024 Newsletter

Friday, November 22, 2024

WebAIM November 2024 Newsletter Read this newsletter online at https://webaim.org/newsletter/2024/november Features Using Severity Ratings to Prioritize Web Accessibility Remediation When it comes to

➡️ Why Your Phone Doesn't Want You to Sideload Apps — Setting the Default Gateway in Linux

Friday, November 22, 2024

Also: Hey Apple, It's Time to Upgrade the Macs Storage, and More! How-To Geek Logo November 22, 2024 Did You Know Fantasy author JRR Tolkien is credited with inventing the main concept of orcs and

JSK Daily for Nov 22, 2024

Friday, November 22, 2024

JSK Daily for Nov 22, 2024 View this email in your browser A community curated daily e-mail of JavaScript news React E-Commerce App for Digital Products: Part 4 (Creating the Home Page) This component

Spyglass Dispatch: The Fate of Chrome • Amazon Tops Up Anthropic • Pros Quit Xitter • Brave Powers AI Search • Apple's Lazy AI River • RIP Enrique Allen

Friday, November 22, 2024

The Fate of Chrome • Amazon Tops Up Anthropic • Pros Quit Xitter • Brave Powers AI Search • Apple's Lazy AI River • RIP Enrique Allen The Spyglass Dispatch is a free newsletter sent out daily on

Charted | How the Global Distribution of Wealth Has Changed (2000-2023) 💰

Friday, November 22, 2024

This graphic illustrates the shifts in global wealth distribution between 2000 and 2023. View Online | Subscribe | Download Our App Presented by: MSCI >> Get the Free Investor Guide Now FEATURED

Daily Coding Problem: Problem #1616 [Easy]

Friday, November 22, 2024

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Alibaba. Given an even number (greater than 2), return two prime numbers whose sum will

The problem to solve

Friday, November 22, 2024

​ Use problem framing to define the problem to solve This week, Tom Parson and Krishna Raha share tools and frameworks to identify and address challenges effectively, while Voltage Control highlights

Issue #568: Random mazes, train clock, and ReKill

Friday, November 22, 2024

View this email in your browser Issue #568 - November 22nd 2024 Weekly newsletter about Web Game Development. If you have anything you want to share with our community please let me know by replying to

Whats Next for AI: Interpreting Anthropic CEOs Vision

Friday, November 22, 2024

Top Tech Content sent at Noon! How the world collects web data Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, November 22, 2024? The HackerNoon