Data Science Weekly - Data Science Weekly - Issue 419

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #419

December 02 2021

Editor Picks
 
  • Flux <3 NumFOCUS
    We are very excited to announce that FluxML is partnering with NumFOCUS as an affiliated project to further the cause of open and reproducible science and growing the adoption of the FluxML ecosystem. Flux has always had the mission of being a simple, hackable and performant approach to machine learning, which is extended to a number of domains in science by means of differentiable programming...This milestone is the result of the coming together of the Julia community to support the vision of producing high performance machine learning tools which are flexible towards the needs of novel use cases such as: graph neural networks, scientific machine learning, and differentiable programming...
  • 30 days and as many maps
    Writing this, looking back on the last 30 days, I realize how much of the fun I have writing these notebooks depends on the existence and vibrance of a large community and on the activity of all the people (past and present) who did research, published it, compiled datasets, created software, created visualizations, and spread all kinds of enthusiasm; people who enjoy sharing, explaining what they do and encouraging others to try and make stuff, and get excited about new ways of seeing and representing spatial (or non-spatial) data...
  • The Impending Cloud Reshuffle
    Here's a theory I have about cloud vendors (AWS, Azure, GCP): 1) Cloud vendors will increasingly focus on the lowest layers in the stack: basically leasing capacity in their data centers through an API and 2) Other pure-software providers will build all the stuff on top of it. Databases, running code, you name it...let me walk you through my thinking—I think some of it is quite well illustrated through the story of Redshift...
 
 

A Message from this week's Sponsor:

 


High quality data labeling, consistently

Edge cases are the most common challenges that ML teams face when training their AI models, making it difficult to reach 95+% accuracy. This can be more complex once you need to scale and start working with 3rd party data labeling solutions.

The evaluation metrics that we use to measure the quality of labeled data - Intersection over Union (IOU) and F1 score - has allowed us to make swift adjustments on the go and continuously improve the quality of our labeling standards. To find out more and start exploring our end-to-end data labeling service, speak to the team at Supahands today

 

 

Data Science Articles & Videos

 
  • OpenAI Residency
    As part of our effort to support and develop AI talent, we’re excited to announce the OpenAI Residency. This new program offers a pathway to a full-time role at OpenAI for researchers and engineers who don’t currently focus on artificial intelligence. We are excited to get applications from everyone, and will make a special effort to hear from underrepresented groups in technology...
  • Exploring the beauty of pure mathematics in novel ways
    As part of DeepMind's mission to solve intelligence, we explored the potential of machine learning (ML) to recognize mathematical structures and patterns, and help guide mathematicians toward discoveries they may otherwise never have found — demonstrating for the first time that AI can help at the forefront of pure mathematics...Our research paper, published today in the journal Nature, details our collaboration with top mathematicians to apply AI toward discovering new insights in two areas of pure mathematics: topology and representation theory...
  • Bird-inspired dynamic grasping and perching in arboreal environments
    Birds take off and land on a wide range of complex surfaces. In contrast, current robots are limited in their ability to dynamically grasp irregular objects. Leveraging recent findings on how birds take off, land, and grasp, we developed a biomimetic robot that can dynamically perch on complex surfaces and grasp irregular objects. To accommodate high-speed collisions, the robot’s two legs passively transform impact energy into grasp force, while the underactuated grasping mechanism wraps around irregularly shaped objects in less than 50 milliseconds...
  • Predicting long-term user engagement from short-term behavior
    A problem that a company may want to address is how to derive insights from data on already engaged users to identify any common behavior patterns that can be leveraged to promote the same level of engagement in new users...In discussing the problem, they can identify two segments of their long-term engaged user population base that they wanted to understand: a) “Day One” Users — consistent, regular users from day one and b) “Late Bloomer” Users — sporadic early users, with an increase in engagement at a later date...The behaviors of these two segments can be seen in the following figures...
  • MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions
    The recent and increasing interest in video-language research has driven the development of large-scale datasets that enable data-intensive machine learning techniques. In comparison, limited effort has been made at assessing the fitness of these datasets for the video-language grounding task...In this work, we present MAD (Movie Audio Descriptions), a novel benchmark that departs from the paradigm of augmenting existing video datasets with text annotations and focuses on crawling and aligning available audio descriptions of mainstream movies...
  • What Data Science candidates can and cannot control in their job hunt
    Having been involved in quite a few rounds of hiring data scientists in a biomedical research context, I'd like to share some perspectives that may help candidates who desire a move into a data science role in biomedical research. I'll start off with the usual disclaimer that these are personal observations and thoughts; they may not apply uniformly to all biomedical data science teams, and may reflect personal biases. With that disclaimer out of the way, here are my observations...
  • Kinematic self-replication in reconfigurable organisms
    Here we show that clusters of cells, if freed from a developing organism, can similarly find and combine loose cells into clusters that look and move like they do, and that this ability does not have to be specifically evolved or introduced by genetic manipulation. Finally, we show that artificial intelligence can design clusters that replicate better, and perform useful work as they do so. This suggests that future technologies may, with little outside guidance, become more useful as they spread, and that life harbors surprising behaviors just below the surface, waiting to be uncovered...
  • Path integral control theory
    Control theory is a theory from engineering that gives a formal description of how a system, such as a robot or animal, can move from a current state to a future state at minimal cost, where cost can mean time spent, or energy spent or any other quantity. Control theory is used traditionally to control industrial plants, airplanes or missiles, but is also the natural framework to model intelligent behavior in animals or robots. The mathematical formulation of deterministic control theory is very similar to classical mechanics. In fact, classical mechanics can be viewed as a special case of control theory...
  • Gaussian Process: First Step Towards Active Learning in Physics
    Despite the extreme disparity in terms of objects and study methods, some tasks are common across multiple scientific fields. One of such tasks is an interpolation...This can be approached using multiple methods including splines, kernel density approximations, neural network fits, and many others. However, when doing so, the second natural question is the uncertainty of these interpolated values, or to which extent they are trustable...Finally, the third and perhaps most interesting question is whether we can use the knowledge of the interpolated function and its uncertainty to guide our search strategy...All these problems can be addressed in a principled manner using Gaussian Process (GP) and GP-based Bayesian Optimization...
  • Procedural storytelling is exploding the possibilities of video game narratives
    Procedural stories in video games often induce a specific kind of delight. You’ll know when it hits — a realization that the code and algorithms of the game seem to be generating a coherent narrative from your own impulsive, seemingly chaotic actions...Drama, as video games continue to prove, is harder to convince players of than space itself, which makes procedural successes all the more eye-catching — from mainstream hits such as The Sims to cult classics like Rimworld. Now it feels like this sandbox approach to storytelling is starting to bear even greater narrative fruit...
 
 

Tools*

 



Retool is the fast way to build an interface for any database

With Retool, you don't need to be a developer to quickly build an app or dashboard on top of any data set. Data teams at companies like NBC use Retool to build any interface on top of their data—whether it's a simple read-write visualization or a full-fledged ML workflow.

Drag and drop UI components—like tables and charts—to create apps. At every step, you can jump into the code to define the SQL queries and JavaScript that power how your app acts and connects to data. The result—less time on repetitive work and more time to discover insights.

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

 
 

Jobs

 
  • R&D Data Scientist - Danaher - Port Washington, NY

    As a Data Scientist at IBM, you will help transform our clients’ data into tangible business value by analyzing information, communicating outcomes and collaborating on product development. Work with Best in Class open source and visual tools, along with the most flexible and scalable deployment options. Whether it’s investigating patient trends or weather patterns, you will work to solve real world problems for the industries transforming how we live.

        Want to post a job here? Email us for details >> team@datascienceweekly.org

 
 

Training & Resources

 
  • Pytorch Conv2d Weights Explained
    Understanding weights dimension, visualization, number of parameters and the infamous size mismatch...One of the most common problems I have found in my journey with Pytorch is the size mismatch error when uploading weights to my models. As you know, Pytorch does not save the computational graph of your model when you save the model weights (on the contrary to TensorFlow). So when you train multiple models with different configurations (different depths, width, resolution…) it is very common to misspell the weights file and upload the wrong weights for your target model...This misspell translates into the infamous Pytorch error for the Conv2d weights: the size mismatch...
  • Random Forests Algorithm explained with a real-life example and some Python code
    Random Forests is a Machine Learning algorithm that tackles one of the biggest problems with Decision Trees: variance...To address overfitting, and reduce the variance in Decision Trees, Leo Breiman developed the Random Forests algorithm. This was an innovative algorithm because it utilized, for the first time, the statistical technique of Bootstrapping and combined the results of training multiple models into a single, more powerful learning model...But before you see Random Forests in action, and code, let’s take a detour to explore what makes Random Forests unique...
  • pybaobabdt - Python implementation of visualization technique for (sklearn) decision trees
    The pybaobabdt package provides a python implementation for the visualization of decision trees. The technique is based on the scientific paper BaobabView: Interactive construction and analysis of decision trees developed by the TU/e. A typical decision tree is visualized using a standard node link diagram...The problem, however, is that information is not easily extracted from this. Which classes are easy to separate for example, which classes are similar, where does the main flow of items go etc. Therefore, we developed techniques to answer these questions with a scalable visualization...
 
 

Books

 

 
  • Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits


    Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
     


    P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2021 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Older messages

Data Science Weekly - Issue 418

Thursday, November 25, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #418 November 25 2021 Editor Picks The

Data Science Weekly - Issue 417

Friday, November 19, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #417 November 18 2021 Editor Picks To Be

[in case you missed it] Data Science Weekly - Issue 416

Sunday, November 14, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #416 November 11 2021 Editor Picks

Data Science Weekly - Issue 416

Friday, November 12, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #416 November 11 2021 Editor Picks

Data Science Weekly - Issue 415

Friday, November 5, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #415 November 04 2021 Editor Picks

You Might Also Like

Re: Hackers may have stolen everyone's SSN!

Saturday, November 23, 2024

I wanted to make sure you saw Incogni's Black Friday deal, which is exclusively available for iPhone Life readers. Use coupon code IPHONELIFE to save 58%. Here's why we recommend Incogni for

North Korean Hackers Steal $10M with AI-Driven Scams and Malware on LinkedIn

Saturday, November 23, 2024

THN Daily Updates Newsletter cover Generative AI For Dummies ($18.00 Value) FREE for a Limited Time Generate a personal assistant with generative AI Download Now Sponsored LATEST NEWS Nov 23, 2024

📧 Building Async APIs in ASP.NET Core - The Right Way

Saturday, November 23, 2024

​ Building Async APIs in ASP .NET Core - The Right Way Read on: m​y website / Read time: 5 minutes The .NET Weekly is brought to you by: Even the smartest AI in the world won't save you from a

WebAIM November 2024 Newsletter

Friday, November 22, 2024

WebAIM November 2024 Newsletter Read this newsletter online at https://webaim.org/newsletter/2024/november Features Using Severity Ratings to Prioritize Web Accessibility Remediation When it comes to

➡️ Why Your Phone Doesn't Want You to Sideload Apps — Setting the Default Gateway in Linux

Friday, November 22, 2024

Also: Hey Apple, It's Time to Upgrade the Macs Storage, and More! How-To Geek Logo November 22, 2024 Did You Know Fantasy author JRR Tolkien is credited with inventing the main concept of orcs and

JSK Daily for Nov 22, 2024

Friday, November 22, 2024

JSK Daily for Nov 22, 2024 View this email in your browser A community curated daily e-mail of JavaScript news React E-Commerce App for Digital Products: Part 4 (Creating the Home Page) This component

Spyglass Dispatch: The Fate of Chrome • Amazon Tops Up Anthropic • Pros Quit Xitter • Brave Powers AI Search • Apple's Lazy AI River • RIP Enrique Allen

Friday, November 22, 2024

The Fate of Chrome • Amazon Tops Up Anthropic • Pros Quit Xitter • Brave Powers AI Search • Apple's Lazy AI River • RIP Enrique Allen The Spyglass Dispatch is a free newsletter sent out daily on

Charted | How the Global Distribution of Wealth Has Changed (2000-2023) 💰

Friday, November 22, 2024

This graphic illustrates the shifts in global wealth distribution between 2000 and 2023. View Online | Subscribe | Download Our App Presented by: MSCI >> Get the Free Investor Guide Now FEATURED

Daily Coding Problem: Problem #1616 [Easy]

Friday, November 22, 2024

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Alibaba. Given an even number (greater than 2), return two prime numbers whose sum will

The problem to solve

Friday, November 22, 2024

​ Use problem framing to define the problem to solve This week, Tom Parson and Krishna Raha share tools and frameworks to identify and address challenges effectively, while Voltage Control highlights