Data Science Weekly - Data Science Weekly - Issue 420

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #420

December 09 2021

Editor Picks
 
  • D3 and Data Visualization Insights with Mike Bostock
    What’s the secret for D3’s long-time success? Mike Bostock, the creator of D3 shares the reasons for his data visualization tool’s longevity, and why it won the 10-year Test-of-Time award from the IEEE. Mike goes deep on D3 and Observable, which he also founded, and talks about all things visualization with The Data Wranglers Joe Hellerstein and Jeffrey Heer, including when it’s OK to use a bar-chart for getting quick data insights and the applications of time zone wrangling...
  • A Call to Build Models Like We Build Open-Source Software
    This post argues that we should develop tools that will allow us to build pre-trained models in the same way that we build open-source software. Specifically, models should be developed by a large community of stakeholders who continually update and improve them. Realizing this goal will require porting many ideas from open-source software development to building and training models, which motivates many threads of interesting research....
  • AI-DR Program Automated Decision-Making and the Law Clearinghouse Project
    One public perception is that automated decision-making is fairer, or could even be more lawful. This perception stems from the belief that human bias may be eliminated in automated decisions. However, as emerging research has shown, unlawful discrimination can flow from the bias that remains encoded in automated decision-making systems...The aim of this clearinghouse project thus is to highlight seminal and impactful articles focused on issues of AI Decision-Making and the law. The AI-DR Program is pleased to share a searchable database of legal scholarly articles related to AI, automated decision-making and the law...
 
 

A Message from this week's Sponsor:

 



Retool is the fast way to build an interface for any database

With Retool, you don't need to be a developer to quickly build an app or dashboard on top of any data set. Data teams at companies like NBC use Retool to build any interface on top of their data—whether it's a simple read-write visualization or a full-fledged ML workflow.

Drag and drop UI components—like tables and charts—to create apps. At every step, you can jump into the code to define the SQL queries and JavaScript that power how your app acts and connects to data. The result—less time on repetitive work and more time to discover insights.

 

 

Data Science Articles & Videos

 
  • Learning with not Enough Data Part 1: Semi-Supervised Learning
    The performance of supervised learning tasks improves with more high-quality labels available. However, it is expensive to collect a large number of labeled samples. There are several paradigms in machine learning to deal with the scenario when the labels are scarce. Semi-supervised learning is one candidate, utilizing a large amount of unlabeled data conjunction with a small amount of labeled data...
  • Automated Story Generation as Question-Answering
    We propose a novel approach to automated story generation that treats the problem as one of generative question-answering. Our proposed story generation system starts with sentences encapsulating the final event of the story. The system then iteratively (1) analyzes the text describing the most recent event, (2) generates a question about "why" a character is doing the thing they are doing in the event, and then (3) attempts to generate another, preceding event that answers this question...
  • Cloud Wars: The Attack of Snowflakes
    Erik Bern wrote a post last week, combining the counterintuitive ideas that (a) the lowest cloud infrastructure layers are not commodity services, and (b) this means that the cloud providers could be happy ceding ground to others for higher level services, turning into pure play infrastructure platforms....I’m in violent agreement with the first premise that the lowest cloud infra layers are not commodity services¹. But I think it’s unlikely that cloud providers would be happy ceding ground to others on higher level services...
  • Visualize Data on Spirals
    In this vignette, I describe the package spiralize which visualizes data along an Archimedean spiral. It has two major advantages for visualization: a) It is able to visualize data with very long axis with high resolution and b) It is efficient for time series data to reveal periodic patterns...
  • Language Modelling at Scale: Gopher, Ethical considerations, and Retrieval
    Today we [DeepMind] are releasing three papers on language models that reflect this interdisciplinary approach. They include a detailed study of a 280 billion parameter transformer language model called Gopher, a study of ethical and social risks associated with large language models, and a paper investigating a new architecture with better training efficiency...
  • Updated spaCY NLP Course
    We've updated our interactive NLP course for spaCy v3!...💬 The updated course is available in English, Spanish, German and Japanese...📚 4 interactive chapters: from the first steps to your own spaCy model...🍰 New exercises about the training CLI & config...
  • A Cartel of Influential Datasets Is Dominating Machine Learning Research, New Study Suggests
    A new paper from the University of California and Google Research has found that a small number of ‘benchmark’ machine learning datasets, largely from influential western institutions, and frequently from government organizations, are increasingly dominating the AI research sector...the authors contend that ‘widely-used datasets are introduced by only a handful of elite institutions’, and that this ‘consolidation’ has increased to 80% in recent years...
  • PyTorch: Where we are headed and why it looks a lot like Julia (but not exactly like Julia)
    When trying to predict how PyTorch would itself get disrupted, we used to joke a bit about the next version of PyTorch being written in Julia. This was not very serious: a huge factor in moving PyTorch from Lua to Python was to tap into Python’s immense ecosystem (an ecosystem that shows no signs of going away) and even today it is still hard to imagine how a new language can overcome the network effects of Python...However, recently, I have been thinking about various projects we have going on in PyTorch...
  • minitorch
    MiniTorch is a diy teaching library for machine learning engineers who wish to learn about the internal concepts underlying deep learning systems. It is a pure Python re-implementation of the Torch API designed to be simple, easy-to-read, tested, and incremental. The final library can run Torch code. The project was developed for the course 'Machine Learning Engineering' at Cornell Tech...
  • Building a recommendation engine inside Postgres with Python and Pandas
    Earlier today I was starting to wonder why couldn't I do more machine learning directly inside the Postgres database. Yeah, there is madlib, but what if I wanted to write my own recommendation engine? So I set out on a total detour of a few hours and lo and behold, I can probably do a lot more of this in Postgres than I realized before. What follows is a quick walkthrough of getting a recommendation engine setup directly inside Postgres on top of Crunchy Bridge, our database as a service...
 
 

Tools*

 


What's a vector database, and how can you use it for AI/ML applications?

Vector databases help data scientists and ML engineers implement NLP into search, personalization, security, analytics, and monitoring applications. Learn all about them, their use cases, their core components, and how to get started. (It's easy.) Start here: What is a vector database?

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

 
 

Jobs

 
  • R&D Data Scientist - Danaher - Port Washington, NY

    As a Data Scientist at IBM, you will help transform our clients’ data into tangible business value by analyzing information, communicating outcomes and collaborating on product development. Work with Best in Class open source and visual tools, along with the most flexible and scalable deployment options. Whether it’s investigating patient trends or weather patterns, you will work to solve real world problems for the industries transforming how we live.

        Want to post a job here? Email us for details >> team@datascienceweekly.org

 
 

Training & Resources

 
  • Intuitive Bayes Introductory Course
    Have you found most statistics books overly theoretical? Math-heavy? Or lacking a clear focus on application?...Want to keep your skills sharp to improve your career prospects?...Have you heard about these new fangled Probabilistic Programming Languages and want to know what they're all about?...Then this course is for you...
  • How a Kalman filter works, in pictures
    You can use a Kalman filter in any place where you have uncertain information about some dynamic system, and you can make an educated guess about what the system is going to do next. Even if messy reality comes along and interferes with the clean motion you guessed about, the Kalman filter will often do a very good job of figuring out what actually happened. And it can take advantage of correlations between crazy phenomena that you maybe wouldn’t have thought to exploit!...I’ll start with a loose example of the kind of thing a Kalman filter can solve, but if you want to get right to the shiny pictures and math, feel free to jump ahead...
  • Reddit Discussion: Why are Einstein Sum Notations not popular in ML? They changed my life.
    I recently discovered `torch.einsum` and now I am mad at every friend, mentor, acquaintance for not telling me about it...They are just way more intuitive and can handle most operations that I would want to do with tensors so elegantly...It takes only 30 mins or so to learn the notation and become somewhat proficient but then you are sorted for life...What are the arguments for and against using einstein notations for everything?...
 
 

Books

 

 
  • Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits


    Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
     


    P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2021 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Older messages

Data Science Weekly - Issue 419

Friday, December 3, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #419 December 02 2021 Editor Picks Flux

Data Science Weekly - Issue 418

Thursday, November 25, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #418 November 25 2021 Editor Picks The

Data Science Weekly - Issue 417

Friday, November 19, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #417 November 18 2021 Editor Picks To Be

[in case you missed it] Data Science Weekly - Issue 416

Sunday, November 14, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #416 November 11 2021 Editor Picks

Data Science Weekly - Issue 416

Friday, November 12, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #416 November 11 2021 Editor Picks

You Might Also Like

Kotlin Weekly #407

Sunday, May 19, 2024

ISSUE #407 19th of May 2024 Hello Kotliners! The Google I/O just finished this week with a huge announcement for us, with Google supporting now Kotlin Multiplatform on Android, and the KotlinConf will

Learn How to Use AI to Reach Your Full Potential, newsletterest1!

Sunday, May 19, 2024

3 Ways AI Can Help Your Writing ͏ ‌  ͏ ‌  ͏ ‌  ͏ ‌  ͏ ‌  ͏ ‌  ͏ ‌  ͏ ‌ ͏ ‌  ͏ ‌  ͏ ‌  ͏ ‌  ͏ ‌  ͏ ‌  ͏ ‌  ͏ ‌ ͏ ‌  ͏ ‌  ͏ ‌  ͏ ‌  ͏ ‌  ͏ ‌  ͏ ‌  ͏ ‌ ͏ ‌  ͏ ‌  ͏ ‌  ͏ ‌  ͏ ‌

Software Testing Weekly - Issue 220

Saturday, May 18, 2024

Software Testing Conferences 📚 View on the Web Archives ISSUE 220 May 18th 2024 COMMENT Welcome to the 220th issue! Have you ever been to a testing conference? They're a great way to learn about

📶 Is a Cellular iPad Worth It? — How to Prevent YouTube From Taking Over Your Screensaver

Saturday, May 18, 2024

Also: This Robot Vacuum Can Clean Stairs, and More! How-To Geek Logo May 18, 2024 📩 Get expert reviews, the hottest deals, how-to's, breaking news, and more delivered directly to your inbox by

Weekend Reading — Objection-oriented programming

Saturday, May 18, 2024

This week we find a power-up box, replace GitHub Actions with Maven XMLs, avoid the worst website in the world, revisit RTO policies, “listen” to OpenAI employees, watch our Slack private messages, do

Daily Coding Problem: Problem #1445 [Easy]

Saturday, May 18, 2024

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Jane Street. The United States uses the imperial system of weights and measures, which

You don’t have to take our word for it…

Saturday, May 18, 2024

You can probably tell how excited we are to re-launch our Gigantic courses – which bring on-demand product management training for today's modern Product Managers and Product Leaders. In fact, we

🐍 New Python tutorials on Real Python

Saturday, May 18, 2024

Hey there, There's always something going on over at realpython.com as far as Python tutorials go. Here's what you may have missed this past week: What Is the __pycache__ Folder in Python? In

Visualized | Life Expectancy by Region (1950-2050F) 📊

Saturday, May 18, 2024

This map shows life expectancy at birth for key global regions, from 1950 to 2050F. View Online | Subscribe Presented by Voronoi: The App Where Data Tells the Story FEATURED STORY Life Expectancy by

New Wi-Fi Vulnerability Enables Network Eavesdropping via Downgrade Attacks

Saturday, May 18, 2024

THN Daily Updates Newsletter cover The DevSecOps Playbook: Deliver Continuous Security at Speed ($19.00 Value) FREE for a Limited Time A must-read guide to a new and rapidly growing field in