Data Science Weekly - Data Science Weekly - Issue 421

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #421

December 16 2021

Editor Picks
 
  • Lee Wilkinson’s contribution to interactive visualization
    Upon learning this morning that Lee Wilkinson passed away I also felt compelled to write something on the extent to which his work has influenced interactive visualization research...The Grammar of Graphics was an incredibly ambitious undertaking – Wilkinson set out to create a system that could produce any statistical graphic he’d ever seen, and that could deepen understanding of the meaning of graphics...
  • Announcing the Transactions on Machine Learning Research
    With this post, we’re happy to announce that we (Raia Hadsell, Kyunghyun Cho, Hugo Larochelle) are founding a new journal...the review process will be hosted by OpenReview, and therefore will be open and transparent to the community. Another differentiation from JMLR will be the use of double blind reviewing, the consequence being that the submission of previously published research, even with extension, will not be allowed. Finally, we intend to work hard on establishing a fast-turnaround review process, focusing in particular on shorter-form submissions that are common at machine learning conferences...
  • The Death of Feature Engineering is Greatly Exaggerated
    One of the most exciting aspects of deep learning’s emergence in computer vision a few years ago was that it didn’t appear to require any feature engineering, unlike previous techniques like histograms-of-gradients or Haar cascades. As neural networks ate up other fields like NLP and speech, the hope was that feature engineering would become unnecessary for those domains too. At first I fully bought into this idea, and saw any remaining manually-engineered feature pipelines as legacy code that would soon be subsumed by more advanced models...Over the last few years of working with product teams to deploy models in production I’ve realized I was wrong...
 
 

A Message from this week's Sponsor:

 



Retool is the fast way to build an interface for any database

With Retool, you don't need to be a developer to quickly build an app or dashboard on top of any data set. Data teams at companies like NBC use Retool to build any interface on top of their data—whether it's a simple read-write visualization or a full-fledged ML workflow.

Drag and drop UI components—like tables and charts—to create apps. At every step, you can jump into the code to define the SQL queries and JavaScript that power how your app acts and connects to data. The result—less time on repetitive work and more time to discover insights.

 

 

Data Science Articles & Videos

 
  • PyTorch vs TensorFlow in 2022
    Should you use PyTorch vs TensorFlow in 2022? This guide walks through the major pros and cons of PyTorch vs TensorFlow, and how you can pick the right framework...
  • Improving the factual accuracy of language models through web browsing
    We’ve fine-tuned GPT-3 to more accurately answer open-ended questions using a text-based web browser. Our prototype copies how humans research answers to questions online – it submits search queries, follows links, and scrolls up and down web pages. It is trained to cite its sources, which makes it easier to give feedback to improve factual accuracy. We’re excited about developing more truthful AI,1 but challenges remain, such as coping with unfamiliar types of questions...
  • Optimization Nuggets: Exponential Convergence of SGD
    This is the first of a series of blog posts on short and beautiful proofs in optimization (let me know what you think in the comments!). For this first post in the series I'll show that stochastic gradient descent (SGD) converges exponentially fast to a neighborhood of the solution...
  • Data with a Purpose with Moritz Stefaner
    Meet Moritz Stefaner, a data designer who uses data for storytelling and who helped design the official German Covid-19 vaccine data dashboard. Moritz tells The Data Wranglers — Jeffrey Heer and Adam Wilson — how he creates a character from a dataset to give it emotional meaning and talks about the Covid vaccine clock he created. And, he dives into his data visualizations for train traffic on a German railroad network, the promises and pitfalls of using machine learning for data design, and what it took to visualize 175 years of text from Scientific American...
  • Using AI to bring children’s drawings to life
    We’re excited to announce a first-of-its-kind method for automatically animating children’s hand-drawn figures of people and humanlike characters (i.e., a character with two arms, two legs, a head, etc.) that bring these drawings to life in a matter of minutes using AI. By uploading them to our prototype system, parents and children can experience the excitement of watching their drawings become moving characters that dance, skip, and jump...
  • Best of the visualisation web - August 2021
    Since 2010 I have compiled and published monthly collections of links to some of the best, most interesting, or thought-provoking data visualisation-related content I come across. These collections are not always published immediately after the month in question has ended, but I try to do so as soon as my workload permits! Here's a collection of some of the best content I encountered during August 2021...
  • How AI Happens Podcast
    How AI Happens is a podcast featuring experts and practitioners explaining their work at the cutting edge of Artificial Intelligence. Tune in to hear AI Researchers, Data Scientists, ML Engineers, and the leaders of today’s most exciting AI companies explain the newest and most challenging facets of their field...
  • The Science of Visual Data Communication: What Works
    Effectively designed data visualizations allow viewers to use their powerful visual systems to understand patterns in data across science, education, health, and public policy. But ineffectively designed visualizations can cause confusion, misunderstanding, or even distrust—especially among viewers with low graphical literacy. We review research-backed guidelines for creating effective and intuitive visualizations oriented toward communicating data to students, coworkers, and the general public...
  • Modern Experimentation Platforms
    Che Sharma is the founder and CEO of Eppo, an experimentation framework that integrates with modern data platforms (cloud lakehouses and cloud data warehouses). We discuss the importance of investing in experimentation tools and the power of having a well-oiled experimentation culture within an organization. Che also explains how modern data platforms enable a variety of applications, including experimentation frameworks like Eppo...
  • Training Machine Learning Models More Efficiently with Dataset Distillation
    For a machine learning (ML) algorithm to be effective, useful features must be extracted from (often) large amounts of training data. However, this process can be made challenging due to the costs associated with training on such large datasets, both in terms of compute requirements and wall clock time. The idea of distillation plays an important role in these situations by reducing the resources required for the model to be effective. The most widely known form of distillation is model distillation (a.k.a. knowledge distillation), where the predictions of large, complex teacher models are distilled into smaller models...An alternative option to this model-space approach is dataset distillation, in which a large dataset is distilled into a synthetic, smaller dataset....
 
 

Tools*

 


Free Course: Natural Language Processing (NLP) for Semantic Search

Learn how to build semantic search applications by making machines understand language as people do. This free course covers everything you need to build state-of-the-art language models, from machine translation to question-answering, and more. Brought to you by Pinecone. Start reading now.

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

 
 

Jobs

 
  • Data Scientist, Decisions - Lyft - New York, NY

    Data Science is at the heart of Lyft’s products and decision-making. As a member of the Science team, you will work in a dynamic environment, where we embrace moving quickly to build the world’s best transportation. Data Scientists take on a variety of problems ranging from shaping critical business decisions to building algorithms that power our internal and external products. We’re looking for passionate, driven Data Scientists to take on some of the most interesting and impactful problems in ridesharing...

        Want to post a job here? Email us for details >> team@datascienceweekly.org

 
 

Training & Resources

 
  • Python Practice Problems for Beginner Coders
    From sifting through Twitter data to making your own Minecraft modifications, Python is one of the most versatile programming languages at a coder’s disposal. The open-source, object-oriented language is also quickly becoming one of the most-used languages in data science...To help readers practice the Python fundamentals, datascience@berkeley gathered six coding problems, including some from the W200: Introduction to Data Science Programming course. The questions below cover concepts ranging from basic data types to object-oriented programming using classes....
  • Book Draft: Distributional Reinforcement Learning
    By considering the return distribution, rather than just the expected return, we gain a fresh perspective on the fundamental problems of reinforcement learning. This includes understanding of how optimal decisions should be made, methods for creating effective representations of an agent’s state, and the consequences of interacting with other learningagents. In fact, many of the tools we develop here are useful beyond reinforcement learning and decision making. We call the process of computing return distributions distributional dynamic programming; it can be applied in any situation where probability distributionsshould be propagated within some dependency structure...
 
 

Books

 

 
  • Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits


    Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
     


    P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2021 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Older messages

Data Science Weekly - Issue 420

Friday, December 10, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #420 December 09 2021 Editor Picks D3

Data Science Weekly - Issue 419

Friday, December 3, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #419 December 02 2021 Editor Picks Flux

Data Science Weekly - Issue 418

Thursday, November 25, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #418 November 25 2021 Editor Picks The

Data Science Weekly - Issue 417

Friday, November 19, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #417 November 18 2021 Editor Picks To Be

[in case you missed it] Data Science Weekly - Issue 416

Sunday, November 14, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #416 November 11 2021 Editor Picks

You Might Also Like

Kotlin Weekly #407

Sunday, May 19, 2024

ISSUE #407 19th of May 2024 Hello Kotliners! The Google I/O just finished this week with a huge announcement for us, with Google supporting now Kotlin Multiplatform on Android, and the KotlinConf will

Learn How to Use AI to Reach Your Full Potential, newsletterest1!

Sunday, May 19, 2024

3 Ways AI Can Help Your Writing ͏ ‌  ͏ ‌  ͏ ‌  ͏ ‌  ͏ ‌  ͏ ‌  ͏ ‌  ͏ ‌ ͏ ‌  ͏ ‌  ͏ ‌  ͏ ‌  ͏ ‌  ͏ ‌  ͏ ‌  ͏ ‌ ͏ ‌  ͏ ‌  ͏ ‌  ͏ ‌  ͏ ‌  ͏ ‌  ͏ ‌  ͏ ‌ ͏ ‌  ͏ ‌  ͏ ‌  ͏ ‌  ͏ ‌

Software Testing Weekly - Issue 220

Saturday, May 18, 2024

Software Testing Conferences 📚 View on the Web Archives ISSUE 220 May 18th 2024 COMMENT Welcome to the 220th issue! Have you ever been to a testing conference? They're a great way to learn about

📶 Is a Cellular iPad Worth It? — How to Prevent YouTube From Taking Over Your Screensaver

Saturday, May 18, 2024

Also: This Robot Vacuum Can Clean Stairs, and More! How-To Geek Logo May 18, 2024 📩 Get expert reviews, the hottest deals, how-to's, breaking news, and more delivered directly to your inbox by

Weekend Reading — Objection-oriented programming

Saturday, May 18, 2024

This week we find a power-up box, replace GitHub Actions with Maven XMLs, avoid the worst website in the world, revisit RTO policies, “listen” to OpenAI employees, watch our Slack private messages, do

Daily Coding Problem: Problem #1445 [Easy]

Saturday, May 18, 2024

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Jane Street. The United States uses the imperial system of weights and measures, which

You don’t have to take our word for it…

Saturday, May 18, 2024

You can probably tell how excited we are to re-launch our Gigantic courses – which bring on-demand product management training for today's modern Product Managers and Product Leaders. In fact, we

🐍 New Python tutorials on Real Python

Saturday, May 18, 2024

Hey there, There's always something going on over at realpython.com as far as Python tutorials go. Here's what you may have missed this past week: What Is the __pycache__ Folder in Python? In

Visualized | Life Expectancy by Region (1950-2050F) 📊

Saturday, May 18, 2024

This map shows life expectancy at birth for key global regions, from 1950 to 2050F. View Online | Subscribe Presented by Voronoi: The App Where Data Tells the Story FEATURED STORY Life Expectancy by

New Wi-Fi Vulnerability Enables Network Eavesdropping via Downgrade Attacks

Saturday, May 18, 2024

THN Daily Updates Newsletter cover The DevSecOps Playbook: Deliver Continuous Security at Speed ($19.00 Value) FREE for a Limited Time A must-read guide to a new and rapidly growing field in