[in case you missed it] Data Science Weekly - Issue 421

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #421

December 16 2021

Editor Picks
 
  • Lee Wilkinson’s contribution to interactive visualization
    Upon learning this morning that Lee Wilkinson passed away I also felt compelled to write something on the extent to which his work has influenced interactive visualization research...The Grammar of Graphics was an incredibly ambitious undertaking – Wilkinson set out to create a system that could produce any statistical graphic he’d ever seen, and that could deepen understanding of the meaning of graphics...
  • Announcing the Transactions on Machine Learning Research
    With this post, we’re happy to announce that we (Raia Hadsell, Kyunghyun Cho, Hugo Larochelle) are founding a new journal...the review process will be hosted by OpenReview, and therefore will be open and transparent to the community. Another differentiation from JMLR will be the use of double blind reviewing, the consequence being that the submission of previously published research, even with extension, will not be allowed. Finally, we intend to work hard on establishing a fast-turnaround review process, focusing in particular on shorter-form submissions that are common at machine learning conferences...
  • The Death of Feature Engineering is Greatly Exaggerated
    One of the most exciting aspects of deep learning’s emergence in computer vision a few years ago was that it didn’t appear to require any feature engineering, unlike previous techniques like histograms-of-gradients or Haar cascades. As neural networks ate up other fields like NLP and speech, the hope was that feature engineering would become unnecessary for those domains too. At first I fully bought into this idea, and saw any remaining manually-engineered feature pipelines as legacy code that would soon be subsumed by more advanced models...Over the last few years of working with product teams to deploy models in production I’ve realized I was wrong...
 
 

A Message from this week's Sponsor:

 



Retool is the fast way to build an interface for any database

With Retool, you don't need to be a developer to quickly build an app or dashboard on top of any data set. Data teams at companies like NBC use Retool to build any interface on top of their data—whether it's a simple read-write visualization or a full-fledged ML workflow.

Drag and drop UI components—like tables and charts—to create apps. At every step, you can jump into the code to define the SQL queries and JavaScript that power how your app acts and connects to data. The result—less time on repetitive work and more time to discover insights.

 

 

Data Science Articles & Videos

 
  • PyTorch vs TensorFlow in 2022
    Should you use PyTorch vs TensorFlow in 2022? This guide walks through the major pros and cons of PyTorch vs TensorFlow, and how you can pick the right framework...
  • Improving the factual accuracy of language models through web browsing
    We’ve fine-tuned GPT-3 to more accurately answer open-ended questions using a text-based web browser. Our prototype copies how humans research answers to questions online – it submits search queries, follows links, and scrolls up and down web pages. It is trained to cite its sources, which makes it easier to give feedback to improve factual accuracy. We’re excited about developing more truthful AI,1 but challenges remain, such as coping with unfamiliar types of questions...
  • Optimization Nuggets: Exponential Convergence of SGD
    This is the first of a series of blog posts on short and beautiful proofs in optimization (let me know what you think in the comments!). For this first post in the series I'll show that stochastic gradient descent (SGD) converges exponentially fast to a neighborhood of the solution...
  • Data with a Purpose with Moritz Stefaner
    Meet Moritz Stefaner, a data designer who uses data for storytelling and who helped design the official German Covid-19 vaccine data dashboard. Moritz tells The Data Wranglers — Jeffrey Heer and Adam Wilson — how he creates a character from a dataset to give it emotional meaning and talks about the Covid vaccine clock he created. And, he dives into his data visualizations for train traffic on a German railroad network, the promises and pitfalls of using machine learning for data design, and what it took to visualize 175 years of text from Scientific American...
  • Using AI to bring children’s drawings to life
    We’re excited to announce a first-of-its-kind method for automatically animating children’s hand-drawn figures of people and humanlike characters (i.e., a character with two arms, two legs, a head, etc.) that bring these drawings to life in a matter of minutes using AI. By uploading them to our prototype system, parents and children can experience the excitement of watching their drawings become moving characters that dance, skip, and jump...
  • Best of the visualisation web - August 2021
    Since 2010 I have compiled and published monthly collections of links to some of the best, most interesting, or thought-provoking data visualisation-related content I come across. These collections are not always published immediately after the month in question has ended, but I try to do so as soon as my workload permits! Here's a collection of some of the best content I encountered during August 2021...
  • How AI Happens Podcast
    How AI Happens is a podcast featuring experts and practitioners explaining their work at the cutting edge of Artificial Intelligence. Tune in to hear AI Researchers, Data Scientists, ML Engineers, and the leaders of today’s most exciting AI companies explain the newest and most challenging facets of their field...
  • The Science of Visual Data Communication: What Works
    Effectively designed data visualizations allow viewers to use their powerful visual systems to understand patterns in data across science, education, health, and public policy. But ineffectively designed visualizations can cause confusion, misunderstanding, or even distrust—especially among viewers with low graphical literacy. We review research-backed guidelines for creating effective and intuitive visualizations oriented toward communicating data to students, coworkers, and the general public...
  • Modern Experimentation Platforms
    Che Sharma is the founder and CEO of Eppo, an experimentation framework that integrates with modern data platforms (cloud lakehouses and cloud data warehouses). We discuss the importance of investing in experimentation tools and the power of having a well-oiled experimentation culture within an organization. Che also explains how modern data platforms enable a variety of applications, including experimentation frameworks like Eppo...
  • Training Machine Learning Models More Efficiently with Dataset Distillation
    For a machine learning (ML) algorithm to be effective, useful features must be extracted from (often) large amounts of training data. However, this process can be made challenging due to the costs associated with training on such large datasets, both in terms of compute requirements and wall clock time. The idea of distillation plays an important role in these situations by reducing the resources required for the model to be effective. The most widely known form of distillation is model distillation (a.k.a. knowledge distillation), where the predictions of large, complex teacher models are distilled into smaller models...An alternative option to this model-space approach is dataset distillation, in which a large dataset is distilled into a synthetic, smaller dataset....
 
 

Tools*

 


Free Course: Natural Language Processing (NLP) for Semantic Search

Learn how to build semantic search applications by making machines understand language as people do. This free course covers everything you need to build state-of-the-art language models, from machine translation to question-answering, and more. Brought to you by Pinecone. Start reading now.

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

 
 

Jobs

 
  • Data Scientist, Decisions - Lyft - New York, NY

    Data Science is at the heart of Lyft’s products and decision-making. As a member of the Science team, you will work in a dynamic environment, where we embrace moving quickly to build the world’s best transportation. Data Scientists take on a variety of problems ranging from shaping critical business decisions to building algorithms that power our internal and external products. We’re looking for passionate, driven Data Scientists to take on some of the most interesting and impactful problems in ridesharing...

        Want to post a job here? Email us for details >> team@datascienceweekly.org

 
 

Training & Resources

 
  • Python Practice Problems for Beginner Coders
    From sifting through Twitter data to making your own Minecraft modifications, Python is one of the most versatile programming languages at a coder’s disposal. The open-source, object-oriented language is also quickly becoming one of the most-used languages in data science...To help readers practice the Python fundamentals, datascience@berkeley gathered six coding problems, including some from the W200: Introduction to Data Science Programming course. The questions below cover concepts ranging from basic data types to object-oriented programming using classes....
  • Book Draft: Distributional Reinforcement Learning
    By considering the return distribution, rather than just the expected return, we gain a fresh perspective on the fundamental problems of reinforcement learning. This includes understanding of how optimal decisions should be made, methods for creating effective representations of an agent’s state, and the consequences of interacting with other learningagents. In fact, many of the tools we develop here are useful beyond reinforcement learning and decision making. We call the process of computing return distributions distributional dynamic programming; it can be applied in any situation where probability distributionsshould be propagated within some dependency structure...
 
 

Books

 

 
  • Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits


    Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
     


    P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2021 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Older messages

Data Science Weekly - Issue 421

Friday, December 17, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #421 December 16 2021 Editor Picks Lee

Data Science Weekly - Issue 420

Friday, December 10, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #420 December 09 2021 Editor Picks D3

Data Science Weekly - Issue 419

Friday, December 3, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #419 December 02 2021 Editor Picks Flux

Data Science Weekly - Issue 418

Thursday, November 25, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #418 November 25 2021 Editor Picks The

Data Science Weekly - Issue 417

Friday, November 19, 2021

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #417 November 18 2021 Editor Picks To Be

You Might Also Like

Happening TUESDAY! Follow Our Coverage of Apple’s Spring Announcement

Monday, May 6, 2024

iPhone Life magazine Follow Our Coverage of Apple's Latest Announcement. twitter facebook YouTube Podcast Tune in for Apple's 'Let Loose' Event Tomorrow! Surprise! Just a month before

Who wants a new iPad?

Monday, May 6, 2024

Plus: OpenAI and Stack Overflow partner and LockBit's website returns View this email online in your browser By Christine Hall Monday, May 6, 2024 Good afternoon, and welcome back to TechCrunch PM.

🔋 Why You Need More Than One Power Bank — Things We Want to See in Windows 12

Monday, May 6, 2024

Also: 7 Samsung Messages Features You Should Be Using, and More! How-To Geek Logo May 6, 2024 Did You Know You can find all manner of canned vegetables, but not broccoli: the temperatures required for

Launch pad decongestion

Monday, May 6, 2024

We've got some very cool news from Hubble Networks, which became the first company to connect a Bluetooth chip to a satellite. View this email online in your browser By Aria Alamalhodaei Monday,

Daily Coding Problem: Problem #1433 [Medium]

Monday, May 6, 2024

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Nest. Create a basic sentence checker that takes in a stream of characters and

Want to become an AI consultant?

Monday, May 6, 2024

My take on this new industry ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

Visualized | Interest Rate Forecasts for Advanced Economies 📈📉

Monday, May 6, 2024

In this graphic, we show the IMF's interest rate forecast for the US, Europe, the UK, and Japan for the next five years ahead. View Online | Subscribe Presented by Voronoi: The App Where Data Tells

⚙️ Apple AI updates

Monday, May 6, 2024

Plus: X AI stories & YouTube "skip to the good part" ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

Unlock Time Series Data, FTC Chair Joins StrictlyVC & More

Monday, May 6, 2024

TechCrunch Events Roundup | May 6 TechCrunch Events TechCrunch events roundup Unlock the power of time series data with industry experts from AWS and InfluxDB on May 16. Join us next week for this free

Deepdive – product strategy, AI, leadership, emotional intelligence

Monday, May 6, 2024

Earlier this month, we presented our Virtual edition of INDUSTRY: The Product Conference, featuring some of our favorite product leaders worldwide. There were seven great keynote presentations, live