Data Science Weekly - Data Science Weekly - Issue 467

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #467

November 03 2022

Editor's Picks

 
  • UX Challenges in Large Language Models
    The biggest bottleneck for large language model startups is UX...This article is based on a year of meeting probably 50+ startups in this category plus having invested in a few directly... I worry that many startups in this space are focusing on the wrong things early on. Specifically, after having met and looked into numerous companies in this space, it seems that UX and product design is the predominant bottleneck holding back most applied large language model startups, not data or modeling...I explain why I think this is the case, highlight many of the key UX issues I observe, and offer recommendations for how a founder building on top of LLMs might account for this...
  • The past, present, and future of notebooks
    Data science notebooks have come a long way since first introduced back in 1988. Here's the 101 on how we got here, where the market is at, and predictions for the future...
  • OpenAI Startup Fund
    OpenAI will give roughly 10 AI startups $1M each and early access to its systems...
 
 

A Message from this week's Sponsor:

 



Out now: new semantic layer whitepapers

Check out this bundle of Semantic Layer whitepapers by best selling authors - download here.

You'll learn the key value propositions to implement a semantic layer and best practices for analytics success with one.


 

 

Data Science Articles & Videos

 
  • Satellite-image-deep-learning Newsletter
    Welcome to the very first of my newsletters on the topic of deep learning applied to satellite and aerial imagery 🛰️ Email is a tried & tested format, and I hope reach a wider audience by embracing it. The material will be curated highlights from my ‘New discovery’ posts, as well as upcoming events and anything else that I think may be of interest. I hope you enjoy this first post!...
  • What Good Data Self-Serve Looks Like
    I once was tasked with figuring out how to ‘democratize data’ for internal employees. No other instructions, solely a general pain point of ‘the data team is stuck doing ad-hoc tickets’ and ‘stakeholders want to get data on their own.’ After floundering for a while, I set out to figure out what data self-serve looked like at other companies. Seemed simple enough. But I quickly learned things aren’t that simple...
  • Large language models are not zero-shot communicators
    Understanding of pragmatics is an essential and ubiquitous part of human communication. We show large language models (LLMs) mostly don’t capture this aspect of language, hindering their applicability in the real world. Our analysis indicates where the largest room for improvement is to ultimately make this technology more useful....
  • Simple data analysis (SDA) in JavaScript
    Easy-to-use JavaScript library for most common data analysis tasks...These project's goals are: a) To ease the way for non-coders (especially journalists) into the beautiful world of data analysis and data visualization in JavaScript and b) To standardize and accelerate frontend/backend workflows with a simple-to-use library working both in the browser and with NodeJS...
  • WeightedSHAP: analyzing and improving Shapley based feature attributions
    This repository provides an implementation of the paper WeightedSHAP: analyzing and improving Shapley based feature attributions accepted at NeurIPS 2022. We show the suboptimality of SHAP and propose a new feature attribution method called WeightedSHAP. WeightedSHAP is a generalization of SHAP and is more effective to capture influential features...
  • PyData Impact Scholarship - Application Form
    Are you a member of an underrepresented group in technology and/or open source and looking for ways to develop your career further and increase your professional impact?...Through the PyData Impact Scholarship, you will get connected to other professionals in the field and learn from their experiences. Scholars will have their own private track during the conference, with workshops designed to help you increase your visibility and impact, and you will join the Impact Scholars community with regular meetups afterwards...
  • Automated classification of videos from trail cameras
    If you've looked at videos collected from trail cameras, you might have found that a large fraction of them contain no visible animals. And if you've spent much time looking at blank videos, you might wish there was a better way!...Well, there is. Using automatic classification from Zamba, an AI tool for wildlife research and conservation, you can eliminate a substantial fraction of blank videos, sight unseen, while losing only a small fraction of videos that actually contain animals...
  • Monolith: Real Time Recommendation System With Collisionless Embedding Table
    In this paper, we present Monolith, a system tailored for online training. Our design has been driven by observations of our application workloads and production environment that reflects a marked departure from other recommendations systems. Our contributions are manifold: first, we crafted a collisionless embedding table with optimizations such as expirable embeddings and frequency filtering to reduce its memory footprint; second, we provide an production-ready online training architecture with high fault-tolerance; finally, we proved that system reliability could be traded-off for real-time learning. Monolith has successfully landed in the BytePlus Recommend product...
  • Unsupervised visualization of image datasets using contrastive learning
    Visualization methods based on the nearest neighbor graph, such as t-SNE or UMAP, are widely used for visualizing high-dimensional data. Yet, these approaches only produce meaningful results if the nearest neighbors themselves are meaningful...Here, we present a new method, called t-SimCNE, for unsupervised visualization of image data...We show that the resulting 2D embeddings achieve classification accuracy comparable to the state-of-the-art high-dimensional SimCLR representations, thus faithfully capturing semantic relationships. Using t-SimCNE, we obtain informative visualizations of the CIFAR-10 and CIFAR-100 datasets, showing rich cluster structure and highlighting artifacts and outliers...
 
 

Tool*

 



Retool is the fast way to build an interface for any database

With Retool, you don't need to be a developer to quickly build an app or dashboard on top of any data set. Data teams at companies like NBC use Retool to build any interface on top of their data—whether it's a simple read-write visualization or a full-fledged ML workflow.

Drag and drop UI components—like tables and charts—to create apps. At every step, you can jump into the code to define the SQL queries and JavaScript that power how your app acts and connects to data. The result—less time on repetitive work and more time to discover insights.


*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

   
 

Tool*

 



Instant Access to Web Data | Bright Data’s End to End Solution

When it comes to data analysis you’re #1 but your skills are only as good as your data. Bright Data is the world's leading web data collection platform covering everything from ready-made datasets to web scraping and proxies. Just for you, an exclusive offer for Data Science Weekly subscribers - 1 dataset refresh free of charge to make sure your data is as fresh as you are ;)


*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

   
 

Jobs

 
  • Senior Data Analyst - Epic Games - New York

    Epic Games spans across 19 countries with 55 studios and 4,500+ employees globally. For over 25 years, we’ve been making award-winning games and engine technology that empowers others to make visually stunning games and 3D content that bring environments to life like never before.

    Use your expert experience in data & analytics to build powerful stories and visuals that inform the games we make, the technology we develop, and business decisions that drive Epic... Epic Games is looking for a Senior Data Analyst to help us create the models that fuel our creator economy. The successful candidate will have excellent SQL knowledge, and enjoy combining analytic skills with business acumen to provide the data and insights that will drive our continued success...

     

        Want to post a job here? Email us for details --> team@datascienceweekly.org

 

 

Training & Resources

 
  • Geometric Kernels
    Geometric Kernels is a library that implements natural kernels (Heat, Matérn) on such non-Euclidean spaces as Riemannian manifolds, graphs and meshes...
  • DSC 223: Introduction to Data Science
    I recently taught my Intro to Data Science course and created a course website for the first time...It's still rough around the edges - progress though. Content is primarily from Data Science in a Box and the website uses a template, both by Mine Çetinkaya-Rundel...
  • Best NLP Papers — October 2022
    If you work in NLP, it's important to keep up to date with the latest research. In this post, we look at some of the best papers on NLP that were published in October 2022...
 

Last Week's Newsletter's 3 Most Clicked Links

   

* Based on unique clicks.

** Find last week's newsletter here.

 

Cutting Room Floor

 


P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2022 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Older messages

Data Science Weekly - Issue 466

Thursday, October 27, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #466 October 27 2022 Editor's Picks

Data Science Weekly - Issue 465

Thursday, October 20, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #465 October 20 2022 Editor's Picks

Data Science Weekly - Issue 464

Thursday, October 13, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #464 October 13 2022 Editor's Picks

Data Science Weekly - Issue 463

Thursday, October 6, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #463 October 06 2022 Editor's Picks

Data Science Weekly - Issue 462

Thursday, September 29, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #462 September 29 2022 Editor's

You Might Also Like

Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator

Friday, February 14, 2025

What came before the golem? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Defining Your Paranoia Level: Navigating Change Without the Overkill

Friday, February 14, 2025

We've all been there: trying to learn something new, only to find our old habits holding us back. We discussed today how our gut feelings about solving problems can sometimes be our own worst enemy

5 ways AI can help with taxes 🪄

Friday, February 14, 2025

Remotely control an iPhone; 💸 50+ early Presidents' Day deals -- ZDNET ZDNET Tech Today - US February 10, 2025 5 ways AI can help you with your taxes (and what not to use it for) 5 ways AI can help

Recurring Automations + Secret Updates

Friday, February 14, 2025

Smarter automations, better templates, and hidden updates to explore 👀 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

The First Provable AI-Proof Game: Introducing Butterfly Wings 4

Friday, February 14, 2025

Top Tech Content sent at Noon! Boost Your Article on HackerNoon for $159.99! Read this email in your browser How are you, @newsletterest1? undefined The Market Today #01 Instagram (Meta) 714.52 -0.32%

GCP Newsletter #437

Friday, February 14, 2025

Welcome to issue #437 February 10th, 2025 News BigQuery Cloud Marketplace Official Blog Partners BigQuery datasets now available on Google Cloud Marketplace - Google Cloud Marketplace now offers

Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰

Friday, February 14, 2025

Discover how the share of US wealth held by the top 1% has evolved from 1989 to 2024 in this infographic. View Online | Subscribe | Download Our App Download our app to see thousands of new charts from

The Great Social Media Diaspora & Tapestry is here

Friday, February 14, 2025

Apple introduces new app called 'Apple Invites', The Iconfactory launches Tapestry, beyond the traditional portfolio, and more in this week's issue of Creativerly. Creativerly The Great

Daily Coding Problem: Problem #1689 [Medium]

Friday, February 14, 2025

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. Given a linked list, sort it in O(n log n) time and constant space. For example,

📧 Stop Conflating CQRS and MediatR

Friday, February 14, 2025

​ Stop Conflating CQRS and MediatR Read on: m​y website / Read time: 4 minutes The .NET Weekly is brought to you by: Step right up to the Generative AI Use Cases Repository! See how MongoDB powers your