[in case you missed it] Data Science Weekly - Issue 477

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #477

January 11 2023

Editor's Picks

 
  • The Economics of Maps
    For centuries, maps have codified the extent of human geographic knowledge and shaped discovery and economic decision-making...In this essay, we first review and unify recent literature in a variety of different fields that highlights the economic and social consequences of maps, along with an overview of the modern geospatial industry. We then outline our economic framework in which a given map is the result of economic choices around map data and designs, resulting in variations in private and social returns to mapmaking. We highlight five important economic and institutional factors shaping mapmakers' data and design choices...
  • Cinematic Techniques in Narrative Visualization
    The many genres of narrative visualization (e.g. data comics, data videos) each offer a unique set of affordances and constraints. To better understand a genre that we call cinematic visualizations-3D visualizations that make highly deliberate use of a camera to convey a narrative-we gathered 50 examples and analyzed their traditional cinematic aspects to identify the benefits and limitations of the form. While the cinematic visualization approach can violate traditional rules of visualization, we find that through careful control of the camera, cinematic visualizations enable immersion in data-driven, anthropocentric environments, and can naturally incorporate in-situ narrators, concrete scales, and visual analogies...
  • NLP Startup Funding in 2022
    I track company funding and acquisitions in the natural language processing space. In 2022, I found just over 340 relevant funding events, ranging from pre-seed funding all the way through to late-stage Series E and F rounds. In this article, I focus in on early-stage companies: specifically, those who reported pre-seed funding, seed funding or Series A funding rounds...I attempt to impose some organisation and structure over the offerings of these companies, with the aim of highlighting the technology and application areas that have been considered worthy of investment over the last twelve months...


 

A Message from this week's Sponsor:

 



Get Your Models Into Production Faster With Encord

Forget about fragmented tools and notebooks for creating your active learning pipelines.

Encord is a single integrated platform that makes it quicker and easier to build production computer vision models using active learning pipelines.

Encord helps you streamline your machine learning projects, giving you a single platform for labeling any visual data, managing annotators, improving training data quality and debugging your datasets and models.

Get in touch to arrange your free trial of Encord and see how we can help you get your models into production faster.


 

Data Science Articles & Videos

 
  • State Space Model Book Club
    Our causal inference book club was a success. Over 300 people took part and every session was well attended! So we're going to do this again. This time we'll focus State Space Models, and specifically on Dynamax a new library that makes these simple to use in a modern data stack...read on for why we picked this topic and the details of this next phase of our book club...
  • Bringing "balance" to your data
    In research and data science, we sometimes encounter biased data: that is, data that has not been sampled completely randomly and suffers from an over- or under-indexing toward the population of interest...With survey data playing a key role in research and product work at Meta, we observed a growing need for software tools that make survey statistics methods accessible for researchers and engineers. This has led us to develop “balance”: A Python package for adjusting biased data samples. In balance we introduce a simple easy-to-use framework for weighting data and evaluating its biases with and without adjustments...
  • nanoGPT
    The simplest, fastest repository for training/finetuning medium-sized GPTs. It is a rewrite of minGPT that prioritizes teeth over education. Still under active development, but currently the file train.py reproduces GPT-2 (124M) on OpenWebText, running on a single 8XA100 40GB node in 38 hours of training. The code itself is plain and readable: train.py is a ~300-line boilerplate training loop and model.py a ~300-line GPT model definition, which can optionally load the GPT-2 weights from OpenAI. That's it...
  • Language Models are Drummers: Drum Composition with Natural Language Pre-Training
    Automatic music generation with artificial intelligence typically requires a large amount of data which is hard to obtain for many less common genres and musical instruments. To tackle this issue, we present ongoing work and preliminary findings on the possibility for deep models to transfer knowledge from language to music, by finetuning large language models pre-trained on a massive text corpus on only hundreds of MIDI files of drum performances...
  • Understanding Inverse Probability of Treatment Weighting (IPTW) in Causal Inference
    In this post I will provide an intuitive and illustrated explanation of inverse probability of treatment weighting (IPTW), which is one of various propensity score (PS) methods. IPTW is an alternative to multivariate linear regression in the context of causal inference, since both attempt to ascertain the effect of a treatment on an outcome in the presence of confounds. It is important to note the current evidence does not support the claim that IPTW is superior to multivariate linear models (Glynn et al., 2006). However, IPTW does confer certain theoretical and practical benefits that we will review in this post...
  • Seven ways humanists are using computers to understand text
    The image below is a map of a few things you might do with text...The idea is to give you a loose sense of how different activities are related to different disciplinary traditions. We’ll start in the center, and spiral out; this is just a way to organize discussion, and isn’t necessarily meant to suggest a sequential work flow...
  • How to Objectively Compare Two Ranked Lists in Python
    A simplified explanation and implementation of Rank Biased Overlap...imagine you and your friend have both watched all 8 Harry Potter movies...But there’s a catch — you have watched each movie the day it was released, without missing a single premier...Your friend, however, watched the 2nd movie first, then the 4th and 5th, and then binge-watched the rest when it was available on Netflix...Theoretically, you and your friend are on an equal footing — both have watched all the movies of the series...Is it really equal though?...
  • Forecasting Potential Misuses of Language Models for Disinformation Campaigns — and How to Reduce Risk
    OpenAI researchers collaborated with Georgetown University’s Center for Security and Emerging Technology and the Stanford Internet Observatory to investigate how large language models might be misused for disinformation purposes. The collaboration included an October 2021 workshop bringing together 30 disinformation researchers, machine learning experts, and policy analysts, and culminated in a co-authored report building on more than a year of research. This report outlines the threats that language models pose to the information environment if used to augment disinformation campaigns and introduces a framework for analyzing potential mitigations...
  • Announcing new R Shiny UI components
    I’m thrilled to share that the latest release of the {bslib} R package introduces new a Card API, Value boxes, and a responsive grid-like layout. These new UI components work in Shiny, R Markdown, Quarto (or really any R-based HTML project) and work best alongside the new {bsicons} package (an R interface to Bootstrap icons) as well as the latest versions of {htmlwidgets} and {shiny}...
  • Numerical Marvels Inside Python [Video]
    Speaker Raymond Hettinger has been a prolific contributor to the CPython project for over a decade, having implemented and maintained many of Python's great features. He has been instrumental in modules like bisect, collections, decimal, functools, itertools, math, random, with types like namedtuple, sets, dictionaries, and in many other places around the codebase. He has contributed to the modification of nearly 90,000 lines of code in the CPython repository, and has made over 160 changes in the PEP repository...
  • Superposition, Memorization, and Double Descent
    In this note, we offer a very preliminary investigation of training the same toy models in our previous paper on limited datasets. Despite being extremely simple, the toy model turns out to be a surprisingly rich case study for overfitting. In particular, we find the following: a) Overfitting corresponds to storing data points, rather than features, in superposition, b) Depending on dataset size, our models fall into two different regimes: an overfitting regime (characterized by storing data points in superposition), and a generalizing regime (characterized by storing features in superposition), and c) We observe double descent as the model transitions between these regimes...
  • Self-serve feature platforms: architectures and APIs
    This post consists of two parts. The first part discusses the evolution of feature platforms, how they differ from model platforms and feature stores. The second part discusses the core challenges of making feature platforms self-serve for data scientists and increase the iteration speed for feature engineering...


 

Tool*

 



Build powerful ML visualizations with Comet

With just 2 lines of code, Comet automatically logs metrics, hyperparameters, libraries, and more. This means automatic chart generation so you can easily manage training runs in real time. When you combine that with:
  • built-in visualizations (like the image panel),
  • custom project views, and
  • your own python panels,
Comet is a powerful tool for optimizing your ML workflow. All for free! Less friction, more ML.

Create your free account.


*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!



 

Jobs

 
  • Data Scientist / Machine Learning Engineer - Epsilon - NYC

    Epsilon Strategy and Insights, Data Sciences team is looking for a talented team player in a Data Scientist/Machine Learning Engineer role. You are an expert, mentor and advocate. You have strong machine learning and deep learning background and are passionate about transforming data into ml models. You welcome the challenge of data science and are proficient in Python, Spark MLLib, Tensorflow, Keras, ML algorithms and Deep Neural Networks, Big Data. You must be self-driven, take initiative and want to work in a dynamic, busy and innovative group...
     
Want to post a job here? Email us for details --> team@datascienceweekly.org



 

Training & Resources

 
  • University of Washington's LING 575: NLP for Cultural Analytics
    Surveys tools, frameworks, and skills needed to apply natural language processing methods to applications in the humanities and social sciences, with a focus on the analysis of large digital text corpora, including social media, literature, and historical documents. Topics will include data collection, text processing and machine learning techniques, data visualization, and ethical considerations...
  • Stanford's CS324 - Large Language Models
    In this course, students will learn the fundamentals about the modeling, theory, ethics, and systems aspects of large language models, as well as gain hands-on experience working with them...
  • Software Engineering at Google
    In March, 2020, we published a book titled “Software Engineering at Google” curated by Titus Winters, Tom Manshreck and Hyrum Wright...The Software Engineering at Google book (“SWE Book”) is not about programming, per se, but about the engineering practices utilized at Google to make their codebase sustainable and healthy. (These practices are paramount for common infrastructural code such as Abseil.)...We are happy to announce that we are providing a digital version of this book in HTML free of charge...
 

Last Week's Newsletter's 3 Most Clicked Links

 
* Based on unique clicks.
** Find last week's newsletter here.

 


Cutting Room Floor

 


P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2022 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Older messages

Data Science Weekly - Issue 478

Friday, January 20, 2023

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #478 January 18 2023 Editor's Picks

Data Science Weekly - Issue 476

Friday, January 6, 2023

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #476 January 05 2023 Editor's Picks

Data Science Weekly - Issue 475

Thursday, December 29, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #475 December 29 2022 Editor's Picks

Data Science Weekly - Issue 474

Friday, December 23, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #474 December 22 2022 Editor's Picks

Data Science Weekly - Issue 473

Friday, December 16, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #473 December 15 2022 Editor's Picks

You Might Also Like

WebAIM November 2024 Newsletter

Friday, November 22, 2024

WebAIM November 2024 Newsletter Read this newsletter online at https://webaim.org/newsletter/2024/november Features Using Severity Ratings to Prioritize Web Accessibility Remediation When it comes to

➡️ Why Your Phone Doesn't Want You to Sideload Apps — Setting the Default Gateway in Linux

Friday, November 22, 2024

Also: Hey Apple, It's Time to Upgrade the Macs Storage, and More! How-To Geek Logo November 22, 2024 Did You Know Fantasy author JRR Tolkien is credited with inventing the main concept of orcs and

JSK Daily for Nov 22, 2024

Friday, November 22, 2024

JSK Daily for Nov 22, 2024 View this email in your browser A community curated daily e-mail of JavaScript news React E-Commerce App for Digital Products: Part 4 (Creating the Home Page) This component

Spyglass Dispatch: The Fate of Chrome • Amazon Tops Up Anthropic • Pros Quit Xitter • Brave Powers AI Search • Apple's Lazy AI River • RIP Enrique Allen

Friday, November 22, 2024

The Fate of Chrome • Amazon Tops Up Anthropic • Pros Quit Xitter • Brave Powers AI Search • Apple's Lazy AI River • RIP Enrique Allen The Spyglass Dispatch is a free newsletter sent out daily on

Charted | How the Global Distribution of Wealth Has Changed (2000-2023) 💰

Friday, November 22, 2024

This graphic illustrates the shifts in global wealth distribution between 2000 and 2023. View Online | Subscribe | Download Our App Presented by: MSCI >> Get the Free Investor Guide Now FEATURED

Daily Coding Problem: Problem #1616 [Easy]

Friday, November 22, 2024

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Alibaba. Given an even number (greater than 2), return two prime numbers whose sum will

The problem to solve

Friday, November 22, 2024

​ Use problem framing to define the problem to solve This week, Tom Parson and Krishna Raha share tools and frameworks to identify and address challenges effectively, while Voltage Control highlights

Issue #568: Random mazes, train clock, and ReKill

Friday, November 22, 2024

View this email in your browser Issue #568 - November 22nd 2024 Weekly newsletter about Web Game Development. If you have anything you want to share with our community please let me know by replying to

Whats Next for AI: Interpreting Anthropic CEOs Vision

Friday, November 22, 2024

Top Tech Content sent at Noon! How the world collects web data Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, November 22, 2024? The HackerNoon

iOS Cocoa Treats

Friday, November 22, 2024

View in browser Hello, you're reading Infinum iOS Cocoa Treats, bringing you the latest iOS related news straight to your inbox every week. Using the SwiftUI ImageRenderer The SwiftUI ImageRenderer