Data Science Weekly - Data Science Weekly - Issue 448

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #448

June 23 2022

Editor Picks

 
  • Machine Learning and Tax Enforcement
    Each year, the Internal Revenue Services receives over 3 billion information returns, such as W-2s and 1099-INTs, from employers, banks, and other entities...In 2021, the Biden administration proposed that a portion of its request for a 55 percent boost (after adjusting for inflation) to the IRS budget over the next decade be used for developing machine learning. If successful, machine learning would marshal the vast trove of data currently received by the IRS to achieve more targeted and productive enforcement actions...
  • The Annotated Diffusion Model
    In this blog post, we'll take a deeper look into Denoising Diffusion Probabilistic Models (also known as DDPMs, diffusion models, score-based generative models or simply autoencoders) as researchers have been able to achieve remarkable results with them for (un)conditional image/audio/video generation. Popular examples (at the time of writing) include GLIDE and DALL-E 2 by OpenAI, Latent Diffusion by the University of Heidelberg and ImageGen by Google Brain...
  • How fast can we perform a forward pass?
    Over the last month, I’ve spent a lot of time trying to answer the following question: How quickly can we perform one forward pass in a transformer model?...By a transformer model, I mean BERT, GPT-3, T5, Chinchilla, or other large language models that use a transformer architecture. By a forward pass, I mean the computation needed to generate the next token given all the tokens so far.[1] By “how quickly”, I mean how much wall clock time elapses between the call to the forward pass and its completion...
 
 

A Message from this week's Sponsor:

 



Online Data Science Programs from Drexel University

Find your algorithm for success with an online data science degree from Drexel University. Gain essential skills in tool creation and development, data and text mining, trend identification, and data manipulation and summarization by using leading industry technology to apply to your career. Learn more.

 

 

Data Science Articles & Videos

 
  • Condemning the deployment of GPT-4chan
    The deployment of GPT-4chan is a clear example of irresponsible practice. GPT-4chan is a language model that Kilcher trained on over three million 4chan threads from the Politically Incorrect /pol/ board, a community full of racist, sexist, xenophobic, and hateful speech that has been linked to white-supremacist violence such as the Buffalo shooting last month. He then used GPT-4chan to generate and deceptively post over 30,000 posts on 4chan mimicking the hateful comments it was trained on without identifying the model as a bot. Kilcher now claims that the release of “the most horrible model on the internet” was “a prank and light-hearted trolling.”...Kilcher’s decision to deploy this bot does not meet any test of reasonableness. His actions deserve censure. He undermines the responsible practice of AI science. If you agree with this statement, please fill out this form to sign it...
  • Mapping Urban Trees Across North America with the Auto Arborist Dataset
    Today we introduce the Auto Arborist Dataset, a multiview urban tree classification dataset that, at ~2.6 million trees and >320 genera, is two orders of magnitude larger than those in prior work. To build the dataset, we pulled from public tree censuses from 23 North American cities (shown above) and merged these records with Street View and overhead RGB imagery. As the first urban forest dataset to cover multiple cities, we analyze in detail how forest models can generalize with respect to geographic distribution shifts, crucial to building systems that scale. We are releasing all 2.6M tree records publicly, along with aerial and ground-level imagery for 1M trees...
  • Learning to Infer Structures of Network Games
    Strategic interactions between a group of individuals or organisations can be modelled as games played on networks, where a player's payoff depends not only on their actions but also on those of their neighbours. Inferring the network structure from observed game outcomes (equilibrium actions) is an important problem with numerous potential applications in economics and social sciences...
  • How do you ace your SQL skills? [Reddit Discussion]
    I am asking about mastering them. Like queries with varying levels of complexity. Some of the Technical Analysts I've worked with have written most mind-blowing Scripts with ease. I encounter the databases daily and want to acquire that levels of proficiency. I am familiar with SQL but I want to take it to the next level. Would you guys suggest me the best places to start exploring and also the strategies that worked for you to enhance your SQL skillsets...
  • Lyapunov Density Models: Constraining Distribution Shift in Learning-Based Control
    When deploying learning-based controllers, we seek a mechanism to constrain the agent to states and actions that resemble those in the training data..However, in order for an agent to remain in-distribution throughout it's trajectory, the agent must not only avoid visiting states and actions that are out-of-distribution...We present Lyapunov density models (LDMs): a generalization of control Lyapunov functions and density models that provides guarantees on an agent's ability to stay in-distribution over its entire trajectory...
  • Diagram as Code
    Diagrams lets you draw the cloud system architecture in Python code. It was born for prototyping a new system architecture design without any design tools. You can also describe or visualize the existing system architecture as well. Diagrams currently supports main major providers including: AWS, Azure, GCP, Kubernetes, Alibaba Cloud, Oracle Cloud etc... It also supports On-Premise nodes, SaaS and major Programming frameworks and languages...
  • Parti - Pathways Autoregressive Text-to-Image Model
    We introduce the Pathways Autoregressive Text-to-Image model (Parti), an autoregressive text-to-image generation model that achieves high-fidelity photorealistic image generation and supports content-rich synthesis involving complex compositions and world knowledge...
  • The State of Data Engineering 2022
    A year has passed since we shared the State of Data Engineering 2021...It was another year worthy of its own prime-time drama, and we’re back to share our updated, digestible snapshot of it all!...
  • Outsider Oversight: Designing a Third Party Audit Ecosystem for AI Governance
    Much attention has focused on algorithmic audits and impact assessments to hold developers and users of algorithmic systems accountable. But existing algorithmic accountability policy approaches have neglected the lessons from non-algorithmic domains: notably, the importance of interventions that allow for the effective participation of third parties. Our paper synthesizes lessons from other fields on how to craft effective systems of external oversight for algorithmic deployments...
 
 

Course*

 


Land Your Dream Job with TDI

One Week Left for Priority Enrollment to Our Data Bootcamps!

Apply by July 1 to earn our coveted priority enrollment package and you’ll get:
  • Up to $2k of tuition
  • Early access to our 12-day python bootcamp
  • Premier access to our resume review services
  • The early chance to join our discord to chat with peers before the course even starts.
  • Did we mention you can also increase your chances of getting a full-tuition scholarship?
What are you waiting for? Early application closes on July 1 so don’t wait!
Apply Now.


*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

 
 

Jobs

 
  • Senior Data Scientist, Startup Creation at Redesign Health - US

    As our Senior Data Scientist for our Startup Creation team, you will set up and configure the data infrastructure for our startups, and work with the startup founding team to define data driven KPIs, and implement automated statistical analyses of customer behavior. Your goal is to make all of the companies that we launch data-driven from day one.

    In this role, you will function as an in-house implementation team for the companies that Redesign Health launches (internally referred to as OpCos). We provide data strategy, data pipeline, data analytics and forecasting services to newly formed companies in a repeatable and scalable manner...

     

        Want to post a job here? Email us for details --> team@datascienceweekly.org

 
 

Training & Resources

 
  • OpenFold - Trainable, memory-efficient, and GPU-friendly PyTorch reproduction of AlphaFold 2
    OpenFold carefully reproduces (almost) all of the features of the original open source inference code (v2.0.1). The sole exception is model ensembling, which fared poorly in DeepMind's own ablation testing and is being phased out in future DeepMind experiments. It is omitted here for the sake of reducing clutter. In cases where the Nature paper differs from the source, we always defer to the latter...
 
 

What you’re up to – notes from DSW readers

 
  • Working on something cool? Let us know here :) ...
 

* To share your projects and updates, share the details here.

** Want to chat with one of the above people? Hit reply and let us know :)

 

Last Week's Newsletter's 3 Most Clicked Links

 

* Based on unique clicks.

** Find last week's newsletter here.

 

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2022 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Older messages

Data Science Weekly - Issue 447

Friday, June 17, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #447 June 16 2022 Editor Picks The

Data Science Weekly - Issue 446

Friday, June 10, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #446 June 09 2022 Editor Picks Literary

Data Science Weekly - Issue 445

Saturday, June 4, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #445 June 02 2022 Editor Picks Best

Data Science Weekly - Issue 444

Thursday, May 26, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #444 May 26 2022 Editor Picks Stanford

Data Science Weekly - Issue 443

Thursday, May 19, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #443 May 19 2022 What are you up to? Hi

You Might Also Like

📧 Building Async APIs in ASP.NET Core - The Right Way

Saturday, November 23, 2024

​ Building Async APIs in ASP .NET Core - The Right Way Read on: m​y website / Read time: 5 minutes The .NET Weekly is brought to you by: Even the smartest AI in the world won't save you from a

WebAIM November 2024 Newsletter

Friday, November 22, 2024

WebAIM November 2024 Newsletter Read this newsletter online at https://webaim.org/newsletter/2024/november Features Using Severity Ratings to Prioritize Web Accessibility Remediation When it comes to

➡️ Why Your Phone Doesn't Want You to Sideload Apps — Setting the Default Gateway in Linux

Friday, November 22, 2024

Also: Hey Apple, It's Time to Upgrade the Macs Storage, and More! How-To Geek Logo November 22, 2024 Did You Know Fantasy author JRR Tolkien is credited with inventing the main concept of orcs and

JSK Daily for Nov 22, 2024

Friday, November 22, 2024

JSK Daily for Nov 22, 2024 View this email in your browser A community curated daily e-mail of JavaScript news React E-Commerce App for Digital Products: Part 4 (Creating the Home Page) This component

Spyglass Dispatch: The Fate of Chrome • Amazon Tops Up Anthropic • Pros Quit Xitter • Brave Powers AI Search • Apple's Lazy AI River • RIP Enrique Allen

Friday, November 22, 2024

The Fate of Chrome • Amazon Tops Up Anthropic • Pros Quit Xitter • Brave Powers AI Search • Apple's Lazy AI River • RIP Enrique Allen The Spyglass Dispatch is a free newsletter sent out daily on

Charted | How the Global Distribution of Wealth Has Changed (2000-2023) 💰

Friday, November 22, 2024

This graphic illustrates the shifts in global wealth distribution between 2000 and 2023. View Online | Subscribe | Download Our App Presented by: MSCI >> Get the Free Investor Guide Now FEATURED

Daily Coding Problem: Problem #1616 [Easy]

Friday, November 22, 2024

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Alibaba. Given an even number (greater than 2), return two prime numbers whose sum will

The problem to solve

Friday, November 22, 2024

​ Use problem framing to define the problem to solve This week, Tom Parson and Krishna Raha share tools and frameworks to identify and address challenges effectively, while Voltage Control highlights

Issue #568: Random mazes, train clock, and ReKill

Friday, November 22, 2024

View this email in your browser Issue #568 - November 22nd 2024 Weekly newsletter about Web Game Development. If you have anything you want to share with our community please let me know by replying to

Whats Next for AI: Interpreting Anthropic CEOs Vision

Friday, November 22, 2024

Top Tech Content sent at Noon! How the world collects web data Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, November 22, 2024? The HackerNoon