Data Science Weekly - Data Science Weekly - Issue 448

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #448

June 23 2022

Editor Picks

 
  • Machine Learning and Tax Enforcement
    Each year, the Internal Revenue Services receives over 3 billion information returns, such as W-2s and 1099-INTs, from employers, banks, and other entities...In 2021, the Biden administration proposed that a portion of its request for a 55 percent boost (after adjusting for inflation) to the IRS budget over the next decade be used for developing machine learning. If successful, machine learning would marshal the vast trove of data currently received by the IRS to achieve more targeted and productive enforcement actions...
  • The Annotated Diffusion Model
    In this blog post, we'll take a deeper look into Denoising Diffusion Probabilistic Models (also known as DDPMs, diffusion models, score-based generative models or simply autoencoders) as researchers have been able to achieve remarkable results with them for (un)conditional image/audio/video generation. Popular examples (at the time of writing) include GLIDE and DALL-E 2 by OpenAI, Latent Diffusion by the University of Heidelberg and ImageGen by Google Brain...
  • How fast can we perform a forward pass?
    Over the last month, I’ve spent a lot of time trying to answer the following question: How quickly can we perform one forward pass in a transformer model?...By a transformer model, I mean BERT, GPT-3, T5, Chinchilla, or other large language models that use a transformer architecture. By a forward pass, I mean the computation needed to generate the next token given all the tokens so far.[1] By “how quickly”, I mean how much wall clock time elapses between the call to the forward pass and its completion...
 
 

A Message from this week's Sponsor:

 



Online Data Science Programs from Drexel University

Find your algorithm for success with an online data science degree from Drexel University. Gain essential skills in tool creation and development, data and text mining, trend identification, and data manipulation and summarization by using leading industry technology to apply to your career. Learn more.

 

 

Data Science Articles & Videos

 
  • Condemning the deployment of GPT-4chan
    The deployment of GPT-4chan is a clear example of irresponsible practice. GPT-4chan is a language model that Kilcher trained on over three million 4chan threads from the Politically Incorrect /pol/ board, a community full of racist, sexist, xenophobic, and hateful speech that has been linked to white-supremacist violence such as the Buffalo shooting last month. He then used GPT-4chan to generate and deceptively post over 30,000 posts on 4chan mimicking the hateful comments it was trained on without identifying the model as a bot. Kilcher now claims that the release of “the most horrible model on the internet” was “a prank and light-hearted trolling.”...Kilcher’s decision to deploy this bot does not meet any test of reasonableness. His actions deserve censure. He undermines the responsible practice of AI science. If you agree with this statement, please fill out this form to sign it...
  • Mapping Urban Trees Across North America with the Auto Arborist Dataset
    Today we introduce the Auto Arborist Dataset, a multiview urban tree classification dataset that, at ~2.6 million trees and >320 genera, is two orders of magnitude larger than those in prior work. To build the dataset, we pulled from public tree censuses from 23 North American cities (shown above) and merged these records with Street View and overhead RGB imagery. As the first urban forest dataset to cover multiple cities, we analyze in detail how forest models can generalize with respect to geographic distribution shifts, crucial to building systems that scale. We are releasing all 2.6M tree records publicly, along with aerial and ground-level imagery for 1M trees...
  • Learning to Infer Structures of Network Games
    Strategic interactions between a group of individuals or organisations can be modelled as games played on networks, where a player's payoff depends not only on their actions but also on those of their neighbours. Inferring the network structure from observed game outcomes (equilibrium actions) is an important problem with numerous potential applications in economics and social sciences...
  • How do you ace your SQL skills? [Reddit Discussion]
    I am asking about mastering them. Like queries with varying levels of complexity. Some of the Technical Analysts I've worked with have written most mind-blowing Scripts with ease. I encounter the databases daily and want to acquire that levels of proficiency. I am familiar with SQL but I want to take it to the next level. Would you guys suggest me the best places to start exploring and also the strategies that worked for you to enhance your SQL skillsets...
  • Lyapunov Density Models: Constraining Distribution Shift in Learning-Based Control
    When deploying learning-based controllers, we seek a mechanism to constrain the agent to states and actions that resemble those in the training data..However, in order for an agent to remain in-distribution throughout it's trajectory, the agent must not only avoid visiting states and actions that are out-of-distribution...We present Lyapunov density models (LDMs): a generalization of control Lyapunov functions and density models that provides guarantees on an agent's ability to stay in-distribution over its entire trajectory...
  • Diagram as Code
    Diagrams lets you draw the cloud system architecture in Python code. It was born for prototyping a new system architecture design without any design tools. You can also describe or visualize the existing system architecture as well. Diagrams currently supports main major providers including: AWS, Azure, GCP, Kubernetes, Alibaba Cloud, Oracle Cloud etc... It also supports On-Premise nodes, SaaS and major Programming frameworks and languages...
  • Parti - Pathways Autoregressive Text-to-Image Model
    We introduce the Pathways Autoregressive Text-to-Image model (Parti), an autoregressive text-to-image generation model that achieves high-fidelity photorealistic image generation and supports content-rich synthesis involving complex compositions and world knowledge...
  • The State of Data Engineering 2022
    A year has passed since we shared the State of Data Engineering 2021...It was another year worthy of its own prime-time drama, and we’re back to share our updated, digestible snapshot of it all!...
  • Outsider Oversight: Designing a Third Party Audit Ecosystem for AI Governance
    Much attention has focused on algorithmic audits and impact assessments to hold developers and users of algorithmic systems accountable. But existing algorithmic accountability policy approaches have neglected the lessons from non-algorithmic domains: notably, the importance of interventions that allow for the effective participation of third parties. Our paper synthesizes lessons from other fields on how to craft effective systems of external oversight for algorithmic deployments...
 
 

Course*

 


Land Your Dream Job with TDI

One Week Left for Priority Enrollment to Our Data Bootcamps!

Apply by July 1 to earn our coveted priority enrollment package and you’ll get:
  • Up to $2k of tuition
  • Early access to our 12-day python bootcamp
  • Premier access to our resume review services
  • The early chance to join our discord to chat with peers before the course even starts.
  • Did we mention you can also increase your chances of getting a full-tuition scholarship?
What are you waiting for? Early application closes on July 1 so don’t wait!
Apply Now.


*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

 
 

Jobs

 
  • Senior Data Scientist, Startup Creation at Redesign Health - US

    As our Senior Data Scientist for our Startup Creation team, you will set up and configure the data infrastructure for our startups, and work with the startup founding team to define data driven KPIs, and implement automated statistical analyses of customer behavior. Your goal is to make all of the companies that we launch data-driven from day one.

    In this role, you will function as an in-house implementation team for the companies that Redesign Health launches (internally referred to as OpCos). We provide data strategy, data pipeline, data analytics and forecasting services to newly formed companies in a repeatable and scalable manner...

     

        Want to post a job here? Email us for details --> team@datascienceweekly.org

 
 

Training & Resources

 
  • OpenFold - Trainable, memory-efficient, and GPU-friendly PyTorch reproduction of AlphaFold 2
    OpenFold carefully reproduces (almost) all of the features of the original open source inference code (v2.0.1). The sole exception is model ensembling, which fared poorly in DeepMind's own ablation testing and is being phased out in future DeepMind experiments. It is omitted here for the sake of reducing clutter. In cases where the Nature paper differs from the source, we always defer to the latter...
 
 

What you’re up to – notes from DSW readers

 
  • Working on something cool? Let us know here :) ...
 

* To share your projects and updates, share the details here.

** Want to chat with one of the above people? Hit reply and let us know :)

 

Last Week's Newsletter's 3 Most Clicked Links

 

* Based on unique clicks.

** Find last week's newsletter here.

 

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2022 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Key phrases

Older messages

Data Science Weekly - Issue 447

Friday, June 17, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #447 June 16 2022 Editor Picks The

Data Science Weekly - Issue 446

Friday, June 10, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #446 June 09 2022 Editor Picks Literary

Data Science Weekly - Issue 445

Saturday, June 4, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #445 June 02 2022 Editor Picks Best

Data Science Weekly - Issue 444

Thursday, May 26, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #444 May 26 2022 Editor Picks Stanford

Data Science Weekly - Issue 443

Thursday, May 19, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #443 May 19 2022 What are you up to? Hi

You Might Also Like

📧 Introduction to Distributed Tracing With OpenTelemetry in .NET

Saturday, April 20, 2024

​ Introduction to Distributed Tracing With OpenTelemetry in .NET Read on: m​y website / Read time: 5 minutes BROUGHT TO YOU BY ​ Shesha: The .NET Open-Source Low-Code Framework ​ Introducing Shesha, a

a16z’s Infrastructure team gets a new general partner

Friday, April 19, 2024

Post News is shutting down and Wall Street isn't feeling a Salesforce-Informatica pairing View this email online in your browser By Christine Hall Friday, April 19, 2024 Image Credits: Andreessen

New Roundtable! Additive for Mass Production Applications

Friday, April 19, 2024

The Outlook for the Future View this email in your browser engineering.com Roundtable - Additive for Mass Production Applications: The Outlook for the Future 6 Considerations for Choosing the Right

📷 What to Know About Macro Photography — Why You Should Buy a Budget Motherboard

Friday, April 19, 2024

Also: How to Automatically Highlight Values in Excel, and More! How-To Geek Logo April 19, 2024 📩 Get expert reviews, the hottest deals, how-to's, breaking news, and more delivered directly to your

Is the wind going out of the AI sails?

Friday, April 19, 2024

Rippling vacuums up venture capital and Ramp bags more millions View this email online in your browser By Haje Jan Kamps Friday, April 19, 2024 Image Credits: Getty Images / Carol Yepes Welcome to

Llama 3 is out - Weekly News Roundup - Issue #463

Friday, April 19, 2024

Plus: brand-new, all-electric Atlas; AI Index Report 2024; Microsoft pitched GenAI tools to US military; Humane AI Pin reviews are in; debunking Devin; and more! ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Daily Coding Problem: Problem #1417 [Easy]

Friday, April 19, 2024

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Wayfair. You are given a 2 x N board, and instructed to completely cover the board with

Charted | How Hard Is It to Get Into an Ivy League School? 🎓

Friday, April 19, 2024

We detail the admission rates and average annual cost for Ivy League schools, as well as the median SAT scores required to be accepted. View Online | Subscribe Presented by: Discover the motivations

Dark Matter & Tortured Poets

Friday, April 19, 2024

New music releases aren't what they used to be -- for good and bad. Dark Matter & Tortured Poets By MG Siegler • 19 Apr 2024 View in browser View in browser New music releases in 2024 are a

Impact of AI on Product Management

Friday, April 19, 2024

​ Impact of AI on Product Management The rise of the AI Product Manager. Product managers have always championed customer's needs. However, with AI, the job requires new technical and ethical