Data Science Weekly - Data Science Weekly - Issue 464

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #464

October 13 2022

Editor's Picks

 

  • R @ AZ: Building a Community in the Pharmaceutical Industry
    Back in early 2021, a few of us at AstraZeneca got together and reflected on the growing importance of R in our organization. R was far from being a new thing at AstraZeneca and a long way from being a fad...However, we didn’t know who the R users at AstraZeneca were, and there wasn’t any kind of forum where R users could meet...So we decided it was time to start a community of R users at AstraZeneca...then the question was: how to get started?...
  • A Framework for Generating Research Ideas [Google Doc]
    Coming up with good research ideas, especially when you’re new to a field, is tough – it requires an understanding of gaps in literature. However, the process of generating research ideas can start after reading a single research paper. In this lecture, I share a set of frameworks with you to help you generate your own research ideas. First, you will learn to apply a framework to identify gaps in a research paper, including in the research question, experimental setup, and findings. Then, you will learn to apply a framework to generate ideas to build on a research paper, thinking about the elements of the task of interest, evaluation strategy and the proposed method. Finally, you will learn to apply a framework to iterate on your ideas to improve their quality...
  • Seven Sins of Numerical Linear Algebra
    In numerical linear algebra we are concerned with solving linear algebra problems accurately and efficiently and understanding the sensitivity of the problems to perturbations. We describe seven sins, whereby accuracy or efficiency is lost or misleading information about sensitivity is obtained...
 
 

A Message from this week's Sponsor:

 



Pinecone vector database

The Pinecone vector database makes it easy to build high-performance vector search applications. Developer-friendly, fully managed, and easily scalable without infrastructure hassles.

Use Pinecone to build semantic search, object recognition, recommendations, anomaly detection, and other vector-based functionality into your applications.

 

 

Data Science Articles & Videos

 
  • The Contract-Powered Data Platform
    A very significant amount of my time over the past five years has been dedicated to data platform automation, reducing cross-team friction, and improving data quality...Schemas have played a critical role in the process; this post outlines the why and the how. But before diving straight into the role of schemas (er... contracts) let's talk data platforms...
  • How dbt fails
    In the data industry, no company is more likely to end up in an HBS classroom than dbt Labs...Though I don’t think that it’ll happen, there is some dark timeline out there that ends with a room full of twenty-something MBA students having a snotty debate about how they would’ve saved dbt Labs from its eventual fall from grace. Prior to that discussion, they’ll all read a case study that chronicles what went wrong...This is my guess as to what it’ll say...
  • Introduction to the Conjugate Gradient Method Without the Agonizing Pain
    The Conjugate Gradient Method is the most prominent iterative method for solving sparse systems of linear equations. Unfortunately, many textbook treatments of the topic are written with neither illustrations nor intuition, and their victims can be found to this day babbling senselessly in the corners of dusty libraries. For this reason, a deep, geometric understanding of the method has been reserved for the elite brilliant few who have painstakingly decoded the mumblings of their forebears. Nevertheless,the Conjugate Gradient Method is a composite ofsimple, elegantideas that almost anyone can understand. Of course, a reader as intelligent as yourself will learn them almost effortlessly...
  • Why Data Cleaning is Failing Your ML Models – And What To Do About It
    Model accuracy doesn’t start or end with data cleaning in your notebook with the few tables you use to inform, train, and validate your model. It starts with the ETL pipeline and the instant you choose what to measure to solve your problem...Let’s walk through a semi-hypothetical scenario that contains real examples we’ve seen in the wild to highlight some common failure points. We’ll then discuss how they can be avoided with an organizational commitment to high-quality data...
  • Bad Data, or Interesting Fact?
    Data quality is definitely boring...What makes things interesting is the combination of problem solving and interesting facts...Data quality techniques can be used to find interesting insights...
  • How the Guardian approaches quote extraction with NLP
    A recent trend for media companies is to explore how fields like Natural Language Processing (NLP) and Information Extraction (IE) can modularize content like a long-form article as reusable elements for different storytelling formats (e.g., a podcast, information graphic, or blog). This push is called modular journalism and many media companies are building towards it to automate customized stories to meet individual user needs for a variety of media forms...
  • Microsoft adds DALL-E to its Office suite
    Microsoft is adding AI-generated art to its suite of Office software with a new app named Microsoft Designer...The app functions the same way as AI text-to-image models like DALL-E and Stable Diffusion, letting users type prompts to “instantly generate a variety of designs with minimal effort.”...
  • Pooling In Convolutional Neural Networks
    In this article, we explore the whys and the hows behind the fundamental process of pooling in CNN architectures, and then compare two common techniques: max and average pooling...
  • Low-Rank Approximation Toolbox: Nyström Approximation
    As I discussed in a previous post, many matrices we encounter in applications are well-approximated by a matrix with a small rank. Efficiently computing low-rank approximations has been a major area of research, with applications in everything from classical problems in computational physics and signal processing to trendy topics like data science. In this series, I want to explore some broadly useful algorithms and theoretical techniques in the field of low-rank approximation...
  • Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning
    No-press Diplomacy is a complex strategy game involving both cooperation and competition that has served as a benchmark for multi-agent AI research. While self-play reinforcement learning has resulted in numerous successes in purely adversarial games like chess, Go, and poker, self-play alone is insufficient for achieving optimal performance in domains involving cooperation with humans. We address this shortcoming by first introducing a planning algorithm we call DiL-piKL that regularizes a reward-maximizing policy toward a human imitation-learned policy. We prove that this is a no-regret learning algorithm under a modified utility function...
  • Building Billion-Scale Vector Search
    Did you know that algorithms and data structures for approximate vector search can search across billions of vectors in high dimensional space in a few milliseconds?...In this blog post, we look at how fast is fast enough in the context of vector search, and how the answer to this question impacts how we design and build a billion-scale vector search solution...
  • gganimate
    gganimate extends the grammar of graphics as implemented by ggplot2 to include the description of animation. It does this by providing a range of new grammar classes that can be added to the plot object in order to customise how it should change with time...
 
 

Summit*

 


Register for IMPACT 2022: The Data Observability Summit

Join thousands of professionals for a virtual event October 25-26 to learn how to drive real-world impact with your data at scale.

Get inspired with virtual keynotes from Nate Silver, the FiveThirtyEight founder and editor-in-chief, Daniel Kahneman, the Nobel Prize-winning psychologist, economist, and author of Thinking, Fast and Slow. Hear from the founders and chief executives of Databricks, Looker, Confluent, dbt Labs, and Fivetran about the industry's hottest technologies. Leverage best practices from leaders heading the industry’s top data organizations including The New York Times, Roche, and GitLab.

RSVP at impactdatasummit.com/2022



*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

 

 

Conference*

 



Global AI Developer Days – 26 October 2022

Join the Global AI Community for a day of inspiring keynotes from industry leaders with a high focus on AI developers.

Highlights during this 3-hour conference include responsible AI by Ruth Yakubu Principal Cloud Advocate at Microsoft. She will talk about how to improve fairness and reliability of AI solutions. Eric Boyd, Corporate Vice President at Microsoft, will show all the latest inventions in Azure AI. Manuvir Das, Head of Enterprise Computing at NVIDIA, takes you on a journey through the new era of AI for developers and many more leaders from the AI community will share their vision.

Don’t miss out on this free day of learning from top leaders in the AI space!

https://devdays.globalai.community


*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

 

 

Jobs

 
  • Data Scientist - Mount Sinai Data Commons - NYC

    A position is available for an individual with skills in data science, bioinformatics and software engineering to play the key role in running and managing the Mount Sinai Data Commons – known as the Data Ark. The Data Ark team brings together all the most important data sets used by Sinai researchers (e.g. 1000G, GTEx, UK Biobank) in a single location on our HPC server (minvera.org), performs QA/QC processing of the data, conducts initial demographics analyses to showcase the different data sets, and will be tasked with expanding the data commons to host a large range of different data sets of different types (genotype, WES, WGS, RNA-seq, EHR-linked, imaging etc.), which will come with their own computational and platform challenges...
     

        Want to post a job here? Email us for details --> team@datascienceweekly.org

 

 

Training & Resources

 
  • Great RStudio Shortcuts
    There are so many RStudio shortcuts out there but these are my TOP5 that I regularly use 😊...I love them because they usually make your life easier...
  • Andrej Karpathy - Becoming a Backprop Ninja [Video]
    We take the 2-layer MLP (with BatchNorm) from the previous video and backpropagate through it manually without using PyTorch autograd's loss.backward(): through the cross entropy loss, 2nd linear layer, tanh, batchnorm, 1st linear layer, and the embedding table. Along the way, we get a strong intuitive understanding about how gradients flow backwards through the compute graph and on the level of efficient Tensors, not just individual scalars like in micrograd. This helps build competence and intuition around how neural nets are optimized and sets you up to more confidently innovate on and debug modern neural networks...
 
 

What you’re up to – notes from DSW readers

 
  • Fill out the form below to appear here :) ...
 

* To share your projects and updates, share the details here.

** Want to chat with one of the above people? Hit reply and let us know :)

 

Last Week's Newsletter's 3 Most Clicked Links

   

* Based on unique clicks.

** Find last week's newsletter here.



 

Cutting Room Floor

 


P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2022 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Older messages

Data Science Weekly - Issue 463

Thursday, October 6, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #463 October 06 2022 Editor's Picks

Data Science Weekly - Issue 462

Thursday, September 29, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #462 September 29 2022 Editor's

Data Science Weekly - Issue 461

Friday, September 23, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #461 September 22 2022 Editor's

Data Science Weekly - Issue 460

Thursday, September 15, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #460 September 15 2022 Editor's

Data Science Weekly - Issue 459

Thursday, September 8, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #459 September 08 2022 Editor's

You Might Also Like

📧 Building Async APIs in ASP.NET Core - The Right Way

Saturday, November 23, 2024

​ Building Async APIs in ASP .NET Core - The Right Way Read on: m​y website / Read time: 5 minutes The .NET Weekly is brought to you by: Even the smartest AI in the world won't save you from a

WebAIM November 2024 Newsletter

Friday, November 22, 2024

WebAIM November 2024 Newsletter Read this newsletter online at https://webaim.org/newsletter/2024/november Features Using Severity Ratings to Prioritize Web Accessibility Remediation When it comes to

➡️ Why Your Phone Doesn't Want You to Sideload Apps — Setting the Default Gateway in Linux

Friday, November 22, 2024

Also: Hey Apple, It's Time to Upgrade the Macs Storage, and More! How-To Geek Logo November 22, 2024 Did You Know Fantasy author JRR Tolkien is credited with inventing the main concept of orcs and

JSK Daily for Nov 22, 2024

Friday, November 22, 2024

JSK Daily for Nov 22, 2024 View this email in your browser A community curated daily e-mail of JavaScript news React E-Commerce App for Digital Products: Part 4 (Creating the Home Page) This component

Spyglass Dispatch: The Fate of Chrome • Amazon Tops Up Anthropic • Pros Quit Xitter • Brave Powers AI Search • Apple's Lazy AI River • RIP Enrique Allen

Friday, November 22, 2024

The Fate of Chrome • Amazon Tops Up Anthropic • Pros Quit Xitter • Brave Powers AI Search • Apple's Lazy AI River • RIP Enrique Allen The Spyglass Dispatch is a free newsletter sent out daily on

Charted | How the Global Distribution of Wealth Has Changed (2000-2023) 💰

Friday, November 22, 2024

This graphic illustrates the shifts in global wealth distribution between 2000 and 2023. View Online | Subscribe | Download Our App Presented by: MSCI >> Get the Free Investor Guide Now FEATURED

Daily Coding Problem: Problem #1616 [Easy]

Friday, November 22, 2024

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Alibaba. Given an even number (greater than 2), return two prime numbers whose sum will

The problem to solve

Friday, November 22, 2024

​ Use problem framing to define the problem to solve This week, Tom Parson and Krishna Raha share tools and frameworks to identify and address challenges effectively, while Voltage Control highlights

Issue #568: Random mazes, train clock, and ReKill

Friday, November 22, 2024

View this email in your browser Issue #568 - November 22nd 2024 Weekly newsletter about Web Game Development. If you have anything you want to share with our community please let me know by replying to

Whats Next for AI: Interpreting Anthropic CEOs Vision

Friday, November 22, 2024

Top Tech Content sent at Noon! How the world collects web data Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, November 22, 2024? The HackerNoon