Data Science Weekly - Data Science Weekly - Issue 464

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #464

October 13 2022

Editor's Picks

 

  • R @ AZ: Building a Community in the Pharmaceutical Industry
    Back in early 2021, a few of us at AstraZeneca got together and reflected on the growing importance of R in our organization. R was far from being a new thing at AstraZeneca and a long way from being a fad...However, we didn’t know who the R users at AstraZeneca were, and there wasn’t any kind of forum where R users could meet...So we decided it was time to start a community of R users at AstraZeneca...then the question was: how to get started?...
  • A Framework for Generating Research Ideas [Google Doc]
    Coming up with good research ideas, especially when you’re new to a field, is tough – it requires an understanding of gaps in literature. However, the process of generating research ideas can start after reading a single research paper. In this lecture, I share a set of frameworks with you to help you generate your own research ideas. First, you will learn to apply a framework to identify gaps in a research paper, including in the research question, experimental setup, and findings. Then, you will learn to apply a framework to generate ideas to build on a research paper, thinking about the elements of the task of interest, evaluation strategy and the proposed method. Finally, you will learn to apply a framework to iterate on your ideas to improve their quality...
  • Seven Sins of Numerical Linear Algebra
    In numerical linear algebra we are concerned with solving linear algebra problems accurately and efficiently and understanding the sensitivity of the problems to perturbations. We describe seven sins, whereby accuracy or efficiency is lost or misleading information about sensitivity is obtained...
 
 

A Message from this week's Sponsor:

 



Pinecone vector database

The Pinecone vector database makes it easy to build high-performance vector search applications. Developer-friendly, fully managed, and easily scalable without infrastructure hassles.

Use Pinecone to build semantic search, object recognition, recommendations, anomaly detection, and other vector-based functionality into your applications.

 

 

Data Science Articles & Videos

 
  • The Contract-Powered Data Platform
    A very significant amount of my time over the past five years has been dedicated to data platform automation, reducing cross-team friction, and improving data quality...Schemas have played a critical role in the process; this post outlines the why and the how. But before diving straight into the role of schemas (er... contracts) let's talk data platforms...
  • How dbt fails
    In the data industry, no company is more likely to end up in an HBS classroom than dbt Labs...Though I don’t think that it’ll happen, there is some dark timeline out there that ends with a room full of twenty-something MBA students having a snotty debate about how they would’ve saved dbt Labs from its eventual fall from grace. Prior to that discussion, they’ll all read a case study that chronicles what went wrong...This is my guess as to what it’ll say...
  • Introduction to the Conjugate Gradient Method Without the Agonizing Pain
    The Conjugate Gradient Method is the most prominent iterative method for solving sparse systems of linear equations. Unfortunately, many textbook treatments of the topic are written with neither illustrations nor intuition, and their victims can be found to this day babbling senselessly in the corners of dusty libraries. For this reason, a deep, geometric understanding of the method has been reserved for the elite brilliant few who have painstakingly decoded the mumblings of their forebears. Nevertheless,the Conjugate Gradient Method is a composite ofsimple, elegantideas that almost anyone can understand. Of course, a reader as intelligent as yourself will learn them almost effortlessly...
  • Why Data Cleaning is Failing Your ML Models – And What To Do About It
    Model accuracy doesn’t start or end with data cleaning in your notebook with the few tables you use to inform, train, and validate your model. It starts with the ETL pipeline and the instant you choose what to measure to solve your problem...Let’s walk through a semi-hypothetical scenario that contains real examples we’ve seen in the wild to highlight some common failure points. We’ll then discuss how they can be avoided with an organizational commitment to high-quality data...
  • Bad Data, or Interesting Fact?
    Data quality is definitely boring...What makes things interesting is the combination of problem solving and interesting facts...Data quality techniques can be used to find interesting insights...
  • How the Guardian approaches quote extraction with NLP
    A recent trend for media companies is to explore how fields like Natural Language Processing (NLP) and Information Extraction (IE) can modularize content like a long-form article as reusable elements for different storytelling formats (e.g., a podcast, information graphic, or blog). This push is called modular journalism and many media companies are building towards it to automate customized stories to meet individual user needs for a variety of media forms...
  • Microsoft adds DALL-E to its Office suite
    Microsoft is adding AI-generated art to its suite of Office software with a new app named Microsoft Designer...The app functions the same way as AI text-to-image models like DALL-E and Stable Diffusion, letting users type prompts to “instantly generate a variety of designs with minimal effort.”...
  • Pooling In Convolutional Neural Networks
    In this article, we explore the whys and the hows behind the fundamental process of pooling in CNN architectures, and then compare two common techniques: max and average pooling...
  • Low-Rank Approximation Toolbox: Nyström Approximation
    As I discussed in a previous post, many matrices we encounter in applications are well-approximated by a matrix with a small rank. Efficiently computing low-rank approximations has been a major area of research, with applications in everything from classical problems in computational physics and signal processing to trendy topics like data science. In this series, I want to explore some broadly useful algorithms and theoretical techniques in the field of low-rank approximation...
  • Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning
    No-press Diplomacy is a complex strategy game involving both cooperation and competition that has served as a benchmark for multi-agent AI research. While self-play reinforcement learning has resulted in numerous successes in purely adversarial games like chess, Go, and poker, self-play alone is insufficient for achieving optimal performance in domains involving cooperation with humans. We address this shortcoming by first introducing a planning algorithm we call DiL-piKL that regularizes a reward-maximizing policy toward a human imitation-learned policy. We prove that this is a no-regret learning algorithm under a modified utility function...
  • Building Billion-Scale Vector Search
    Did you know that algorithms and data structures for approximate vector search can search across billions of vectors in high dimensional space in a few milliseconds?...In this blog post, we look at how fast is fast enough in the context of vector search, and how the answer to this question impacts how we design and build a billion-scale vector search solution...
  • gganimate
    gganimate extends the grammar of graphics as implemented by ggplot2 to include the description of animation. It does this by providing a range of new grammar classes that can be added to the plot object in order to customise how it should change with time...
 
 

Summit*

 


Register for IMPACT 2022: The Data Observability Summit

Join thousands of professionals for a virtual event October 25-26 to learn how to drive real-world impact with your data at scale.

Get inspired with virtual keynotes from Nate Silver, the FiveThirtyEight founder and editor-in-chief, Daniel Kahneman, the Nobel Prize-winning psychologist, economist, and author of Thinking, Fast and Slow. Hear from the founders and chief executives of Databricks, Looker, Confluent, dbt Labs, and Fivetran about the industry's hottest technologies. Leverage best practices from leaders heading the industry’s top data organizations including The New York Times, Roche, and GitLab.

RSVP at impactdatasummit.com/2022



*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

 

 

Conference*

 



Global AI Developer Days – 26 October 2022

Join the Global AI Community for a day of inspiring keynotes from industry leaders with a high focus on AI developers.

Highlights during this 3-hour conference include responsible AI by Ruth Yakubu Principal Cloud Advocate at Microsoft. She will talk about how to improve fairness and reliability of AI solutions. Eric Boyd, Corporate Vice President at Microsoft, will show all the latest inventions in Azure AI. Manuvir Das, Head of Enterprise Computing at NVIDIA, takes you on a journey through the new era of AI for developers and many more leaders from the AI community will share their vision.

Don’t miss out on this free day of learning from top leaders in the AI space!

https://devdays.globalai.community


*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

 

 

Jobs

 
  • Data Scientist - Mount Sinai Data Commons - NYC

    A position is available for an individual with skills in data science, bioinformatics and software engineering to play the key role in running and managing the Mount Sinai Data Commons – known as the Data Ark. The Data Ark team brings together all the most important data sets used by Sinai researchers (e.g. 1000G, GTEx, UK Biobank) in a single location on our HPC server (minvera.org), performs QA/QC processing of the data, conducts initial demographics analyses to showcase the different data sets, and will be tasked with expanding the data commons to host a large range of different data sets of different types (genotype, WES, WGS, RNA-seq, EHR-linked, imaging etc.), which will come with their own computational and platform challenges...
     

        Want to post a job here? Email us for details --> team@datascienceweekly.org

 

 

Training & Resources

 
  • Great RStudio Shortcuts
    There are so many RStudio shortcuts out there but these are my TOP5 that I regularly use 😊...I love them because they usually make your life easier...
  • Andrej Karpathy - Becoming a Backprop Ninja [Video]
    We take the 2-layer MLP (with BatchNorm) from the previous video and backpropagate through it manually without using PyTorch autograd's loss.backward(): through the cross entropy loss, 2nd linear layer, tanh, batchnorm, 1st linear layer, and the embedding table. Along the way, we get a strong intuitive understanding about how gradients flow backwards through the compute graph and on the level of efficient Tensors, not just individual scalars like in micrograd. This helps build competence and intuition around how neural nets are optimized and sets you up to more confidently innovate on and debug modern neural networks...
 
 

What you’re up to – notes from DSW readers

 
  • Fill out the form below to appear here :) ...
 

* To share your projects and updates, share the details here.

** Want to chat with one of the above people? Hit reply and let us know :)

 

Last Week's Newsletter's 3 Most Clicked Links

   

* Based on unique clicks.

** Find last week's newsletter here.



 

Cutting Room Floor

 


P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2022 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Key phrases

Older messages

Data Science Weekly - Issue 463

Thursday, October 6, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #463 October 06 2022 Editor's Picks

Data Science Weekly - Issue 462

Thursday, September 29, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #462 September 29 2022 Editor's

Data Science Weekly - Issue 461

Friday, September 23, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #461 September 22 2022 Editor's

Data Science Weekly - Issue 460

Thursday, September 15, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #460 September 15 2022 Editor's

Data Science Weekly - Issue 459

Thursday, September 8, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #459 September 08 2022 Editor's

You Might Also Like

SWLW #592: Advice that I can't get out of my head, The Compass vs. Map method, and more

Friday, March 29, 2024

Weekly articles & videos about people, culture and leadership: everything you need to design the org that makes the product. A weekly newsletter by Oren Ellenbogen with the best content I found

ASP.NET Core News - 03/29/2024

Friday, March 29, 2024

View this email in your browser Get ready for this weeks best blog posts about ASP.NET Core! This newsletter is sponsored by elmah.io - the most advanced, yet so simple to set up, error logging and

New Linux Bug Could Lead to User Password Leaks and Clipboard Hijacking

Friday, March 29, 2024

THN Daily Updates Newsletter cover Refactoring in Java ($36.99 Value) FREE for a Limited Time Refactoring in Java serves as an indispensable guide to enhancing your codebase's quality and

Post from Syncfusion Blogs on 03/29/2024

Friday, March 29, 2024

New blogs from Syncfusion Introducing the New .NET MAUI Chat Control By Piruthiviraj Malaimelraj This blog explains the features of the new Syncfusion .NET MAUI Chat control added in the 2024 Volume 1

Re: Last Chance

Friday, March 29, 2024

Dear there, By this time tomorrow, your exclusive new subscriber discount will be gone and you'll have to pay twice as much to join Insider and master everything your iPhone has to offer. If, like

Hacker Newsletter #694

Friday, March 29, 2024

Always forgive your enemies - nothing annoys them so much. //Oscar Wilde hackernewsletter Issue #694 // 2024-03-29 // View in your browser Happy Easter if you celebrate it! Heads up - we're taking

Apple RCS 📱, SBF's 25 year sentence 👮, Linux Foundation's Redis fork 👨‍💻

Friday, March 29, 2024

RCS is coming to the iPhone in the fall of 2024 Sign Up|Advertise|View Online TLDR Together With Veracode TLDR 2024-03-29 Build fast, build secure (Sponsor) Software is drowning in security debt.

Data Science Weekly - Issue 540

Friday, March 29, 2024

Curated news, articles and jobs related to Data Science, AI, & Machine Learning ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

This Week in Rust #540

Friday, March 29, 2024

Email isn't displaying correctly? Read this e-mail on the Web This Week in Rust issue 540 — 27 MAR 2024 Hello and welcome to another issue of This Week in Rust! Rust is a programming language

The Value Of A Promise 🤞

Friday, March 29, 2024

How much is a promise from a tech company really worth, anyway? Here's a version for your browser. Hunting for the end of the long tail • March 28, 2024 The Value Of A Promise When you hear a