Data Science Weekly - Data Science Weekly - Issue 473

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #473

December 15 2022

Editor's Picks

 
  • Failed ML Project - How bad is the real estate market getting?
    There aren't enough failed data science projects out there. Usually, projects only show up in public if they work. I think that's a shame. If we learn more from our successes than our failures, it makes sense to share more failures to help those around us...TL;DR I made several mistakes in this machine learning project that led to its failure. I pick these apart in this article...
  • Data lineage is a cartography problem
    Data lineage can become overwhelming if you don’t know what you are looking at — there are too many layers affecting the different abstractions of data, and it can be difficult to understand when data (and the organisation) scales...In this post we’ll discuss how we can learn from the field of cartography and Google Maps to extract the untapped potential of data lineage, and build this ideal interface to improve data literacy and observability...
  • Building a synth with ChatGPT
    I know everyone is talking about ChatGPT right now. One aspect I haven’t seen much of is what it actually looks like to build software using it. There’s a bunch of articles about cool things you can do with it, but what are you going to run into when you build real-world software?..My idea was to build a synth using the Web Audio API. You can watch the recording below. Early results were impressive; we were able to get noise without knowing anything about the Web Audio API...


 

A Message from this week's Sponsor:

 



Pinecone vector database

The Pinecone vector database makes it easy to build high-performance vector search applications. Developer-friendly, fully managed, and easily scalable without infrastructure hassles.

Use Pinecone to build semantic search, object recognition, recommendations, anomaly detection, and other vector-based functionality into your applications.



 

Data Science Articles & Videos

 
  • PubMed GPT: a Domain-Specific Large Language Model for Biomedical Text
    We evaluated PubMed GPT on several question and answer (QA) benchmarks, and manually assessed its generations for a question summarization task. One key benchmark was MedQA-USMLE, which consists of question and answer pairs taken from previous Medical Licensing Exams given to doctors in the United States...
  • What do Vision Transformers Learn? A Visual Exploration
    Vision transformers (ViTs) are quickly becoming the de-facto architecture for computer vision, yet we understand very little about why they work and what they learn. While existing studies visually analyze the mechanisms of convolutional neural networks, an analogous exploration of ViTs remains challenging. In this paper, we first address the obstacles to performing visualizations on ViTs. Assisted by these solutions, we observe that neurons in ViTs trained with language model supervision (e.g., CLIP) are activated by semantic concepts rather than visual features...
  • A/B Testing with Holm's Procedure
    Holm's procedure is helpful for multiple tests with any degree of correlation. It's less conservative than Bonferroni because once you achieve a success comparing with the smallest p-value from your trial, you'll be able to benchmark your other tests at a larger test-level alpha...Here's how it work...
  • Streamlining Machine Learning In Production with Ran Romano
    Our wide-ranging conversation touches on his time in the Israeli army, his engineering experience at VMware, his time at Wix building their internal ML platform, his current journey with Qwak building an end-to-end ML engineering platform to automate the MLOps processes, the evolution of feature stores, the MLOps community in Israel, and much more...
  • Web scraping and text analysis in R and GGplot2
    I recently needed to learn text mining for a project at work. I generally learn more quickly with a real-world project. So, I turned to a topic I love: Wilderness, to see how I could apply the skills of text scrubbing and natural language processing...The first portion of this post will cover web scraping, then text mining, and finally analysis and visualization...
  • Don’t Start Your SQL Queries with the ‘Select’ Statement
    The majority of developers start writing their SQL queries with the ‘SELECT’ clause, then write ‘FROM’, ‘WHERE’, ‘HAVING’….and so on. But this is not the ‘right’ way of writing your SQL queries as this is very prone to syntactic errors, especially if you are a beginner in SQL...
  • What's unsolved in generative AI?
    Much has already been written about the market opportunities for the Cambrian explosion of startups now emerging, which aim to remake everything from creative tools to software development...But equally important is what’s missing today. Other computing paradigm shifts necessitated new infrastructure, from cloud tools at the advent of SaaS to MLOps in the early deep learning era. What problems are unsolved in the generative AI stack? And what tools and infrastructure does the ecosystem need to truly take off?...
  • The Importance of Data Analytics for Marketers
    Data analytics is the not-so-secret weapon that businesses are leveraging to derive key insights into their audiences and expand in saturated markets...Marketers now have highly specialized and segmented insights into market trends, consumer behavior, demographics, and purchase preferences. This allows them to craft personalized messages to target the right consumer at the right time and on the platforms of their choice...
  • Awesome ChatGPT Prompts
    Welcome to the "Awesome ChatGPT Prompts" repository! This is a collection of prompt examples to be used with the ChatGPT model...The ChatGPT model is a large language model trained by OpenAI that is capable of generating human-like text. By providing it with a prompt, it can generate responses that continue the conversation or expand on the given prompt...In this repository, you will find a variety of prompts that can be used with ChatGPT...


 

Tool*

 



Now where can I find that query... 🔍🔍

Did you put it in a doc? Slack? Teams? Notes?? Make searching for a query a thing of the past with Sherloq. Sherloq helps data analysts save, organize, and share their metrics, most used or complex queries for seamless collaboration within their organization. It’s a secure add-on (no integrations or permissions necessary) that works on the popular query editors. Start organizing your query repository in a shared workspace with Sherloq beta.

Try Sherloq For Free


*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!



 

Tool*

 



Retool is the fast way to build an interface for any database

With Retool, you don't need to be a developer to quickly build an app or dashboard on top of any data set. Data teams at companies like NBC use Retool to build any interface on top of their data—whether it's a simple read-write visualization or a full-fledged ML workflow.

Drag and drop UI components—like tables and charts—to create apps. At every step, you can jump into the code to define the SQL queries and JavaScript that power how your app acts and connects to data. The result—less time on repetitive work and more time to discover insights.


*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!




 

Jobs

 
  • Senior Data Analyst - Epic Games - New York

    Epic Games spans across 19 countries with 55 studios and 4,500+ employees globally. For over 25 years, we’ve been making award-winning games and engine technology that empowers others to make visually stunning games and 3D content that bring environments to life like never before.

    Use your expert experience in data & analytics to build powerful stories and visuals that inform the games we make, the technology we develop, and business decisions that drive Epic... Epic Games is looking for a Senior Data Analyst to help us create the models that fuel our creator economy. The successful candidate will have excellent SQL knowledge, and enjoy combining analytic skills with business acumen to provide the data and insights that will drive our continued success...

     
Want to post a job here? Email us for details --> team@datascienceweekly.org



 

Training & Resources

 
  • Stanford CS229M: Machine Learning Theory [YouTube Playlist]
    This course focuses on developing a theoretical understanding of the statistical properties of learning algorithms. Topics Include: a) Generalization bounds via uniform convergence, b) Theory for deep learning, c) Non-convex optimization, d) Neural tangent kernel, e) Implicit/algorithmic regularization, and f) Unsupervised learning and domain adaptation...
  • High-Dimensional Probability and Applications in Data Science
    This course builds probabilistic foundations for theoretical research in modern data science. You will learn some methods that form an essential toolbox for anyone looking to do mathematical work in machine learning, theoretical computer science, theoretical statistics, signal processing, etc...
  • Survival Analysis: Optimize the Partial Likelihood of the Cox Model
    Survival analysis encompasses a collection of statistical methods for describing time to event data...In this post, we introduce a popular survival analysis algorithm, the Cox proportional hazards model¹. Then, we define its log-partial likelihood and the gradient, and optimize it to find the best set of model parameters through a practical Python example...
 

Last Week's Newsletter's 3 Most Clicked Links

 
* Based on unique clicks.
** Find last week's newsletter here.

 


Cutting Room Floor

 


P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2022 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Key phrases

Older messages

Data Science Weekly - Issue 472

Friday, December 9, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #472 December 08 2022 Editor's Picks

Data Science Weekly - Issue 471

Thursday, December 1, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #471 December 01 2022 Editor's Picks

Data Science Weekly - Issue 470

Thursday, November 24, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #470 November 24 2022 Editor's Picks

[in case you missed it] Data Science Weekly - Issue 469

Sunday, November 20, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #469 November 17 2022 Editor's Picks

Data Science Weekly - Issue 469

Friday, November 18, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #469 November 17 2022 Editor's Picks

You Might Also Like

Feature | The Best Visualizations from April on Our New App 📲

Monday, April 29, 2024

See the most popular, most discussed, and most liked visualizations on our new data storytelling app Voronoi from April. View Online | Subscribe At the end of 2023, we publicly launched Voronoi, our

😸 Tangible change

Monday, April 29, 2024

🤖 Elon is closing in on $6 billion in funding for his AI startup. 🛜 The FCC has officially voted... Product Hunt Read in browser This newsletter is brought to you by YOU MIGHT HAVE MISSED 🤖 Elon is

⚙️ AI has emotions now

Monday, April 29, 2024

Plus: Meta AI? More like Mid-ta AI! ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

Yikes! Copilot failed all our coding tests

Monday, April 29, 2024

iPad Pro with M4; Best security keys; AI conducts job interviews now -- ZDNET ZDNET Tech Today - US April 29, 2024 placeholder Yikes! Microsoft Copilot failed every single one of my coding tests I ran

Re: The smart home product I use every day!

Monday, April 29, 2024

Hey , Earlier this month, I emailed you about one of my favorite smart home products, a robot vacuum and mop. I wanted to let you know that Samsung currently has a Spring Black Friday Sale and is

The EU draws its regulatory cords tighter around Apple

Monday, April 29, 2024

The EU has said Apple's iPadOS will now fall under the DMA View this email online in your browser By Alex Wilhelm Monday, April 29, 2024 Welcome to TechCrunch AM! We're off to a quick start

GCP Newsletter #396

Monday, April 29, 2024

Welcome to issue #396 April 29th, 2024 News Networking Official Blog Partners Introducing the Verified Peering Provider program, a simple alternative to Direct Peering - Google has launched a new

How many Vision Pro headsets has Apple sold?

Monday, April 29, 2024

The Morning After It's Monday, April 29, 2024. Apple Vision Pro headset production is reportedly being cut, sales are reportedly “way down.” But but but wait: Wasn't the Vision Pro meant to

Okta Warns of Unprecedented Surge in Proxy-Driven Credential Stuffing Attacks

Monday, April 29, 2024

THN Daily Updates Newsletter cover Webinar -- Uncovering Contemporary DDoS Attack Tactics -- and How to Fight Back Stop DDoS Attacks Before They Stop Your Business... and Make You Headline News.

Import AI 370: 213 AI safety challenges; everything becomes a game; Tesla's big cluster

Monday, April 29, 2024

Are AI systems more like religious artifacts or disposable entertainment? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏