Data Science Weekly - Data Science Weekly - Issue 478

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #478

January 18 2023

Editor's Picks

 
  • Data is Rectangular and other Limiting Misconceptions
    In software we express our ideas through tools. In data, those tools think in rectangles. From spreadsheets to the data warehouses, to do any analytical calculation, you must first go through a rectangle. Forcing data through a rectangle shapes the way we solve problems (for example, dimensional fact tables, OLAP Cubes)...But really, most data isn’t rectangular. Most data exists in hierarchies (orders, items, products, users). Most query results are better returned as a hierarchy (category, brand, product). Can we escape the rectangle?...


 

A Message from this week's Sponsor:

 



Pinecone vector database

The Pinecone vector database makes it easy to build high-performance vector search applications. Developer-friendly, fully managed, and easily scalable without infrastructure hassles.

Use Pinecone to build semantic search, object recognition, recommendations, anomaly detection, and other vector-based functionality into your applications.




 

Data Science Articles & Videos

 
  • My journey from R to Julia
    For 15 years I taught “Applied Epidemiology using R” at the UC Berkeley School of Public Health...I started dabbling in Python...but eventually I discovered Julia—a programming language designed for scientific computing with the intuition of Python or R, but with the speed of C++. I fell in love with Julia and I gave up on learning Python...As I learned more Julia, I became convinced that for me learning Julia was a better long term investment than sticking with R...
  • Finding Modes and Antimodes
    How can I find the least frequent value (antimode) between 2 modes in a bimodal distribution?...One option is to use kernel density estimation KDE...
  • Let's build GPT: from scratch, in code, spelled out. [YouTube Vide]
    Andrej Karpathy...builds a Generatively Pretrained Transformer (GPT), following the paper "Attention is All You Need" and OpenAI's GPT-2 / GPT-3. We talk about connections to ChatGPT, which has taken the world by storm. We watch GitHub Copilot, itself a GPT, help us write a GPT (meta :D!)...
  • Pretraining quadrupeds: a case study in RL as an engineering tool
    How an unlikely corner of robotics research, locomotion, defined RL's new notion of success...This article’s goal is to try and shift the narrative around what success is in RL: how we don’t need to be talking about AGI, how RL can be used to solve specific problems like robotic locomotion, and how we should react to future examples of RL’s success...
  • The State of Data Testing
    You would be hard-pressed to find anyone who doesn’t believe that it’s important to test your data, especially given the increased reliance on data to drive decision-making at companies. Data must be well-tested, accurate, and reliable, but all too often, data is broken, incorrect, and untested...This guide to data testing hopes to change that. We’ll cover the current state of data testing and bring the Datafold view on what data testing should look like with Data -Diff, to ensure your pipelines run smoothly through automated and proactive data testing...
  • Large Transformer Model Inference Optimization
    In this post, we will look into several approaches for making transformer inference more efficient. Some are general network compression methods, while others are specific to transformer architecture...
  • Google Research, 2022 & Beyond: Language, Vision and Generative Models
    With this post, I am kicking off a series in which researchers across Google will highlight some exciting progress we've made in 2022 and present our vision for 2023 and beyond. I will begin with a discussion of language, computer vision, multi-modal models, and generative machine learning models. Over the next several weeks, we will discuss novel developments in research topics ranging from responsible AI to algorithms and computer systems to science, health and robotics...
  • Welcome to the jungle, we got fun and frames
    I’m still in the early stages of a project and doing data analysis on the input data, a dump of 10GB from Goodreads in JSON...When I tried to use read_json in Pandas, the kernel took over 2 minutes just to read the data in. Actually, initially I wasn’t sure if it would even be able to read the entire file, and I’d need to read it in again and again during the exploration process, so this wasn’t an optimal way to go...But to understand the performance constraints, I had to go pretty deep into the Pandas ecosystem...


 

Tool*

 



Full Transparency for ML Experiment Tracking

With just 2 lines of code, Comet automatically logs metrics, hyperparameters, libraries, and more. Comet works with Keras, Tensorflow, Pytorch, Hugging Face and 15+ more popular tools and frameworks. Check out our GitHub repo for Comet examples. With Comet, you can:
  • Diff up to 4 experiments
  • Review model predictions across experiments with built-in visualizations
  • Save models to a model registry
Get started with your free account today


*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!



 

Jobs

 
  • Data Scientist / Machine Learning Engineer - Epsilon - NYC

    Epsilon Strategy and Insights, Data Sciences team is looking for a talented team player in a Data Scientist/Machine Learning Engineer role. You are an expert, mentor and advocate. You have strong machine learning and deep learning background and are passionate about transforming data into ml models. You welcome the challenge of data science and are proficient in Python, Spark MLLib, Tensorflow, Keras, ML algorithms and Deep Neural Networks, Big Data. You must be self-driven, take initiative and want to work in a dynamic, busy and innovative group...
     
Want to post a job here? Email us for details --> team@datascienceweekly.org



 

Training & Resources

 
  • Resources for Learning Computational Complexity Theory
    Computational complexity theory studies the feasibility of solving and resources required to solve computational problems and is useful to any field that thinks about the analysis and design of algorithms (which is much more broad than one may first think). While there are a good bit of notes and lectures available online, these are scattered across university course pages, YouTube, etc. This guide aims to bring this material together for learning computational complexity theory at the introductory graduate level, especially for those without a formal CS background...
 

Last Week's Newsletter's 3 Most Clicked Links

 
* Based on unique clicks.
** Find last week's newsletter here.

 


Cutting Room Floor

 


P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2022 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Older messages

Data Science Weekly - Issue 476

Friday, January 6, 2023

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #476 January 05 2023 Editor's Picks

Data Science Weekly - Issue 475

Thursday, December 29, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #475 December 29 2022 Editor's Picks

Data Science Weekly - Issue 474

Friday, December 23, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #474 December 22 2022 Editor's Picks

Data Science Weekly - Issue 473

Friday, December 16, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #473 December 15 2022 Editor's Picks

Data Science Weekly - Issue 472

Friday, December 9, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #472 December 08 2022 Editor's Picks

You Might Also Like

📧 Building Async APIs in ASP.NET Core - The Right Way

Saturday, November 23, 2024

​ Building Async APIs in ASP .NET Core - The Right Way Read on: m​y website / Read time: 5 minutes The .NET Weekly is brought to you by: Even the smartest AI in the world won't save you from a

WebAIM November 2024 Newsletter

Friday, November 22, 2024

WebAIM November 2024 Newsletter Read this newsletter online at https://webaim.org/newsletter/2024/november Features Using Severity Ratings to Prioritize Web Accessibility Remediation When it comes to

➡️ Why Your Phone Doesn't Want You to Sideload Apps — Setting the Default Gateway in Linux

Friday, November 22, 2024

Also: Hey Apple, It's Time to Upgrade the Macs Storage, and More! How-To Geek Logo November 22, 2024 Did You Know Fantasy author JRR Tolkien is credited with inventing the main concept of orcs and

JSK Daily for Nov 22, 2024

Friday, November 22, 2024

JSK Daily for Nov 22, 2024 View this email in your browser A community curated daily e-mail of JavaScript news React E-Commerce App for Digital Products: Part 4 (Creating the Home Page) This component

Spyglass Dispatch: The Fate of Chrome • Amazon Tops Up Anthropic • Pros Quit Xitter • Brave Powers AI Search • Apple's Lazy AI River • RIP Enrique Allen

Friday, November 22, 2024

The Fate of Chrome • Amazon Tops Up Anthropic • Pros Quit Xitter • Brave Powers AI Search • Apple's Lazy AI River • RIP Enrique Allen The Spyglass Dispatch is a free newsletter sent out daily on

Charted | How the Global Distribution of Wealth Has Changed (2000-2023) 💰

Friday, November 22, 2024

This graphic illustrates the shifts in global wealth distribution between 2000 and 2023. View Online | Subscribe | Download Our App Presented by: MSCI >> Get the Free Investor Guide Now FEATURED

Daily Coding Problem: Problem #1616 [Easy]

Friday, November 22, 2024

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Alibaba. Given an even number (greater than 2), return two prime numbers whose sum will

The problem to solve

Friday, November 22, 2024

​ Use problem framing to define the problem to solve This week, Tom Parson and Krishna Raha share tools and frameworks to identify and address challenges effectively, while Voltage Control highlights

Issue #568: Random mazes, train clock, and ReKill

Friday, November 22, 2024

View this email in your browser Issue #568 - November 22nd 2024 Weekly newsletter about Web Game Development. If you have anything you want to share with our community please let me know by replying to

Whats Next for AI: Interpreting Anthropic CEOs Vision

Friday, November 22, 2024

Top Tech Content sent at Noon! How the world collects web data Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, November 22, 2024? The HackerNoon