[in case you missed it] Data Science Weekly - Issue 478

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #478

January 18 2023

Editor's Picks

 
  • Data is Rectangular and other Limiting Misconceptions
    In software we express our ideas through tools. In data, those tools think in rectangles. From spreadsheets to the data warehouses, to do any analytical calculation, you must first go through a rectangle. Forcing data through a rectangle shapes the way we solve problems (for example, dimensional fact tables, OLAP Cubes)...But really, most data isn’t rectangular. Most data exists in hierarchies (orders, items, products, users). Most query results are better returned as a hierarchy (category, brand, product). Can we escape the rectangle?...


 

A Message from this week's Sponsor:

 



Pinecone vector database

The Pinecone vector database makes it easy to build high-performance vector search applications. Developer-friendly, fully managed, and easily scalable without infrastructure hassles.

Use Pinecone to build semantic search, object recognition, recommendations, anomaly detection, and other vector-based functionality into your applications.




 

Data Science Articles & Videos

 
  • My journey from R to Julia
    For 15 years I taught “Applied Epidemiology using R” at the UC Berkeley School of Public Health...I started dabbling in Python...but eventually I discovered Julia—a programming language designed for scientific computing with the intuition of Python or R, but with the speed of C++. I fell in love with Julia and I gave up on learning Python...As I learned more Julia, I became convinced that for me learning Julia was a better long term investment than sticking with R...
  • Finding Modes and Antimodes
    How can I find the least frequent value (antimode) between 2 modes in a bimodal distribution?...One option is to use kernel density estimation KDE...
  • Let's build GPT: from scratch, in code, spelled out. [YouTube Vide]
    Andrej Karpathy...builds a Generatively Pretrained Transformer (GPT), following the paper "Attention is All You Need" and OpenAI's GPT-2 / GPT-3. We talk about connections to ChatGPT, which has taken the world by storm. We watch GitHub Copilot, itself a GPT, help us write a GPT (meta :D!)...
  • Pretraining quadrupeds: a case study in RL as an engineering tool
    How an unlikely corner of robotics research, locomotion, defined RL's new notion of success...This article’s goal is to try and shift the narrative around what success is in RL: how we don’t need to be talking about AGI, how RL can be used to solve specific problems like robotic locomotion, and how we should react to future examples of RL’s success...
  • The State of Data Testing
    You would be hard-pressed to find anyone who doesn’t believe that it’s important to test your data, especially given the increased reliance on data to drive decision-making at companies. Data must be well-tested, accurate, and reliable, but all too often, data is broken, incorrect, and untested...This guide to data testing hopes to change that. We’ll cover the current state of data testing and bring the Datafold view on what data testing should look like with Data -Diff, to ensure your pipelines run smoothly through automated and proactive data testing...
  • Large Transformer Model Inference Optimization
    In this post, we will look into several approaches for making transformer inference more efficient. Some are general network compression methods, while others are specific to transformer architecture...
  • Google Research, 2022 & Beyond: Language, Vision and Generative Models
    With this post, I am kicking off a series in which researchers across Google will highlight some exciting progress we've made in 2022 and present our vision for 2023 and beyond. I will begin with a discussion of language, computer vision, multi-modal models, and generative machine learning models. Over the next several weeks, we will discuss novel developments in research topics ranging from responsible AI to algorithms and computer systems to science, health and robotics...
  • Welcome to the jungle, we got fun and frames
    I’m still in the early stages of a project and doing data analysis on the input data, a dump of 10GB from Goodreads in JSON...When I tried to use read_json in Pandas, the kernel took over 2 minutes just to read the data in. Actually, initially I wasn’t sure if it would even be able to read the entire file, and I’d need to read it in again and again during the exploration process, so this wasn’t an optimal way to go...But to understand the performance constraints, I had to go pretty deep into the Pandas ecosystem...


 

Tool*

 



Full Transparency for ML Experiment Tracking

With just 2 lines of code, Comet automatically logs metrics, hyperparameters, libraries, and more. Comet works with Keras, Tensorflow, Pytorch, Hugging Face and 15+ more popular tools and frameworks. Check out our GitHub repo for Comet examples. With Comet, you can:
  • Diff up to 4 experiments
  • Review model predictions across experiments with built-in visualizations
  • Save models to a model registry
Get started with your free account today


*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!



 

Jobs

 
  • Data Scientist / Machine Learning Engineer - Epsilon - NYC

    Epsilon Strategy and Insights, Data Sciences team is looking for a talented team player in a Data Scientist/Machine Learning Engineer role. You are an expert, mentor and advocate. You have strong machine learning and deep learning background and are passionate about transforming data into ml models. You welcome the challenge of data science and are proficient in Python, Spark MLLib, Tensorflow, Keras, ML algorithms and Deep Neural Networks, Big Data. You must be self-driven, take initiative and want to work in a dynamic, busy and innovative group...
     
Want to post a job here? Email us for details --> team@datascienceweekly.org



 

Training & Resources

 
  • Resources for Learning Computational Complexity Theory
    Computational complexity theory studies the feasibility of solving and resources required to solve computational problems and is useful to any field that thinks about the analysis and design of algorithms (which is much more broad than one may first think). While there are a good bit of notes and lectures available online, these are scattered across university course pages, YouTube, etc. This guide aims to bring this material together for learning computational complexity theory at the introductory graduate level, especially for those without a formal CS background...
 

Last Week's Newsletter's 3 Most Clicked Links

 
* Based on unique clicks.
** Find last week's newsletter here.

 


Cutting Room Floor

 


P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2022 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Key phrases

Older messages

Data Science Weekly - Issue 477

Friday, January 20, 2023

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #477 January 11 2023 Editor's Picks

[in case you missed it] Data Science Weekly - Issue 477

Friday, January 20, 2023

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #477 January 11 2023 Editor's Picks

Data Science Weekly - Issue 478

Friday, January 20, 2023

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #478 January 18 2023 Editor's Picks

Data Science Weekly - Issue 476

Friday, January 6, 2023

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #476 January 05 2023 Editor's Picks

Data Science Weekly - Issue 475

Thursday, December 29, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #475 December 29 2022 Editor's Picks

You Might Also Like

Dot Leap 2024-7: Fat Stacks and Race Tracks

Saturday, April 20, 2024

Thank you for taking the Dot Leap!We cover Polkadot, Kusama, Polkadot-SDK, and all related Web 3.0 projects! Want your content featured? X ... ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

📧 Introduction to Distributed Tracing With OpenTelemetry in .NET

Saturday, April 20, 2024

​ Introduction to Distributed Tracing With OpenTelemetry in .NET Read on: m​y website / Read time: 5 minutes BROUGHT TO YOU BY ​ Shesha: The .NET Open-Source Low-Code Framework ​ Introducing Shesha, a

a16z’s Infrastructure team gets a new general partner

Friday, April 19, 2024

Post News is shutting down and Wall Street isn't feeling a Salesforce-Informatica pairing View this email online in your browser By Christine Hall Friday, April 19, 2024 Image Credits: Andreessen

New Roundtable! Additive for Mass Production Applications

Friday, April 19, 2024

The Outlook for the Future View this email in your browser engineering.com Roundtable - Additive for Mass Production Applications: The Outlook for the Future 6 Considerations for Choosing the Right

📷 What to Know About Macro Photography — Why You Should Buy a Budget Motherboard

Friday, April 19, 2024

Also: How to Automatically Highlight Values in Excel, and More! How-To Geek Logo April 19, 2024 📩 Get expert reviews, the hottest deals, how-to's, breaking news, and more delivered directly to your

Is the wind going out of the AI sails?

Friday, April 19, 2024

Rippling vacuums up venture capital and Ramp bags more millions View this email online in your browser By Haje Jan Kamps Friday, April 19, 2024 Image Credits: Getty Images / Carol Yepes Welcome to

Llama 3 is out - Weekly News Roundup - Issue #463

Friday, April 19, 2024

Plus: brand-new, all-electric Atlas; AI Index Report 2024; Microsoft pitched GenAI tools to US military; Humane AI Pin reviews are in; debunking Devin; and more! ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Daily Coding Problem: Problem #1417 [Easy]

Friday, April 19, 2024

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Wayfair. You are given a 2 x N board, and instructed to completely cover the board with

Charted | How Hard Is It to Get Into an Ivy League School? 🎓

Friday, April 19, 2024

We detail the admission rates and average annual cost for Ivy League schools, as well as the median SAT scores required to be accepted. View Online | Subscribe Presented by: Discover the motivations

Dark Matter & Tortured Poets

Friday, April 19, 2024

New music releases aren't what they used to be -- for good and bad. Dark Matter & Tortured Poets By MG Siegler • 19 Apr 2024 View in browser View in browser New music releases in 2024 are a