[in case you missed it] Data Science Weekly - Issue 478

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #478

January 18 2023

Editor's Picks

 
  • Data is Rectangular and other Limiting Misconceptions
    In software we express our ideas through tools. In data, those tools think in rectangles. From spreadsheets to the data warehouses, to do any analytical calculation, you must first go through a rectangle. Forcing data through a rectangle shapes the way we solve problems (for example, dimensional fact tables, OLAP Cubes)...But really, most data isn’t rectangular. Most data exists in hierarchies (orders, items, products, users). Most query results are better returned as a hierarchy (category, brand, product). Can we escape the rectangle?...


 

A Message from this week's Sponsor:

 



Pinecone vector database

The Pinecone vector database makes it easy to build high-performance vector search applications. Developer-friendly, fully managed, and easily scalable without infrastructure hassles.

Use Pinecone to build semantic search, object recognition, recommendations, anomaly detection, and other vector-based functionality into your applications.




 

Data Science Articles & Videos

 
  • My journey from R to Julia
    For 15 years I taught “Applied Epidemiology using R” at the UC Berkeley School of Public Health...I started dabbling in Python...but eventually I discovered Julia—a programming language designed for scientific computing with the intuition of Python or R, but with the speed of C++. I fell in love with Julia and I gave up on learning Python...As I learned more Julia, I became convinced that for me learning Julia was a better long term investment than sticking with R...
  • Finding Modes and Antimodes
    How can I find the least frequent value (antimode) between 2 modes in a bimodal distribution?...One option is to use kernel density estimation KDE...
  • Let's build GPT: from scratch, in code, spelled out. [YouTube Vide]
    Andrej Karpathy...builds a Generatively Pretrained Transformer (GPT), following the paper "Attention is All You Need" and OpenAI's GPT-2 / GPT-3. We talk about connections to ChatGPT, which has taken the world by storm. We watch GitHub Copilot, itself a GPT, help us write a GPT (meta :D!)...
  • Pretraining quadrupeds: a case study in RL as an engineering tool
    How an unlikely corner of robotics research, locomotion, defined RL's new notion of success...This article’s goal is to try and shift the narrative around what success is in RL: how we don’t need to be talking about AGI, how RL can be used to solve specific problems like robotic locomotion, and how we should react to future examples of RL’s success...
  • The State of Data Testing
    You would be hard-pressed to find anyone who doesn’t believe that it’s important to test your data, especially given the increased reliance on data to drive decision-making at companies. Data must be well-tested, accurate, and reliable, but all too often, data is broken, incorrect, and untested...This guide to data testing hopes to change that. We’ll cover the current state of data testing and bring the Datafold view on what data testing should look like with Data -Diff, to ensure your pipelines run smoothly through automated and proactive data testing...
  • Large Transformer Model Inference Optimization
    In this post, we will look into several approaches for making transformer inference more efficient. Some are general network compression methods, while others are specific to transformer architecture...
  • Google Research, 2022 & Beyond: Language, Vision and Generative Models
    With this post, I am kicking off a series in which researchers across Google will highlight some exciting progress we've made in 2022 and present our vision for 2023 and beyond. I will begin with a discussion of language, computer vision, multi-modal models, and generative machine learning models. Over the next several weeks, we will discuss novel developments in research topics ranging from responsible AI to algorithms and computer systems to science, health and robotics...
  • Welcome to the jungle, we got fun and frames
    I’m still in the early stages of a project and doing data analysis on the input data, a dump of 10GB from Goodreads in JSON...When I tried to use read_json in Pandas, the kernel took over 2 minutes just to read the data in. Actually, initially I wasn’t sure if it would even be able to read the entire file, and I’d need to read it in again and again during the exploration process, so this wasn’t an optimal way to go...But to understand the performance constraints, I had to go pretty deep into the Pandas ecosystem...


 

Tool*

 



Full Transparency for ML Experiment Tracking

With just 2 lines of code, Comet automatically logs metrics, hyperparameters, libraries, and more. Comet works with Keras, Tensorflow, Pytorch, Hugging Face and 15+ more popular tools and frameworks. Check out our GitHub repo for Comet examples. With Comet, you can:
  • Diff up to 4 experiments
  • Review model predictions across experiments with built-in visualizations
  • Save models to a model registry
Get started with your free account today


*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!



 

Jobs

 
  • Data Scientist / Machine Learning Engineer - Epsilon - NYC

    Epsilon Strategy and Insights, Data Sciences team is looking for a talented team player in a Data Scientist/Machine Learning Engineer role. You are an expert, mentor and advocate. You have strong machine learning and deep learning background and are passionate about transforming data into ml models. You welcome the challenge of data science and are proficient in Python, Spark MLLib, Tensorflow, Keras, ML algorithms and Deep Neural Networks, Big Data. You must be self-driven, take initiative and want to work in a dynamic, busy and innovative group...
     
Want to post a job here? Email us for details --> team@datascienceweekly.org



 

Training & Resources

 
  • Resources for Learning Computational Complexity Theory
    Computational complexity theory studies the feasibility of solving and resources required to solve computational problems and is useful to any field that thinks about the analysis and design of algorithms (which is much more broad than one may first think). While there are a good bit of notes and lectures available online, these are scattered across university course pages, YouTube, etc. This guide aims to bring this material together for learning computational complexity theory at the introductory graduate level, especially for those without a formal CS background...
 

Last Week's Newsletter's 3 Most Clicked Links

 
* Based on unique clicks.
** Find last week's newsletter here.

 


Cutting Room Floor

 


P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2022 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Key phrases

Older messages

Data Science Weekly - Issue 477

Friday, January 20, 2023

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #477 January 11 2023 Editor's Picks

[in case you missed it] Data Science Weekly - Issue 477

Friday, January 20, 2023

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #477 January 11 2023 Editor's Picks

Data Science Weekly - Issue 478

Friday, January 20, 2023

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #478 January 18 2023 Editor's Picks

Data Science Weekly - Issue 476

Friday, January 6, 2023

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #476 January 05 2023 Editor's Picks

Data Science Weekly - Issue 475

Thursday, December 29, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #475 December 29 2022 Editor's Picks

SRE Weekly Issue #358

Monday, February 6, 2023

View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Rootly automates manual tasks like creating an incident channel, Jira ticket and Zoom rooms

WP Weekly 132 - Geeky - Auto Save Forms, WP ShowOff, Event Automator

Monday, February 6, 2023

Read on Website WP Weekly 132 / Geeky A lot of geeky things are in focus this week, ready for implementation on your WordPress websites. Be it 'save and continue' in Gravity Forms or connecting

Weekend Reading — 👋 How many fingers to a hand?

Monday, February 6, 2023

This week we ask who's responsible for technical decisions, dig into the origins of spaghetti code, discover a new dopamine loop, talk a lot (too much?) about generative AI, and wrap it up with a

U.S. military shoots down suspected Chinese surveillance balloon — and Ask HN: Are you tired of reading ChatGPT headlines?

Sunday, February 5, 2023

Issue #1034 — Top 20 stories of February 06, 2023 Issue #1034 — February 06, 2023 You receive this email because you are subscribed to Hacker News Digest. You can open it in the browser if you prefer.

Dragons, bailouts and resort buyers

Sunday, February 5, 2023

Neologism #19, 05.02.2023 ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

Daily Coding Problem: Problem #1014 [Medium]

Sunday, February 5, 2023

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Facebook. Given a start word, an end word, and a dictionary of valid words, find the

The end of writing & No Tracking, no bias

Sunday, February 5, 2023

In this week's issue of Creativerly: Supercharged daily productivity, up your game, date-focused note-taking, and a lot more. Creativerly Creativerly The end of writing & No Tracking, no bias

The Station - Elon dodges liability, Ford falters and Rivian lays off more workers

Sunday, February 5, 2023

TechCrunch Newsletter TechCrunch logo The Transportation logo By Kirsten Korosec Sunday, February 05, 2023 Welcome back to The Station, your central hub for all past, present and future means of moving

Contracts you should never sign

Sunday, February 5, 2023

The good news is that contracts are not set in stone, they are usually open to negotiation #508 – February 06, 2023 View in browser Programming Digest Contracts you should never sign The good news is

Pitch your startup to Sequoia Capital and Vanta

Sunday, February 5, 2023

TechCrunch Live - Feb 15 - 11:30am PT TechCrunch Live - February 15, 2023 Hear why cybersecurity is still hiring and spending on TechCrunch Live Hear why cybersecurity is still hiring and spending on