Hello and thank you for tuning in to Issue #489.
Once a week we write this email to share the links we (Hannah and Sebastian) thought were worth sharing in the Data Science, ML, AI, Data Visualization, and ML/Data Engineering worlds.
If you find this useful, please consider becoming paid subscriber here:
https://datascienceweekly.substack.com/subscribe
If you don’t find this useful, unsubscribe here.
Hope you enjoy it!
:)
And now, let's dive into some interesting links from this week:
Linear programming: Theory and applications
Linear optimization main concepts and implementation in Python…Throughout this article, some of the main theoretical aspects of linear programming will be covered, besides applications in classical problems using Python. To do this, we will use the libraries scipy and pyomo…
MLOps is Mostly Data Engineering
MLOps emerged as a new category of tools for managing data infrastructure, specifically for ML use cases with the main assumption being that ML has unique needs…After a few years and with the hype gone, it has become apparent that MLOps overlap more with Data Engineering than most people believed. Let’s see why and what that means for the MLOps ecosystem…
Unify real-time customer data with every interaction, and personalize experiences at scale.
Unify makes Segment’s real-time identity resolved profiles completely portable. Sync them to your warehouse to perform advanced analytics, enhance them with data from multiple sources, and enable the entire business by activating them across your CX tools of choice.
Want to sponsor the newsletter? Email us for details --> team@datascienceweekly.org
pandas 2.0 and the Arrow revolution (part I)
At the time of writing this post, we are in the process of releasing pandas 2.0…A recent change that may seem subtle and it's easy to not be noticed, but it's actually very important is the new Apache Arrow backend for pandas data. To understand this change, let's quickly summarize how pandas works…
Python is Enough to Simulate Life Itself : Interview with Allen B Downey
Get ready for an exclusive, thought-provoking interview with Dr. Allen B. Downey, an MIT graduate, renowned computer scientist, and current member of the Brilliant.org team. We dive deep into his insights on the future of education, the potential dangers of artificial intelligence, and his latest book, "Modeling and Simulation in Python." Dr. Downey also shares his unique connection to NASA's Mars 2020 program, offering a fascinating glimpse into the world of space exploration…
Your guide to AI: April 2023
Welcome to the latest issue of your guide to AI, an editorialized newsletter covering key developments in AI research, industry, geopolitics and startups during March 2023…
Putting the human touch on LLMs
In this piece, I dive into the buzzy, complicated world of applying the human touch to LLMs, providing some background, an overview of the technique, and its applications and implications for AI startups and the ecosystem more broadly…
Rulers, NER, and data iteration
The SpanRuler component of spaCy allows you to create rules to recognize spans or entities within your data. Lj and I created a spaCy project to showcase the functionality of the SpanRuler within a NER pipeline, but when we didn’t see the improvement we were looking for in the initial pipeline evaluation…
LLMs in Production Conference (free, virtual) - 13 April 2023
Large Language Models have taken the world by storm. But what are the real use cases? What are the challenges in productionizing them? In this event, you will hear from practitioners about how they are dealing with things such as cost optimization, latency requirements, trust of output and debugging. You will also get the opportunity to join workshops that will teach you how to set up your use cases and skip over all the headaches…
Machine Learning for Partial Differential Equations
This review will examine several promising avenues of PDE research that are being advanced by machine learning, including: 1) the discovery of new governing PDEs and coarse-grained approximations for complex natural and engineered systems, 2) learning effective coordinate systems and reduced-order models to make PDEs more amenable to analysis, and 3) representing solution operators and improving traditional numerical algorithms. In each of these fields, we summarize key advances, ongoing challenges, and opportunities for further development…
What Language Should You Use for Econometrics?
There are plenty of tools and languages you can use these days for doing econometrics in. What are they, and what are they good for (or not good for)? In this video I cover six popular options for doing econometrics in: Stata, R, Python, Matlab, Julia, and Excel, and discuss the kinds of tasks each is best suited for, and the places where they run into problems. Which one fits the stuff you need to do best?…
Awesome Twitter Algo
An annotated look through the release of the Twitter algorithm, through the context of engineering and recsys, with notes from repo creators on significance of specific parts of the code. Since it can be hard to parse through so much code and derive meaning and context, we do it for you!
This code focuses on the services used to build the Home timeline For You
feed, the algorithmic tab that is now served first on both web and mobile next to the Following
feed…
From Deep to Long Learning?
For the last two years, a line of work in our lab has been to increase sequence length…As the GPT4 press release noted, this has allowed almost 50 pages of text as context–and tokenization/patching ideas like those in Deepmind’s Gato are able to use images as context. So many amazing ideas coming together, awesome! This article is about another approach to increasing sequence length at a high level, and the connection to a new set of primitives…
Am I kidding myself to think that this is doable? [Reddit Discussion]
How feasible is it to improve and become a data scientist on the job? And any book or youtube videos (I am a fan of learning through these two methods) that stand out when it comes to learning data science? By this, I mean more technical knowledge and less on how to do particular tasks or analyses on a coding language. Any guidance on how to become a better data scientist is also welcomed…
At Inceptive, we don't have titles. The title you see at the top of this post is merely our pragmatic way to make sure we reach you. Instead of topic experts working in silos, we are building an antedisciplinary team where everyone is gaining deep knowledge outside of their traditional area of expertise.
You will be part of an antedisciplinary team building our biological software. You will be involved in designing, testing and scaling novel experimental protocols to characterize molecules both in vitro and in silico. You will be responsible for deriving insights from our experimental data and enabling these to drive future development of our models and our protocols. You will ensure that Inceptive tracks and contributes to relevant developments in data science particularly as it relates to biology, and you will help shape the direction of the company in these areas. You will work effectively in an agile setting with recurring status updates and continuous knowledge sharing.
We are happy to consider candidates at various stages in their career and the position could be tailored to your experience and career preferences.
Apply here
Want to post a job here? Email us for details --> team@datascienceweekly.org
crème de la crème of AI courses
This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI). Whether you're a beginner or an experienced learner, there's something here for everyone!..
NLP Course | For You
This is an extension to the (ML for) Natural Language Processing course I teach at the Yandex School of Data Analysis (YSDA) since fall 2018 (from 2022, in Israel branch)…
Transformer Taxonomy (lit review)
This document is my running literature review for people trying to catch up on AI. It covers 22 models, 11 architectural changes, 7 post-pre-training techniques and 3 training techniques (and 5 things that are none of the above)…
* Based on unique clicks.
** Find last week's issue #488 here.
Thanks for joining us this week :)
All our best,
Hannah & Sebastian
P.S.,
Please consider becoming paid subscriber here: https://datascienceweekly.substack.com/subscribe
:)
Copyright © 2013-2023 DataScienceWeekly.org, All rights reserved.