Data Science Weekly - Data Science Weekly - Issue 475

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #475

December 29 2022

Editor's Picks

 
  • Jürgen Schmidhuber's Annotated History of Modern AI and Deep Learning
    Machine learning is the science of credit assignment: finding patterns in observations that predict the consequences of actions and help to improve future performance. Credit assignment is also required for human understanding of how the world works, not only for individuals navigating daily life, but also for academic professionals like historians who interpret the present in light of past events. Here I focus on the history of modern artificial intelligence (AI) which is dominated by artificial neural networks (NNs) and deep learning, both conceptually closer to the old field of cybernetics than to what's been called AI since 1956...
  • Cats, Pi, and Machine Learning [Project]
    There is a cat that wanders around my neighbourhood. I wanted to build something that would notify me whenever it came to my backyard...I thought to myself, if I can get a picture of my backyard as the input I can process the image by checking if there is a cat in it and send a notification to my phone as the output...For the picture input, I attached a camera module to a raspberry pi 4. For the processing, I wrote some code that would run on the pi, which would periodically take an image and run object detection on it. For the output, I relied on sending a message via Signal to my phone...
  • Another Year of #TidyTuesday
    After another 52 data visualisations created for #TidyTuesday, it's time for the annual round-up! Read this blog post for some interesting R packages discovered, a few new I've tricks learnt, and the data visualisations I'd like to do again...


 

A Message from this week's Sponsor:

 



Ilum the Spark cluster manager and monitoring tool

With Ilum's solution, everyone can now quickly and easily deploy Apache Spark on any Kubernetes cluster. Our software eliminates the need for tedious configuration and reduces the time needed for deployment from days to minutes. By leveraging the power of container orchestration and Apache Spark's scalability and reliability, we are making it easier than ever to stay ahead of the curve and explore the future of Big Data.

Ilum provides an all-in-one solution for:
  • Data Science on Kubernetes
  • Hadoop replacement
  • Apache Livy alternative
  • Integration with Jupyter and Apache Zeppelin
It's free! Unlock the power of Big Data today with Ilum.

Learn more about Ilum!



 

Data Science Articles & Videos

 
  • The Build vs. Buy Guide for the Modern Data Stack
    Nishith Agarwal, Head of Data & ML Platforms at Lyra Health and creator of the popular open source data management framework, Apache Hudi, outlines his blueprint for building (and buying) the data stack of your dreams. In Part I of this series, Nishith discusses some initial considerations and shares a framework for getting started...
  • Forward-Forward Algorithm App
    This app implements a complete open-source version of Geoffrey Hinton's Forward Forward Algorithm, an alternative approach to backpropagation...The Forward Forward algorithm is a method for training deep neural networks that replaces the backpropagation forward and backward passes with two forward passes, one with positive (i.e., real) data and the other with negative data that could be generated by the network itself...
  • Large Language Models Encode Clinical Knowledge
    We present MultiMedQA, a benchmark combining six existing open question answering datasets spanning professional medical exams, research, and consumer queries; and HealthSearchQA, a new free-response dataset of medical questions searched online...we evaluate PaLM (a 540-billion parameter LLM) and its instruction-tuned variant, Flan-PaLM, on MultiMedQA. Using a combination of prompting strategies, Flan-PaLM achieves state-of-the-art accuracy on every MultiMedQA multiple-choice dataset (MedQA, MedMCQA, PubMedQA, MMLU clinical topics), including 67.6% accuracy on MedQA (US Medical License Exam questions), surpassing prior state-of-the-art by over 17%...
  • Partially renaming columns using a lookup table
    Usually data sets come with short column names, which makes it easy to clean and manipulate the data. However, when presenting the data to stakeholders, in form of tables or plots, we often need longer, meaningful names. In many cases we have a lookup table which contains long and short versions of the column names so that we can “easily” replace the names when needed...Below we’ll look at how to rename columns using different approaches in R...
  • Top Python libraries of 2022
    Welcome to the 8th edition of our Top Python Libraries list!...We are excited to present this year's picks for the most innovative developments in the Python ecosystem. From this edition, we are expanding our list to include not only libraries per-se, but also tools that are built to belong in the Python ecosystem — some of which are not written in Python as you’ll see...
  • Ask HN: Upskilling as a Data Engineer [HN Discussion]
    What should a data engineer learn as a part of Upskilling in 2022/2023?...New languages like Rust/Ocaml/Nim...if yes then which?...I don't think learning an ETL tool will be helpful because essentially they are all one and the same...Any tips?...
  • Vanishing Gradients Podcast #15: Uncertainty, Risk, and Simulation in Data Science
    Hugo speaks with JD Long, agricultural economist, quant, and stochastic modeler, about decision making under uncertainty and how we can use our knowledge of risk, uncertainty, probabilistic thinking, causal inference, and more to help us use data science and machine learning to make better decisions in an uncertain world...This is part 1 of a two part conversation. In this, part 1, we discuss risk, uncertainty, probabilistic thinking, and simulation, all with a view towards improving decision making and we draw on examples from our personal lives, the pandemic, our jobs, the reinsurance space, and the corporate world. In part 2, we’ll get into the nitty gritty of decision making under uncertainty...


 

Tool*

 



Build powerful ML visualizations with Comet

With just 2 lines of code, Comet automatically logs metrics, hyperparameters, libraries, and more. This means automatic chart generation so you can easily manage training runs in real time. When you combine that with:
  • built-in visualizations (like the image panel),
  • custom project views, and
  • your own python panels,
Comet is a powerful tool for optimizing your ML workflow. All for free! Less friction, more ML.

Create your free account.


*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!



 

Tool*

 



Now where can I find that query... 🔍🔍

Did you put it in a doc? Slack? Teams? Notes?? Make searching for a query a thing of the past with Sherloq. Sherloq helps data analysts save, organize, and share their metrics, most used or complex queries for seamless collaboration within their organization. It’s a secure add-on (no integrations or permissions necessary) that works on the popular query editors. Start organizing your query repository in a shared workspace with Sherloq beta.

Try Sherloq For Free


*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!




 

Jobs

 
  • Data Scientist / Machine Learning Engineer - Epsilon - NYC

    Epsilon Strategy and Insights, Data Sciences team is looking for a talented team player in a Data Scientist/Machine Learning Engineer role. You are an expert, mentor and advocate. You have strong machine learning and deep learning background and are passionate about transforming data into ml models. You welcome the challenge of data science and are proficient in Python, Spark MLLib, Tensorflow, Keras, ML algorithms and Deep Neural Networks, Big Data. You must be self-driven, take initiative and want to work in a dynamic, busy and innovative group...
     
Want to post a job here? Email us for details --> team@datascienceweekly.org



 

Training & Resources

 
  • Transformers from Scratch
    I procrastinated a deep dive into transformers for a few years. Finally the discomfort of not knowing what makes them tick grew too great for me. Here is that dive...Transformers were introduced in this 2017 paper as a tool for sequence transduction—converting one sequence of symbols to another. The most popular examples of this are translation, as in English to German. It has also been modified to perform sequence completion—given a starting prompt, carry on in the same vein and style. They have quickly become an indispensible tool for research and product development in natural language processing...
  • An overview of gradient descent optimization algorithms
    Gradient descent is the preferred way to optimize neural networks and many other machine learning algorithms but is often used as a black box. This post explores how many of the most popular gradient-based optimization algorithms such as Momentum, Adagrad, and Adam actually work...
 

Last Week's Newsletter's 3 Most Clicked Links

 
* Based on unique clicks.
** Find last week's newsletter here.

 


Cutting Room Floor

 


P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2022 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Key phrases

Older messages

Data Science Weekly - Issue 474

Friday, December 23, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #474 December 22 2022 Editor's Picks

Data Science Weekly - Issue 473

Friday, December 16, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #473 December 15 2022 Editor's Picks

Data Science Weekly - Issue 472

Friday, December 9, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #472 December 08 2022 Editor's Picks

Data Science Weekly - Issue 471

Thursday, December 1, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #471 December 01 2022 Editor's Picks

Data Science Weekly - Issue 470

Thursday, November 24, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #470 November 24 2022 Editor's Picks

You Might Also Like

Microsoft’s new $1.5B AI deal is more political than you think

Tuesday, April 16, 2024

Microsoft's investment in the UAE's G42 has broad geopolitical implications... View this email online in your browser By Alex Wilhelm Tuesday, April 16, 2024 Welcome to TechCrunch AM! This

GPT-4 tops chatbot leaderboard again

Tuesday, April 16, 2024

Slack alternatives; Best Samsung phone; Insta360's new camera ZDNET ZDNET Tech Today - US April 16, 2024 placeholder GPT-4 Turbo reclaims 'best AI model' crown from Anthropic's Claude 3

LW 129 - Checkout Branding and Extensions

Tuesday, April 16, 2024

Checkout Branding and Extensions Shopify Development news and articles I've been doing a deep dive on checkout extensions and branding recently and I've decided to include links to some keys

Invitation: AI Demo Day (3rd and final)

Tuesday, April 16, 2024

3 hours til last call ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

📢 .NET 8 Series Soon!

Tuesday, April 16, 2024

Starting from Next Week! ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Samsung is, once again, shipping the most phones in the world

Tuesday, April 16, 2024

The Morning After It's Tuesday, April 16, 2024. Samsung reportedly shipped 60.1 million smartphone units worldwide in Q1, representing 20.8 percent of the global market share — and first place.

Intel and Lenovo BMCs Contain Unpatched Lighttpd Server Flaw

Tuesday, April 16, 2024

THN Daily Updates Newsletter cover Webinar -- The Future of Threat Hunting Is Powered by Generative AI From Data to Defense: Step Into the Next Era of Cybersecurity with CensysGPT Download Now

Post from Syncfusion Blogs on 04/16/2024

Tuesday, April 16, 2024

New blogs from Syncfusion Easily Render Flat Data in Blazor File Manager By Keerthana Rajendran This blog provides a straightforward guide to rendering the Blazor File Manager component with flat data

Issue 154.5

Tuesday, April 16, 2024

🧑‍🍳🍺 The mashup you didn't know you needed: AI x beer. WordPress plugin developer faces backlash for anti-piracy tactic. DALL E-3: Innocent image generator or battlefield tool? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Edge 387: Tool Learning in Autonomous Agents

Tuesday, April 16, 2024

Agents that master tools and APIs, UC Berkeley's Gorilla and Microsoft's TaskWeaver ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏