What are you up to?
Hi Friends - this past week someone asked us to share their news in the newsletter. So we're going to try something for the next few weeks...
Fill out this form with what you're currently up to and we'll include your responses in the newsletter. Here's the link => https://forms.gle/vemHV4F27zqUaNCj7
If we get 1000's of responses, we'll split it up over a few weeks.
Looking foward to hearing from you :)
Editor Picks
- Software Development Resources for Data Scientists
Rachael Dempsey recently asked the Twitter community for suggestions on resources that data scientists can use to improve their software development skill set...We received so many great recommendations that we wanted to summarize and share them here. This blog post walks through software development best practices that your team may want to adopt and where to find out more...The areas discussed below are: a) Project structure, b) Automatic testing, c) Reproducible environments, d) Version control....
- Introducing Accelerated PyTorch Training on Mac
In collaboration with the Metal engineering team at Apple, we are excited to announce support for GPU-accelerated PyTorch training on Mac. Until now, PyTorch training on Mac only leveraged the CPU, but with the upcoming PyTorch v1.12 release, developers and researchers can take advantage of Apple silicon GPUs for significantly faster model training. This unlocks the ability to perform machine learning workflows like prototyping and fine-tuning locally, right on Mac...
A Message from this week's Sponsor:
Online Data Science Programs from Drexel University
Find your algorithm for success with an online data science degree from Drexel University. Gain essential skills in tool creation and development, data and text mining, trend identification, and data manipulation and summarization by using leading industry technology to apply to your career. Learn more.
Data Science Articles & Videos
- Exploring Clusters of Research in Three Areas of AI Safety
Problems of AI safety are the subject of increasing interest for engineers and policymakers alike. This brief uses the CSET Map of Science to investigate how research into three areas of AI safety — robustness, interpretability and reward learning — is progressing. It identifies eight research clusters that contain a significant amount of research relating to these three areas and describes trends and key papers for each of them...
- Preston’s Paradox
Suppose every woman has fewer children than her mother. Average fertility would decrease and population growth would slow, right?..Actually, no...According to Preston's paradox, fertility could increase, decrease, or stay the same...I explain here...
- Automatic Differentiation: Forward and Reverse
Deriving derivatives is not fun. In this post, I will deep dive into the methods for automatic differentiation (abbreviated as AD by many). After reading this post, you should feel confident with using the various AD techniques, and hopefully never manually calculate derivatives again. Note that this post is not a comparison between AD libraries...
- Wobbly tables and the intermediate value theorem
Tomorrow I’ll be introducing the intermediate value theorem (IVT) to my calculus class...My new favorite application of the IVT is the wobbly table theorem: every rectangular table placed on uneven ground can be stabilized by rotating it. This was proved in 2005 by Baritompa, Löwen, Polster, and Ross. Here is an excerpt from their paper...
- The Future of Data Catalogs
Let’s visit a website just to “browse the metadata,” said no one ever...we started as a data team, and we failed three times at implementing a data catalog. As a data leader who saw these projects fail, I found that the biggest reason data catalogs fail is the user experience. This isn’t just about a beautiful user interface though. It’s about truly understanding how people work and giving them the best possible experience...
- Data Journey with Victoria Bukta (Shopify) - Apache Iceberg and data ingestion
Viktoria works as a senior data engineer at Shopify. Shopify is one of the most well-known e-commerce companies and it is a very early adopter of big data & cloud technologies. We talk with Viktora about how her team ingests data at Shopify using a mix of open-source and cloud-native technologies such as Apache Iceberg, Debezium, Kafka, and GCP...
- Collaborative Data Workspace, The Sharing Gap, And Engineering Management With Caitlin Colgrove (Hex)
The 91st episode of Datacast is with Caitlin Colgrove, the Co-Founder and CTO of Hex, a collaborative data workspace for building and sharing data projects using SQL and Python...Our wide-ranging conversation touches on her Computer Science education at Stanford; her 6-year engineering career at Palantir; her stint as a Data Engineering Manager at Remix; her current journey with Hex building a Data Workspace for teams; lessons learned serving the “analytically technical”, addressing the sharing gap, narrowing down user profiles, hiring for ownership, fundraising from complementary investors; and much more...
- The Big Six Matrix Factorizations
Six matrix factorizations dominate in numerical linear algebra and matrix analysis: for most purposes one of them is sufficient for the task at hand. We summarize them here...For each factorization we give the cost in flops for the standard method of computation, stating only the highest order terms. We also state the main uses of each factorization...
- Friendlier SQL with DuckDB
An elegant user experience is a key design goal of DuckDB. This goal guides much of DuckDB’s architecture: it is simple to install, seamless to integrate with other data structures like Pandas, Arrow, and R Dataframes, and requires no dependencies. Parallelization occurs automatically, and if a computation exceeds available memory, data is gracefully buffered out to disk. And of course, DuckDB’s processing speed makes it easier to get more work accomplished...
Tools*
Retool is the fast way to build an interface for any database
With Retool, you don't need to be a developer to quickly build an app or dashboard on top of any data set. Data teams at companies like NBC use Retool to build any interface on top of their data—whether it's a simple read-write visualization or a full-fledged ML workflow.
Drag and drop UI components—like tables and charts—to create apps. At every step, you can jump into the code to define the SQL queries and JavaScript that power how your app acts and connects to data. The result—less time on repetitive work and more time to discover insights.
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!
Jobs
- Data Scientist - Hungryroot - Remote
Hungryroot is looking for a Data Scientist to join our growing Data Team. As a Data Scientist, you will work closely with other Data Scientists and Data Engineers to develop various Machine Learning models that power Hungryroot and it’s AI functions. These models include traditional forecasting models, as well as more industry-specific optimization challenges.
As a Data Scientist at Hungryroot, you will work on answering questions like: how do you tell what food someone would like to eat this week, how do you determine whether they enjoyed it or not, maybe they liked their means last week, but are now looking for different options, maybe they like the same food on Tuesdays, but variety on Fridays, what about spicy food, is Green Chilly as spicy as Green Curry?
Want to post a job here? Email us for details --> team@datascienceweekly.org
Training & Resources
- Data visualization standards for SF.gov
These guidelines outline best practices for public reporting of data dashboards and visualizations...This guide was created collaboratively in 2021 by DataSF, Digital Services, Controller’s Office, and several expert volunteers...We aim to ensure data visualizations created for the public are: a) Thoughtfully designed, b) Accessible and enjoyable, and c) Mobile-responsive...For dashboards or data visuals going on a public website, implementing these guidelines should be considered a minimum criteria for your dashboard or product...
- Artificial Intelligence and Machine Learning– Explained
Hundreds of billions in public and private capital is being invested in Artificial Intelligence (AI) and Machine Learning companies. The number of patents filed in 2021 is more than 30 times higher than in 2015 as companies and countries across the world have realized that AI and Machine Learning will be a major disruptor and potentially change the balance of military power...Until recently, the hype exceeded reality. Today, however, advances in AI in several important areas (here, here, here, here and here) equal and even surpass human capabilities. If you haven’t paid attention, now’s the time...
- Pandas Tutor: Using Pyodide to Teach Data Science at Scale
Hi, we’re Sam Lau and Philip Guo, and we teach data science classes at UC San Diego. In this guest post we’ll tell you about our free educational tool, Pandas Tutor, that helps students learn data science using the popular pandas library. The above screenshot shows how you can use it to write Python and pandas code in a web-based editor and see visualizations of what your code does step-by-step...After giving an overview of Pandas Tutor, we’ll dive into a case study of how we ported it to Pyodide and why we feel that Pyodide is amazing for educational use cases like ours...
Books
-
Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
|