|
|
Editor's Picks
- Exploratory programming: what it is, why it matters
Data teams are different from dev teams. Tools and tactics that help us build software aren’t designed for exploring data and sharing insights. Exploratory programming is. This post looks at why exploratory programming is ideal for data teams — and to unlock its value...
- Are your data normal? Hint: no
One of the frequently-asked questions over at the statistics subreddit (reddit.com/r/statistics) is how to test whether a dataset is drawn from a particular distribution, most often the normal distribution...There are standard tests for this sort of thing, many with double-barreled names like Anderson-Darling, Kolmogorov-Smirnov, Shapiro-Wilk, Ryan-Joiner, etc...But these tests are almost never what you really want...
- Is AI the new crypto?
The nuclear winter techpocalypse arrived, sparing only artificial intelligence. Peak AI indicators are everywhere. Can it maintain the faith that crypto lost?...
A Message from this week's Sponsor:
Tell us how MLOps is more than just tools and you could win $100
At Toloka, we’re exploring what MLOps culture looks like across the industry at the start of 2023.
A huge variety of tools are available for ML development, but the culture and practices still have some catching up to do.
How do you see MLOps evolving this year?
Share your thoughts in our 5-minute survey. We’ll follow up to share the research results and pick a random winner for a $100 Amazon certificate!
Data Science Articles & Videos
- What we look for in a resume
Resume screening is pretty much a black box for most candidates. Few hiring managers have publicly discussed this...Whether you're interested in our startup, I hope my perspective can shed some light on what is happening on the other side of the table...
- Composing music with cellular automata
Musical composition is not solely inspirational. Composers use frameworks. Some use western music theory, and its numerous genre and forms, each coming with its rich history, culture, rules and patterns. This notebook shows how musical rules and patterns can be extracted from particular mathematical objects called cellular automata...
- Chick-fil-A Restaurant's Enterprise Restaurant Compute
We have integrated with several of our restaurant systems to assist with Kitchen Production processes or onboarding mobile payment terminals used in our Drive Thru. In total, there are tens-of-thousands of devices deployed across our restaurants that are actively providing telemetry data from a wide variety of smart equipment devices (fryers, grills, etc)...Our purpose today is to catch readers up to our current state and share what has changed over the past 4 years...
- A Watermark for Large Language Models
We propose a watermarking framework for proprietary language models. The watermark can be embedded with negligible impact on text quality, and can be detected using an efficient open-source algorithm without access to the language model API or parameters. The watermark works by selecting a randomized set of whitelist tokens before a word is generated, and then softly promoting use of whitelist tokens during sampling...
- A Survey of Meta-Reinforcement Learning
In this survey, we describe the meta-RL problem setting in detail as well as its major variations. We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task. Using these clusters, we then survey meta-RL algorithms and applications. We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner...
- Demystifying efficient self-attention, a practical overview
This blog post aims to provide a comprehensive overview of the different types of efficient attention with intuitive explanations. This is not a complete overview of every paper that has been written, but instead a coverage of the underlying methods and techniques, with in-depth examples...
- The misuse of colour in science communication [PDF]
The accurate representation of data is essential in science communication. However, colour maps that visually distort data through uneven colour gradients or are unreadable to those with colour-vision deficiency remain prevalent in science. These include, but are not limited to, rainbow-like and red–green colour maps. Here, we present a simple guide for the scientific use of colour. We show how scientifically derived colour maps report true data variations, reduce complexity, and are accessible for people with colour-vision deficiencies. We highlight ways for the scientific community to identify and prevent the misuse of colour in science, and call for a proactive step away from colour misuse among the community, publishers, and the press...
- Using new shipping data to improve government understanding of trade flows
Shipping instructions (or Bill of Lading) data detail the type, quantity and destination of goods being shipped in containers. These data were procured for UK imports and exports by DIT and the Department for Transport (DfT) in 2021...This blog post demonstrates the uniqueness of this data source and how data science techniques are being used to determine its data quality and potential cross-government uses to understand trade...
- Visual Studio Code for Data Science
In this article, we will discuss the three major benefits and features of VS Code: 1. Extensions (and how to install them) 2. Connection with remote servers (Google Colab) 3. Managing Python virtual environments...
- Data Wrangling Functions
For 2 years I have worked on this #rstats resource. It's a collection of functions used to wrangle data, especially in the field of ed research. Functions are organized by task (such as naming variables), and examples of how to use functions are provided...This repository contains examples of packages::functions() I commonly use when wrangling education research data...
- Career chat with Philip Robinson
Philip transitioned from working in computer security research, to working on environmental and satellite imaging challenges at the Global Fishing Watch...
Tool*
Sync customer data from your warehouse to any SaaS tool with Hightouch
Hightouch is the leading Data Activation platform, powered by Reverse ETL. Sync customer data from your warehouse into the tools your business teams rely on.
Get started for free at app.hightouch.io, or book a demo to see how it can work for your team.
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!
Jobs
- Data Scientist / Machine Learning Engineer - Epsilon - NYC
Epsilon Strategy and Insights, Data Sciences team is looking for a talented team player in a Data Scientist/Machine Learning Engineer role. You are an expert, mentor and advocate. You have strong machine learning and deep learning background and are passionate about transforming data into ml models. You welcome the challenge of data science and are proficient in Python, Spark MLLib, Tensorflow, Keras, ML algorithms and Deep Neural Networks, Big Data. You must be self-driven, take initiative and want to work in a dynamic, busy and innovative group...
Want to post a job here? Email us for details --> team@datascienceweekly.org
Training & Resources
- Harvard's Foundations of Biomedical Informatics II, Spring 2023
This introductory course provides a survey of artificial intelligence for biomedical informatics, covering methods for key data modalities: clinical data, networks, language, and images. It introduces machine learning problems from a practical perspective, focusing on tasks that drive the adoption of machine learning in biology and medicine...
- Caltech's Advanced Topics in Machine Learning CS 159
The goal of the class is to bring students up to speed in two topics in modern machine learning research through a series of lectures. Students will then go on to conduct a mini research project at the end of the class. The two topics are: a) Predictive control & model-based reinforcement learning; and b) Neural network theory: learning & generalisation...
- Kane’s Free Data Science Course
Kane’s (Free) Data Science Course is offered occasionally, always running four weeks. The next session starts on Monday, January 23, 2023. The workload is 10 hours per week: 3 hours in class with former Harvard Preceptor David Kane, lecturing live on Zoom at around 8:00 PM EDT and 7 hours of work completed by students on their own. By the end of this course, students will be able to do basic data science! Although the course is normally restricted to high school students, all ages are welcome for the next session, subject to enrollment. There is no charge...
Last Week's Newsletter's 3 Most Clicked Links
* Based on unique clicks.
** Find last week's newsletter here.
Cutting Room Floor
P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian |
|
|
|