Editor's Picks
- Dissecting the satellite-image-deep-learning repo
I maintain a popular repository on Github called the satellite-image-deep-learning, currently sitting close to 4k stars. The repository lists useful references I have found related to satellite imagery and deep learning, but also goes beyond this to include sections on deploying machine learning models, and even a section on ‘movers and shakers’ on Github. This post provides the brief history of this repository, how I find material, and why you should create something similar...
- Uniques Python Packages To Improve Your Data Workflow
Various Python packages have been developed to help data people in their works. In my experience, many useful data Python packages lack recognition or still growing in popularity...That is why, in this article, I want to introduce you to several unique Python packages that would help your data workflow in many ways...
A Message from this week's Sponsor:
Retool is the fast way to build an interface for any database
With Retool, you don't need to be a developer to quickly build an app or dashboard on top of any data set. Data teams at companies like NBC use Retool to build any interface on top of their data—whether it's a simple read-write visualization or a full-fledged ML workflow.
Drag and drop UI components—like tables and charts—to create apps. At every step, you can jump into the code to define the SQL queries and JavaScript that power how your app acts and connects to data. The result—less time on repetitive work and more time to discover insights.
Data Science Articles & Videos
- The Technology Behind BLOOM Training
In recent years, training ever larger language models has become the norm. While the issues of those models' not being released for further study is frequently discussed, the hidden knowledge about how to train such models rarely gets any attention. This article aims to change this by shedding some light on the technology and engineering behind training such models both in terms of hardware and software on the example of the 176B parameter language model BLOOM...
- AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts
Although large language models (LLMs) have demonstrated impressive potential on simple tasks, their breadth of scope, lack of transparency, and insufficient controllability can make them less effective when assisting humans on more complex tasks. In response, we introduce the concept of Chaining LLM steps together, where the output of one step becomes the input for the next, thus aggregating the gains per step...
- Brickit: Build new things from your old Lego bricks
The main thing that makes Brickit possible is a wonder of machine learning. We make an infinite number of 3D models of piled bricks and let our algorithm to recognise each and every brick in them. Of course sometimes it makes mistakes, and we politely point them out to it, increasing its accuracy. But it’s only the first step! Then our algorithm meets the real world and the real photos of our audience. And it learns on them too! In the end we a have a well-educated algorithm, which can recognise 1600 most widespread bricks even by the tiniest bits of them. ...
- Cognitive Science and AI [Twitter Thread]
Mentally preparing a class on "Cognitive Science and AI" for 1st year AI students and looking for examples of how knowledge of human cognition can help curb over-hyped AI claims of "sentience" or "human-level AI". A 🧵with some ideas I have so far. What else would you suggest?...
- Which Datawarehouse & ELT tool is best and economical for a startup? [Reddit Discussion]
We are currently using a managed data warehouse that uses Redshift and provides an in-built ETL tool. The prices have gone through the roof so we are planning to look into cheaper alternatives...I have been looking into DW alternatives like BigQuery, and Snowflake, & keeping the Redshift instance. I wanted to know which DW seems good and cheapest for our requirements in the long term. I read that BigQuery would be the cheapest and best (managed) but wanted to know if there are any downsides or disadvantages...For ELT, I am looking into open source options like Airbyte, Meltano, and Singer. Any recommendations from people who are using these would be welcome...
- Launch: Building Modern Data Teams
Years ago I started collecting my favorite data-related links in a GitLab snippet and thanks to my incredible team members at @AmplifyPartners it has become way more!...
- What’s It Like to Work in Applied AI?
Understanding the role of a Machine Learning Engineer from an early-career perspective...This article is written from the perspective of an early-career stage: 2+ years in the tech space, 1+ years of applied AI/ML experience, with an (expected) bachelor’s level degree in engineering. The bulk of experience lies in start-up, small, and medium-sized environments, which are inherently less structured than established organizations...
- Supercharging A/B Testing at Uber
Uber’s Experimentation Team dives into the architecture of our new A/B experimentation platform upleveling product development agility by providing correct, reliable, flexible, and easy-to-use experimentation to teams across Uber...
- Is there a way I can use Data Science to help my mom with her small shop? [Reddit Discussion]
Hello, I am still studying to be a Data Science, I am a beginner in the field...My mom has a small underwear and sportswear shop in a small town. Things haven't been going great economically in our country (Brazil), so I wanted to help somehow...Considering the small scale of the business, what are some projects that could have an impact? Both economically or even just making her life easier/more efficient...
- ML-Enhanced Code Completion Improves Developer Productivity
We describe how we combined ML and semantic engines (SEs) to develop a novel Transformer-based hybrid semantic ML code completion, now available to internal Google developers. We discuss how ML and SEs can be combined by (1) re-ranking SE single token suggestions using ML, (2) applying single and multi-line completions using ML and checking for correctness with the SE, or (3) using single and multi-line continuation by ML of single token semantic suggestions. We compare the hybrid semantic ML code completion of 10k+ Googlers (over three months across eight programming languages) to a control group and see a 6% reduction in coding iteration time (time between builds and tests) and a 7% reduction in context switches (i.e., leaving the IDE) when exposed to single-line ML completion...
Course*
Don’t Miss Your Chance to Jumpstart Your Data Career
TDI Fall Cohort Applications Close TOMORROW
Complete your application today and you could be on your way to becoming a leading data scientist. It’s as easy as 1, 2, 3:
- Attend part-time or full-time and master the in-demand skills employers are craving
- Work with our career services team and land a job with one of our hiring partners
- ????
- Profit and take over the data world
Apply today and don’t pay a dime until you get a job. Applications close July 29th. Apply Now.
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!
Jobs
- Data Scientist - Success Academy Charter Schools, Inc - NYC
This new Data Scientist role will be a key contributor to our mission of driving innovation across the organization. Reporting to the Leader of Enterprise Analytics, this role will be responsible for working with stakeholders in various functions to understand areas of opportunity, developing analytical solutions ranging from dashboards to sophisticated mathematical models, and helping functional teams adopt those solutions. This role will be part of a highly collaborative team of professionals with a wide range of skills including data science, data engineering, business analysis, and project management....
Want to post a job here? Email us for details --> team@datascienceweekly.org
Training & Resources
- AMMI Geometric Deep Learning Course - Second Edition (2022) [YouTube Series]
Video recording of the Second Edition of the course "Geometric Deep Learning" taught in the African Master in Machine Intelligence in July 2022...Lecturers: Michael Bronstein (Oxford/Twitter) • Joan Bruna (NYU) • Taco Cohen (Qualcomm) • Petar VeliÄŤković (DeepMind)...Seminar speakers: Russ Bates (DeepMind) • Cristian Bodnar (Cambridge) • Fabrizio Frasca (Twitter/Imperial College) • Francesco Di Giovanni (Twitter) • Geordie Williamson (U Sydney)...
What you’re up to – notes from DSW readers
- Working on something cool? Let us know here :) ...
* To share your projects and updates, share the details here.
** Want to chat with one of the above people? Hit reply and let us know :)
Last Week's Newsletter's 3 Most Clicked Links
* Based on unique clicks.
** Find last week's newsletter here.
P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian |