Editor's Picks
- A CSV file with 100,000 rows?
If someone gives you a CSV file with 100,000 rows in it, what tools do you use to start exploring and understanding that data? [Twitter thread and ressponses]...
- Collective Intelligence for Deep Learning: A Survey of Recent Developments
We survey ideas from complex systems such as swarm intelligence, self-organization, and emergent behavior that are gaining traction in ML...In the last few years, I have been noticing many works in deep learning research pop up that have been using some of these ideas from collective intelligence, in particular, the area of emergent complex systems. Recently, Yujin Tang and I put together a survey paper called Collective intelligence for deep learning: A survey of recent developments about this topic, and in this post, I will summarize the key themes in our paper...
- Build a Career in Data Science Podcast
Build a Career in Data Science teaches you what data science courses leave out: from how to land your first job to the lifecycle of a data science project and even how to become a manager. This is a true how-to on obtaining and then navigating a data science career--filled with real stories from data scientists. This podcast is an extension of the similarly named book: Build a Career in Data Science...
A Message from this week's Sponsor:
Out now: new semantic layer whitepapers
Check out this bundle of Semantic Layer whitepapers by best selling authors - download here.
You'll learn the key value propositions to implement a semantic layer and best practices for analytics success with one.
Data Science Articles & Videos
- Getting tabular data from unstructured text with GPT-3: an ongoing experiment
One of the most exciting applications of AI in journalism is the creation of structured data from unstructured text...Government reports, legal documents, emails, memos...these are rich with content like names, organizations, dates, and prices. But to get them into a format that can be analyzed and counted, like a spreadsheet, usually involves days or weeks of tedious manual data entry...Large language models like GPT-3 from OpenAI have the potential to greatly speed up this awful slog. Because these models have such a deep grasp of language (GPT-3 was trained on basically the entire internet — at least all of English Wikipedia), they can understand commands and pick out the right elements from text....
- Andrej Karpathy: Building makemore Part 3: Activations & Gradients, BatchNorm
We dive into some of the internals of MLPs with multiple layers and scrutinize the statistics of the forward pass activations, backward pass gradients, and some of the pitfalls when they are improperly scaled. We also look at the typical diagnostic tools and visualizations you'd want to use to understand the health of your deep network. We learn why training deep neural nets can be fragile and introduce the first modern innovation that made doing so much easier: Batch Normalization...
- Predicting the Future of AI with AI: High-quality link prediction in an exponentially growing knowledge network
A tool that could suggest new personalized research directions and ideas by taking insights from the scientific literature could significantly accelerate the progress of science. A field that might benefit from such an approach is artificial intelligence (AI) research, where the number of scientific publications has been growing exponentially over the last years, making it challenging for human researchers to keep track of the progress. Here, we use AI techniques to predict the future research directions of AI itself...
- Discovering novel algorithms with AlphaTensor
In our paper, published today in Nature, we introduce AlphaTensor, the first artificial intelligence (AI) system for discovering novel, efficient, and provably correct algorithms for fundamental tasks such as matrix multiplication. This sheds light on a 50-year-old open question in mathematics about finding the fastest way to multiply two matrices...
- DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics
We introduce the first work to explore web-scale diffusion models for robotics. DALL-E-Bot enables a robot to rearrange objects in a scene, by first inferring a text description of those objects, then generating an image representing a natural, human-like arrangement of those objects, and finally physically arranging the objects according to that image. The significance is that we achieve this zero-shot using DALL-E, without needing any further data collection or training...
- latexify_py
A Python package that generates LaTeX math description from Python functions....
- End-to-end Neural Coreference Resolution in spaCy
Coreference resolution is something all of us do instinctively many times every day even though most of us haven’t heard the term before. People use language to talk about entities, events and the relationships between them. When we mention the same thing multiple times throughout a discourse we tend to use different expressions...
- Erik Bernhardsson and Etienne Dilocker on Vector Search in Production
This is a really special episode with Erik Bernhardsson! Erik is one of the early thought leaders on Approximate Nearest Neighbor (ANN) Search, creating the ANNOY library at Spotify. Erik shared incredible insights about vector search at Spotify such as the role of Offline and Online Machine Learning inference and the role of multi-stage re-ranking pipelines. Erik has also done massively impactful work on benchmarking ANN algorithms! We really hope you enjoy the podcast and would be thrilled to answer any questions you have about the conversation topics!...
- Blueprint for an AI Bill of Rights - Making Automated Systems Work For The American People
The White House Office of Science and Technology Policy has identified five principles that should guide the design, use, and deployment of automated systems to protect the American public in the age of artificial intelligence. The Blueprint for an AI Bill of Rights is a guide for a society that protects all people from these threats—and uses technologies in ways that reinforce our highest values....
Tool*
Retool is the fast way to build an interface for any database
With Retool, you don't need to be a developer to quickly build an app or dashboard on top of any data set. Data teams at companies like NBC use Retool to build any interface on top of their data—whether it's a simple read-write visualization or a full-fledged ML workflow.
Drag and drop UI components—like tables and charts—to create apps. At every step, you can jump into the code to define the SQL queries and JavaScript that power how your app acts and connects to data. The result—less time on repetitive work and more time to discover insights.
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!
Conference*
Global AI Developer Days – 26 October 2022
Join the Global AI Community for a day of inspiring keynotes from industry leaders with a high focus on AI developers.
Highlights during this 3-hour conference include responsible AI by Ruth Yakubu Principal Cloud Advocate at Microsoft. She will talk about how to improve fairness and reliability of AI solutions. Eric Boyd, Corporate Vice President at Microsoft, will show all the latest inventions in Azure AI. Manuvir Das, Head of Enterprise Computing at NVIDIA, takes you on a journey through the new era of AI for developers and many more leaders from the AI community will share their vision.
Don’t miss out on this free day of learning from top leaders in the AI space!
https://devdays.globalai.community
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!
Jobs
- Data Scientist - Mount Sinai Data Commons - NYC
A position is available for an individual with skills in data science, bioinformatics and software engineering to play the key role in running and managing the Mount Sinai Data Commons – known as the Data Ark.
The Data Ark team brings together all the most important data sets used by Sinai researchers (e.g. 1000G, GTEx, UK Biobank) in a single location on our HPC server (minvera.org), performs QA/QC processing of the data, conducts initial demographics analyses to showcase the different data sets, and will be tasked with expanding the data commons to host a large range of different data sets of different types (genotype, WES, WGS, RNA-seq, EHR-linked, imaging etc.), which will come with their own computational and platform challenges...
Want to post a job here? Email us for details --> team@datascienceweekly.org
Training & Resources
- The Illustrated Stable Diffusion
The ability to create striking visuals from text descriptions has a magical quality to it and points clearly to a shift in how humans create art. The release of Stable Diffusion is a clear milestone in this development because it made a high-performance model available to the masses (performance in terms of image quality, as well as speed and relatively low resource/memory requirements)...After experimenting with AI image generation, you may start to wonder how it works...This is a gentle introduction to how Stable Diffusion works...
- How diffusion models work: the math from scratch
In this blog post, we will dig our way up from the basic principles. There are already a bunch of different diffusion-based architectures. We will focus on the most prominent one, which is the Denoising Diffusion Probabilistic Models (DDPM) as initialized by Sohl-Dickstein et al and then proposed by Ho. et al 2020. Various other approaches will be discussed to a smaller extent such as stable diffusion and score-based models...
- Global Pooling in Convolutional Neural Networks
In this article, we explore what global average and max pooling entail. We discuss why they have come to be used and how they measure up against one another. We also developed an intuition into why they work by performing a biopsy of our convnets and visualizing intermediate layers...
What you’re up to – notes from DSW readers
- Fill out the form below to appear here :) ...
* To share your projects and updates, share the details here.
** Want to chat with one of the above people? Hit reply and let us know :)
Last Week's Newsletter's 3 Most Clicked Links
* Based on unique clicks.
** Find last week's newsletter here.
Cutting Room Floor
P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian |