Data Science Weekly - Data Science Weekly - Issue 463

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #463

October 06 2022

Editor's Picks

 

  • A CSV file with 100,000 rows?
    If someone gives you a CSV file with 100,000 rows in it, what tools do you use to start exploring and understanding that data? [Twitter thread and ressponses]...
  • Collective Intelligence for Deep Learning: A Survey of Recent Developments
    We survey ideas from complex systems such as swarm intelligence, self-organization, and emergent behavior that are gaining traction in ML...In the last few years, I have been noticing many works in deep learning research pop up that have been using some of these ideas from collective intelligence, in particular, the area of emergent complex systems. Recently, Yujin Tang and I put together a survey paper called Collective intelligence for deep learning: A survey of recent developments about this topic, and in this post, I will summarize the key themes in our paper...
  • Build a Career in Data Science Podcast
    Build a Career in Data Science teaches you what data science courses leave out: from how to land your first job to the lifecycle of a data science project and even how to become a manager. This is a true how-to on obtaining and then navigating a data science career--filled with real stories from data scientists. This podcast is an extension of the similarly named book: Build a Career in Data Science...
 
 

A Message from this week's Sponsor:

 



Out now: new semantic layer whitepapers

Check out this bundle of Semantic Layer whitepapers by best selling authors - download here.

You'll learn the key value propositions to implement a semantic layer and best practices for analytics success with one.

 

 

Data Science Articles & Videos

 
  • Getting tabular data from unstructured text with GPT-3: an ongoing experiment
    One of the most exciting applications of AI in journalism is the creation of structured data from unstructured text...Government reports, legal documents, emails, memos...these are rich with content like names, organizations, dates, and prices. But to get them into a format that can be analyzed and counted, like a spreadsheet, usually involves days or weeks of tedious manual data entry...Large language models like GPT-3 from OpenAI have the potential to greatly speed up this awful slog. Because these models have such a deep grasp of language (GPT-3 was trained on basically the entire internet — at least all of English Wikipedia), they can understand commands and pick out the right elements from text....
  • Andrej Karpathy: Building makemore Part 3: Activations & Gradients, BatchNorm
    We dive into some of the internals of MLPs with multiple layers and scrutinize the statistics of the forward pass activations, backward pass gradients, and some of the pitfalls when they are improperly scaled. We also look at the typical diagnostic tools and visualizations you'd want to use to understand the health of your deep network. We learn why training deep neural nets can be fragile and introduce the first modern innovation that made doing so much easier: Batch Normalization...
  • Predicting the Future of AI with AI: High-quality link prediction in an exponentially growing knowledge network
    A tool that could suggest new personalized research directions and ideas by taking insights from the scientific literature could significantly accelerate the progress of science. A field that might benefit from such an approach is artificial intelligence (AI) research, where the number of scientific publications has been growing exponentially over the last years, making it challenging for human researchers to keep track of the progress. Here, we use AI techniques to predict the future research directions of AI itself...
  • Discovering novel algorithms with AlphaTensor
    In our paper, published today in Nature, we introduce AlphaTensor, the first artificial intelligence (AI) system for discovering novel, efficient, and provably correct algorithms for fundamental tasks such as matrix multiplication. This sheds light on a 50-year-old open question in mathematics about finding the fastest way to multiply two matrices...
  • DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics
    We introduce the first work to explore web-scale diffusion models for robotics. DALL-E-Bot enables a robot to rearrange objects in a scene, by first inferring a text description of those objects, then generating an image representing a natural, human-like arrangement of those objects, and finally physically arranging the objects according to that image. The significance is that we achieve this zero-shot using DALL-E, without needing any further data collection or training...
  • latexify_py
    A Python package that generates LaTeX math description from Python functions....
  • End-to-end Neural Coreference Resolution in spaCy
    Coreference resolution is something all of us do instinctively many times every day even though most of us haven’t heard the term before. People use language to talk about entities, events and the relationships between them. When we mention the same thing multiple times throughout a discourse we tend to use different expressions...
  • Erik Bernhardsson and Etienne Dilocker on Vector Search in Production
    This is a really special episode with Erik Bernhardsson! Erik is one of the early thought leaders on Approximate Nearest Neighbor (ANN) Search, creating the ANNOY library at Spotify. Erik shared incredible insights about vector search at Spotify such as the role of Offline and Online Machine Learning inference and the role of multi-stage re-ranking pipelines. Erik has also done massively impactful work on benchmarking ANN algorithms! We really hope you enjoy the podcast and would be thrilled to answer any questions you have about the conversation topics!...
  • Apple's "Human Interface Guidelines for Charts"
    Human Interface Guidelines for Charts are here! We've been working on these pages for a bit—hope you find them useful 📊🥳 ... Including: Patterns page and Components pages...
  • Blueprint for an AI Bill of Rights - Making Automated Systems Work For The American People
    The White House Office of Science and Technology Policy has identified five principles that should guide the design, use, and deployment of automated systems to protect the American public in the age of artificial intelligence. The Blueprint for an AI Bill of Rights is a guide for a society that protects all people from these threats—and uses technologies in ways that reinforce our highest values....
 
 

Tool*

 



Retool is the fast way to build an interface for any database

With Retool, you don't need to be a developer to quickly build an app or dashboard on top of any data set. Data teams at companies like NBC use Retool to build any interface on top of their data—whether it's a simple read-write visualization or a full-fledged ML workflow.

Drag and drop UI components—like tables and charts—to create apps. At every step, you can jump into the code to define the SQL queries and JavaScript that power how your app acts and connects to data. The result—less time on repetitive work and more time to discover insights.



*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

 
 

Conference*

 



Global AI Developer Days – 26 October 2022

Join the Global AI Community for a day of inspiring keynotes from industry leaders with a high focus on AI developers.

Highlights during this 3-hour conference include responsible AI by Ruth Yakubu Principal Cloud Advocate at Microsoft. She will talk about how to improve fairness and reliability of AI solutions. Eric Boyd, Corporate Vice President at Microsoft, will show all the latest inventions in Azure AI. Manuvir Das, Head of Enterprise Computing at NVIDIA, takes you on a journey through the new era of AI for developers and many more leaders from the AI community will share their vision.

Don’t miss out on this free day of learning from top leaders in the AI space!

https://devdays.globalai.community


*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

 
 

Jobs

 
  • Data Scientist - Mount Sinai Data Commons - NYC

    A position is available for an individual with skills in data science, bioinformatics and software engineering to play the key role in running and managing the Mount Sinai Data Commons – known as the Data Ark. The Data Ark team brings together all the most important data sets used by Sinai researchers (e.g. 1000G, GTEx, UK Biobank) in a single location on our HPC server (minvera.org), performs QA/QC processing of the data, conducts initial demographics analyses to showcase the different data sets, and will be tasked with expanding the data commons to host a large range of different data sets of different types (genotype, WES, WGS, RNA-seq, EHR-linked, imaging etc.), which will come with their own computational and platform challenges...
     

        Want to post a job here? Email us for details --> team@datascienceweekly.org

 

 

Training & Resources

 
  • The Illustrated Stable Diffusion
    The ability to create striking visuals from text descriptions has a magical quality to it and points clearly to a shift in how humans create art. The release of Stable Diffusion is a clear milestone in this development because it made a high-performance model available to the masses (performance in terms of image quality, as well as speed and relatively low resource/memory requirements)...After experimenting with AI image generation, you may start to wonder how it works...This is a gentle introduction to how Stable Diffusion works...
  • How diffusion models work: the math from scratch
    In this blog post, we will dig our way up from the basic principles. There are already a bunch of different diffusion-based architectures. We will focus on the most prominent one, which is the Denoising Diffusion Probabilistic Models (DDPM) as initialized by Sohl-Dickstein et al and then proposed by Ho. et al 2020. Various other approaches will be discussed to a smaller extent such as stable diffusion and score-based models...
  • Global Pooling in Convolutional Neural Networks
    In this article, we explore what global average and max pooling entail. We discuss why they have come to be used and how they measure up against one another. We also developed an intuition into why they work by performing a biopsy of our convnets and visualizing intermediate layers...
 
 

What you’re up to – notes from DSW readers

 
  • Fill out the form below to appear here :) ...
 

* To share your projects and updates, share the details here.

** Want to chat with one of the above people? Hit reply and let us know :)

 

Last Week's Newsletter's 3 Most Clicked Links

   

* Based on unique clicks.

** Find last week's newsletter here.



 

Cutting Room Floor

 
  • All clear :)



P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2022 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Key phrases

Older messages

Data Science Weekly - Issue 462

Thursday, September 29, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #462 September 29 2022 Editor's

Data Science Weekly - Issue 461

Friday, September 23, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #461 September 22 2022 Editor's

Data Science Weekly - Issue 460

Thursday, September 15, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #460 September 15 2022 Editor's

Data Science Weekly - Issue 459

Thursday, September 8, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #459 September 08 2022 Editor's

Data Science Weekly - Issue 458

Friday, September 2, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #458 September 01 2022 Editor's

You Might Also Like

The clock’s ticking for TikTok

Wednesday, April 24, 2024

The US Senate has passed a bill that would ban TikTok if its US business is not divested by Bytedance View this email online in your browser By Alex Wilhelm Wednesday, April 24, 2024 Good morning, and

How to block Windows 11 Start menu ads

Wednesday, April 24, 2024

Oura Ring hits Target; 7 iPad Pro features we need; AI hallucinations aren't all bad -- ZDNET ZDNET Tech Today - US April 24, 2024 placeholder Microsoft is now showing ads in Windows 11's Start

The Chilling of TikTok

Wednesday, April 24, 2024

Ban or not, this is the end of TikTok as we know it The Chilling of TikTok By MG Siegler • 24 Apr 2024 View in browser View in browser The tok is tiking... Later today, President Biden will sign a bill

GenAI is transforming materials design

Wednesday, April 24, 2024

‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

⚙️ Meta Smart Glasses

Wednesday, April 24, 2024

Plus: $3B valuation for AI startup ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

Endpoint Security Tips Curated by Experts - Get This Guide Now

Wednesday, April 24, 2024

Endpoint Security Tips Curated by Experts Hey there, It's no secret that endpoints are prime targets for hackers—so how can you defend them better? Well, we have some suggestions for where you can

Senate passes the bill that could ban TikTok

Wednesday, April 24, 2024

The Morning After It's Wednesday, April 24, 2024. The Senate approved a measure that will require ByteDance to sell TikTok or face a ban, in a vote of 79 to 18. The Protecting Americans from

[Incubator] Dates for our next Student Orientation and Demo Day

Wednesday, April 24, 2024

Also, here's the link to our last student demo day. ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

Warning: Security Flaws Exposed Keystrokes of Over 1 Billion Chinese Keyboard App Users

Wednesday, April 24, 2024

THN Daily Updates Newsletter cover Webinar -- Uncovering Contemporary DDoS Attack Tactics -- and How to Fight Back Stop DDoS Attacks Before They Stop Your Business... and Make You Headline News.

Post from Syncfusion Blogs on 04/24/2024

Wednesday, April 24, 2024

New blogs from Syncfusion What's New in React Query Builder: 2024 Volume 1 By Satheeskumar S This blog explores the new features added to the Syncfusion React Query Builder in the 2024 Volume 1