Data Science Weekly - Data Science Weekly - Issue 447

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #447

June 16 2022

Editor Picks

 
  • The Imitation of Consciousness: On the Present and Future of Natural Language Processing
    The artificial intelligence of natural language processing is arriving. In January, 2021, Microsoft filed a patent to reincarnate people digitally through distinct voice fonts appended to lingual identities garnered from their social media accounts. I don’t see any reason why it can’t work...That’s not a “new soul.” It is a mechanical tongue, an artificial person, a virtual being. The application of machine learning to natural language processing achieves the imitation of consciousness, not consciousness itself, and it is not science fiction. It is now...
  • It's Not All Fun and Games: How DeepMind Unlocks Medicine's Secrets
    Hello. This is Eric Topol for Medicine and the Machine on Medscape. I have been looking forward to having this conversation with Demis Hassabis (DeepMind) for many months, if not years...I want to get into three areas: games, digitizing biology and protein structure, and drug discovery...
 
 

A Message from this week's Sponsor:

 



ML and Data Developers Week

Learn ML/data engineering, MLOps, scaling ML from top minds practitioners at the ML and Data Developers Week, which is geared for engineering teams to discuss the practical solutions, challenges faced when building ML for the real world. With thousands of global ML devs/data scientists, you can look forward to engaging conversations, insightful discussions, hands-on workshops/code labs, and peer networking. Free to join online or in-person in San Francisco, New York, London, Toronto.

 

 

Data Science Articles & Videos

 
  • Debezium to Snowflake: Lessons learned building data replication in production
    With the well-deserved position that Snowflake and Debezium have gained in the modern data stack, it is now fairly easy to find online resources on using these technologies. In this blog, we take it one step further by reflecting on the lessons we (at Shippeo) have learned using Debezium to replicate data at scale in near real time to Snowflake...
  • VCT: A Video Compression Transformer
    We show how transformers can be used to vastly simplify neural video compression. Previous methods have been relying on an increasing number of architectural biases and priors, including motion prediction and warping operations, resulting in complex models. Instead, we independently map input frames to representations and use a transformer to model their dependencies, letting it predict the distribution of future representations given the past. The resulting video compression transformer outperforms previous methods on standard video compression data sets...
  • "The Worst AI Ever" - Lessons from the GPT-4Chan Controversy
    This article contains an objective summary of a recent controversy related to an AI model named GPT-4chan, as well as a subjective commentary with my thoughts on it...the intent of this is to provide a comprehensive summary of what happened, as well as what I consider to be valuable lessons that can be taken away from it all. It is primarily for people in the AI community, but is accessible to those outside of it as well...If you are already aware of what happened, I recommend skipping the first two sections, but to still read the ‘Analysis’ and ‘Lessons’ sections...
  • Discovering and Debugging a PyTorch Performance Decrease
    Over the past week, Thomas Capelle and I (Benjamin Warner) discovered, debugged, and created a workaround for a performance bug in PyTorch which reduces image training GPU throughput up to forty percent...This performance bug has been affecting fastai for an unknown amount of time...The culprit? Subclassed tensors...
  • An Actually-Good Argument Against Naive AI Scaling
    My position is that continuing to scale language models under the current paradigm will give us massive but not unbounded capabilities. To be clear: I think Dr. Marcus’s argument for this position is terrible. My reasons for believing that scaling has limits have nothing to do with “true intelligence” or “symbol manipulation” or any of that nonsense. Instead, the limits come from the fundamental inadeqacy of passively-collected datasets, even at internet-scale...In other words: the internet is the bottleneck. Anyways, on to the argument...
  • Google releases T5X
    T5X is a modular, composable, research-friendly framework for high-performance, configurable, self-service training, evaluation, and inference of sequence models (starting with language) at many scales...It is essentially a new and improved implementation of the T5 codebase (based on Mesh TensorFlow) in JAX and Flax...Below is a quick start guide for training models with TPUs on Google Cloud. For additional tutorials and background, see the complete documentation...
  • An on-chip photonic deep neural network for image classification
    Deep neural networks are commonly implemented using clock-based processors in which computation speed is mainly limited by the clock frequency and the memory access time. In the optical domain, the lack of scalable on-chip optical non-linearity and the loss of photonic devices limit the scalability of optical deep networks. Here we report an integrated end-to-end photonic deep neural network (PDNN) that performs sub-nanosecond image classification through direct processing of the optical waves impinging on the on-chip pixel array as they propagate through layers of neurons...
  • Emergent Abilities of Large Language Models
    Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon that we refer to as emergent abilities of large language models. We consider an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models. The existence of such emergence implies that additional scaling could further expand the range of capabilities of language models...
  • Spancat: a new approach for span labeling
    The SpanCategorizer is a spaCy component that answers the NLP community’s need to have structured annotation for a wide variety of labeled spans, including long phrases, non-named entities, or overlapping annotations. In this blog post, we’re excited to talk more about spancat and showcase new features to help with your span labeling needs!...
  • Things I Have Learned Working in an MIT AI Research Lab for a Year
    Hiall! My name is Mike and I work in the Massachusetts Institute of Technology’s Brain and Cognitive Sciences department (MIT BCS) as a research software engineer/ML Engineer. I specifically work on Brain-Score, a tool to measure how brain-like AIs are. I have been here a year, graduating from the University of Virginia in Spring of 2021...Below are 5 things that I have learned in a year of working in an MIT AI lab — some things I hope you find amusing or useful for your own journey, and some things that have profoundly impacted the way that I view life, success, knowledge, and humanity itself...
 
 

Course*

 


Business-Driven Data Analysis

Looking to deliver critical insights that power business strategy? Pragmatic Institute’s Business-Driven Data Analysis course teaches data professionals how to identify the right question and the right data, optimize results, communicate them effectively, and ensure stakeholder alignment. This 8-week, part-time course was developed in close partnership with industry leaders to ensure it drives impact in an evolving data and business landscape.

Learn how to:
  • Translate business needs into achievable data projects
  • Learn a proven and repeatable approach to data analysis
  • Identify the fastest path to actionable insights
  • Communicate effectively with diverse stakeholders

Enroll in the Upcoming Session



*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

 
 

Jobs

 
  • Senior Data Scientist, Startup Creation at Redesign Health - US

    As our Senior Data Scientist for our Startup Creation team, you will set up and configure the data infrastructure for our startups, and work with the startup founding team to define data driven KPIs, and implement automated statistical analyses of customer behavior. Your goal is to make all of the companies that we launch data-driven from day one.

    In this role, you will function as an in-house implementation team for the companies that Redesign Health launches (internally referred to as OpCos). We provide data strategy, data pipeline, data analytics and forecasting services to newly formed companies in a repeatable and scalable manner...

     

        Want to post a job here? Email us for details --> team@datascienceweekly.org

 
 

Training & Resources

 
  • chalk-diagrams
    Chalk is a declarative drawing library built on top of PyCairo. The API draws heavy inspiration from Haskell's diagrams, Scala's doodle and Jeremy Gibbons's lecture notes on Functional Programming for Domain−Specific Languages...
 
 

What you’re up to – notes from DSW readers

 
  • PyMC Decs are working on PyMC 4.0 - a new major release of the popular probabilistic programming library for Python that allows users to build Bayesian models with a simple Python API and fit them using Markov chain Monte Carlo (MCMC) methods. This new release comes with many new features including a new backend, JAX and GPU support, better sampling and much more. Check out the full release announcement...
 

* To share your projects and updates, share the details here.

** Want to chat with one of the above people? Hit reply and let us know :)

 

Last Week's Newsletter's 3 Most Clicked Links

 

* Based on unique clicks.

** Find last week's newsletter here.

 

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2022 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Older messages

Data Science Weekly - Issue 446

Friday, June 10, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #446 June 09 2022 Editor Picks Literary

Data Science Weekly - Issue 445

Saturday, June 4, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #445 June 02 2022 Editor Picks Best

Data Science Weekly - Issue 444

Thursday, May 26, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #444 May 26 2022 Editor Picks Stanford

Data Science Weekly - Issue 443

Thursday, May 19, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #443 May 19 2022 What are you up to? Hi

Data Science Weekly - Issue 442

Thursday, May 12, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #442 May 12 2022 Editor Picks "

You Might Also Like

a16z’s Infrastructure team gets a new general partner

Friday, April 19, 2024

Post News is shutting down and Wall Street isn't feeling a Salesforce-Informatica pairing View this email online in your browser By Christine Hall Friday, April 19, 2024 Image Credits: Andreessen

New Roundtable! Additive for Mass Production Applications

Friday, April 19, 2024

The Outlook for the Future View this email in your browser engineering.com Roundtable - Additive for Mass Production Applications: The Outlook for the Future 6 Considerations for Choosing the Right

📷 What to Know About Macro Photography — Why You Should Buy a Budget Motherboard

Friday, April 19, 2024

Also: How to Automatically Highlight Values in Excel, and More! How-To Geek Logo April 19, 2024 📩 Get expert reviews, the hottest deals, how-to's, breaking news, and more delivered directly to your

Is the wind going out of the AI sails?

Friday, April 19, 2024

Rippling vacuums up venture capital and Ramp bags more millions View this email online in your browser By Haje Jan Kamps Friday, April 19, 2024 Image Credits: Getty Images / Carol Yepes Welcome to

Llama 3 is out - Weekly News Roundup - Issue #463

Friday, April 19, 2024

Plus: brand-new, all-electric Atlas; AI Index Report 2024; Microsoft pitched GenAI tools to US military; Humane AI Pin reviews are in; debunking Devin; and more! ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Daily Coding Problem: Problem #1417 [Easy]

Friday, April 19, 2024

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Wayfair. You are given a 2 x N board, and instructed to completely cover the board with

Charted | How Hard Is It to Get Into an Ivy League School? 🎓

Friday, April 19, 2024

We detail the admission rates and average annual cost for Ivy League schools, as well as the median SAT scores required to be accepted. View Online | Subscribe Presented by: Discover the motivations

Dark Matter & Tortured Poets

Friday, April 19, 2024

New music releases aren't what they used to be -- for good and bad. Dark Matter & Tortured Poets By MG Siegler • 19 Apr 2024 View in browser View in browser New music releases in 2024 are a

Impact of AI on Product Management

Friday, April 19, 2024

​ Impact of AI on Product Management The rise of the AI Product Manager. Product managers have always championed customer's needs. However, with AI, the job requires new technical and ethical

⚙️ Zuck has entered the chat(bot)

Friday, April 19, 2024

Plus: AI video's coming to mobile! ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌