Data Science Weekly - Data Science Weekly - Issue 444

Curated news, articles and jobs related to Data Science. 
Keep up with all the latest developments
Email not displaying correctly?
View it in your browser.

Issue #444

May 26 2022

Editor Picks

 
  • Stanford MLSys Seminar Episode 5: Chip Huyen [Video]
    This talk covers what it means to operationalize ML models. It starts by analyzing the difference between ML in research vs. in production, ML systems vs. traditional software, as well as myths about ML production...It then goes over the principles of good ML systems design and introduces an iterative framework for ML systems design, from scoping the project, data management, model development, deployment, maintenance, to business analysis...The talk ends with a survey of the ML production ecosystem, the economics of open source, and open-core businesses....
  • NonCompositional or Why composition is DALL-E’s strength, not its weakness
    When we compose meanings, concepts, semantics or any other ‘elements’ of cognition, the outcome is not easily predictable like it is when we compose functions in mathematics or operations in a computer programme....it makes no sense to criticise DALL-E (or neural networks in general) for their poor composition. It is precisely because their composition is surprisingly good that emotions have been stirred and people are enjoying tweeting and sharing these things so much! Yeah, all good fun, but we can’t learn anything scientific or conceptual from this brute-force approach Well, I’m not so sure. Let’s consider a bit more history….
 
 

A Message from this week's Sponsor:

 



Retool is the fast way to build an interface for any database

With Retool, you don't need to be a developer to quickly build an app or dashboard on top of any data set. Data teams at companies like NBC use Retool to build any interface on top of their data—whether it's a simple read-write visualization or a full-fledged ML workflow.

Drag and drop UI components—like tables and charts—to create apps. At every step, you can jump into the code to define the SQL queries and JavaScript that power how your app acts and connects to data. The result—less time on repetitive work and more time to discover insights.

 

 

Data Science Articles & Videos

 
  • Imagen - unprecedented photorealism × deep level of language understanding
    We present Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding. Imagen builds on the power of large transformer language models in understanding text and hinges on the strength of diffusion models in high-fidelity image generation. Our key discovery is that generic large language models (e.g. T5), pretrained on text-only corpora, are surprisingly effective at encoding text for image synthesis: increasing the size of the language model in Imagen boosts both sample fidelity and image-text alignment much more than increasing the size of the image diffusion model. ...
  • A survey on adversarial attacks and defences
    In recent times, different types of adversaries based on their threat model leverage vulnerabilities to compromise a deep learning system where ad-versaries have high incentives...However, there are only a few strong countermeasures which can be used in all types of attack scenarios to design a robust deep learning system. Herein, the authors attempt to provide a detailed discussion on different types of adversarial attacks with various threat models and also elaborate on the efficiency and challenges of recent countermeasures against them......
  • Let's Continue Bundling into the Database
    A very silly blog post came out a couple months ago about The Unbundling of Airflow. I didn’t fully read the article, but I saw its title and skimmed it enough to think that it might’ve been too thin of an argument to hold water...I actually don’t care that much about the bundling argument that I will make in this post. Truthfully, I just want to argue that feature stores, metrics layers, and machine learning monitoring tools are all abstraction layers on the same underlying concepts, and 90% of companies should just implement these “applications” in SQL on top of streaming databases...
  • Bridging the Resource Divide for Artificial Intelligence Research
    White House report from Lynne Parker, Deputy United States Chief Technology Officer and Director of the National Artificial Intelligence Initiative Office...Today, as co-chair of the Task Force and as part of OSTP’s broader work to advance the responsible research, development, and use of AI, I am proud to announce the submission of the interim report of the NAIRR Task Force to the President and Congress. This report lays out a vision for how this national cyberinfrastructure could be structured, designed, operated, and governed to meet the needs of America’s research community...
  • Introducing PeerXiv - A modern platform for peer-review of preprints
    What would a peer review process look like if it was designed today? Peer review is one of the cornerstones of the research community, and yet while our community keeps advancing and growing, the reviewing process remains almost unchanged...We strongly believe that peer review can be so much better for both authors and reviewers and we are excited to share PeerXiv, our proposal to do just that...
  • Artificial intelligence is breaking patent law
    The patent system assumes that inventors are human. Inventions devised by machines require their own intellectual property law and an international treaty...In 2020, a machine-learning algorithm helped researchers to develop a potent antibiotic that works against many pathogens. Artificial intelligence (AI) is also being used to aid vaccine development, drug design, materials discovery, space technology and ship design. Within a few years, numerous inventions could involve AI. This is creating one of the biggest threats patent systems have faced...
  • On the Impact of Data Augmentation on Downstream Performance in Natural Language Processing
    Data augmentation is a common strategy to improve generalization and robustness of machine learning models. While data augmentation has been widely used within computer vision, its use in the NLP has been been comparably rather limited. The reason for this is that within NLP, the impact of proposed data augmentation methods on performance has not been evaluated in a unified manner, and effective data augmentation methods are unclear. In this paper, we look to tackle this by evaluating the impact of 12 data augmentation methods on multiple datasets when finetuning pre-trained language models...
  • AI reveals unsuspected math underlying search for exoplanets
    University of California, Berkeley, astronomers found unsuspected connections hidden in the complex mathematics arising from general relativity—in particular, how that theory is applied to finding new planets around other stars...In a paper appearing this week in the journal Nature Astronomy, the researchers describe how an AI algorithm developed to more quickly detect exoplanets when such planetary systems pass in front of a background star and briefly brighten it—a process called gravitational microlensing—revealed that the decades-old theories now used to explain these observations are woefully incomplete....
 
 

Tools*

 



Check out the new Anaconda Community for all-things data!

Want insights into the newest developments in the world of data, or need help getting “unstuck” on a problem?

Our Community Forums is the place to go! Be the first to engage with other professionals and ask questions to the broader data community. Users can join in conversations around trends, debate new features, post questions to the community, and more. Plus, it’s another avenue for technical help!

Create your free Anaconda Community account now.



*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

 
 

Jobs

 
  • Data Scientist - Hungryroot - Remote

    Hungryroot is looking for a Data Scientist to join our growing Data Team. As a Data Scientist, you will work closely with other Data Scientists and Data Engineers to develop various Machine Learning models that power Hungryroot and it’s AI functions. These models include traditional forecasting models, as well as more industry-specific optimization challenges.

    As a Data Scientist at Hungryroot, you will work on answering questions like: how do you tell what food someone would like to eat this week, how do you determine whether they enjoyed it or not, maybe they liked their means last week, but are now looking for different options, maybe they like the same food on Tuesdays, but variety on Fridays, what about spicy food, is Green Chilly as spicy as Green Curry?

     

        Want to post a job here? Email us for details --> team@datascienceweekly.org

 
 

Training & Resources

 
  • MIT Spring 2022 Machine Learning for Healthcare Class (6.871/HST.956)
    Introduces students to machine learning in healthcare, including the nature of clinical data and the use of machine learning for risk stratification, disease progression modeling, precision medicine, diagnosis, subtype discovery, and improving clinical workflows. Topics include causality, interpretability, algorithmic fairness, time-series analysis, graphical models, deep learning and transfer learning. Guest lectures by clinicians from the Boston area and course projects with real clinical data emphasize subtleties of working with clinical data and translating machine learning into clinical practice....
  • What Is Active Metadata, and Why Does It Matter?
    Just like data mesh or the metrics layer, active metadata is the latest hot topic in the data world. As with every other new concept that gains popularity in the data stack, there’s been a sudden explosion of vendors rebranding to “active metadata”, ads following you everywhere and...confusion...With everyone talking about active metadata, it must be pretty easy to understand, right?...I’ve broken down the ideas behind active metadata with as little jargon as possible. Keep reading to learn what active metadata is, what it looks like, how you can actually use it, how it fits into the modern data stack, and why it even matters...
 
 

What you’re up to – notes from DSW readers

 
  • Alex is working on building a predictive model for customer segmentation...
  • Daniel Czwalinna is working on measuring the effect of false labels on image classification model performance....
  • Andrew Van Dyke is working on an algorithmic trading system. Algorithms are randomly generated from compositions of basic math functions on input data. These algorithms are then refined via a Genetic Algorithm....
  • Frank is working on Master's thesis in logistics and supply chain management...
  • Frank Corrigan is working on building a NLP-enabled async, voice-first communication platform....
 

* To share your projects and updates, share the details here.

** Want to chat with one of the above people? Hit reply and let us know :)

 

Last Week's Newsletter's 3 Most Clicked Links

 

* Based on unique clicks.

** Find last week's newsletter here.

 

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Follow on Twitter
Copyright © 2013-2022 DataScienceWeekly.org, All rights reserved.
unsubscribe from this list    update subscription preferences 

Key phrases

Older messages

Data Science Weekly - Issue 443

Thursday, May 19, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #443 May 19 2022 What are you up to? Hi

Data Science Weekly - Issue 442

Thursday, May 12, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #442 May 12 2022 Editor Picks "

Data Science Weekly - Issue 440

Thursday, May 5, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #441 May 5 2022 Editor Picks How

Data Science Weekly - Issue 440

Thursday, April 28, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #440 April 28 2022 Editor Picks Beyond

Data Science Weekly - Issue 439

Thursday, April 21, 2022

Curated news, articles and jobs related to Data Science. Keep up with all the latest developments Email not displaying correctly? View it in your browser. Issue #439 April 21 2022 Editor Picks Real

You Might Also Like

New Blogs on ThomasMaurer.ch for 04/16/2024

Tuesday, April 16, 2024

View this email in your browser Thomas Maurer Cloud & Datacenter Update This is the update for blog posts on ThomasMaurer.ch. Automate on-premises Windows Server from the cloud using Azure Arc By

April TC39 meeting; Rspack v0.6; future of JS packages; Evan You on Vue, Vite, etc.

Tuesday, April 16, 2024

We have 7 links for you - Stay up-to-date on JavaScript and tools This Week In React - Keeps senior React devs up to date thisweekinreact.com Partner We keep over 37k mid/senior React devs up-to-date

Ingesting & Using CAD Data for Real-Time 3D

Tuesday, April 16, 2024

How engineering firms leverage real-time 3D technology View this email in your browser engineering.com White Paper - Ingesting and Using CAD Data for Real-Time 3D Ingesting and Using CAD Data for Real-

It’s Easy. But Is It Easy Enough? 🤓

Monday, April 15, 2024

Is self-hosting still too hard for normal people? Here's a version for your browser. Hunting for the end of the long tail • April 15, 2024 It's Easy. But Is It Easy Enough? Self-hosted apps are

Re: Free Class: Master the Notes app

Monday, April 15, 2024

Hi there, We are holding a Free Notes App Class tomorrow (Wednesday, April 17) at 4:30 pm ET! We do expect this class to fill up so register soon to save your spot! I wanted to take a minute to answer

Two Tesla execs leave amid layoffs

Monday, April 15, 2024

Tesla execs bid adieu View this email online in your browser By Christine Hall Monday, April 15, 2024 Welcome back to TechCrunch PM, where you can find me each day bringing you the most important

🍏 Why You Should Buy the MacBook Air Over the Pro — Thrift Stores Are a Goldmine for Geeks

Monday, April 15, 2024

Also: How to Play Epic Game Titles on the Steam Deck, and More! How-To Geek Logo April 15, 2024 📩 Get expert reviews, the hottest deals, how-to's, breaking news, and more delivered directly to your

JSK Daily for Apr 15, 2024

Monday, April 15, 2024

JSK Daily for Apr 15, 2024 View this email in your browser A community curated daily e-mail of JavaScript news Embracing Functional Programming: Streamlining Code with Reusability and Maintainability

True Anomaly and Rocket Lab will make big moves on orbit (literally)

Monday, April 15, 2024

The Space Force has contracted out its next "responsive space" mission, and this one is a doozy. View this email online in your browser By Aria Alamalhodaei Monday, April 15, 2024 Hello and

Daily Coding Problem: Problem #1413 [Medium]

Monday, April 15, 2024

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Square. Given a string and a set of characters, return the shortest substring containing