Hello and thank you for tuning in to Issue #503.
Once a week we write this email to share the links we thought were worth sharing in the Data Science, ML, AI, Data Visualization, and ML/Data Engineering worlds.
Seeing this for the first time? Subscribe here:
Want to support us? Become a paid subscriber here.
If you don’t find this email useful, please unsubscribe here.
And now, let's dive into some interesting links from this week:
:)
Financial Machine Learning - A Survey
We survey the nascent literature on machine learning in the study of financial markets. We highlight the best examples of what this line of research has to offer and recommend promising directions for future research. This survey is designed for both financial economists interested in grasping machine learning tools, as well as for statisticians and machine learners seeking interesting financial contexts where advanced methods may be deployed…
Writing an ‘AI Strategy’ [Reddit Discussion]
My boss wants me to formulate an ‘AI strategy’ for our company. He is very emphatic that we need one so we don’t ‘fall behind’ but is very vague on what he means by AI. I am very against this, I think AI is a nonsense buzzword and I really don’t want our team’s work associated with it. I’m afraid any document will be full of thought bubbles and hype that will do nothing for our credibility with senior management. But - I’ve been told it must be done. How would you approach this?…
Folk Theorization, Platform Spirit, and Adaptation with AI-Driven Social Systems
AI-based platforms are an increasingly important part of our social lives. However, they are currently the equivalent of an unpredictable friend who acts in a confusing, often harmful manner, and refuses to explain themselves. They are a friend that does not deserve upfront trust, and with whom users question continuing their relationship. However, they are still an important friend, as platforms such as Facebook and TikTok have become essential to accomplishing user goals. That leaves users to investigate and speculate on how the system works and how to deal with it to accomplish their goals…
We're a distributed GPU cloud with 10k+ consumer GPUs on our network.
Get more inferences per dollar at massive scale.
Deploy AI/ML production models without headaches and chop your cloud bills by up to 90%.
Try For Free or Get A Demo
Want to sponsor the newsletter? Email us for details --> team@datascienceweekly.org
There is no Data Engineering roadmap
Between Reddit, twitter, LinkedIn and various Slack communities, I see multiple junior folk looking to break into Data Engineering and asking for advice. Every single day. Many ask for a “roadmap” or some kind of step by step lesson plan that will land them their dream job. I don’t believe that such a roadmap exists…So, how does an entry-level engineer get started in Data Engineering?…read on…
AI Junk Is Starting to Pollute the Internet
When she first heard of the humanlike language skills of the artificial-intelligence bot ChatGPT, Jennifer Stevens wondered what it would mean for the retirement magazine she edits. Months later, she has a better idea. It means she is spending a lot of time filtering out useless article pitches…In early May, the news site rating company NewsGuard found 49 fake news websites that were using AI to generate content. By the end of June, the tally had hit 277…
Satellite Image Time Series Datasets
This page presents a list of satellite imagery datasets with a temporal dimension, mainly satellite image time series (SITS) and satellite videos, for various computer vision and deep learning tasks. It covers multi-temporal datasets with more than two acquisitions but not bi-temporal datasets…
What Should Data Science Education Do with Large Language Models?
The rapid advances of large language models (LLMs), such as ChatGPT, are revolutionizing data science and statistics. These state-of-the-art tools can streamline complex processes. As a result, it reshapes the role of data scientists. We argue that LLMs are transforming the responsibilities of data scientists, shifting their focus from hands-on coding, data-wrangling and conducting standard analyses to assessing and managing analyses performed by these automated AIs. This evolution of roles is reminiscent of the transition from a software engineer to a product manager. We illustrate this transition with concrete data science case studies using LLMs in this paper…
Can LLMs Generate Random Numbers? Evaluating LLM Sampling in Controlled Domains
In this paper, we empirically evaluate LLMs’ capabilities as distribution samplers. We identify core concepts and metrics underlying LLM-based sampling, including different sampling methodologies and prompting strategies. Using a set of controlled domains with known target distributions, we evaluate the error and variance of the distributions induced by the LLM. We find that LLMs struggle to induce the target distributions over generated elements, suggesting that practitioners should more carefully consider the semantics and methodologies of sampling from LLMs…
China finalizes first-of-its-kind rules governing generative A.I. services
Generative AI services will need to obtain a license to operate, the Cyberspace Administration of China (CAC) said. If a generative AI service provider finds “illegal” content, it should take measures to stop generating that content, improve the algorithm and then report that material to the relevant authority. Providers of these services must conduct security assessments on their product and ensure user information is secure. Generative AI services in China must also adhere to the “core values of socialism,” the CAC said. Still, regulators are trying to strike the balance between making China a leader in artificial intelligence while keeping a close eye on its development…
Mozilla’s Reflections on AI Explain: A postmortem
We [Mozilla] recently launched two new AI experiences — AI Explain and AI Help. Thanks to feedback from our community, we realized that AI Explain (A way for readers to explore and understand code examples embedded in MDN documentation pages, describing the purpose and behavior of the code or parts of the example) needs more work. We have disabled it and will be working on it to make it a better experience. In this blog post, we look into the story behind AI Explain: its development, launch, and the reasons that led us to press the pause button…
Large language models encode clinical knowledge
We introduce instruction prompt tuning, a parameter-efficient approach for aligning LLMs to new domains using a few exemplars. The resulting model, Med-PaLM, performs encouragingly, but remains inferior to clinicians. We show that comprehension, knowledge recall and reasoning improve with model scale and instruction prompt tuning, suggesting the potential utility of LLMs in medicine. Our human evaluations reveal limitations of today’s models, reinforcing the importance of both evaluation frameworks and method development in creating safe, helpful LLMs for clinical applications…
M&A, competition, pricing, and investing - Podcast with Julia Schottenstein from dbt Labs
Julia Schottenstein is a product lead at dbt Labs, a data transformation company, and an active angel investor in data and infrastructure startups…In today’s episode, we discuss: • Advice for founders hoping to improve their M&A outcome • How to strategically think about competition • How to determine your paid features and have willingness-to-pay conversations • Why Julia lives by “worse is better” and “tech debt is a champagne problem” • Lessons from dbt Labs • What PMs can learn from investors…
Noob question, but how do you deploy and automate a data pipeline?
I’m relatively new to data ops and have a learned a bunch out of necessity, but I really like Data Engineering and want to pursue it further…One thing that’s always baffled me though is how you actually deploy and automate a pipeline. I’ve done a few snowflake labs where I did this via python on a local machine, but what’s the next step up? Let’s say I want to take data from Postgres in AWS RDS and send it to Snowflake. Do I write a python script and upload to EC2 then schedule it to run via a cron job?..
* Define and deploy new approaches, exploiting the wealth of data and the power of associated technologies, to respond to the problems of teams in areas: Americas, China, Japan, Europe, South Asia and North Asia
* Collaborate on a daily basis with the teams on their needs and build ready-to-use algorithms in order to feed their business challenges and customer experience in particular
* Ensure all stages of data science projects: framing and management, implementation, development and operation, adoption and commitment
Support the business teams in the use of the tools put in place to serve their challenges and enable them to act more and more independently
* Ensure the governance of Data Science projects according to the defined principles
* Monitor, maintain and improve the models and tools in place
Apply here
Want to post a job here? Email us for details --> team@datascienceweekly.org
* Based on unique clicks.
** Find last week's issue #502 here.
Thanks for joining us this week :)
All our best,
Hannah & Sebastian
P.S.,
If you found this newsletter helpful, consider supporting us by becoming a paid subscriber here: https://datascienceweekly.substack.com/subscribe :)
Copyright © 2013-2023 DataScienceWeekly.org, All rights reserved.