| | | Good morning. According to the most recent reporting from The Information, Softbank’s pending investment in OpenAI would grant the startup a $260 billion valuation. | Though still eye-wateringly enormous, when reports of the funding first surfaced, the valuation number being thrown around went as high as $340 billion. I don’t care who you are, a difference of $80 billion is a big one. | — Ian Krietzberg, Editor-in-Chief, The Deep View | In today’s newsletter: | 🌊 AI for Good: Coral reefs 👁️🗨️ Canada’s privacy watchdog investigates X for AI training 💰 The AI price war: DeepSeek undercuts OpenAI again 🔬 Report: The ‘skillful plagiarism’ of LLM research
|
| |
| AI for Good: Coral reefs |  | Source: Unsplash |
| Though they only cover about 1% of the world’s oceans, coral reefs are a vital component of our global ecosystem, according to NOAA. Home to at least 25% of all marine life, the reefs protect shorelines from storms, provide food to fishermen, are a potential source of new medicines and a massive economic boon through tourism (Australia’s Great Barrier Reef injects more than $6 billion to the economy each year). | But corals are increasingly threatened by our changing climate. The increase in atmospheric carbon emissions — which are, in part, absorbed by the oceans — is causing an increase both in water temperature and ocean acidification, which is weakening and killing corals. | As in many areas of conservation, researchers are turning to technology as a means of improving their capacity to protect coral reefs. A team of Australian researchers launched a project that aims to combine all global data about coral reefs with machine learning models and remote sensing to build better predictive algorithms to aid coral conservation efforts around the world. | Historically, a lack of organized and quality data regarding corals has been an impediment to the application of large-scale machine-learning approaches. This project aims, in part, to overcome that. With enough of the right data, some machine learning systems are able to detect and monitor specific features of a given coral, while others are capable of predicting environmental changes that might negatively impact a coral reef.
| Why it matters: These predictions will enable researchers to intervene more quickly, giving conservationists a greater chance of keeping endangered corals alive. |
| |
| | AI's next era with DeepMind's Jeff Dean | | Turing’s AGI Icons Episode 3 features Jeff Dean, Google DeepMind’s Chief Scientist, and Jonathan Siddarth, CEO of Turing, as they discuss the future of AGI and how AI is transforming software engineering, research, and businesses worldwide. | Tune in to learn: | The evolution of AI from early days to AGI Key challenges in scaling intelligence The future of AI’s real-world impact
| Watch the episode now |
| |
| Canada’s privacy watchdog investigates X for AI training |  | Source: Unsplash |
| Canada’s privacy watchdog last week opened an investigation into Twitter (AKA, X) after receiving a complaint that the platform may be in violation of Canada’s Personal Information Protection and Electronic Documents Act. | The details: The organization said in a statement that the investigation will “focus on the platform’s compliance with federal privacy law with respect to its collection, use and disclosure of Canadians’ personal information to train artificial intelligence models.” | In addition to collecting an enormous quantity of personal user data to power its algorithms and show users personalized advertisements, X additionally — by default — shares all relevant user data with xAI to train Grok, xAI’s family of commercial large language models (LLMs). This includes public posts (including Spaces), as well as user engagement with public posts. “This also means that when you interact with Grok on X, your interactions, inputs and results may be used to train and improve the performance of those generative AI models developed by xAI.”
| Though this data sharing can be disabled, it is enabled by default, thus requiring users to opt out in order to protect their data, which means opting out now is, in many senses, too late, since the model was already trained. | It’s a model that has become all too common lately, with Meta last year adjusting its own privacy policies to give itself access to that treasure trove of leverageable user data gathered by its series of social media platforms. It’s something that has been met with global resistance, though users have hardly abandoned these platforms. | Elon Musk owns and operates both X and xAI; he is also closely associated with the current administration. President Donald Trump recently affirmed that tariffs against Canada will go into effect on March 4. |
| |
| | | NOAA gets crippled: The Trump Administration last week announced the firing of hundreds of NOAA and National Weather Service employees, severely hamstringing both organizations’ efficacy. Together, the two organizations are responsible for the collection of a vast amount of raw climate data that is made openly accessible around the world, powering everything from weather prediction to many climate-related machine learning and AI models. A federal judge ruled that the mass firings were unlawful. Startup funding: Though it may be a startup itself, OpenAI likes to keep its eggs in all sorts of baskets. The company has a massive startup fund, which it has used to support these tech startups.
| | Doge’s misplaced war on software licenses (Wired). Microsoft resolves global outage that left tens of thousands unable to access email and other apps (CNBC). Microsoft calls US chip curbs a ‘strategic misstep’ in global AI race (Semafor). Softbank’s son goes on a new borrowing binge to fund AI (The Information). Secret US drone program helped capture Mexican cartel boss (WSJ).
|
| |
| The AI price war: DeepSeek undercuts OpenAI again |  | Source: Unsplash |
| When DeepSeek launched R1, a so-called reasoning model that performs on par with OpenAI’s models, the masses and the markets alike panicked. The reason largely had to do with the costs of operation; the model seemingly cost comparatively very little in its final training run, and DeepSeek further decided to make the model freely accessible, something that has the potential to upend OpenAI’s — and others’ — very tenuous business model. | The Chinese firm has been leaning into this, recently; to finish up its “open source week,” DeepSeek opened up some of the statistics around its operation for the first time. The findings should, once again, be enough to make OpenAI and its investors at least a little nervous. | Over the most recent 24-hour period, the combined cost of inference — the cost of running its models — for both R1 and DeepSeek’s earlier V3 model is right around $87,000, assuming a leasing cost of $2 per hour for each H800 chip they use (most Western companies use the more advanced, and more expensive, H100s). This is across a total of 608 billion input tokens and 168 billion output tokens. If all of those tokens were billed at DeepSeek’s R1 pricing — $0.14 per million input tokens and $2.19 per million output tokens — DeepSeek would be raking in $562,027 in revenue per day, notably, with a pretty high cost-profit margin of 545%.
| Now, the reality isn’t quite there, at least at the moment. DeepSeek said that its “actual revenue is substantially lower” due to the fact that it only monetizes a small subset of its services, as well as the fact that V3’s pricing is even lower than R1’s. | And that $87,000 number also doesn’t tell the full story when it comes to total expenses, which would include salaries, research and development costs and any other number of things necessary to run a business. | This is all in rather stark contrast to OpenAI, which, shortly after launching o3 mini at a much more competitive pricepoint to R1, launched GPT-4.5, a model so expensive — roughly 500 times the input token cost of R1 — that, in its launch, OpenAI warned it might not host the model in its API for very long. | OpenAI lost $5 billion in 2024 on $4 billion in revenue; the company spent $2 billion alone on compute to run its models. OpenAI expects to burn a minimum of $44 billion between 2023 and 2028, and is in the process of raising a massive funding round at a $260 billion valuation. |
| |
| Report: The ‘skillful plagiarism’ of LLM research |  | Source: Unsplash |
| The idea of automated scientific research is one that has a lot of people very, very excited. | Last year, some of this excitement culminated in the publication of a paper that claimed that large language models (LLMs) produce “more novel” research ideas than human experts. The study, which wasn’t peer-reviewed, focused exclusively on natural language processing research, so it dealt with a pretty narrow domain, but the findings seemed to bolster the idea that LLMs can automatically begin accelerating scientific discovery. | Sounds great. | But, in a recent paper, two researchers from the Indian Institute of Technology examined the phenomenon, finding that a significant portion of these LLM research proposals are “skillfully plagiarized, bypassing inbuilt plagiarism checks and unsuspecting experts.” | The means through which this is accomplished vary; in some cases, the LLMs simply used different terminology to reference the same thing (“resonance graph” instead of “weighted adjacency matrix,” for example). | The details: Unlike that first paper, whose authors recruited experts to evaluate the “novelty” of proposed research ideas, the authors of this paper recruited 13 experts to identify similarities between LLM-generated research documents and existing papers. | The experts read a total of 50 LLM-generated research documents, and were instructed to grade each paper on a scale of 1-5, where 5 indicates direct copying and 1 indicates a possibly novel concept. After cross-checking with the authors of the source papers, 36% of the papers examined were graded with a four or a five, both scores indicative of obvious plagiarism. These documents did not acknowledge the original papers, and were not flagged by plagiarism detectors.
| Only 4% of the papers received a score of one, indicating apparent originality, and 28% received a score of two, which indicates that the paper bears some slight resemblance to existing papers. Dr. Danish Pruthi, one of the authors, clarified that “non-plagiarized proposals are NOT necessarily novel.” | “Our analysis reveals a concerning pattern wherein a significant portion of LLM-generated research ideas appear novel on the surface but are actually skillfully plagiarized in ways that make their originality difficult to verify,” the paper reads, adding that the LLM-generated research tends to follow “more predictable patterns,” meaning it might be detectable through basic classification methods. | “While we do not recommend wholesale dismissal of LLM-generated research, our findings suggest that they may not be as novel as previously thought, and additional scrutiny is warranted,” according to the paper. “The sophisticated nature of the plagiarism we uncover suggests that widespread adoption of these tools could significantly impact the peer review process, requiring (already overwhelmed) reviewers to spend additional time searching for potential content misappropriation.” | This is only one part of the risk posed by letting LLMs into science; in 2023, a team of researchers at the Oxford Internet Institute said that science and education must be protected from the spread of bad, false and biased information that LLMs are prone to producing due to algorithmic bias and hallucinations, two things that, despite all the benchmark improvements we’ve seen, have not gone away. | “The way in which LLMs are used matters,” Professor Sandra Wachter said. “In the scientific community it is vital that we have confidence in factual information, so it is important to use LLMs responsibly. If LLMs are used to generate and disseminate scientific articles, serious harms could result.” | The team suggested researchers use LLMs for translation, not as a knowledge base. | | Plagiarism almost certainly exists among human-written research papers, but that’s not really the point. Since LLMs are designed to produce the most likely tokens, they are inherently incapable (at this stage) of themselves contributing to science. Researchers employing pattern-recognition tools and machine learning tech to parse and analyze massive datasets can contribute to science, but pure LLMs spitting out research proposals won’t push the frontier. | The more likely scenario is an obfuscation of science, a gradual, nuanced and subtle degradation of knowledge. | As the Oxford researchers indicate, LLMs can be useful assistants, when deployed properly. To treat LLM-generated research without layers of caution and skepticism would be to allow growing layers of rot into our research ecosystem. | It’s the same thing we’re seeing in other places; the internet has become littered with LLM-generated pollution, meaningless amalgamations of internet-scale training data, pumped out in volume and without an ounce of intention, making it more difficult to find the things that actually matter. | | | Which image is real? | | | | | 🤔 Your thought process: | Selected Image 2 (Left): | |
| |
| 💭 A poll before you go | Thanks for reading today’s edition of The Deep View! | We’ll see you in the next one. | Here’s your view on GPT-4.5: | Most of you haven’t tried it out, yet. | The rest are evenly split between thinking it’s impressive, or that it’s awful, or just wondering where GPT-5 is. | Nope: | | Something else: | | Would you hypothetically invest in OpenAI, considering the varying strength of the competition? | | If you want to get in front of an audience of 450,000+ developers, business leaders and tech enthusiasts, get in touch with us here. |
|
|
|