March 04, 2024 | Read Online

FOD#43: How do you Prompt a Black Box?

we explore the updated problem of black box and offer the best curated list of the freshest ML news and papers

Was this email forwarded to you? Sign up here

Next Week in Turing Post:

Wednesday, Token 1.23: Detecting and Mitigating Bias
Friday, AI Infra Unicorns: CoreWeave

If you like Turing Post, please consider to support us. You will also get full access to our most interesting articles and investigations →

UPGRADE NOW

In today's world, for many people, conversing with AI has become as routine as discussing one's coffee preferences with a barista (or it soon will be!). Yet, here lies an irony: the more we interact with AI, the more elusive our understanding of these conversations becomes. This irony essentially represents a modern twist on the "black box" dilemma, which has perplexed the ML community for years.

The "black box" problem refers to the opaque decision-making processes of ML models (large language models (LLMs) including), where the rationale behind any given response is shrouded in complexity. Despite advances in technology, the inner workings of these models, governed by billions or soon trillions of parameters, remain largely inscrutable. Their decision-making is a puzzle, complicated by nonlinear interactions that defy straightforward interpretation.

Prompting – the buzzing word of 2023 – doesn't make it any better: we now have obscured layers of communication activated with every prompt. What we see – the prompt we type – is merely the surface. Beneath lies a hidden dialogue, an augmented system prompt, which is a complex, coded conversation the model conducts with itself, away from our understanding. And who knows what a model whispers to itself.

So, if you were confused about prompting amid the avalanche of articles, blogs, and tutorials about it – you should be. As Ethan Mollick’s research reveals, contrary to intuition, the most effective prompts involve imaginative scenarios, such as pretending to navigate a Star Trek episode or a political thriller, demonstrating that traditional logical or direct prompts may not always yield the best responses from AI.

But it also reveals that it’s not coherent and might change with a new version of the model. He mentions the futility of seeking a universal "magic phrase" for AI interaction, the effectiveness of specific prompting techniques like adding context, few-shot learning, and Chain of Thought, and the significant impact that prompts can have on AI performance.

But for me – and I’ve been using AI a lot – many times the most straightforward prompts, or "magic words," can be surprisingly effective.

How to explain it? A few years back, Explainable AI (XAI) was heralded as a solution to the "black box" issue, with entities like DARPA leading the charge (they created XAI toolkit, that has not been updated since 2021). However, the buzz around XAI seems to have dimmed, overtaken by a broader focus on Responsible AI. Is Responsible AI the solution? Let me know if you want to share your insights and write a guest post about it.

So, that’s what we end up with:

How do machines make decisions? – We don’t know!

How to talk (prompt) to them? – We don’t know as well!

But, please, keep shipping to us new, larger (though we will also take smaller) models! Why? – We don’t know! But we can’t stop.

Twitter Library

AI Code Generation: A Complete List

Discover & Choose the Perfect AI Code Generation Tool for Your Needs

www.turingpost.com/p/code-generation-ai

Share the newsletter

News from The Usual Suspects ©

Elon Musk vs OpenAI

The narrative that we discussed in the editorial gains another layer with Elon Musk’s lawsuit against OpenAI over a breach of contract (it’s not open anymore which makes Elon unhappy). This legal battle could potentially unveil some of OpenAI's internal operations, offering a rare glimpse into the workings of advanced AI models.

Seems like a good plan from the very beginning! (sarcasm)

Anthropic and three Claudes

Meet Opus, Sonnet, and the upcoming Haiku – new Claude models – each excelling in deep processing, efficiency, and speed, respectively. Opus surpasses GPT-4's performance in benchmarks, supporting text and images with a large context window, priced at $15 per million tokens. They promise enterprise-grade security, including SOC II and HIPAA compliance, with AWS/GCP compatibility, boasting features like a 200K context window, multimodality, low hallucination rates, and high accuracy on long documents. The models cover undergraduate to graduate-level knowledge and basic mathematics, with Sonnet being free and approximately twice as fast as GPT-4.

Groq – a new player with big plans

Groq – recently becoming famous by it’s Language Processing Unit (LPU) that makes inference much faster (read our explanation what it is here) acquired Definitive Intelligence to enhance AI solutions and cloud platform. Moving towards their goal to provide high-speed inference for generative AI and establish a foothold in the competitive custom AI chips market.

Lightricks – the oldest GenAI Unicorn

Lightricks (read their profile here), known for apps like Facetune, announced LTX Studio, an AI-powered filmmaking tool. It aids creators from ideation to generating AI-powered clips, understanding storylines. It's web-based, free initially, and invites waitlist sign-ups. The tool crafts scripts, storyboards, and characters, allowing scene customization and character editing.

Google – from Gemini to Genie

Google first surprised the internet with the extra-woke Gemini, then melted everybody's hearts with Genie. Google DeepMind's Genie is a groundbreaking generative model capable of creating playable 2D video games from text, sketches, or photos. Uniquely, it learns detailed controls from unlabeled videos, understanding actions and their variations within environments. Although in early development, Genie's potential spans simulations, gaming, and robotics, marking a new frontier in generative AI.
Stack Overflow and Google Cloud have partnered to deliver new AI-powered features to developers through the Stack Overflow platform, Google Cloud Console, and Gemini for Google Cloud. Good move to get access to trusted and accurate knowledge, and code from the Stack Overflow community.

Patterns of giants

Microsoft, known for its substantial investment in OpenAI, has expanded its AI ecosystem by investing €15 million ($16.3 million) in Paris-based Mistral AI and forming partnerships with AI startups Cohere and Mistral, integrating their models into Azure's offerings.
Following suit, Alibaba has recently made strategic investments in several Chinese generative AI startups, including Moonshot AI, Baichuan AI, Zhipu AI, and 01.AI (founded by Kai-Fu Lee). These moves aim to diversify Alibaba's stakes in China's AI sector and foster early ties with emerging leaders in the field. In addition to these investments, Alibaba Cloud has launched Model Studio to aid AI development and announced significant price reductions to boost AI innovation in China.

To OpenAI, to close the circle

So many things are happening to OpenAI: → The Sora demo receives mixed feedback for its impressive visuals but questionable physics and biology; OpenAI researcher Andrej Karpathy – who never participated in any scandal and has the highest authority among researchers – leaves the company; ChatGPT experiences a significant issue, causing it to malfunction for several hours; Microsoft invests in others, Claude Opus beats GPT-4 across the benchmarks; Elon Musk wants to justice.

The freshest research papers, categorized for your convenience

Special category: Definitely Worth Reading:

The Era of 1-bit LLMs: Discusses the development and advantages of 1-bit LLMs, promising significant cost reductions and efficiency improvements. Read the paper
Beyond Language Models: Introduces bGPT, a model that simulates the digital world beyond traditional modalities, predicting and diagnosing algorithms or hardware behavior. Read the paper

Language Models in Specialized Domains

ChatMusician: Integrates music understanding and generation capabilities into LLMs, demonstrating LLMs' potential in music composition. Read the paper
StructLM: Aims to bridge LLMs' gap in interpreting structured data, enhancing their ability to ground knowledge in tables, graphs, and databases. Read the paper
StarCoder2 and The Stack v2: Focuses on responsibly creating Code LLMs, contributing to advancements in coding benchmarks and emphasizing model openness. Read the paper
Video as the New Language for Real-World Decision Making: Discusses video generation's potential as a unified interface for diverse tasks, outlining challenges and future directions. Read the paper

Enhancing and Merging Language Model Capabilities

FUSECHAT: Proposes a method to fuse knowledge from multiple chat models, improving chat model performance through a novel merging technique. Read the paper
Nemotron-4 15B Technical Report: Details a multilingual language model that showcases superior performance in coding tasks and multilingual capabilities. Read the paper
Do Large Language Models Latently Perform Multi-Hop Reasoning?: Explores latent multi-hop reasoning in LLMs, revealing their inherent capabilities and limitations in complex reasoning tasks. Read the paper

Scaling and Efficiency in Model Training

MegaScale: Discusses a system for training LLMs on over 10,000 GPUs, tackling efficiency and stability challenges in large-scale model training. Read the paper
Towards Optimal Learning of Language Models: Proposes a theory for optimizing LLM learning, aiming for reduced training steps and improved performance. Read the paper
Griffin: Introduces a model combining gated linear recurrences with local attention, offering an efficient alternative for language processing tasks. Read the paper
When scaling meets LLM finetuning: Investigates the effects of scaling on fine-tuning LLMs, providing insights into data, model, and method impacts on bilingual tasks. Read the paper

Improving Robustness and Diversity in AI

Rainbow Teaming: Generates diverse adversarial prompts to enhance LLM robustness, employing an open-ended search method for prompt discovery. Read the paper
Priority Sampling of Large Language Models for Compilers: Proposes a deterministic sampling technique for code generation, improving sample diversity and model performance in compiler optimization. Read the paper

Generative Models and Interactive Environments

Genie: Trains a generative model to create interactive virtual worlds from various inputs, advancing generative AI and simulation capabilities. Read the paper

In other newsletters

We don’t miss any papers, creating a weekly roundup of the freshest papers for you. However, if you feel you've missed a few weeks and need to catch up, Sebastian Raschka’s monthly summary is the best way to do it.
One can only admire how Gary Markus turns every news piece to be about *him*.
An Interview with Nat Friedman and Daniel Gross Reasoning About AI in Stratechery by Ben Thompson.

We are watching

No AI was involved! Please enjoy this collaboration between two dear friends who met during the conference initiated by TrackTwo: an institute for citizen diplomacy (another passion of mine).

WILL LAUT – Not The Day (Official Music Video) ft. Joe Orrach

If you decide to become a Premium subscriber, remember, that in most cases, you can expense this subscription through your company! Join our community of forward-thinking professionals. Please also send this newsletter to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve. 🤍 Thank you for reading

Become Premium

How was today's FOD?

Please give us some constructive feedback

Reach out to learn about Sponsorship | Follow us on X (Twitter) and LinkedIn | Reply to this email if you have any questions or suggestions

Update your email preferences or unsubscribe here

1434 Western Ave, Suite 1 #4796
Albany, New York 12203, United States

Powered by beehiiv

Market Mix - FOD#43: How do you Prompt a Black Box?

FOD#43: How do you Prompt a Black Box?

Next Week in Turing Post:

Twitter Library

News from The Usual Suspects ©

Elon Musk vs OpenAI

Anthropic and three Claudes

Groq – a new player with big plans

Lightricks – the oldest GenAI Unicorn

Google – from Gemini to Genie

Patterns of giants

To OpenAI, to close the circle

The freshest research papers, categorized for your convenience

Special category: Definitely Worth Reading:

Language Models in Specialized Domains

Enhancing and Merging Language Model Capabilities

Scaling and Efficiency in Model Training

Improving Robustness and Diversity in AI

Generative Models and Interactive Environments

In other newsletters

We are watching

How was today's FOD?

Older messages

💰 This Newsletter Has Been Acquired!

40K points is unfathomable

Bitcoin's March Madness: Surging Past $65K

The Art of Metrics, Systems and Meetings

Here's how NIL Collectives can not suck at NIL

You Might Also Like

🔮 $320B investments by Meta, Amazon, & Google!

✍🏼 Why founders are using Playbookz

Is AI going to help or hurt your SEO?

Our marketing playbook revealed

Connect one-on-one with programmatic marketing leaders

Outsmart Your SaaS Competitors with These SEO Strategies 🚀

Temu and Shein's Dominance Is Over [Roundup]

"Agencies are dying."

Is GEO replacing SEO?

🌁#87: Why DeepResearch Should Be Your New Hire