| | Good morning. I sat down recently with Waseem Alshikh, the co-founder and CTO of enterprise AI firm Writer. Writer recently made waves with the release of a powerful new generative AI model that’s significantly smaller and cheaper than those coming out of the major labs, while still maintaining competitive performance. | Alshikh said that the company is “building something different.” | — Ian Krietzberg, Editor-in-Chief, The Deep View | | Are your competitors eating your lunch? | | Stop losing market share to your competitors. Klue gives you the insights to prioritize competitive threats and win those winnable deals. See why thousands of GTM leaders and compete pros have used Klue to stay ahead of their competition. | Cover 10x more competitors with automated intel Boost your win rate with more effective battle cards Measure the ROI impact of your competitive program
| See it in action today |
| |
| Writer’s new model | Last month, enterprise AI company Writer launched a new model — Palmyra X 004 — that is competitive — and in some cases, more powerful — than popular state-of-the-art models from leading labs. | And the company — spending just $700,000 to train Palmyra, according to CNBC — did it at a mere fraction of the cost major labs incur on training. In 2024, for instance, OpenAI spent roughly $3 billion on compute for training, according to The Information. This does not count the cost of data acquisition. | In an additional — and significant — split with the major developer approach, Palmyra was built using only synthetic — or AI-generated — data. This move comes amid warnings both that model developers will soon exhaust online sources of publicly available training data, and that filling those gaps with AI-generated synthetic data could cause model collapse over time. | | Source: Writer |
| But Waseem Alshikh, Writer’s co-founder and CTO, told me that Writer didn’t use synthetic data quite in the way it sounds. | Defining synthetic data as “real-world life data,” Alshikh said that the “data (that) exists in the world today is fine, but this data is dirty.” He added that the quantity of data is less important to model performance than the quality, or cleanliness, of the data being used. To build Palmyra, Writer downloaded real training data, classified it and cleaned it. Then, the team tasked a smaller, seven billion-parameter model to process that training data, outputting it in a clearer, more straightforward manner.
| “We trained the model on synthetic data that (is) basically rephrasing real-world data,” Alshikh said. The result of this is that, even though the model is smaller than other major models, its accuracy is at least at par with them; according to Alshikh, this is due to the data processing approach Writer undertook. | Here’s how Palmyra stacks up to the competition: According to Writer, the model — benchmarked on Berkely’s Tool Calling Leaderboard — is outperforming model providers including OpenAI, Anthropic, Google and Meta by a “significant margin.” | It is also ranked within the top seven models on Stanford’s HELM leaderboard. | | Source: Writer |
| Writer added that the model far outperforms the state-of-the-art when it comes to tool-calling, achieving an accuracy rate of 78.76, according to Berkeley’s Function Calling Leaderboard. The second-best model on this leaderboard — OpenAI’s GPT-4 turbo — has an accuracy rate of 59.49. | At Writer, the small-but-mighty approach that was employed to build Palmyra is one that the company is earnestly exploring in other avenues. | Alshikh explained that generative AI models essentially break down into three main components: knowledge, skills and behavior. Knowledge, he said, is “important, but not that important.” Behavior — usually the result of post-training and fine-tuning efforts, is likewise important, but not too important. | Skills, he said, are “the most important part.” | Skills, according to Alshikh, refer to a model’s capabilities to self-organize, self-assemble, classify and reason. (There remains no evidence that LLMs are, or will become, capable of reasoning the way humans do; reasoning here instead refers to benchmarks that aim to quantify LLM capabilities). | He said that to achieve a more skilled model, you need a deeper model — one with more layers — whereas a wider model — one with more parameters — will get you more knowledge. The ultimate utilization of AI, according to Alshikh, will be the eventual combination of a small (and deep) model with a large (but narrow) one. | But first, some definitions: | The technology underpinning generative AI today is artificial neural networks, or ANNs. ANNs consist of layers of interconnected, artificial neurons that input, process and output data. In between the input and output layers are a series of “hidden” layers — these perform the bulk of the necessary computations. Parameters refer to a model’s internal configuration variables that are learned from data. Parameters are what enable models to make predictions. When we refer to the size of a model, we are often referring to the parameters — GPT-4, for example, is rumored to have 1.8 trillion parameters (where Palmyra has 150 billion).
| According to Alshikh, the deepest model we know of today is GPT-4, which is rumored to have roughly 240 layers. This latest version of Palmyra has around 300 layers. Alshikh said that Writer plans to make its next generation of models “smaller parameters, around one to four billion, but we’d like to actually target to get around 1,000 layers.” | But it’s not an easy approach to take. | Alshikh said that today, people aren’t taking this approach because traditional transformers tend to struggle with anything above 216 layers. Writer “adjusted the transformer design,” figuring “out a way to make it deeper” and still work. | “We think this is what the future is going to be,” Alshikh told me, saying that developers at that point will need a lesser quantity of data, and a great quality of it instead. | “We’re building something different,” he added. “We’re building wider models — like financial, medical — but we’re going to put on top of them what we call the reasoning models, and those reasoning models ought to be smaller and more efficient.” | This new approach comes in the midst of a mounting realization across the industry that the current method — of simply scaling deep learning models — has reached a point of diminishing returns. | The company’s vision is to develop something it refers to as “enterprise AGI,” standing for artificial general intelligence. Since the term is vague — and the technology it refers to is hypothetical — every lab has its own definition of AGI. | At Writer, the idea behind enterprise AGI involves a system where the output is work completed, rather than text or image files, which then have to be manipulated by a person. Alshikh said he expects to have that — which seems like an advancement of current approaches toward agentic AI — cracked by 2026. | To that end, Writer last week unveiled something it has been working on for six months: self-evolving LLMs. The company claims that these systems — which it is not yet releasing for public use due to ethical concerns and considerations — are capable of auto-updating their parameters with new information, in real-time. | This, if it works at scale, would allow developers to only have to train a model once, significantly reducing the cost of operation necessitated by re-training, re-design and re-deployment. Alshikh said that this approach addresses the issue that the industry is coming to terms with, that scale is not, in fact, all that is required to build more powerful AI systems. | But even in its current form, AI has massive implications for job loss; the kind of enterprise AGI system Alshikh described would seem to have even more significant implications on this front. But he said the jobs most at risk are those in management. | “With lower tasks, you’re not saving anything at scale,” he said. But team leaders and strategy decision-makers — who also happen to be the highest-paid members of a given staff — are the positions where AI replacement makes sense, according to Alshikh. | He thinks his own job will go away (or at least, change dramatically). | “This is where we start seeing the direction pushing,” he said. “We follow the money, right, as a capitalist. What are we going to save? I think the public market would love to start saving the $20 million bonus per quarter per CEO.” | Though smaller than its Big Tech (or Big Tech-funded) competitors, Writer recently closed a $200 million Series C funding round, at a $1.9 billion valuation. Writer said that its customers — which include Uber and Salesforce — experience a 9x return on their investment in AI on average. |
| |
|
|