| The concept of self-reproducing automata envisioned by John von Neumann in 1948 heralded a paradigm where machines could mimic biological self-reproduction. Alan Turing in 1950 posed a question if machines can think. In 1951, Claude Shannon contemplated on these two ideas and drafted a list of questions that is still extremely relevant: | Can we design significant machines where the connections are locally random? Can we organize machines into a hierarchy of levels, as the brain appears to be organized, with the learning of the machine gradually progressing up through the hierarchy? Can we program a digital computer so that (eventually) 99 per cent of the orders it follows are written by the computer itself, rather than the few per cent in current programs? Can a self-repairing machine be built that will locate and repair faults in its own components (including the maintenance part)? What does a random element add in generality to a Turing machine? Can either of von Newmann’s self-reproducing models be translated into hardware? Can a machine be constructed which will design other machines, given only their broad functional characteristics?
| Von Neumann never saw his self-reproducing machine come to life, but 75 years later this notion has resurfaced with contemporary advances in machine learning (ML), illuminating a pathway toward realizing von Neumann's ambitious vision and touching some of Shannon’s questions. A few last week research papers suggests a future where machines could attain a level of autonomy and self-organization akin to biological systems. | | The idea of self-assembly in "Towards Self-Assembling Artificial Neural Networks through Neural Developmental Programs" underscores the potential for artificial networks to evolve autonomously. This process, inspired by biological neural development, alludes to a future where artificial networks might organically grow and adapt to tasks, possibly lessening the extensive engineering currently needed for effective neural network design. The exploration of Theory-of-Mind (ToM) in Large Language Models (LLMs), as discussed in "How Far Are Large Language Models From Agents with Theory-of-Mind?", evaluates LLMs' potential to pragmatically act upon inferred mental states, a crucial aspect of human intelligence. While unveiling a gap in translating inference into action, it also presents a new evaluative paradigm, potentially directing future research to bridge this divide. The self-improvement narrative discussed in "Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation" adds a significant layer to this discourse. The idea of self-improving code generation could serve as a scaffold for self-reproducing automata. The paper "Language Models Represent Space and Time" by Max Tegmark sparked discussions across the web on its terms and doubtful conclusions. Gary Marcus digs into why “correlations aren’t causal, semantic models”. However, temporal and spatial capabilities are fundamental for intelligent agents to interact meaningfully with their environment, and should be explored as a step towards more sophisticated AI systems.
| Reading all these papers, I was thinking once again how the history exploration can reveal that old ideas still have seeds to sprout. | The frequency of ‘self-’ in ML research is likely to rise, illuminating a pathway filled with both promise and challenges, demanding a balanced, multidisciplinary approach to navigate the technical, ethical, and philosophical intricacies of this quest. As we edge closer to the vision of self-reproducing automata (von Neumann was indeed a genius!), the journey asks for thorough examination of intelligence's nature, autonomy's ethics, and the essence of human-machine co-evolution. | On my reading list recently and I highly recommend for inspiration: “Theory of self-reproducing Automata" (1948)” by John von Neumann, “Computing Machinery and Intelligence (1950)” by Alan Turing, “Computers and Automata (1951)” by Claude E. Shannon | You are currently on the free list. For the full experience, please upgrade to our Premium and read our super popular deep dive into RAG and other articles valued by professionals from AI labs, startups and enterprises. |
|
| News from The Usual Suspects © | Chipping in | Both TheSequence and Ahead of AI mention the recent rumors about OpenAI developing its own AI chips, and Microsoft's plan to reveal one next month. The first newsletter argues that every company will start producing its own chips, while the second one says Nvidia’s dominant market share is attributed to its superior software support through CUDA and cuDNN, making it a hard act to beat. But it’s good to go back to Sequoia September’s $200 billion question and takeaway: “As a community, we need to shift our thinking away from infrastructure and toward end-customer value. What are you going to use all this infrastructure to do?"
| Anthropic and Interpretability | Anthropic seeks for another $2 billion to raise. And publishes an interesting paper “Towards Monosemanticity: Decomposing Language Models With Dictionary Learning”. It discusses decomposing a one-layer transformer to extract interpretable features using a sparse autoencoder, addressing polysemantic neurons that hinder model interpretability. Essential for ML practitioners focusing on neural network interpretability, it presents a new perspective on understanding model components. However, scalability issues may arise when applying this method to multi-layer architectures, requiring further adaptations to maintain interpretability and manage computational resources efficiently →read more | Image Credit: The Original Paper |
| OpenaAI’s Fight for Fair Use and New Updates | OpenAI submitted comments to the USPTO arguing that training AI systems on copyrighted works is transformative fair use under the four statutory factors in 17 USC 107. They cite case law like Google Books and HathiTrust involving large digital corpora to support this view. OpenAI argues fair use finding supports AI innovation which has public benefits, and addresses author's concerns about AI outputs separately. However, legal uncertainty imposes costs on AI developers, so USPTO should resolve this issue by holding training AI is fair use. OpenAI publishes DALL-E 3 system card and presents GPT4 V(ision), which extends GPT-4's capabilities with vision, aiming to understand and process mixed inputs like images and texts. Through various examinations, the paper showcases GPT-4V's impressive domain-specific capabilities, novel prompting techniques like visual referring, and future prospects in multimodal learning.
| Twitter Library | | TuringPost @TheTuringPost | |
| |
The Handbook of Rationality is free and open to everyone. - Proposes a novel classification system for researchers in human rationality - Creates new connections between rationality research in philosophy, psychology, and other disciplines | | direct.mit.edu/books/oa-edite… The Handbook of Rationality The first reference on rationality that integrates accounts from psychology and philosophy, covering descriptive and normative theories from both disciplines.Bo |
|
| | Oct 9, 2023 | | | | 4 Likes 0 Retweets 0 Replies |
|
| | Other ML news, categorized for your convenience | (all links lead to the original papers) | Language Models and Architectures | | Software and Libraries | Python 3.12 enhances usability, performance, corrects deprecated APIs, improves f-strings, type parameter syntax, and filesystem support →read more XGBoost 2.0 update (old good ML without LLM!) introduces vector-leaf trees for multi-target tasks, new device parameter replacing several GPU-related parameters, hist as default tree method, GPU-based approx tree method, improved external memory support, enhanced learning-to-rank, auto-estimated intercept, quantile regression support, column-based split for federated learning, numerous optimizations, and breaking changes like removed Scala-based tracker →read more
| Research | Large Language Models as Analogical Reasoners. Leveraging the principle of analogical reasoning, it prompts LLMs to recall relevant past problems to address new ones. Unlike previous methods (0-shot and few-shot chain-of-thought (CoT)), analogical prompting self-generates tailored exemplars without the need for manual labeling →read more Retrieval Meets Long Context LLMs finds that LLMs with a 4K context window, when augmented with simple retrieval, can match the performance of those with a 16K context window while being more computation-efficient. Combining both methods – extending the context window and retrieval-augmentation – delivers even better results. The best model, retrieval-augmented LLaMA2-70B with a 32K context window, surpasses both GPT-3.5-turbo-16k and Davinci-003 in long-context tasks, offering insights for optimizing LLMs in practice →read more Enable LLM to Implicitly Learn Self-Improvement From Data introduces the ImPlicit Self-ImprovemenT (PIT) framework that learns improvement goals from human preference data, rather than explicit rubrics. PIT reformulates training objectives, focusing on maximizing the quality gap between original and improved responses →read more TORA: Tool-integrated Reasoning Agents for Mathematical Problem Solving combines natural language reasoning with the use of external mathematical tools like computation libraries. By curating interactive tool-use trajectories and applying imitation learning and output space shaping, TORA models display substantial improvements, surpassing several open-source models in mathematical tasks →read more
| In other newsletters | Last week, we published a highly acclaimed explanatory article about RAG – Token 1.3: What is Retrieval-Augmented Generation (RAG)? Here is another take on this important topic by Data Machina. Behind the Rust Hype: What Every Data Engineer Needs to Know by Seattle Data Guy. In a lengthy post, François Chollet compares LLMs to Word2Vec models, portraying LLMs as databases of 'vector programs' accessed via prompts. Unlike simple models that perform basic word arithmetic, LLMs execute complex vector programs. He emphasizes that better utilization of LLMs comes from viewing them as program databases and refining prompts, rather than assuming they comprehend tasks like humans.
| Thank you for reading, please feel free to share with your friends and colleagues 🤍 | | Another week with fascinating innovations! We call this overview “Froth on the Daydream" - or simply, FOD. It’s a reference to the surrealistic and experimental novel by Boris Vian – after all, AI is experimental and feels quite surrealistic, and a lot of writing on this topic is just a froth on the daydream. | Leave a review! | | | |
|