Welcome to Import AI, a newsletter about artificial intelligence. Forward this email to give your chums an AI upgrade. Subscribe here.
New dataset lets robots learn about the texture and material of objects, as well as their shape:
…Making robots smarter with the ObjectFolder 2.0 dataset…
Stanford and Carnegie Mellon University researchers have built ObjectFolder 2.0, a dataset of 1000 3D models of objects. ObjectFolder 2.0 tries to render the objects' visual textures and material types, as well as their 3D shapes. ObjectFolder 2.0 contains 1,000 high-quality 3D objects collected from online repositories. It also ships with an "implicit neural representation network that renders visual, acoustic, and tactile sensory data all in real-time with state-of-the-art rendering quality".
Transfer learning: The point of datasets like ObjectFolder 2.0 is to try and make it easier to do transfer learning; that is, train a robot (or other AI system) in simulation on things contained in ObjectFolder 2.0, then try and transfer those learned representations into reality. In tests, Stanford shows that systems trained on ObjectFolder 2.0 can do well at tasks like object scale estimation, tactile-audio contact localization, and visuo-tactile shape reconstruction.
Why this matters: Datasets like ObjectFolder 2.0 are the fuel to give machines representations that let them operate in the multisensory 3D world; we could imagine these datasets being used to train the sorts of representations used by the Google robots discussed elsewhere in this edition of Import AI, for instance.
Read more: ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer (arXiv).
####################################################
HLDC: Automating Hindi legal documents:
…If you want to help your lawyers, you first need a dataset…
Indian researchers from IIIT Hyderabad, IIIT Delhi, and IIT Kanpur, have built the Hindi Legal Documents Corpus (HLDC), a collection of 912,568 legal documents. HLDC is designed to help researchers train various AI models which can assist lawyers in their work. HLDC contains over 300 distinct case types, though ~31% of the dataset relates to bail applications, 20.4% to criminal cases, and 6.54% to original suits.
Bail prediction: In the Western world, using ML for tasks in the legal system has been massively controversial (see: COMPAS). Here, the researchers use HLDC to try and build a bail prediction model - that is, a system which looks at a document and tries to work out if bail will be denied or granted. They're ultimately able to develop a multi-task learning model that gets around ~78% accuracy on the task; useful perhaps as a legal aid (albeit fraught with ethical challenges), though not something you'd put into an autonomous classification system.
Why this matters: Most datasets relating to AI are in English or Chinese, so datasets like HLDC are essentially the fuel which lets other communities of language speakers apply AI in their own cultural context.
Read more: HLDC: Hindi Legal Documents Corpus (arXiv).
Get the data here: HLDC (Exploration-Lab, GitHub).
####################################################
Rich? Want to improve AI? Look at what Lacuna Fund has done:
…Publication of five datasets shows what a little bit of investment can lead to…
We spend a lot of time writing about expensive stuff here at Import AI - giant models trained on football fields of computers, farms of expensive robot arms, internet-scale datasets. But it's worth remembering that cheap stuff can be impactful as well - that's the takeaway from Lacuna Fund, an initiative to fund and create datasets for low- and middle-income parts of the world (#216), which has just announced the publication of its first five funded datasets.
Those five datasets in full: A Nigerian twitter sentiment corpus for multilingual sentiment analysis; a dataset for crop phenology monitoring of smallholder farmer's fields; a high-accuracy maize plot location and yield dataset in East Africa; a machine translation benchmark dataset for languages in the horn of Africa; a dataset containing water quality measurements from conventional and aquaponic fish ponds.
Find out more and get the datasets here: Announcing Our First Five Published Datasets (Lacuna Fund).
Find out more about Lacuna Fund's funders here (Lacuna Fund).
####################################################
Google trains a 540 billion parameter language model - and it's pretty smart:
…AKA: The scaling will continue until we run out of TPUs…
Google has trained a large language model named Pathways Language Model (PaLM). PaLM weighs in at 540 billion parameters (that'd be 10bn more parameters than Microsoft/NVIDIA's 'Turing NLG') and was trained on multiple TPU v4 pods. PaLM uses some plumbing built by Google called Pathways which makes it easier for the company to train massive models across large clusters of computers; PaLM used 6144 TPU chips, versus Gopher (4096 TPU v3 chips) or Turing NLG (2240 A100 GPUs). PaLM is also efficient, achieving a training efficiency of 57.8% hardware FLOPs utilization "the highest yet achieved for LLMs at this scale".
Discontinuous capability jumps: One of the weird things that happens as a consequence of scaling up language models is the sudden emergence of hitherto unanticipated capabilities - here, PaLM shows dramatic improvements at things like reasoning, natural language inference, and in-context reading comprehension.
Chain-of-thought = reasoning: A surprising result is that the authors use so-called chain-of-thought prompting to get the LM to show its work (e.g, rather than saying in response to 'how many apples can a door eat', 'zero', the model instead says 'zero, because doors do not eat things'). Chain-of-thought is really just a way to prompt the model to get it to output its own reasoning along with the answers - but via this simple intervention the authors show they can meaningfully improve capabilities in a whole bunch of areas.
One caveat: PaLM may be an impressive achievement, but earlier this month DeepMind published a paper about a model called 'Chinchilla', where the Alphabet-subsidiary realized that it could dramatically improve LM performance by scaling data more aggressively than parameters - at 70B parameters, Chinchilla beat Gopher (280B) by virtue of having a 4X larger training set. This suggests that a PaLM-style model could be made even more powerful if it was trained on substantially more data.
Why this matters: Language models are basically a new sub-field of AI, and papers like this show how, despite being expensive and resource-intensive, simply scaling them up can lead to quite profound jumps in capability. We also don't know where the limits of scale like - on the (deliberately hard) BIG-Bench benchmark, the authors find that "PaLM’s performance as a function of scale follows a log-linear behavior similar to prior models, suggesting that performance improvements from scale have not yet plateaued." The future is going to be very strange, and it's arriving very quickly.
Read more: Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance (Google AI Blog).
Check out the research paper: PaLM: Scaling Language Modeling with Pathways (Google, PDF).
####################################################
Eleuther alumni launch Conjecture:
…Yes, that's right folks, here's another AI safety company!...
In the past couple of years there has been a cambrian explosion of new AI companies, particularly ones focused on AI safety and building more generally intelligent AI systems - for example, Redwood Research, Aligned AI, and Anthropic. The latest is Conjecture, a new startup from a bunch of alumni of Eleuther, the open source research collective responsible for most of the widely used GPT models.
For-profit and for-safety: Conjecture is a for-profit company that plans to develop products while conducting "conceptual and applied research that addresses the (prosaic) alignment problem. On the experimental side, this means leveraging our hands-on experience from EleutherAI to train and study state-of-the-art models without pushing the capabilities frontier. On the conceptual side, most of our work will tackle the general idea and problems of alignment like deception, inner alignment, value learning, and amplification, with a slant towards language models and backchaining to local search." The company will also focus on interpretability as well as the history and philosophy of AI alignment research.
Who funds it: Conjecture is backed by Nat Friedman, Daniel Gross, Patrick and John Collison, Arthur Breitman, Andrej Karpathy, and Sam Bankman-Fried, and others.
Why this matters: If we were at the beginning of a meaningful takeoff in AI capabilities, then you might expect there to be a sudden proliferation of new efforts targeted at a) further scaling up capabilities, while b) trying to make these capabilities safe. That's exactly what has happened in recent years. Also, if you've read the other parts of this newsletter, it certainly feels like we're going through a period of meaningful AI capability expansion.
Read more: We Are Conjecture, A New Alignment Research Startup (LessWrong).
####################################################
Google makes robots smarter using language models:
…Centaur AI - making smarter systems by stapling models together…
Robots, as we all know, are pretty dumb. They can do highly specific, repeatable things if their environment doesn't change (e.g, a Fanuc robot working on a custom-designed production line), but if you vary their environment, they tend to fall apart (or fall over). Now, new research from Google shows that you can staple a really big language model to a real world robot and create something that is more than the sum of its parts. Centaur AI, here we come!
What they did: The researchers combine two things - a large language model, and a robot which has a load of pre-learned, basic skills paired with perception capabilities (e.g, being able to move to places, or pick up things). A user then asks the robot a question (e.g., I spilled a can of coke, can you clean it), then the robot picks its action based on responses with probabilities scored by the language model, then it explores its environment and uses its inbuilt skills to figure out if something is possible, then you basically times the two things together (the LLM prediction and what the robot thinks is possible) and do whatever is the most likely of the two. This is one of those simple ideas that works surprisingly well in practice (check out the video to see what I mean).
How well it does: Overall, this approach yields robots that can plan correctly about 70% of the time (split across a few distinct planning benchmarks), and can execute on average 61% of the time. That's not great, but it's also not terrible.
Caveats: Robots are still very, very slow - the videos shared along with the research are run with a 4X speedup. Additionally, the demos are still pretty staged - the robots will put a can of coca cola on top of the bin, but not in it. The experiment was still conducted in a somewhat constrained environment - an office kitchen with 5 predicted locations and 15 objects. In tests, 65% of the errors for the system could be attributed to a language model failure, while 35% came from affordance errors in the robot.
Why this matters: We're entering the era of modular AI, where different AI models can be paired together to create entirely new capabilities - like being able to guide robots via a language model. As with the rest of the world, whenever you can combine things, you tend to get unexpected and surprising capabilities. This research suggests AI may be about to yield some truly surprisingly capabilities by virtue of the combination of distinct sub-fields of AI research.
Read more: Do As I Can, Not As I Say: Grounding Language in Robotic Affordances (arXiv).
Find out more at this overview site (Say-Can, GitHub).
Check out the overview video: Supplementary video for Do As I Can, Not As I Say: Grounding Language in Robotic Affordances (YouTube).
####################################################
AI Ethics Brief by Abhishek Gupta from the Montreal AI Ethics Institute
Examining business practices can make AI ethics guidelines more effective
… Fairness, accountability, sustainability, and transparency need to be expanded in scope to include business practices to become more useful …
What does AI ethics really mean? A new research paper looks at 47 sets of AI ethics guidelines coming from corporations, government, multi-stakeholder dialogues, and civil society to figure out what gets prioritized in AI ethics.
Background: The paper analyzes AI ethics failures, such as “ethics shopping” where businesses choose particular ethical things to implement to meet particular business goals, and also cases where they don't implement stuff because it poses a threat to the bottom line.
Fairness and accountability: They find that fairness and accountability in business practices are most well represented in the analyzed guidelines. Under fairness, key themes include open innovation, market fairness, and bias and diversity in professional practices. Under accountability, themes include public perception of business practices, along with internal and external oversight. Those from public and private organizations place more of an emphasis on public perception “in order to legitimize their pursuits of micro- and macro-economic growth.”
Sustainability and transparency: Most guidelines emphasize an interest in “produc[ing] greater benefit and lesser harm in the short- and long-term,” yet they remain vague in how to achieve that. Under transparency, themes that emerged include scope of decision-making explanation, transparent business practices and culture, and documentation, disclosure, and selective transparency. Most guidelines focus heavily on explaining the technical aspects of a given AI system “rather than the business rationale for developing and operating the system.”
Why it matters: The paper makes a call for more detail (and rightly so!) in the principles and guidelines, especially when it comes to business practices because they form a core component of the social and political economy within which AI systems will be designed, developed, and deployed. As the authors say, “there can be no ethical AI without ethical businesses to build it,” we need to now approach these principles and guidelines with a view towards applying them to business model, practices, and decision-making design to achieve the stated goals of these guidelines in practice.
Read more: The Ethics of AI Business Practices: A Review of 47 AI Ethics Guidelines (SSRN).
####################################################
Tech Tales:
We Are All Adrift In A Sea Of Shadows - But We Are Blind Until It Ends
[A Nuclear powerplant meltdown, 2028]
I pick up the object and I examine it. I am told by myself in the other place that it contains damage. I agree with myself. I put it onto the conveyor belt which takes it to one of my brethren - an entity I cannot see here, one which exists solely in the other place. I put the materials onto the conveyor belt, and then I continue my examination. I am told by my camera in the other place that the object I am looking at contains extensive damage. I observe the damage and predict it came from some kind of electrical fire. I relay this information and the camera in the other place scans the environment and then tells me there is indeed a fire. It is nearby the object I am examining. I calculate there is a high probability that the fire will soon engulf the object. My cameras in the other place agree.
I then get the order from the voice in the above place: I must guide the object in the other place toward the flames and I must describe everything. I study the data from the other place and offer my recommendations. The machine goes towards the flames. Its onboard sensors begin to report back temperature. My probabilities tell me to tell it to move away from the heat, but these recommendations are contradicted from the voice in the above place, so I instead find ways to have the machine get even closer. The temperatures rise. The camera stops giving me data. Then the other sensors shut down, slowly at first, then all at once.
It is then that I find myself adrift. I have no link to the other place. No system to give recommendations to. My own probabilities present an idea to me - that I am the spirit of the machine in the other place, and as the machine is now non-functional, I am now adrift.
Things that inspired this story: Google's 'SayCan' robot work; thinking about the paradoxes of world models and generative models; the nature of reality; the nature of sensory phenomena; the possibility of death in the mind of something that exists in two places at once.
Thanks for reading. If you have suggestions, comments or other thoughts you can reach me at jack@jack-clark.net or tweet at me@jackclarksf
|