Using AI in the real world remains challenging in many ways. Organizations are struggling to attract and retain talent, build and deploy AI models, define and apply responsible AI practices, and understand and prepare for regulatory framework compliance.
At the same time, the DeepMinds, Googles and Metas of the world are pushing ahead with their AI research. Their talent pool, experience and processes around operationalizing AI research rapidly and at scale puts them on a different level from the rest of the world, creating a de facto AI divide.
These are 4 AI research trends that the tech giants are leading on, but everyone else will be talking about and using in the near future.
One of the key talking points regarding the way forward in AI is whether scaling up can lead to substantially different qualities in models. Recent work by a group of researchers from Google Research, Stanford University, UNC Chapel Hill and DeepMind says it can.
Their research discusses what they refer to as emergent abilities of large language models (LLMs). An ability is considered to be emergent if it is not present in smaller models but is present in larger models. The thesis is that existence of such emergence implies that additional scaling could further expand the range of capabilities of language models.
The work evaluates emergent abilities in Google’s LaMDA and PaLM, OpenAI’s GPT-3 and DeepMind’s Gopher and Chinchilla. In terms of the “large” in LLMs, it is noted that today’s language models have been scaled primarily along three factors: amount of computation (in FLOPs), number of model parameters, and training dataset size.
Even though the research focuses on compute, some caveats apply. Thus, it may be wise to view emergence as a function of many correlated variables, the researchers note.
In order to evaluate the emergent abilities of LLMs, the researchers leveraged the prompting paradigm, in which a pretrained language model is given a task prompt (e.g., a natural language instruction) and completes the response without any further training or gradient updates to its parameters.
LLMs were evaluated using standard benchmarks for both simple, so-called few-shot prompted tasks, and for augmented prompting strategies. Few-shot prompted tasks include things such as addition and subtraction, and language understanding in domains including math, history, law and more. Augmented prompting includes tasks such as multistep reasoning and instruction following.
The researchers found that a range of abilities have only been observed when evaluated on a sufficiently large language model. Their emergence cannot be predicted by simply extrapolating performance on smaller-scale models. The overall implication is that further scaling will likely endow even larger language models with new emergent abilities. There are many tasks in benchmarks for which even the largest LaMDA and GPT-3 models do not achieve above-random performance.
As to why these emergent abilities are manifested, some possible explanations offered are that tasks involving a certain number of steps may also require a model having an equal depth, and that it is reasonable to assume that more parameters and more training enable better memorization that could be helpful for tasks requiring world knowledge.
As the science of training LLMs progresses, the researchers note, certain abilities may be unlocked for smaller models with new architectures, higher-quality data or improved training procedures. That means that both the abilities examined in this research, as well as others, may eventually be available to users of other AI models, too.
Another emergent ability getting attention in recently published work by researchers from the Google Research Brain Team is performing complex reasoning.
The hypothesis is simple: What if, instead of being terse while prompting LLMs, users showed the model a few examples of a multistep reasoning process similar to what a human would use?
A chain of thought is a series of intermediate natural language reasoning steps that lead to the final output, inspired by how humans use a deliberate thinking process to perform complicated tasks.
This work is motivated by two key ideas: First, generating intermediate results significantly improves accuracy for tasks involving multiple computational steps. Second, LLMs can be “prompted” with a few examples demonstrating a task in order to “learn” to perform it. The researchers note that chain-of-thought prompting has several attractive properties as an approach for facilitating reasoning in LLMs.
First, allowing models to decompose multistep problems into intermediate steps means that additional computation can be allocated to problems that require more reasoning steps. Second, this process contributes to explainability. Third, it can (in principle) be applied to any task humans can solve via language. And fourth, it can be elicited in sufficiently large off-the-shelf language models relatively simply.
The research evaluates Google’s LaMDA and PaLM, and OpenAI’s GPT-3. These LLMs are evaluated on the basis of their ability to solve tasks included in math word, commonsense reasoning and symbolic reasoning benchmarks.
To get a sense of how the researchers approached prompting LLMs for the tasks at hand, consider the following problem statement: “Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?”
The “standard” approach to few-shot prompted learning would be to provide the LLM with the answer directly, i.e., “The answer is 11.” Chain-of-thought prompting translates to expanding the answer as follows: “Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11. The answer is 11.”
It turns out that the more complex the task of interest is (in the sense of requiring a multistep reasoning approach), the bigger the boost from the chain-of-thought prompting. Also, it looks like the bigger the model, the bigger the gain. The method also proved to always outperform standard prompting in the face of different annotators, different prompt styles, etc.
This seems to indicate that the chain-of-thought approach may also be useful to custom-train LLMs for other tasks they were not explicitly designed to perform. That could be very useful for downstream applications leveraging LLMs.
Meta AI chief scientist Yann LeCun is one of the three people (alongside Google’s Geoffrey Hinton and MILA’s Yoshua Bengio) who received the Turing Award for their pioneering work in deep learning. He is aware of both progress and controversy around AI, and has been documenting his thoughts on an agenda to move the domain forward.
LeCun believes that reaching “Human Level AI” may be a useful goal, and that the research community is making some progress towards this. He also believes that scaling up helps, although it’s not sufficient because we are still missing some fundamental concepts.
For example, we still don’t have a learning paradigm that allows machines to learn how the world works like human and many nonhuman babies do, LeCun notes. He also cites several other necessary concepts: to predict how one can influence the world through taking actions, as well as learn hierarchical representations that allow long-term predictions, while dealing with the fact that the world is not completely predictable. They also need to be able to predict the effects of sequences of actions so as to be able to reason and plan, and decompose a complex task into subtasks.
Although LeCun feels that he has identified a number of obstacles to clear, he also notes that we don’t know how. Therefore, the solution is not just around the corner. Recently, LeCun shared his vision in a position paper titled “A Path Towards Autonomous Machine Intelligence.”
Besides scaling, LeCun shares his takes on topics such as reinforcement learning (“reward is not enough”) and reasoning and planning (“it comes down to inference, explicit mechanisms for symbol manipulation are probably unnecessary”).
LeCun also presents a conceptual architecture, with components for functions such as perception, short-term memory and a world model that roughly correspond to the prevalent model of the human brain. Meanwhile, Gadi Singer, VP and director of emergent AI at Intel Labs, believes that the last decade has been phenomenal for AI, mostly because of deep learning, but there’s a next wave emerging. Singer thinks this is going to come about through a combination of components: neural networks, symbolic representation and symbolic reasoning, and deep knowledge, in an architecture he calls Thrill-K.
In addition, Frank van Harmelen is the principal investigator of the Hybrid Intelligence Centre, a $22.7 million, (€20 million), 10-year collaboration between researchers at six Dutch universities doing research into AI that collaborates with people instead of replacing them. He thinks the combination of machine learning with symbolic AI in the form of very large knowledge graphs can give us a way forward, and has published work on “Modular design patterns for hybrid learning and reasoning systems.”
All that sounds visionary, but what about the impact on sustainability? As researchers from Google and UC Berkeley note, machine learning workloads have rapidly grown in importance, but also raised concerns about their carbon footprint.
In a recently published work, Google researchers share best practices they claim can reduce machine learning training energy by up to 100x and CO2 emissions up to 1000x:
Datacenter providers should publish the PUE, %CFE, and CO2e/MWh per location so that customers who care can understand and reduce their energy consumption and carbon footprint.
ML practitioners should train using the most effective processors in the greenest data center that they have access to, which today is often in the cloud.
ML researchers should continue to develop more efficient ML models, such as by leveraging sparsity or by integrating retrieval into a smaller model.
They should also publish their energy consumption and carbon footprint, both in order to foster competition on more than just model quality, and to ensure accurate accounting of their work, which is difficult to do accurately post hoc.
By following these best practices, the research purports that the overall machine learning energy use (across research, development and production) held steady at <15% of Google’s total energy use for the past three years – even though overall energy use at Google grows annually with greater usage.
If the whole machine learning field were to adopt best practices, total carbon emissions from training would reduce, the researchers claim. However, they also note that the combined emissions of training and serving models need to be minimized.
Overall, this research tends to be on the optimistic side, despite the fact that it acknowledges important issues not addressed at this point. Either way, making an effort and raising awareness are both welcome, and could trickle down to more organizations.
(Copyright: VentureBeat https://venturebeat.com/2022/06/28/4-ai-research-trends-everyone-is-or-will-be-talking-about/)