If you think artificial intelligence (AI) is moving at a breakneck speed and it’s almost impossible to keep up, you’re not alone. Even if being on top of all things AI is part of your job, it’s getting increasingly hard to do that. Nathan Benaich and Ian Hogarth know this all too well, yet somehow they manage.
Benaich and Hogarth have solid backgrounds in AI as well as tons of experience and involvement in research, community- and market-driven initiatives. AI is both their job and their passion and being on top of all things AI comes with the territory.
Benaich is the general partner of Air Street Capital, a venture capital firm investing in Al-first technology and life science companies. Hogarth is a cofounder at Plural, an investment platform for experienced founders to help the most ambitious European startups.
Since 2018, Benaich and Hogarth have been publishing their yearly State of AI report, aiming to summarize and share their knowledge with the world. This ever-growing and evolving work covers all the latest and greatest across industry, research and politics. Over time, new sections have been added, with this year featuring AI safety for the first time.
Traditionally, Benaich and Hogarth have also been venturing on predictions, with remarkable success. Equally traditionally, we have been connecting with them to discuss their findings every year upon release of the report. This year was no exception, so buckle up and let the ride begin.
AI research is moving so fast, it seems like almost every week there are new breakthroughs, with commercial applications quickly following suit. Case in point: AI coding assistants have been deployed, with early signs of developer productivity gains and satisfaction.
OpenAI’s Codex, which drives GitHub Copilot, has impressed the computer science community with its ability to complete code on multiple lines or directly from natural language instructions. This success spurred more research in this space, including from Salesforce, Google and DeepMind.
Codex quickly evolved from research (July 2021) to open commercialization (June 2022) with (Microsoft’s) GitHub Copilot now publicly available for $10/month or $100/year. Amazon followed suit by announcing CodeWhisperer in preview in June 2022.
Google revealed that it was using an internal machine learning (ML)-powered code completion tool, which Benaich and Hogarth note in the State of AI report could soon lead toward a browser-based AI-powered IDE (integrated development environment). Meanwhile, Tabnine has more than 1 million users, raised $15M and promises accurate multiline code completions.
And if you think that is old news, or not massive enough, then how about some diffusion-powered AI art? In 2021, diffusion AI models were overtaking GANs, the previously dominant AI models for image generation, on a few benchmarks. Today, diffusion AI models are used to power the likes of DALL-E 2, Imagen, Midjourney and Stable Diffusion, spreading to text-to-video, text generation, audio, molecular design and more.
This meteoric rise has given birth to both opportunities and open questions. It seems anyone can now create stunning imagery (and more) with the click of a button. Does that mean everyone is an artist now? Does that mean graphic designers will be out of work soon? And who owns the AI-generated art? These are just some of the questions that pop up, the answers to which seem to be “no,” “no,” and “we don’t know,” respectively.
Benaich pointed out the obvious: this is an evolving topic and it will take a while for people to figure it out. There will be goofs and controversies, like people winning art contests with AI-generated art, while others are forced to take down AI-generated images. Some art communities are banning AI art altogether, while some artists are not at all happy that their art is included in datasets used to train those AI models.
Benaich thinks we’ll see more formal partnerships between AI companies and generators of these models and corpus owners, especially large corpus owners. Ultimately, it’s a question about the incremental value of additional data points in this broad dataset:
“It’s not clear that an individual contributor to a broad dataset really moves the needle on model performance and to what degree can an individual really influence this debate,” said Benaich. “Or would there have to be an en masse demand to not have work be present in a training dataset to influence this question of ownership?” Long term, he said, “If these systems are trained on everyone’s data, they shouldn’t necessarily be owned by a single party.”
Hogarth, for his part, noted that the economic model around monetization and ownership is currently massively in flux, as a result of alternatives that are popping up fast. “If you were planning to have an API that monetizes a generative image model and you have that behind a paywall, and then suddenly there’s an open-source project that offers the same quality experience in a self-service, non-commercial way, you’re going to see a real tension,” he noted.
Similar questions have also been raised for the case of AI coding assistants. This points toward a so-called “distributed” modus operandi for AI. What remains beyond question, as Benaich and Hogarth’s work in the State of AI report reveals, is the dominant player in the hardware used to generate AI models for all types of applications: Nvidia.
Compute infrastructure, the substrate that’s enabling all the progress in this field, as Benaich put it, is also seeing lots of innovation. However, he went on to add, despite the fact that there has been a lot of investment and willingness in the community to try and dislodge Nvidia as the giant in this space that powers everybody, that has not really happened.
It has always been hard to put numbers on that feeling, but this is precisely what Benaich and Hogarth tried to do this year. They scanned academic and open-source AI literature for papers that mentioned the use of a specific hardware platform to train the models that they reported in the results. They enumerated those papers and the results were both expectable and impressive.
What Benaich and Hogarth’s work showed was that the chasm between the sum of papers that mentioned using any form of Nvidia hardware and papers using TPUs or other hardware created by the top five semiconductor companies is sometimes 100 times or 150 times in favor of Nvidia. This, Benaich noted, hasn’t really changed that much in the last few years.
Nvidia has had a head start compared to the competition and they certainly made the best of it. Nvidia has created a massive ecosystem and partnerships around its hardware, investing heavily in its software stack as well. Nvidia also shows a startup-like attitude despite being the incumbent, as Benaich noted. They keep improving their hardware, using techniques such as ML to design new architectures, and their latest H100 generation of GPUs is said to bring a 10 times improvement in performance over their previous A100.
On the other hand, GPUs come with certain baggage, as they were never really designed to accommodate AI workloads. It’s easier to innovate if you build something from the ground up and you don’t have backward-compatibility to worry about. That may well mean that eventually there will be a tipping point at which the challengers will start seeing substantial adoption.
Benaich was adamant: “I would like to see that happen. I think the industry would like that, too. But the data doesn’t suggest that. I think the question is always — how much better can a new design be and how much better does it have to be if its software is less well understood and the learning curve is higher? And you’re also fighting with a massive install-base that many companies already have.”
Benaich added that it’s “a really hard uphill battle” that has been fought for at least five years. “We would have thought that the chasm would be narrower than where it is if the future would actually look more distributed than pure Nvidia,” he said. “Despite there being a busting up of centralized ownership of models at the software layer, that hasn’t really happened at the hardware layer.”
This “busting up of centralized ownership” that Benaich spoke of at the software layer is another takeaway from the State of AI report. As Benaich noted, the last couple of years the central dogma in ML has been that of centralization. The hypothesis was that the entities that will profit and advance the most are those that can acquire the most resources, whether that’s money, talent, compute or data.
While that still rings true in many ways, it’s also being challenged. For example, Meta concluded that “while the centralized nature of the [AI] organization gave us leverage in some areas, it also made it a challenge to integrate as deeply as we would hope.”
What we have been seeing in the last 12 months or so, Benaich added, is that there is an emergence of what he called “distributed research collectives” such as Adept, Anthropic, Inflection, Luther and Cohere. Benaich referred to these as being “broadly defined as either not even companies, or Discord servers that emerge, or nonprofit institutions or startups that are fundamentally open source.”
Benaich and Hogarth see those as another pole to do AI research, specifically work that focuses on diffusing and distributing inventions in centralized labs to the masses incredibly quickly. The report includes various examples of open-source alternatives for models — including text-to-image, language and biology models — being released faster than anyone expected.
Benaich believes that this will become the norm: first, closed-source models will appear and then within a matter of a year we’ll start seeing the first open-source models. He thinks this grants access to a broader community of people that otherwise wouldn’t participate [in AI]:
“That is because to get jobs in some of these big tech companies, you need to have a Ph.D., you need to be extremely technically literate and check certain boxes. These open-source collectives care a lot less about that. They care more about the value of each person’s contribution and the contributions can be different,” he said. At the same time, which ones of those initiatives can be viable and how, exactly, is an open question.
There are lots of dynamics at play there. Top-tier talent from the Googles and DeepMinds of the world is breaking loose and becoming entrepreneurial. At the same time, investment in startups using AI has slowed down in 2022 compared to 2021, along with the broader market, but is still higher than 2020. Investment in the USA accounts for more than half of the worldwide venture capital and unicorns, while private valuations are on the rise.
Last but not least, the somewhat forward-looking introduction of AI safety. The report’s section on AI safety starts by quoting AI pioneers like Alan Turing and Marvin Minsky, who warned about the dangers of machine intelligence surpassing human capabilities as early as the 1950s.
AI safety is currently used as an umbrella term that captures the general goal of making powerful AI systems aligned with human preferences and values, as Hogarth noted. Some of the challenges are nearer term, such as taking a computer vision system used by law enforcement and trying to understand where it exhibits bias. Benaich and Hogarth included work on related topics in previous years.
What’s new in 2022, and what made Benaich and Hogarth dedicate an entire section to AI safety, is the other end of AI safety. This is what Hogarth referred to as AI alignment: ensuring that an extremely powerful and superintelligent AI system doesn’t ever go rogue and start treating humanity badly in aggregate. The 2022 State of AI report is very much biased toward that end of safety because, according to Hogarth, the topic is not receiving enough attention.
“We’re seeing exponential gain in capabilities, exponential use of compute, exponential data being fed into these [AI] models,” Hogarth said. “And yet we have no idea how to solve the alignment problem yet.” It’s still an unsolved technical problem where there are no clear solutions, he added: “That’s what alarms me — and I think that the thing that is probably the most alarming about all of it is that the feedback loops now are so violent. You have huge wealth creation happening in AI. So there’s more and more money flowing into making these models more powerful.”
There’s more geopolitical awareness of the significance of this and there’s competitive dynamics between countries accelerating, he went on, as well as more social prestige. “You get kudos for working at DeepMind or OpenAI,” he said, “so there’s a lot of these powerful feedback loops that are kicking in and making the systems [have] more increasing capabilities at a greater rate and we don’t have the same feedback loop starting to kick in on safety.”
It’s a forward-looking concern, but the thinking in the report seems to be “better safe than sorry.” Many AI researchers share Hogarth’s concern as well. The report quotes a recent survey of the ML community, which found that 69% believe AI safety should be prioritized more than it currently is.
A separate survey of the NLP community was also quoted, which found that a majority believe AGI (artificial general intelligence) is an important concern we are making progress toward. Over 70% believed AI will lead to social change at the level of the Industrial Revolution this century, and nearly 40% believed AI could cause a catastrophe as bad as nuclear war during that time.
The report’s key findings are that AI safety is attracting more talent and funding, but remains relatively neglected and underfunded. Some initial progress had been made toward alignment, via approaches such as learning from human feedback, red-teaming, reverse-engineering neural networks and measuring moral behavior in artificial agents.
The State of AI report also highlights Conjecture, which it notes is the first well-funded startup purely focusing on AGI alignment. Conjecture is a London based startup, led by Connor Leahy, who previously cofounded Eleuther – the organization that kicked off decentralized development of large AI models.
Conjecture operates under the assumption that AGI will be developed in the next five years, and on the current trajectory will be misaligned with human values and consequently catastrophic for our species. It has raised millions from investors, including the founders of GitHub, Stripe and FTX.
Of course, all of that only makes sense if you believe that we are moving toward AGI. Not everyone shares this belief. Different schools of thought are aptly exemplified by Meta’s Yann LeCun, and prolific AI scholar, author and entrepreneur, Gary Marcus. Besides the debate on how AI could move forward, Hogarth finds that Marcus’s criticism of the capabilities of AI models is “extremely unhelpful … kind of the opposite of ringing the fire alarm.”
Hogarth believes that in order to carve a safe path toward the advancement of AI, there should be more funding of AI safety, as well as some regulatory oversight. He mentioned gain-of-function research as an example of research that is only allowed under certain conditions and AI should be modeled after. For Hogarth, the distinction of what should be regulated cannot be done on the basis of applications, as all applications have potential for misuse, but rather on the basis of capabilities: anything above a certain threshold should be subject to scrutiny.
Copyright: VentureBeat Top AI investors reveal State of AI in 2022 | VentureBeat