Gain of function research has gotten a bad rap lately, due to the global pandemic caused by a potential lab-leak in China. This lab was taking viruses from the natural world and experimenting with them, potentially to test ways to combat these exact enhanced viruses. Well, then a global pandemic happened that wrecked economies across the globe and killed a lot of people. Oops! We don’t really know what happened still, funny how that works, but yeah, bad rap. Still, this sort of research can be useful, assuming the labs running the research stop leaking nasty enhanced viruses to susceptible human populations. Easier said than done.
Well, what about AI? Could it be useful to dive deep into gain of function research for AGI? I believe the answer is a resounding yes. Obviously, this wouldn’t include putting an AGI in a box and trying to see how it gets out. It also wouldn’t include taking a large language model and trying to see if it can rapidly improve itself. There are a lot of ways gain of function research could be done in very stupid ways, and I think it would actually be easier to kill all of humanity via stupid gain of function research than otherwise. Still, one of the biggest problem with current AI ex-risk is that no one really takes it seriously. If you can prove that we should, that could massively benefit humanity. I am against open-sourcing this kind of research, at least in most circumstances. We shouldn’t give potential web-scraping AGI any ideas, and we shouldn’t give terrorist organizations any either.
There is a lot of talk about nanobots and random sci-fi stuff being utilized in order to kill all of humanity, despite the obvious fact that we are constantly one button click away from near-total human destruction. The doomsday machine is real, and multiple times humanity almost blew itself up because some combination of miscommunication and computer errors. Remember, those are just the situations we know about. If we are barely skating by, nearly avoiding blowing ourselves up every few decades out of sheer luck, imagine what would happen if a half-motivated AGI wanted to tip the scales. Life in the nuclear age is actually terrifying, and it is clear that even a narrow AI could be used to cause billions of deaths. The advances in biotechnology are another venue towards ex-risk, and I believe this risk is also massively enhanced by the advent of AGI. It shouldn’t be hard to devise a virus that kills most or all of humanity, and again, it’s probably the case that even a narrow AI could do this. The recent pandemic showed just how unprepared humanity is for a virus that doesn’t even kill 1% of the people it infects, imagine if that number is 20%, or 99%. It’s unfortunate that actual day to day work on AI is so boring, because this field is such a big deal and the problems are so important. Maybe by publishing some scary research we can make AI safety cool and exciting, while also convincing people that the future could be terrifying.
AI
alignment, if solved, could actually lead to a massive increase in ex-risk. If
we can get an AGI to do exactly what humans want, and someone convinces it to
destroy humanity, that would be bad. Without proper alignment, maybe the AGI gets
massively confused and only makes fifty paperclips. Or because of biased
training data, the AI only kills white people because the initial data set is
biased and it does not realize there are humans that are non-white. Please don’t
use this example to call me a racist who loves biased training data, but imagine
how funny that would be. I’m not racist, I’m just not including Puerto Ricans in my models
because I really want them to survive the AI apocalypse. Overall, what I am
saying is there is some threshold we should meet with this research, and there
is a fine line between when we should wait and when we should push forward. If through
gain of function research we “solve alignment” in some super unenlightened and
ineffective way, we could give AI labs a false sense of security. Also, we could
give some bad actors some “good” ideas.
For a
moment, let’s discuss bad actors. Obviously, there could be some pro-apocalyptic
groups hell-bent on destroying all of humanity. There are terrorists who could
be motivated to suicide bomb the world, and there are power hungry totalitarian
regimes who may kill everyone but their in-group. The fundamental question I
ask myself is, does it matter who gets AI first? There are two scenarios here. In
the first scenario, whoever gets AGI first solves alignment, creates ASI with
their set of values, and these values persist from then on. If this is the
case, it really matters who builds AGI first. If China builds AGI, we’re all
about to live under a totalitarian regime, and free speech is no longer a thing
(sad!). If North Korea builds AGI, things get pretty bad and The Interview is
deleted from streaming platforms. If the USA builds an AGI, human rights might
have a better shot at being preserved, but for some reason the 3d printer won’t
stop making oil. Should we optimize for the third scenario, even if the changes
of ex-risk go up? In this case, gain of function research could be really good,
if we use it to make actually-aligned AI systems that a “better” country or company
could use to win the race.
Now comes
the second scenario. Maybe, regardless of who builds AGI first, we are all
totally screwed. AI alignment is impossible, and if the US builds it or North Korea
builds it, we all get turned into paperclips. The ex-risk is the same for
everyone, because the problem of converting human values into machine code does
not work with our current framework. Corrigibility doesn’t work, and no amount
of technical research moves the needle on interpretability. In this case, I
actually think gain of function research is the most important area of
research. Because, only through really solid proof that everything is about to
go really, really wrong will we have a chance at halting AI progress. Only through
showing that without a doubt we are screwed, will governments be able to
collaborate on stopping AI progression, at least until we solve alignment or
come up with a better AI framework.
No comments:
Post a Comment