OpenAI has shown us the breathtaking potential of active learning. ChatGPT is absolutely incredible, thanks to the deployment of reinforcement learning from human feedback (RLHF). The humans who essentially “upvote” or “downvote” generated prompts from the chatbot provide a massively useful service. These “graders” are employed by the company and are responsible for enforcing the preferences of the wider human consumer. Well, here comes my top-tier idea: why not leverage the blockchain to massively increase the number of “graders”?
Chainlink,
a cryptocurrency platform, focuses on the creation and maintenance of Oracles.
Oracles are a way to connect the blockchain to the real world. Essentially,
users put up collateral and use this collateral to “vote” on specific events or
historical facts. The following is a massively simplistic illustration, for
example only. Let’s say you could pony
up a hundred dollars worth of some cryptocurrency, and vote on “is Joe Biden
the current president of the United States?” Lots of people put up collateral
and vote, and the outcome is decided by the majority. Since 90% of the voters agree,
it is decided that Joe Biden is the current president. The 90% that told the
truth are rewarded and gifted a small amount of crypto as a reward for their
truth telling. The 10% that lied are punished and their collateral is taken. Thus,
we can connect the world of decentralized finance to the greater real-world
economy. This is a powerful mechanism that I believe can be applied to the world
of artificial intelligence.
Instead
of voting on current events, users can express their preferences for certain
chatbot responses. For example, if the chatbot responds to the question of “what
is a good movie” with “I am going to kill every human being,” the users will put
up collateral and downvote that comment. If the chatbot responds with “The
Departed,” the users will upvote the comment. The same reward/punishment is
enforced, with less harsh penalties and less beneficial rewards. Potentially,
these can scale based on the way that others voted. If 99.9% of people voted
against, you will be punished to a greater extent for voting for. Millions of
people could vote on various prompts, and as a result they will fine-tune the underlying AI system to
a level unachievable otherwise. Open sourcing the AI reinforcement to the
internet without any sort of financial incentive has been shown to be a disaster, with blank-slate bots being turned
racist by a small group of trolls. However, with a strong financial incentive,
these problems would be largely mitigated. Also, the underlying value will be
essentially created from thin air, massively decreasing the costs that this sort
of scale would usually require. And, I really do believe that this system would
be valuable.
Many technical
kinks would need to be worked out, but I believe that the underlying idea is
very interesting. This wouldn’t need to be created by an AI company, but rather
a small crypto startup could create the platform and then charge AI labs to use
the platform for reinforcement learning. With the required network effects,
this could lead to a massively valuable cryptocurrency. The current market cap
of ChainLink is $2.5 billion. My idea may not be as revolutionary, but either
way, you have no downside. As with everything in crypto, you don’t need a VC.
Everything will be paid for by fake money.
No comments:
Post a Comment