Aligned Intelligence Solutions: Crypto and AI: Oracles Reinforced by Oracles

OpenAI has shown us the breathtaking potential of active learning. ChatGPT is absolutely incredible, thanks to the deployment of reinforcement learning from human feedback (RLHF). The humans who essentially “upvote” or “downvote” generated prompts from the chatbot provide a massively useful service. These “graders” are employed by the company and are responsible for enforcing the preferences of the wider human consumer. Well, here comes my top-tier idea: why not leverage the blockchain to massively increase the number of “graders”?

Chainlink, a cryptocurrency platform, focuses on the creation and maintenance of Oracles. Oracles are a way to connect the blockchain to the real world. Essentially, users put up collateral and use this collateral to “vote” on specific events or historical facts. The following is a massively simplistic illustration, for example only. Let’s say you could pony up a hundred dollars worth of some cryptocurrency, and vote on “is Joe Biden the current president of the United States?” Lots of people put up collateral and vote, and the outcome is decided by the majority. Since 90% of the voters agree, it is decided that Joe Biden is the current president. The 90% that told the truth are rewarded and gifted a small amount of crypto as a reward for their truth telling. The 10% that lied are punished and their collateral is taken. Thus, we can connect the world of decentralized finance to the greater real-world economy. This is a powerful mechanism that I believe can be applied to the world of artificial intelligence.

Instead of voting on current events, users can express their preferences for certain chatbot responses. For example, if the chatbot responds to the question of “what is a good movie” with “I am going to kill every human being,” the users will put up collateral and downvote that comment. If the chatbot responds with “The Departed,” the users will upvote the comment. The same reward/punishment is enforced, with less harsh penalties and less beneficial rewards. Potentially, these can scale based on the way that others voted. If 99.9% of people voted against, you will be punished to a greater extent for voting for. Millions of people could vote on various prompts, and as a result they will fine-tune the underlying AI system to a level unachievable otherwise. Open sourcing the AI reinforcement to the internet without any sort of financial incentive has been shown to be a disaster, with blank-slate bots being turned racist by a small group of trolls. However, with a strong financial incentive, these problems would be largely mitigated. Also, the underlying value will be essentially created from thin air, massively decreasing the costs that this sort of scale would usually require. And, I really do believe that this system would be valuable.

Many technical kinks would need to be worked out, but I believe that the underlying idea is very interesting. This wouldn’t need to be created by an AI company, but rather a small crypto startup could create the platform and then charge AI labs to use the platform for reinforcement learning. With the required network effects, this could lead to a massively valuable cryptocurrency. The current market cap of ChainLink is $2.5 billion. My idea may not be as revolutionary, but either way, you have no downside. As with everything in crypto, you don’t need a VC. Everything will be paid for by fake money.

Aligned Intelligence Solutions

Thursday, March 9, 2023

Crypto and AI: Oracles Reinforced by Oracles

No comments:

Post a Comment

Reflections on Publishing