Deep analysis of IPFS and Filecoin and what are their drawbacks
Nowadays Web3 is developing rapidly, with various Web3 applications emerging, such as NFT, decentralized games, metaverse, Web3 social App and DAO. But these Web3 applications all run in the mode of "Web2 application to provide service + centralized database to store application data+blockchain to store financial assets and do asset transactions". These kinds of Web3 have limited prospect. If decentralized storage of application data cannot be realized, It is hard to build real decentralized Web3 applications. Blockchains such as Ethereum can only solve the decentralization of finance, but It is impossible to store huge amounts of application data on Ethereum. (for a detailed elaboration of this issue, see another article of ours: https://www.cyfs.com/blog-What-kinds-of-crypto-will-rise-in-next-bull-market), so we have to look for other technical solutions.
Among the existing decentralized storage solutions on the market, the most popular one，feasible one with the largest community is IPFS+FileCoin. But it still has many drawbacks. Now let's do an objective analysis of it.
The drawbacks of IPFS and Filecoin
Idea of IPFS
The design of IPFS is that everyone can freely run their own IPFS peer nodes to form a P2P network. Your data is stored on your peer and can be voluntarily backed up by other peers. Each data has a unique CID in the whole network. As long as you know the CID, you can retrieve data from the network, which realizes the decentralized storage and retrieving of data. Its specific operating mechanism is as follows:
Each peer maintains its own DHT, which maintains the address information of a small part of peers and“provider records” a small part of data in the network. "Provider record" is to record which node has certain data, such as "Alice has CID_D this data". The overall architecture of IPFS is shown in the figure:
Data storage process
When Alice peer stores data D, she needs to retrieve m peers in the network that are "closest" to the data in the network (m is generally 20), and request them to save the provider record "Alice has CID_D data" in their DHTs. The algorithm for calculating the "distance" between peers and CID_D uses the Kademlia algorithm. The retrieval process is that，Alice first finds the K peers that are "closest" to the data D from her own DHT and requests them to save the record. If these peers can find peers that are "closer" than themselves in their own DHT, they will recommend to Alice, then she will request these "closer" peers. In this way, after multiple rounds of requests, the relatively nearest peers can be finally found and have them store the record.
Data retrieval process
If Bob wants to retrieve data D in the network through CID_D, he needs to find one of the m peers "closest" to CID_D and get the provider record "Alice has CID_D data", then know Alice has the data and request data from her. The algorithm for calculating "distances" and retrieving peers are the same as the data storage process.
This system looks good, but it has a fatal flaw, that is, there is no incentive in this network, that means everyone needs to provide services to others with negative benefits. For example, DHT maintenance, responding to other people's retrieval requests consumes huge bandwidth, and storing provider records and data for others consumes a lot of storage space. These all require costs, but these things have nothing to do with themselves, and there are no benefits. I believe most people would not want to do such a thing. IPFS Docs also admit that IPFS cannot guarantee the data being retrieved.
Therefore, the DHT-based decentralized retrieval solution of IPFS can only work well in the laboratory environment. Once a large amount of data is stored on IPFS, it will definitely collapse.
Then let's talk about FileCoin. Many people think that FileCoin is an incentive layer for IPFS to encourage peer behaviors. However it is not. FileCoin is used to provide long-term backups for this data in case of data loss in IPFS while IPFS is used for decentralized retrieval of data. FileCoin and IPFS are two separate networks.
Also, FileCoin's incentive model also has problems. The idea of FileCoin is that the more data stored, the longer the time stored, the more FIL would be rewarded. So for miners, isn’t it more cost-effective to store their own garbage data than to store real user data? As such, although FileCoin claims to have a large amount of data stored, it is almost all garbage data known by all miners. Moreover, once the FIL price drops sharply, miners are reluctant to provide storage. If users really save data on FileCoin, it is likely to be lost.
Objectively speaking, the idea of IPFS is correct. If the IPFS system is reliable, then everyone can freely publish content on IPFS and allow others to access it in a decentralized manner. But because it is not, people can only use IPFS as a gimmick. For example, some NFT projects store their minted NFT files on IPFS and show them to users: "Look, your NFT files have been decentralized stored on IPFS, it cannot be deleted and can exist forever now." However, This usage is neither effective nor the original purpose of IPFS.
The innovative design of CYFS
CYFS uses an innovative architecture to solve the above problems of IPFS, realizing a really feasible decentralized storage infrastructure. Similar to the peer of IPFS, we also let users run their own OOD (Owner Online Device) to form a P2P network. The architecture diagram is as follows:
If you want to quickly understand how CYFS works, you can watch the following video:
Below is the detailed description:
The fundamental flaw of IPFS is that there is no incentive mechanism designed. Here comes the question, how should this incentive mechanism be designed? This looks complicated, but let's go back to the basic principle of human business society, it will be clear:"who has the demand,who pays the cost". In a decentralized storage system, that is:"who has a demand for reliable storage and retrieval of the data, he should pay for the cost". Obviously, only the owner of the data naturally has this need. So, the incentive mechanism now is clear, the data owner himself should be responsible for the storage and retrieval of the data. If the process needs to rely on third parties, they should get paid by the data owner. IPFS's data retrieval involves hundreds of random peers, which makes it technically impossible to design the incentive mechanism.
Blockchain as DNS
After knowing the basic principle of incentive mechanism, understanding CYFS is easy. If Alice wants to publish her content using CYFS, she can generate a cyfs:// link to share it. The difference between cyfs:// and ipfs:// is that ipfs:// has only one section, which is the CID of the data, while cyfs:// has two sections. One section is the unique device ID of Alice's OOD, while the other is the object ID of the data (similar to CID). The purpose of this design is to ask data retrievers to request data directly from the data owner.
Then, the question has evolved into how Bob can quickly retrieve the address of Alice OOD in CYFS with Alice's device ID. We innovatively use blockchain to replace DHT, which is called meta-chain in CYFS. As long as Alice publishes the latest address of her OOD on the meta-chain, Bob can get it from the meta-chain through the device ID and connect to Alice's OOD. Problem solved! The latter process is the same as IPFS. Bob directly requests data from Alice, and proves that the data is the original data by checking the object ID. The whole process cannot be interfered with by others and have the same performance as Web2 , as long as Alice ensures that her OOD is online.
Reliable storage of data
Many may ask if data is stored on one's own OOD, what if the OOD goes down and data gets lost? There are two ways to solve this problem, one is to use multi OODs to achieve high availability, the other is to pay to encrypted backup important data on other people's OODs (to prevent others from stealing data) As long as you can rerun your OOD anywhere, you can get back your data with the private key. CYFS has designed the DSG protocol to provide a complete service proof method, which can protect the interests of both parties. At the same time, the DSG protocol supports the establishment of a decentralized storage matching market on it, providing reasonable prices for storage resources and more incentives.
Also someone may ask:"it is too troublesome to maintain my OOD online, is there a way to allow my data to be stored and retrieved without requiring me to run OOD?" IPFS agrees with this opinion, that's why they provide centralized ping service like pinata. But how is this different from AWS S3? This solution is easy to implement, but we should first know, what is the meaning of data decentralization? The significance of the decentralization of Web3 data is to allow users to truly own their own data. If it is not about owning it, it is more convenient to put it on Web2 applications. If you want to truly own your data, you must control your own storage devices.
Mass distribution of data
Under normal circumstances, these designs are sufficient, but there are still many scenarios that require large-scale distribution of data in a short period of time, Such as large-scale distribution of data in social networks, large-scale distribution of online multi-group chats, etc. We designed the BDT protocol to help application developers solve this problem while having good overall load. The core idea of BDT is that the more valuable your content is to other people, the more nodes are willing to voluntarily backup your data and help you make the link available. If there are no volunteers, you can also pay them to achieve this. We have done so much work, just hope to build Web3 applications that can really replace Web2 applications.
If you want to know more about CYFS, you can follow us here:
CYFS website: https://www.cyfs.com/