The InterPlanetary File System (IPFS) is an open-source, peer-to-peer (P2P) protocol designed to create a decentralized, resilient, and content-addressed web infrastructure. Unlike traditional location-based systems such as HTTP, IPFS identifies data by its content using cryptographic hashes, generating unique Content Identifiers (CIDs) that ensure data integrity and immutability. At its core, IPFS relies on a Merkle DAG structure to link data blocks, enabling efficient storage, versioning, and deduplication across a distributed network of nodes. The system leverages the networking stack to facilitate secure, transport-agnostic peer discovery and communication, while the DHT enables decentralized content routing by mapping CIDs to hosting peers. Data transfer is managed by the protocol, which incentivizes cooperation through a ledger-based tit-for-tat mechanism to prevent freeloading. To support mutable references, IPFS integrates the IPNS and , allowing dynamic updates to content without breaking verifiability. IPFS is widely used in applications, particularly for storing NFT metadata, decentralized websites, and blockchain-related data, often in conjunction with incentive layers like and developer tools such as . Despite its advantages in censorship resistance and data permanence, IPFS faces challenges including the "pinning problem" (reliance on voluntary data persistence), vulnerabilities to Sybil attacks on its DHT, privacy risks due to public content addressing, and regulatory scrutiny over content moderation. Projects like and enhance reliability by combining IPFS with long-term storage guarantees, while gateways such as and enable seamless access via standard web browsers. Ongoing developments in the initiative aim to improve browser integration and performance, positioning IPFS as a foundational layer for a more open, verifiable, and user-controlled internet [1], [2].
Architecture and Core Principles
The InterPlanetary File System (IPFS) is built upon a decentralized, peer-to-peer (P2P) architecture that reimagines how data is stored, addressed, and retrieved on the internet. Unlike traditional web protocols that rely on centralized servers and location-based addressing, IPFS leverages a suite of cryptographic and distributed systems principles to create a resilient, tamper-evident, and censorship-resistant infrastructure. Its design is founded on three core tenets: content addressing, peer-to-peer networking, and a distributed, graph-based data structure. These principles collectively enable a web where data integrity is guaranteed not by trust in intermediaries, but by mathematical verification.
Content Addressing and Cryptographic Integrity
At the heart of IPFS lies content addressing, a paradigm that identifies data by its cryptographic hash rather than its network location. When any piece of data—be it a file, image, or block of text—is added to IPFS, it is processed through a cryptographic hash function, typically SHA-256, to generate a unique Content Identifier (CID) [3]. This CID serves as a digital fingerprint of the content; any alteration, no matter how minor, produces a completely different hash and thus a new CID. This ensures data integrity and immutability, as users can cryptographically verify that the content they retrieve is identical to the original by re-computing its hash and comparing it to the expected CID. This mechanism eliminates the risk of undetected data corruption or tampering, a critical feature for trustless environments like and platforms. The use of the Multihash format allows IPFS to support multiple hash functions, ensuring future-proofing and resistance to potential cryptographic vulnerabilities [4].
Merkle Directed Acyclic Graph (Merkle DAG) for Data Structuring
IPFS organizes data using a Merkle Directed Acyclic Graph (Merkle DAG), a hierarchical data structure that links data blocks through their CIDs. In this model, files are broken down into smaller blocks, each of which is assigned its own CID based on its content. These blocks are then linked together, with parent nodes containing references (CIDs) to their child nodes. The CID of the root node serves as the address for the entire file or dataset [5]. This structure provides several key benefits: it enables efficient data deduplication (identical blocks are stored only once), supports versioning (each version of a file creates a new DAG with a new root CID), and allows for partial retrieval of data. The Merkle DAG also ensures structural integrity; any change to a leaf node propagates upward, altering the root CID, which makes the entire dataset's history verifiable. This design is conceptually similar to the data structures used in , enabling powerful features like snapshotting and incremental updates.
Peer-to-Peer Networking and libp2p
IPFS operates on a global P2P network where nodes store and exchange data directly, without relying on central servers. This decentralized model is facilitated by the libp2p networking stack, a modular framework that handles all aspects of peer communication [6]. libp2p abstracts the complexities of networking by providing a transport-agnostic layer that supports various protocols, including TCP, UDP, , and QUIC, allowing IPFS to function across diverse environments like browsers, mobile devices, and servers [7]. Each node in the network is identified by a self-certifying Peer ID, derived from its public key, which ensures secure and verifiable peer-to-peer interactions. This modular design allows developers to plug in different implementations of discovery, security, and transport mechanisms, making the network highly adaptable and resilient to network fragmentation.
Content Discovery via Distributed Hash Table (DHT)
To locate data in a decentralized network, IPFS uses a Distributed Hash Table (DHT) based on the Kademlia algorithm. The DHT acts as a decentralized index that maps CIDs to the network addresses of peers storing the corresponding data [8]. When a node requests a file by its CID, it queries the DHT to discover which peers are hosting the required blocks. This process of content routing is distributed across the network, with no single point of failure or control. The DHT enables efficient peer and content discovery, allowing nodes to find data even as the network dynamically changes. This system is critical for the network's scalability and fault tolerance, as it ensures that content remains locatable as long as at least one node is hosting it, contributing to the system's resistance to censorship and link rot.
Data Transfer with the Bitswap Protocol
Once a node has discovered peers that host the desired data, the actual transfer is managed by the Bitswap protocol. Bitswap is a message-based system that enables efficient and cooperative exchange of data blocks between peers [9]. Instead of a simple request-response model, Bitswap uses "wantlists" where nodes advertise which blocks they need and which they have available. This allows for parallel downloads from multiple sources, improving speed and redundancy. To prevent freeloading—where peers consume data without reciprocating—Bitswap implements a ledger-based tit-for-tat strategy. Each node maintains a record of data exchanged with its peers and prioritizes sending blocks to those who have previously provided data, creating a self-regulating ecosystem that incentivizes cooperation. This mechanism promotes a healthy, collaborative network without requiring a built-in economic incentive system.
Mutable References with IPNS and DNSLink
While the underlying data in IPFS is immutable, real-world applications often require the ability to update content. To address this, IPFS provides systems for mutable references. The InterPlanetary Name System (IPNS) allows users to create updatable links to content by associating a mutable pointer (a public key) with a changing CID [10]. Updates are signed with the corresponding private key, ensuring that only the rightful owner can change the reference, while anyone can verify the authenticity of the update. Similarly, DNSLink enables the use of traditional DNS records to point to IPFS content, allowing human-readable domain names to resolve to CIDs. These systems provide a dynamic namespace over the immutable content, enabling the creation of websites and applications that can be updated while still benefiting from the verifiability and permanence of the underlying data.
Content Addressing and Data Integrity
The InterPlanetary File System (IPFS) fundamentally redefines how data is identified, stored, and verified by replacing traditional location-based addressing with content addressing, a mechanism that ensures data integrity through cryptographic verification. Unlike HTTP, which locates resources by server address (e.g., https://example.com/file.pdf), IPFS identifies data by its content using a unique Content Identifier (CID), a cryptographic hash—typically SHA-256—of the data itself [3]. This approach guarantees that any alteration to the content, no matter how minor, produces a completely different CID, making tampering immediately detectable and ensuring immutability [12].
Cryptographic Hashing and Content Identifiers
At the core of IPFS’s integrity model is cryptographic hashing, which generates a fixed-length digital fingerprint for any input. When a file is added to IPFS, it is processed through a hash function, and the resulting digest becomes part of its CID. This means that identical content will always produce the same CID, enabling automatic data deduplication across the network [13]. The CID serves as a self-verifying address: upon retrieval, the system re-computes the hash of the received data and compares it to the requested CID. If they match, the data is confirmed authentic; if not, the transfer is rejected, ensuring tamper-evident delivery [14].
IPFS uses the Multihash format to encode hashes, which includes metadata such as the hash function used (e.g., SHA-256, BLAKE3) and digest length. This self-describing structure allows IPFS to support multiple cryptographic algorithms and enables future upgrades without breaking compatibility [4]. For instance, if SHA-256 were ever compromised, IPFS could transition to more secure alternatives while preserving its addressing paradigm.
Merkle Directed Acyclic Graph (Merkle DAG) and Structural Integrity
To manage complex data structures, IPFS organizes content into a Merkle Directed Acyclic Graph (Merkle DAG), where each node contains a hash of its data and references to child nodes via their CIDs [5]. Files are split into blocks, each assigned a CID, and linked hierarchically. The root node’s CID represents the entire file, enabling efficient verification: by checking the root hash, users can cryptographically confirm the integrity of the entire dataset without downloading all components [17].
This structure supports advanced features such as incremental updates and versioning. When a file is modified, only the changed blocks are re-hashed, and a new Merkle DAG is constructed with a new root CID. The previous version remains accessible under its original CID, preserving a history of immutable snapshots—similar to version control systems like [5]. This model is particularly valuable in -based systems, where off-chain data must remain verifiable and unaltered [19].
Immutability and Tamper Evidence
Content addressing enforces immutability: once data is published, it cannot be altered without changing its CID. This makes IPFS inherently tamper-evident, as any unauthorized modification invalidates the original reference. This property is critical in trustless environments such as , where users and smart contracts rely on cryptographic proof rather than centralized authorities to verify authenticity [20]. For example, in NFT applications, the metadata and media are stored on IPFS, and the CID is recorded on-chain. This ensures that the digital asset linked today will be the same tomorrow, with any alteration breaking the cryptographic chain of trust [21].
Mutable References and Integrity Preservation
While IPFS content is immutable, real-world applications often require updates. To support mutable references without sacrificing integrity, IPFS integrates the InterPlanetary Name System (IPNS) and DNSLink. IPNS uses public-key cryptography: a node generates a key pair, and the public key serves as a mutable namespace. Updates are signed with the private key, and clients verify the signature before accepting the new CID, ensuring only authorized changes are applied [22]. This preserves integrity at the naming layer while enabling dynamic content, such as updated website versions or evolving NFT metadata.
Security Implications and Verification Mechanisms
Despite its strong integrity guarantees, IPFS does not provide built-in encryption or access control. Data is public by default, and anyone who knows a CID can retrieve it, posing privacy risks [23]. However, integrity remains intact even in adversarial environments. Emerging tools like @helia/verified-fetch enable end-to-end verification in web browsers, ensuring that content retrieved via gateways matches the expected CID [24]. Additionally, protocols like Proof of Unified Data Retrieval (PoUDR) use zero-knowledge proofs to cryptographically verify data retrieval, enhancing trust in decentralized networks [25].
Implications for Trustless Systems
In blockchain and decentralized applications (dApps), the synergy between IPFS and on-chain verification creates a powerful model for off-chain storage with on-chain verifiability. By storing large files on IPFS and anchoring their CIDs in smart contracts, developers achieve scalable, cost-effective data management without compromising integrity [26]. This hybrid approach supports auditability, provenance tracking, and censorship resistance, forming the backbone of secure DeFi platforms, digital identity systems, and archival solutions [27].
In summary, content addressing in IPFS ensures data integrity through cryptographic hashing, Merkle DAGs, and self-verifying identifiers. These mechanisms provide tamper-evident storage, immutability, and structural integrity, making IPFS a foundational technology for trustless, decentralized systems where data authenticity is paramount [28].
Peer-to-Peer Networking and libp2p
The InterPlanetary File System (IPFS) relies on a robust peer-to-peer (P2P) networking model to enable decentralized data storage, sharing, and retrieval across a global network of nodes. Unlike traditional client-server architectures, IPFS eliminates centralized points of control by allowing nodes to directly exchange data based on content rather than location. This decentralized structure is made possible through the libp2p networking stack, a modular, protocol-agnostic framework designed to abstract the complexities of P2P communication and support interoperability across diverse environments [6].
Role of libp2p in IPFS
libp2p serves as the foundational communication layer for IPFS, providing a reusable and extensible network stack that enables secure, efficient, and transport-agnostic connectivity. Originally developed as part of IPFS, libp2p was later decoupled to function as a standalone toolkit for building decentralized applications. It allows IPFS nodes to discover, connect to, and transfer data with other peers without relying on centralized infrastructure, ensuring resilience and censorship resistance [30].
The modular architecture of libp2p enables developers to plug in various implementations of core networking components, including transports, security protocols, peer discovery mechanisms, and stream multiplexers. This flexibility ensures that IPFS can operate efficiently across different platforms—such as browsers, mobile devices, and servers—while maintaining compatibility and performance [6]. By abstracting low-level networking concerns, libp2p allows IPFS to focus on content addressing and data integrity without being tied to any specific transport protocol or network topology.
Peer Discovery Mechanisms
For nodes to exchange data, they must first locate one another within the decentralized network. libp2p supports multiple peer discovery strategies, often used in combination, to ensure reliable and scalable node discovery.
mDNS (Multicast DNS)
On local networks, libp2p uses mDNS to allow peers to discover each other by broadcasting their presence within the same subnet. This method is particularly effective for LAN-based applications or local clusters where low-latency discovery is required [32].
Kademlia Distributed Hash Table (DHT)
For global-scale discovery, libp2p employs a Kademlia DHT, a distributed key-value store that maps cryptographic identifiers to network addresses. The DHT enables efficient routing and lookup of both peers and content by organizing nodes based on XOR-based distance metrics derived from their PeerIDs. This allows any node to locate another peer or a file’s provider through iterative queries to progressively closer nodes, facilitating scalable content routing across the network [33].
Rendezvous Protocol
The rendezvous protocol allows nodes to discover each other via well-known, highly available "rendezvous points" or registration nodes. While more federated than fully decentralized methods like the DHT, it provides a reliable mechanism for peers behind NATs or restrictive firewalls to find each other [34].
Bootstrap Nodes
New nodes typically begin discovery by connecting to a predefined list of bootstrap nodes, which act as entry points into the network. These nodes provide initial peer addresses, enabling newcomers to populate their routing tables and participate in DHT queries [35].
Secure Communication
Security in libp2p is built into the connection lifecycle, ensuring confidentiality, integrity, and authentication of peer-to-peer communications.
Noise Protocol Framework
libp2p uses the Noise Protocol Framework (specifically noise-libp2p) as its default secure channel protocol. Noise provides forward secrecy, mutual authentication, and resistance to replay attacks by performing cryptographic handshakes that establish shared secrets between peers [36]. The handshake incorporates the peers’ public keys, ensuring that each PeerID—a self-certifying identifier derived from the hash of a node’s public key—is cryptographically verifiable [37].
TLS 1.3 Support
In addition to Noise, libp2p supports TLS 1.3 for secure communication, particularly in environments where compatibility with existing tooling or regulatory requirements favors standardized encryption [38]. This dual support allows developers to choose the most appropriate security protocol based on their use case.
Transport-Agnostic Connectivity
A core principle of libp2p is transport agnosticism—the ability to operate over multiple underlying network protocols without requiring changes to higher-level application logic. This flexibility allows IPFS to function across heterogeneous environments, including browsers, mobile devices, and traditional servers.
libp2p supports a wide range of transport protocols, including:
- TCP and UDP for standard internet connectivity
- WebRTC and WebTransport for browser-to-browser and browser-to-node communication [39], [40]
- QUIC for low-latency, multiplexed transport over UDP
- Relay transports for NAT traversal and connectivity in restricted networks
This transport abstraction enables seamless interoperability. For example, a browser-based IPFS node can use WebRTC to connect directly to another browser or use a relay to communicate with a server node using TCP, all through the same libp2p interface [41].
Conclusion
libp2p is the backbone of IPFS’s decentralized networking model, providing a modular, secure, and adaptable communication layer. By supporting multiple peer discovery strategies, enforcing end-to-end encryption via the Noise Protocol and TLS 1.3, and enabling transport-agnostic connectivity, libp2p ensures that IPFS can scale globally while remaining resilient to network fragmentation, censorship, and infrastructure limitations. Its design principles—modularity, security, and decentralization—make it a foundational technology for the next generation of distributed systems [42].
Data Distribution and Replication
The InterPlanetary File System (IPFS) achieves data distribution and replication through a decentralized, peer-to-peer (P2P) architecture that relies on content addressing, distributed routing, and cooperative storage mechanisms. Unlike traditional file systems that depend on centralized servers, IPFS distributes files across a global network of independent nodes, ensuring redundancy, fault tolerance, and resistance to censorship. This model enables resilient access to data as long as at least one node hosts it, making the network inherently robust against single points of failure [43].
Content-Addressed Distribution and On-Demand Replication
At the core of IPFS's distribution model is content addressing, where each data block is uniquely identified by a cryptographic hash known as a Content Identifier (CID) [3]. When a file is added to IPFS, it is split into smaller blocks, each assigned a CID derived from its content using algorithms like SHA-256. These blocks are linked together in a Merkle Directed Acyclic Graph (Merkle DAG), allowing efficient reconstruction and verification of the original file [45].
Distribution occurs when nodes request content by its CID. The system uses a Distributed Hash Table (DHT)—based on the Kademlia algorithm—to locate peers storing the requested data. This decentralized index maps CIDs to the network addresses of hosting nodes, enabling content discovery without reliance on central servers [8]. As nodes retrieve data, they often cache or "pin" it locally, leading to on-demand replication. This means popular content naturally becomes more widely distributed across the network, improving availability and retrieval speed [47].
Bitswap Protocol and Efficient Data Exchange
Data transfer between nodes is managed by the Bitswap protocol, a message-based system that facilitates peer-to-peer exchange of blocks [9]. Instead of a traditional request-response model, Bitswap uses "wantlists" to advertise which blocks a node needs or has, enabling concurrent downloads from multiple sources. This allows for faster, more resilient retrieval, especially in environments with high latency or intermittent connectivity [49].
Bitswap also incorporates a ledger-based tit-for-tat mechanism to incentivize cooperation among peers. Each node maintains a ledger tracking how much data it has sent and received from other peers. Nodes prioritize sending blocks to those who have previously contributed data, effectively deprioritizing freeloading behavior. This creates a self-regulating ecosystem where contribution is rewarded with better service quality, promoting long-term network health [9].
Replication Through Pinning and IPFS Cluster
Replication in IPFS is not automatic; by default, nodes only store data they have added, requested, or temporarily cached. To ensure long-term availability, data must be explicitly "pinned"—a process that prevents the data from being removed during garbage collection [51]. Pinning is essential for maintaining persistence, as unpinned data may be deleted when storage limits are reached.
For coordinated replication across multiple nodes, IPFS Cluster provides a management layer that synchronizes pinning across a group of IPFS nodes [52]. Cluster allows administrators to define replication factors—specifying how many nodes should store a given piece of data—and automatically distributes and tracks pinned content. It supports consensus mechanisms like Raft or Conflict-Free Replicated Data Types (CRDTs) to maintain consistency, even during network partitions or node failures [53]. This makes IPFS Cluster ideal for enterprise use cases requiring guaranteed redundancy and fault tolerance.
Remote Pinning Services and Incentivized Storage
To enhance accessibility and reduce infrastructure burdens, users can leverage remote pinning services such as Pinata, NFT.Storage, and Web3.Storage. These services operate dedicated nodes that ensure content remains available by persistently pinning files, offering service-level guarantees for uptime and redundancy [54]. They are widely used in NFT projects to ensure metadata and digital assets remain accessible over time.
For economically secured, long-term storage, IPFS integrates with incentive layers like Filecoin, a blockchain-based network that rewards storage providers for reliably hosting data [55]. When data is stored via NFT.Storage, it is first pinned on IPFS for fast access and then replicated on Filecoin, where cryptographic proofs (Proof of Replication and Proof of Spacetime) verify ongoing storage. This dual-layer approach ensures both performance and permanence, aligning with the needs of Web3 applications [56].
Erasure Coding and Advanced Fault Tolerance
Beyond full replication, some IPFS-based systems employ erasure coding to improve storage efficiency while maintaining fault tolerance. Techniques like alpha entanglement codes split data into encoded fragments distributed across nodes, allowing reconstruction even if some fragments are lost. This reduces storage overhead compared to full replication and enhances durability in large-scale deployments [57].
Additionally, high-availability backup strategies have been proposed that integrate monitoring and repair components to detect node failures and proactively restore missing data, further strengthening long-term resilience [58]. These mechanisms complement IPFS’s inherent fault tolerance, which is reinforced by the DHT’s ability to reroute queries dynamically, ensuring content remains discoverable as long as at least one provider is online [8].
Differences from Traditional Distributed File Systems
IPFS’s replication model differs significantly from traditional distributed file systems like Hadoop Distributed File System (HDFS) or Ceph. While traditional systems rely on centralized metadata servers (e.g., HDFS NameNode) and predefined replication policies, IPFS operates without central coordination. Its fault tolerance emerges organically from network participation rather than being enforced by a control plane [60].
Moreover, IPFS ensures end-to-end data integrity through cryptographic hashing in the Merkle DAG, whereas traditional systems often rely on weaker checksums or parity mechanisms. However, unlike HDFS, which guarantees persistence through administrative policies, IPFS depends on voluntary or incentivized pinning, making data availability contingent on user or economic incentives rather than system-enforced guarantees [47].
Integration with Blockchain and Web3
The InterPlanetary File System (IPFS) plays a pivotal role in the evolution of the decentralized web, particularly through its integration with blockchain technologies and the broader Web3 ecosystem. By combining IPFS’s distributed, content-addressed storage with blockchain’s immutability and trustless verification, developers can build scalable, secure, and cost-effective decentralized applications (dApps) that overcome the limitations of traditional on-chain storage. This synergy enables a new paradigm of data management where large files—such as images, videos, metadata, and documents—are stored off-chain on IPFS, while only their cryptographic references (Content Identifiers, or CIDs) are anchored on-chain within smart contracts [26].
This hybrid model ensures that data remains tamper-proof, verifiable, and censorship-resistant, while significantly reducing the economic and technical burden of storing bulk data directly on blockchains like Ethereum or Polygon. The integration supports critical use cases across NFT platforms, decentralized finance (DeFi), digital identity, and secure document management, forming the backbone of a resilient, user-owned digital infrastructure.
Hybrid Storage Model and On-Chain Verification
The core of IPFS’s integration with blockchain lies in the hybrid storage model, where data is stored off-chain on the IPFS network and only its CID is recorded on-chain. When a file is uploaded to IPFS, it is assigned a unique CID derived from its cryptographic hash (typically SHA-256). This CID is then stored in a smart contract using programming languages such as Solidity or via developer tools like web3.js and ethers.js [63]. To retrieve the data, applications resolve the CID through an IPFS gateway, with the blockchain guaranteeing the authenticity of the reference.
This model ensures end-to-end verifiability: any alteration to the content results in a different CID, making tampering immediately detectable. Projects like IPCM (InterPlanetary CID Mapping) enhance this model by enabling dynamic updates to IPFS content while maintaining verifiable links via smart contracts, allowing for mutable references without sacrificing integrity [64]. This approach is particularly valuable for applications requiring auditability, such as healthcare data management or legal documentation, where data provenance and integrity are paramount [27].
Advantages Over On-Chain Storage
Storing large datasets directly on a blockchain—known as on-chain storage—is technically feasible but highly inefficient due to several constraints:
- High Cost: On-chain storage incurs prohibitive gas fees; for example, storing 250GB of data on Ethereum would be economically unfeasible [66].
- Scalability Limits: Blockchains are optimized for transaction processing, not bulk data storage, leading to network congestion and slow performance.
- Immutability Challenges: While immutability is a strength, it becomes a limitation when updates are needed, especially for mutable data like user profiles or NFT metadata.
In contrast, the IPFS-blockchain combination offers significant advantages:
Cost Efficiency and Scalability
By storing only CIDs on-chain (typically less than 100 bytes), the cost of data anchoring is drastically reduced. This makes it economically viable to manage large datasets such as NFT collections, media files, or enterprise records [67]. A 2026 study highlights a dual framework using lightweight blockchain and scalable smart contracts with IPFS to optimize both storage efficiency and retrieval speed [68].
Data Integrity and Censorship Resistance
Despite being off-chain, data stored on IPFS remains tamper-proof. Since retrieval is content-based (via CID), any modification invalidates the hash, ensuring data authenticity. This is particularly critical for use cases like source code repositories, healthcare records, and legal documentation [69]. When combined with incentive layers like Filecoin, long-term persistence is ensured through economic guarantees [70].
Interoperability and Flexibility
The IPFS-blockchain architecture supports cross-chain compatibility. Projects like Textile provide SDKs such as @textile/storage and @textile/ipfs-lite to bridge dApps on Ethereum, NEAR, and Polygon with IPFS and Filecoin, enabling seamless data management across ecosystems [71].
Use Cases in Web3 Ecosystems
NFTs and Digital Assets
IPFS is widely used to store NFT metadata and media files. Platforms like NFT.Storage and Pinata simplify the process by offering free, reliable pinning services that ensure NFT data remains accessible and immutable [72]. Best practices recommend storing JSON metadata (including image links) on IPFS and anchoring the CID in the NFT smart contract [73]. Major platforms like OpenSea and Rarible have adopted NFT.Storage to enhance the resilience of their NFT offerings [74].
Decentralized Identity and Document Management
Applications leveraging decentralized identity or document verification use IPFS to store sensitive records, with blockchain ensuring auditability. For instance, a blockchain-IPFS framework has been proposed for secure, interoperable healthcare data management, enabling patient-controlled access and traceable data sharing [27].
Decentralized Code Repositories
An integrated blockchain and IPFS solution has been developed for hosting source code repositories, providing version control, integrity verification, and resistance to tampering using a middleman approach [69].
Supporting Infrastructure and Developer Tools
Several platforms have emerged to simplify IPFS-blockchain integration:
- Web3.Storage: Offers a simple API and JavaScript client library to store data on IPFS and Filecoin, automatically creating redundancy and long-term deals [77].
- Infura|Infura and Pinata: Provide managed IPFS gateways, pinning services, and developer tools to ensure data availability and reduce operational complexity [78].
- Filecoin Pin: Enables developers to guarantee persistent storage by creating verified deals with miners on the Filecoin network [79].
These tools abstract the technical challenges of node management, allowing developers to focus on application logic while maintaining decentralization and security.
Conclusion
The integration of IPFS with blockchain technologies forms a powerful paradigm for decentralized data storage in Web3. By leveraging IPFS for scalable, content-addressed file storage and blockchain for immutable referencing and verification, this hybrid model overcomes the cost, scalability, and performance limitations of pure on-chain storage. It enables robust applications across NFTs, DeFi, healthcare, and enterprise systems, supported by a growing ecosystem of tools and protocols like Filecoin and Textile. As decentralized applications continue to evolve, the synergy between IPFS and blockchain will remain a cornerstone of secure, efficient, and user-owned digital infrastructure.
Security, Privacy, and Attack Vectors
The InterPlanetary File System (IPFS) offers robust mechanisms for data integrity and decentralization, but its distributed architecture introduces significant security and privacy challenges. While content addressing and cryptographic hashing ensure tamper-evident storage, the absence of built-in access controls, reliance on voluntary pinning, and vulnerabilities in its networking stack expose users to risks such as unintentional data exposure, censorship, and network-level attacks. These issues are compounded by the protocol’s public-by-default nature and the lack of native encryption, necessitating careful mitigation strategies for secure deployment.
Data Integrity and Tamper Resistance
IPFS ensures data integrity through cryptographic hashing, primarily using SHA-256, which generates a unique Content Identifier (CID) for each piece of content [12]. Any alteration to the data results in a completely different CID due to the avalanche effect, making tampering immediately detectable. This self-verifying model allows users to independently recompute the hash of retrieved data and compare it to the expected CID, ensuring authenticity without relying on trusted intermediaries [28]. The use of a Merkle Directed Acyclic Graph (Merkle DAG) extends this integrity guarantee to complex data structures, enabling verification of entire datasets by checking only the root hash [5]. This design is particularly critical in trustless environments like blockchain-based systems, where off-chain data must remain verifiable and immutable [27].
To future-proof its integrity model, IPFS employs the Multihash format, which embeds metadata such as the hash function and digest length within the CID. This allows seamless migration to more secure algorithms like SHA-3 or BLAKE3 if vulnerabilities in SHA-256 are discovered, ensuring backward compatibility and long-term resilience [4]. Emerging tools like @helia/verified-fetch further enhance verification by enabling end-to-end cryptographic checks directly in web browsers, protecting against man-in-the-middle attacks during retrieval [24].
Privacy Risks and Unintentional Exposure
Despite its strong integrity guarantees, IPFS poses significant privacy risks due to its default public accessibility. Any user who knows a CID can retrieve the associated content, as there are no built-in access controls or encryption mechanisms [23]. This has led to documented cases of sensitive data exposure, including API keys, private SSH keys, and internal configuration files, often uploaded inadvertently by developers [87]. Once published, content becomes immutable and permanently accessible as long as any node pins it, creating a "permanence paradox" where users may assume data is ephemeral, but it is effectively archived indefinitely [88].
User behavior can also be inferred through traffic monitoring on the Distributed Hash Table (DHT), where nodes broadcast requests for specific CIDs. Passive adversaries can correlate these requests with IP addresses to build profiles of user interests, posing a threat to anonymity [89]. To mitigate these risks, data must be encrypted client-side before uploading, using protocols such as AES-256 or decentralized key management systems like the Lit Protocol, which enables fine-grained access control based on blockchain attestations [90]. Projects like Peergos implement block-level encryption and access control, offering a model for truly private sharing on IPFS [91].
Attack Vectors: Sybil, Eclipse, and Content Censorship
IPFS is vulnerable to several network-level attacks, most notably Sybil attacks and eclipse attacks targeting its Kademlia-based DHT. In a Sybil attack, an adversary creates numerous pseudonymous nodes to manipulate routing tables and isolate honest peers. Research has demonstrated that even a single machine can eclipse content by dominating DHT lookups, preventing users from discovering specific CIDs with minimal resources [92]. A critical vulnerability, CVE-2023-26248, in the go-libp2p-kad-dht implementation allowed attackers to exploit this weakness, highlighting ongoing risks in the protocol’s core infrastructure [93].
These attacks can be used for content censorship, where malicious actors suppress access to specific data by refusing to store or route it. While IPFS is designed for censorship resistance, such attacks undermine this goal by enabling targeted content blocking [94]. Additionally, BGP hijacking at the network level can intercept or drop IPFS traffic, effectively censoring content regionally [95]. Public IPFS gateways, such as ipfs.io, are also susceptible to legal pressure and may implement Safe Mode filtering to block access to malicious or illegal content, reintroducing centralized control points [96].
Mitigation Strategies and Trust Models
To counter these threats, a layered defense approach is required. DHT hardening measures, such as limiting peers from the same IP subnet and improving routing table diversity, have been implemented to increase the cost of eclipse attacks [97]. Statistical detection of anomalous node behavior can identify Sybil nodes, while Proof of Space (PoSp) has been proposed to raise the economic cost of operating large numbers of fake nodes [98]. For privacy, Private Information Retrieval (PIR) protocols like Peer2PIR allow users to fetch content without revealing which CID they are requesting, significantly enhancing query confidentiality [99].
Trust in IPFS is established through decentralized models based on cryptographic verification rather than centralized authorities. The InterPlanetary Name System (IPNS) uses public-key cryptography to provide mutable, verifiable pointers: updates are signed with a private key, and clients verify the signature before accepting the new CID [22]. This ensures only authorized parties can update content. Integration with Decentralized Identifiers (DIDs) and Verifiable Credentials (VCs) further enhances trust by enabling self-sovereign identity and auditable storage commitments [101]. Projects like Verifiable Decentralized IPFS Clusters (VDICs) combine IPFS with blockchain to create transparent, tamper-proof storage ecosystems [102].
Legal and Operational Vulnerabilities
IPFS faces legal challenges due to its potential for abuse in hosting illegal content such as child sexual abuse material (CSAM) or malware, which can be distributed anonymously through public gateways [103]. While intermediary liability protections generally shield gateway operators from legal responsibility—similar to ISPs or Tor nodes—mass DMCA takedown notices create operational and reputational risks, leading to self-censorship [104]. Regulatory frameworks like the Digital Services Act (DSA) in the EU may increase pressure on gateways to implement proactive moderation, threatening the open nature of the network [105].
Operational security also depends on avoiding reliance on centralized pinning services like Pinata or Infura, which act as single points of failure. If these services go offline or unpublish content, data becomes inaccessible, undermining decentralization goals [106]. Developers should instead use incentivized storage layers like Filecoin, which provides cryptoeconomic guarantees for long-term persistence through verifiable proofs of replication and storage [107]. Combining IPFS with DNSLink and DNSSEC can further secure mutable references, ensuring that updates to NFT metadata or dApp frontends are tamper-evident and verifiable.
Censorship Resistance and Legal Challenges
The InterPlanetary File System (IPFS) is designed with inherent features that enhance censorship resistance, making it a powerful tool for preserving access to information in restrictive environments. Its decentralized, peer-to-peer architecture and content-addressed storage model eliminate single points of failure and control, enabling users to distribute and retrieve data without reliance on centralized servers [108]. This design allows content to remain accessible as long as at least one node hosts it, even if the original publisher or server is taken offline [109]. For example, the Catalan government utilized IPFS to disseminate referendum-related materials despite legal blocks and website takedowns by Spanish authorities, demonstrating its utility in circumventing state-imposed censorship [110]. Similarly, activists have leveraged IPFS to share banned literature across the Great Firewall of China, using distributed nodes to bypass national filtering systems [111].
However, while IPFS provides strong technical foundations for censorship resistance, its effectiveness is constrained by practical and systemic limitations. One major vulnerability is the pinning problem, which refers to the reliance on voluntary data persistence. Content on IPFS only remains available if at least one node actively "pins" it—storing and serving the data. If no node chooses to pin a file, it becomes inaccessible, creating a form of passive censorship driven by resource constraints or ideological disengagement [112]. This dependency undermines the network’s resilience, particularly for less popular or politically sensitive content that may lack sustained hosting support.
Network-Level Censorship and Infrastructure Centralization
Despite its decentralized design, IPFS is not immune to network-level censorship. Adversaries can exploit underlying internet infrastructure to disrupt access. For instance, BGP hijacking—where malicious actors manipulate routing protocols—can redirect or block traffic to critical IPFS nodes, effectively censoring content even if it exists elsewhere in the network [113]. Such attacks target the physical layer of internet connectivity rather than the protocol itself, exposing a critical vulnerability in IPFS’s censorship resistance.
Moreover, empirical studies reveal a growing trend toward centralization within the IPFS ecosystem. Research indicates that over 80% of content is hosted by just 5% of peers, many of which operate on major cloud platforms like Amazon Web Services, Google Cloud, and Microsoft Azure [114]. This concentration creates de facto chokepoints that can be targeted by regulators or censors, undermining the network’s decentralized ethos. The reliance on large cloud providers introduces risks of compliance with takedown requests, service disruptions, or throttling under legal pressure, effectively reintroducing centralized control mechanisms.
Public IPFS gateways—services like Cloudflare, Pinata, and ipfs.io that provide HTTP access to IPFS content—are particularly vulnerable to regulatory intervention. Although these gateways do not permanently host data, they are frequently subjected to DMCA takedown notices and other legal demands to block access to specific Content Identifiers (CIDs) [115]. While such blocks do not remove content from the network, they significantly hinder accessibility for mainstream users who rely on these gateways for seamless browsing [116]. A 2024 legal opinion affirmed that gateway operators generally function as intermediaries and are not liable for content they merely relay, reinforcing protections similar to those for ISPs or Tor nodes [104]. Nevertheless, the threat of legal liability often leads to self-censorship, where operators preemptively filter content to avoid litigation or reputational damage [105].
Legal and Regulatory Challenges Across Jurisdictions
The decentralized nature of IPFS poses significant challenges for government regulation and content moderation. Traditional legal frameworks assume centralized intermediaries who can be held accountable for illegal content, but IPFS distributes responsibility across a global network of nodes, many of which may fall outside any single jurisdiction [119]. This complicates enforcement, especially when content crosses multiple legal boundaries [120].
In authoritarian regimes like China and Iran, IPFS is both a tool for circumvention and a target for suppression. While peer-to-peer exchange can persist behind firewalls, public gateways are often blocked, limiting usability for average users [121]. In democratic states, IPFS benefits from intermediary liability protections under laws like the DMCA in the U.S. and the E-Commerce Directive in the EU. However, emerging regulations such as the EU’s Digital Services Act may increase pressure on gateway operators to implement proactive content moderation, potentially eroding the openness of the decentralized web [105].
Additionally, the lack of built-in content moderation mechanisms raises concerns about the distribution of illegal or harmful material, including child sexual abuse material (CSAM), malware, and extremist propaganda [103]. While the IPFS community has introduced optional denylists and content-blocking features—such as the compact denylist format and tools like NOpfs—these are opt-in and do not guarantee universal enforcement [124]. This creates a tension between preserving censorship resistance and addressing legitimate public safety concerns.
Standardization and Governance Barriers
A major structural hurdle to IPFS’s widespread adoption is the absence of formal recognition by the Internet Engineering Task Force (IETF), the primary body for internet protocol standards. Unlike HTTP, which is governed by RFCs, IPFS lacks IETF-standardized specifications, and the ipfs:// URI scheme remains outside official protocol registries [125]. This lack of institutional endorsement hinders integration into core internet infrastructure, such as native browser support and enterprise networking systems.
Furthermore, despite its decentralized design, IPFS development is largely driven by Protocol Labs, raising governance concerns about long-term neutrality and sustainability [126]. The reliance on centralized indexers, gateways, and cloud infrastructure creates a paradox of "centralization within decentralization," where usability and performance improvements come at the cost of ideological purity [127].
In conclusion, while IPFS offers robust technical mechanisms for censorship resistance through decentralized distribution and cryptographic integrity, its real-world resilience is tempered by infrastructure dependencies, legal pressures, and governance challenges. Achieving true censorship resistance requires not only technological innovation but also coordinated efforts in legal advocacy, infrastructure diversification, and engagement with global standards bodies to ensure equitable and sustainable access to information.
Scalability and Performance Optimization
The InterPlanetary File System (IPFS) employs a range of architectural strategies and protocol-level optimizations to address scalability challenges inherent in decentralized file storage. As the network grows in size and usage, maintaining performance, fault tolerance, and efficient resource utilization becomes critical. IPFS leverages content addressing, distributed systems design, and evolving networking protocols to scale effectively while managing trade-offs related to decentralization, replication, and bandwidth efficiency.
Content-Addressed Storage and Merkle DAGs
At the core of IPFS’s scalability is its use of content addressing and the Merkle Directed Acyclic Graph (Merkle DAG) data structure. By assigning each data block a unique Content Identifier (CID) derived from its cryptographic hash, IPFS enables efficient deduplication: identical content across files or versions is stored only once, reducing storage overhead [13]. This structural efficiency allows IPFS to scale to massive datasets, with research indicating that Merkle DAGs can support hundreds of billions of entries while maintaining performance [129]. Furthermore, the hierarchical nature of the Merkle DAG supports incremental updates, allowing large files to be modified without re-uploading entire datasets, which enhances bandwidth efficiency in distributed environments.
Distributed Hash Table (DHT) Optimization
The Distributed Hash Table (DHT) is central to IPFS’s content routing mechanism, mapping CIDs to the peers that store them. However, the DHT has historically been a scalability bottleneck, particularly for nodes hosting large numbers of CIDs. A major advancement in 2024, known as Provide Sweep, was introduced in Kubo v0.39 to significantly improve DHT efficiency [130]. This optimization reduces the number of DHT lookups by up to 97%, enabling self-hosted nodes to manage hundreds of thousands or even millions of CIDs without overwhelming the network. This enhancement makes large-scale data hosting more feasible and improves the overall responsiveness of content discovery.
IPFS Cluster for Coordinated Pinning
To ensure data availability and fault tolerance at scale, IPFS Cluster provides a coordination layer that manages replication and pinning across multiple IPFS nodes. It allows administrators to define a replication factor—specifying how many nodes should store a given piece of data—and automatically distributes content across the cluster [52]. This orchestrated replication ensures redundancy and prevents data loss if individual nodes go offline. IPFS Cluster supports consensus mechanisms such as Raft or CRDTs (Conflict-Free Replicated Data Types) to maintain consistency among cluster peers, even during network partitions [132]. This makes it a powerful tool for enterprises and service providers seeking reliable, scalable storage infrastructure built on IPFS.
Cloud-Optimized and Elastic Architectures
Scalability is further enhanced through cloud-based architectures that decouple data ingestion from serving. Elastic IPFS providers leverage cloud infrastructure (e.g., AWS) to dynamically scale node count, implement load balancing, and optimize data distribution [133]. These hybrid models combine the principles of decentralized storage with centralized orchestration to achieve near-infinite scalability while preserving content integrity. Such architectures are particularly effective for applications requiring high availability and low-latency access, bridging the performance gap between pure peer-to-peer networks and traditional centralized systems.
libp2p and GossipSub Enhancements
The underlying networking stack plays a critical role in IPFS scalability. GossipSub, the default pub/sub routing protocol, has undergone significant optimizations to improve bandwidth efficiency in large networks. Recent proposals such as GossipSub v1.4 introduce message preambles and IMReceiving notifications to reduce redundant transfers and latency for large messages [134]. GossipSub v2.0 explores lazy mesh propagation to minimize duplicate message delivery, improving bandwidth utilization at the cost of slight latency increases [135]. Techniques like message staggering and fragmentation are also being evaluated to improve handling of large payloads [136].
Current Limitations and Trade-offs
Despite these advancements, IPFS faces several scalability limitations and trade-offs. A fundamental constraint is the 1 MiB maximum block size in the default implementation, which affects how efficiently large files can be stored and retrieved [137]. While this facilitates fine-grained deduplication, it increases metadata overhead for large datasets. Additionally, IPFS exhibits low natural replication—only about 2.71% of files are replicated more than five times—posing risks to long-term data availability [13]. This often leads to reliance on centralized pinning services, creating a trend toward centralization that undermines one of IPFS’s core goals.
The Bitswap protocol, responsible for data exchange, also faces scalability challenges. Content discovery involves broadcasting interest to multiple peers via the DHT, leading to increased bandwidth consumption and privacy exposure [49]. Research into plausibly deniable content discovery aims to mitigate these concerns [140]. Furthermore, IPFS Cluster’s use of the Raft consensus algorithm imposes practical limits on cluster size due to leader election and log replication requirements, necessitating sharding or hierarchical clustering for very large deployments [141].
Fault Tolerance vs. Performance Trade-offs
IPFS demonstrates strong fault tolerance, with continued operation observed even when 60% of DHT servers became unresponsive [142]. However, such events lead to increased latency and slower lookups, highlighting a trade-off between availability and performance. The system prioritizes resilience over speed, which can impact user experience in real-time applications. Ongoing research into privacy-preserving discovery, erasure coding, and alternative chunking strategies like Content-Defined Chunking (CDC) aims to resolve these limitations, positioning IPFS for broader adoption in scalable, decentralized storage ecosystems.
Use Cases and Real-World Applications
Future Development and Standardization
The future development and standardization of the InterPlanetary File System (IPFS) are shaped by a complex interplay of technical innovation, regulatory scrutiny, and the evolving demands of the Web3 ecosystem. While IPFS has established itself as a foundational protocol for decentralized data storage and content addressing, its transition into a core component of global internet infrastructure faces significant hurdles. Ongoing efforts aim to enhance performance, improve usability, and address critical challenges related to governance, legal compliance, and formal standardization.
Technical Advancements and Performance Optimization
Recent developments have focused on improving the scalability and efficiency of IPFS, particularly in large-scale deployments. A major milestone was the introduction of Provide Sweep in Kubo v0.39, which reduces the number of Distributed Hash Table (DHT) lookups by up to 97% [130]. This enhancement allows self-hosted nodes to manage hundreds of thousands or even millions of content identifiers (CIDs) without overwhelming the DHT, making IPFS more viable for high-volume data hosting. Additionally, optimizations in the underlying libp2p networking stack, such as improvements to GossipSub v1.4 and the proposed GossipSub v2.0, aim to reduce bandwidth consumption and improve message delivery efficiency in large peer-to-peer networks [134], [135].
Efforts are also underway to address IPFS's inherent limitations, such as the 1 MiB block size constraint and inefficient default deduplication using Fixed-Size Chunking (FSC). Research into Content-Defined Chunking (CDC) suggests it could significantly reduce storage costs at scale, though it may introduce computational overhead [13]. Furthermore, the Interplanetary Shipyard initiative is actively working to improve browser integration and performance, with goals including native CID resolution and enhanced user experience through projects like Lassie, a retrieval client that simplifies access to IPFS and Filecoin content [147].
Standardization and Formal Recognition
A critical barrier to the widespread adoption of IPFS is the lack of formal recognition by established internet standards bodies. Unlike HTTP, which is governed by IETF RFCs, the ipfs:// URI scheme has not been standardized by the Internet Engineering Task Force (IETF) [125]. This absence hinders native integration into browsers, operating systems, and enterprise infrastructure, which typically require IETF-endorsed protocols. The IPFS project has developed its own specification process through InterPlanetary Improvement Proposals (IPIPs), but this community-driven model lacks the global authority and consensus of formal standardization [149].
Despite this, IPFS maintains detailed technical documentation and architecture specifications at [150], fostering a degree of interoperability across implementations like Kubo (Go) and js-IPFS. However, the risk of fragmentation remains, as different implementations may interpret specifications differently, potentially leading to compatibility issues. The ecosystem's reliance on HTTP gateways (e.g., ipfs.io) to bridge the decentralized and traditional web introduces centralization risks and sustainability concerns, as these gateways are often operated by a few entities and may face legal or financial pressures [151].
Regulatory Challenges and Content Moderation
IPFS's decentralized and content-addressed architecture presents significant regulatory challenges, particularly concerning content moderation and liability. Because data on IPFS is immutable and distributed, removing illegal or harmful content—such as child sexual abuse material (CSAM) or copyrighted works—is extremely difficult. This has led to a wave of Digital Millennium Copyright Act (DMCA) takedown notices targeting public IPFS gateways, even though these gateways act as mere conduits rather than hosts [115]. Legal opinions suggest that gateway operators may be protected under intermediary liability frameworks similar to those for ISPs or Tor nodes [104], but the threat of legal action can still lead to self-censorship and operational risks.
The pseudonymous nature of IPFS also enables abuse for hosting phishing pages, malware, and illicit marketplaces [154]. While the IPFS community has introduced opt-in content blocking mechanisms like compact denylists and the NOpfs layer, these tools are not universally enforced and do not resolve the fundamental tension between censorship resistance and the need for public safety [124]. Jurisdictions like China have responded by blocking IPFS bootstrap nodes and monitoring associated traffic, limiting its effectiveness as a circumvention tool [156].
Governance and Long-Term Sustainability
The long-term sustainability of IPFS as global infrastructure depends on addressing issues of governance, economic cost, and environmental impact. Despite its decentralized design, empirical studies show a trend toward centralization, with a small number of cloud providers hosting the majority of content [114]. This concentration creates de facto control points and vulnerabilities, as cloud providers may comply with legal takedown requests or suffer outages that affect large portions of the network.
Running a reliable IPFS node incurs significant economic and environmental costs, including hardware, electricity, and high-bandwidth internet. Monthly hosting expenses can exceed €100 on major cloud platforms, posing a barrier for individuals and institutions in resource-constrained regions [158]. The energy consumption of maintaining multiple data replicas across a decentralized network also raises concerns about its ecological footprint, especially as the network scales [159].
Comparison with Historical Protocol Adoption
The challenges facing IPFS today echo the "Protocol Wars" of the 1980s and 1990s, when TCP/IP competed with the OSI model for dominance. Like IPFS, OSI was a formally standardized model backed by governments and international bodies, but it was ultimately overtaken by the more agile, implementation-driven development of TCP/IP [160]. However, a key difference is that TCP/IP benefited from substantial institutional support, including funding from DARPA and adoption by the U.S. government, whereas IPFS is primarily driven by Protocol Labs and the Web3 community, facing skepticism from traditional internet governance institutions [126].
For IPFS to achieve widespread adoption, it will require not only continued technical innovation but also engagement with formal standards bodies, clearer legal frameworks, and strategies to reduce barriers to entry for under-resourced regions. Without such efforts, IPFS may remain a powerful but parallel network rather than a foundational layer of the future web.