Trace/verifiability of late publishing attacks

m0ar · December 4, 2023, 1:11pm

Hi,

At DeSci Labs, we have strong requirements on deterministic resolution of data, and have been thinking about the impact of the late publishing attack vector on these. The reason we have such requirements is that we are building a system for versioned PID’s to be used as a core component in safekeeping the scientific record.

In the protocol consensus docs, this is mostly neglected for the single-user controller case through the conclusion that one does not want to attack oneself. Allow me to challenge this for a bit with a parallel from our nascent protocol:

A researcher makes a scientific publication which is ultimately stored on Ceramic. This publication is later proven to be fraudulent through community review. This should contribute to a negative reputation delta for the author, and this series of events must be historically verifiable. Alternatively, imagine the author made an honest mistake and later on updates the publication, which is later on approved by the community. The trace of such interaction is very valuable to paint the full picture of the conflict.

Now, what if the offending author started the stream with an anchored but unannounced tip? It is more or less a save checkpoint, or undo button, which the author can at a time of their choosing announce, and have the protocol consensus resolve to removal of the publishing history. It would be easy for a third party to create a service for automatically generating such a failsafe mechanism for newly created streams, more or less ruining the guarantees of deterministic resolution. In this case, using this loophole does not necessarily require a high understanding of the underlying mechanics.

The original data is still there if someone is pinning it, and the CID of that very content can be found computationally from the commit ID. So what we are wondering is basically this:

Will a Ceramic node automatically unpin the consensus-discarded commits when an earlier fork is received from the network?
When a new node joins the network, can it discover the full historical tree of a stream? In other words, does syncing only retrieve the “main” branch?
If someone knows the now-removed commit ID, can it be resolved by a request to a node which has already done the historical change?
Can a node detect these historical changes when they occur, and somehow emit events when it happens? If so, we can in the worst case keep track of these branches manually.

I’m very interested in both discussion whether my assumptions are correct, and what the node/network behavior is, and possibly should be, in these cases. Thanks!

jthor · December 5, 2023, 9:41am

Yep, this is definitely a real problem in the single author case as well.

I’ll let @spencer or someone else more familiar with the current implementation details answer you main questions. I mainly wanted to point out that we want to make the protocol more robust against this attack in the future by explicitly retaining all previous past events in the event log data structure and maintain multiple tips if a “merge” has yet to happen. The main discussion about this is happening in CIP-145, feedback is welcome!

spencer · December 5, 2023, 10:16pm

This is a very good point. It would probably make sense to update our documentation around this case. I will attempt to answer your questions here:

Will a Ceramic node automatically unpin the consensus-discarded commits when an earlier fork is received from the network?

The node will remove it from the ceramic state store, so the knowledge that that stream has that possible divergent history is lost, but the underlying data in IPFS is not explicitly unpinned and so should remain available to the network if you know the CID.

When a new node joins the network, can it discover the full historical tree of a stream? In other words, does syncing only retrieve the “main” branch?

Currently the node will only sync the canonical branch, there is no way for a new node to sync branches that were rejected by the consensus mechanism. However, as Joel pointed out in his comment, we are working to change this (see CIP-145 that Joel linked above). Once CIP-145 is implemented, nodes will always sync and maintain the full history for a stream, including divergent branches. So while the current state and content of the stream will only reflect the canonical history, it will become possible to detect and inspect divergent branches.

If someone knows the now-removed commit ID, can it be resolved by a request to a node which has already done the historical change?

Currently we have a check that prevents loading at a CommitID that the node knows to have been rejected due to conflict resolution. However it would be quite straightforward to expose a flag that would allow disabling that check and let the node return the state at the CommitID even if it’s been rejected. The functionality to do this is already there at lower levels, we just have to expose the option up to the http api and client.

Can a node detect these historical changes when they occur, and somehow emit events when it happens? If so, we can in the worst case keep track of these branches manually.

If a specific individual node has one state for a given stream, then learns about a new commit that causes it to change its view of the state of the stream (whether because it’s a strictly additive new event at the end of the log, or it’s an event that changes the view of the history of the log, doesn’t matter), the node certainly knows when that happens. We are actually already planning work on a “feed” api that will allow developers to subscribe to a feed of all updates to all indexed documents on a node, that API would automatically surface state changes like this.

Of course if a given node never knew about the now-rejected history of a stream, it has no way to detect or surface events about changes to that history.