Potential Data Loss on Mainnet

Since Friday 28th Oct we’ve been experiencing issues with our production instance of Ceramic Anchoring Service and its Ceramic node that made it impossible to anchor commits on Ceramic Mainnet.

This may have caused data loss for your DApp runing on Ceramic Mainnet, if:

  • you use CACAO to authorize your DApp/Session to make updates on Ceramic (either use CACAO directly, or via a higher-level library, like DID or DIDSession)
  • your CACAOs’ expiration time is shorter than three days
  • in this case, because none of the commits containing CACAOs have been anchored, they may be considered expired and Ceramic nodes may reject them
  • we’re currently looking into whether there are any other cases when data loss is possible

Additionally, we’ve been also seeing time outs on our Prod and Clay public Ceramic nodes.

The whole engineering team is currently working on remediation of these issues. We’ll be keeping the community updated.

3 Likes

Hi everyone! We would like to share a status update with you all regarding the data loss issue mentioned above.

As of now, Prod is stable and anchoring commits on a regular schedule. We are working on fixes that will improve the stability going forward. As mentioned earlier, we have a batch that hasn’t been anchored and we are looking into ways to salvage as much data as possible - on Monday the team will be looking into how streams that were using did-key, 3ID or longer CACAO expiration timeout can still be anchored.

After a deeper investigation we can see that we have unprocessed anchor requests between 2022-10-29 22:35:36 GMT and 2022-11-02 20:56:50 GMT . Updates performed during this time window will likely be lost if you were using the default CACAO expiration timeout.

If you find yourself having issues loading streams and getting CACAO has expired errors, it might be possible to repair the stream (by causing it to throw out the writes with the expired CACAOs) to a consistent state by loading it with the sync flag set to SyncOptions.ALWAYS_SYNC . This causes the node to reload the stream from scratch and only consider commits that are valid and unexpired. Using the ceramic http client this would look something like:

const streamid = /* ... */
const stream = ceramic.loadStream(streamid, {sync: SyncOptions.ALWAYS_SYNC})

In a browser, this would look like loading a url like:

https://<ceramic node url>/api/v0/streams/<streamid>?sync=1

Note that the above fix will only work for streams where the commit with the expired CACAO was an update. For streams that were created during the outage, they may have an expired CACAO in their genesis commit. In that case resyncing the stream won’t be able to repair the stream state, and the only option is to create a new stream to replace the corrupt one.

We will keep you updated as we are working on implementing the fixes.