Empty stream from api/v0/multiqueries

Hi, since a couple of weeks ago, and no changes on our app, we’re getting an empty result from reading some IDX-related streams when calling a DIDDataStore .get(‘’), like the following one:

ceramic://k2t6wyfsu4pg2dngmumympowrsuz4yt7aw676zdwx33njyn4gat35l6pchuu5c

Example payload to the request:

{"queries":[{"streamId":"k2t6wyfsu4pg2dngmumympowrsuz4yt7aw676zdwx33njyn4gat35l6pchuu5c","genesis":{"header":{"controllers":["did:pkh:eip155:1:0x7456ed43037820285e0f37708630cff2e78317f8"],"family":"IDX"}}}]}

And it results in empty response: “{}”

But the content is there, it can be see on cerscan:

https://cerscan.com/mainnet/stream/k2t6wyfsu4pfx9q48o3dhegwe722ngmtfmupma7x7b0iwr5ui9v4rnp4z1xjne

These are our versions:

"@ceramicnetwork/blockchain-utils-linking": "^2.0.8",
    "@ceramicnetwork/http-client": "^2.3.1",
    "@ceramicnetwork/stream-tile": "^2.4.0",
    "@didtools/pkh-ethereum": "^0.0.2",
    "@glazed/datamodel": "^0.3.1",
    "@glazed/did-datastore": "^0.3.2",
    "did-session": "^1.0.0",
    "key-did-provider-ed25519": "^2.0.1",
    "key-did-resolver": "^2.0.5",

Thanks in advance

1 Like

Alejandro, apologies for the long delay in replying. I’m pinging our engineers internally to elevate this.

Note that the StreamID in the multiquery payload (k2t6wyfsu4pg2dngmumympowrsuz4yt7aw676zdwx33njyn4gat35l6pchuu5c) is different than the StreamID you sent in the cerscan link (k2t6wyfsu4pfx9q48o3dhegwe722ngmtfmupma7x7b0iwr5ui9v4rnp4z1xjne).

Regardless though, I can load both streams and access valid data from them, so I wouldn’t expect an empty response.

One thing I do notice though is that the Cerscan copy of the stream k2t6wyfsu4pfx9q48o3dhegwe722ngmtfmupma7x7b0iwr5ui9v4rnp4z1xjne contains two extra commits that the copy I am able to load is missing. Investigating further, it seems that the first of those commits (with IPFS CID bagcqcerazhq3ayshls5u2ml2n6unhp6kesdiib73ag5qu53jiqrgwelcbjzq) has a CACAO that was issued at 2022-10-29T22:44:32.982Z, which was during the CAS (Ceramic Anchor Service) outage that occurred over that weekend, which we disclosed on discord here: Discord. Those two extra commits were never anchored, which means the CACAO expired, rendering those commits invalid.

So it seems possible to me that some of the issues you are seeing are due to issues surrounding data corruption from this CAS outage. If you search your Ceramic node logs, do you see messages that contain the text “CACAO expired”?

In any case, I’d strongly recommend upgrading your Ceramic node to the newest version, as last month we rolled out a change to more proactively detect these types of data corruption and proactively throw errors (see Discord), which should help you detect and clean up any streams that were affected by the outage.

I want to apologize personally for the issues resulting from this outage. A major focus on the whole team right now is investing in the stability of the anchoring system to avoid any other data loss incidents like this in the future.

I’m a bit confused on how we can recover from this situation, some of the streams affected are the IDX registry of those users.

How can we “clean up” bad streams for users if they are the controller?

Shouldn’t the ceramic network detect the corrupt commits and fall back to the latest valid commit?

Shouldn’t the ceramic network detect the corrupt commits and fall back to the latest valid commit?

That’s effectively what the newer versions of Ceramic do, though they require a manual intervention step as we didn’t want to automatically discard data in case application devs wanted to try to remember the content and have users re-apply it on the newly repaired streams. New nodes will detect this corrupted state and throw an error with the suggestion to reload the stream with the sync flag set to SYNC_ALWAYS, which will force the node to discard the invalid commits and reset the stream state back to its last valid state

Ok, I just think this should be the default, at least from what I see on our app we have more than a 100 affected streams.

Hi @spencer
We can’t manage to retrieve a list of all of the corrupted streams from the node logs, can you help us check how this should be done precisely?
Thanks

We can’t really find all the streams that have this corrupted state, if some of those streams haven’t been referenced in a while. We can however identify the ones that have been loaded or updated since the CACAO expired, as there will be log messages about the CACAO timeouts in the Ceramic node’s log. @mohsin has a script he wrote to do that - Mohsin can you share that script so they can run it against their node’s logs?

Also, if you update to the newest Ceramic version, an error will be thrown anytime a corrupted stream is loaded or updated. You could have your application catch errors and check the error message to see if it’s due to this issue, and if so then it could automatically reload the stream with the SYNC_ALWAYS flag, resetting it to a valid state. So then whenever a user comes to your app and tries to interact with a corrupted stream it will be automatically repaired on demand.

My code actually uses AWS Cloud Insights to look for those specific logs.

@alerios, are you on AWS and using CloudWatch logging as well, by any chance?

If so, then this code would be useful to pull out the list of affected streams.

Thanks @spencer I was looking for a way to do it on the code, I saw this can be passed to the loadStream method:

loadStream(streamid, {sync: SyncOptions.ALWAYS_SYNC})

But we’re using DIDDataStore.get(), is there a way to pass the same flag here?

@paul - is it possible to pass SyncOptions through the DIDDataStore APIs to get down to the underlying loadStream call on the Ceramic http client?

No it’s not supported. The DID DataStore uses the TileLoader, which uses the multiQuery() method, not loadStream().