Ceramic without Anchoring

Ceramic currently has a system of timestamping all events that are written to the network called Anchoring. This is achieved by including all events into a big merkle tree and making a transaction to ethereum which includes the root hash of that tree in the transaction payload. The process of anchoring is done by the Ceramic Anchor Service. Right now only one CAS is available on the network and it’s operated by 3Box Labs.

Why anchoring?

So why does Ceramic even need anchoring? There are three main reasons for this (really only two since one is historical). Let’s explore each of them

Key rotation, e.g. 3ID

While 3ID is no longer actively supported in Ceramic, it’s worth mentioning here for completeness, since it was the original reason to introduce anchoring in Ceramic in the first place.
The core problem 3ID was solving was to be able to have a permanent identifier over time, with the ability to change the keys with admin control over that identifier. This required timestaps in order to know which key rotation happened first, e.g. key1 → key2, or key1 → key3. Anchors allowed us to easily pick the key rotation that was anchored earliest.

Automatic OCAP revocation

Our system of OCAPs (object-capabilities, sometimes referred to as CACAOs in our system) doesn’t yet have a way of direct revocation. However, when a user creates an OCAP and delegates to a session key, the OCAP may contain an expiry time. The main functionality that anchors serve today is to make sure that the data created using these OCAPs are timestamped before the expiry time (modulus a grace period).
The main risk this is trying to mitigate is the potential of key loss. If a key is stolen, at least the theif could only use that key for as long as the OCAP is valid. Worth noting is that there currently isn’t any way in Ceramic to explicitly revoke an OCAP right away.

Trustless Timestamps

Besides key management, the main value proposition of anchoring is tamper proof timestamps. This allows developers and users of Ceramic to see when a piece of data was created (more specifically they can know that the data was created before a specific point in time, sometimes referred to as proof-of-publication).
A few questions arises out of this:

  1. Do current and future developers and users actually need timestamps?
  2. If yes, is the current approach to anchoring optimal from a cost and efficiency perspective?
  3. Or would they prefer to have more control over which data and when data is timestamped?

The answer to all of these three questions isn’t super obvious and likely depends a lot on different need for different use cases.

Key revocation without anchoring

Can the key revocation functionality we need be achieved without the need of doing anchoring at all? This would certainly be desireable as it would remove a lot of complexity in the protocol that is involved with running CAS (or the planned rust-ceramic feature self-anchoring). It would also allow us to solve the needs around trustless timestamps from a more product centric perspective, rather than being a side effect of technical constraints.


… Retracted proposal …


Conclusion

Given that key rotation is no longer the goal of Ceramic (since it’s handled on-chain by “smart accounts”), anchoring as it exists today on Ceramic is no longer an optimal solution to OCAP revocation. There is a possibility to retain an alternative way of OCAP revocation event without anchoring.
If we decide to let go of anchoring all together, the question of the trustless timestamp aspect of anchoring can be re approached from a more product centric perspective. Giving users and developers on Ceramic more optinality on if and how they get trustless timestamps.

2 Likes

First question, I’ll have follow up questions but I think I am missing something fundamental so will try and clarify that first.

How does the issued at timestamp get set? Given that the premise of the new system is that the presence of a new key with a new issued at time moves time forward, how do we prevent bad actors from moving time forward? Is the assumption that only the original controller of a stream can create new OCAPs? Can an OCAP delegate to another OCAP?

2 Likes

I think a big issue with this approach is that it only provides revocation information within a single stream, but today most streams in Ceramic only have a single event in them and CACAOs can be used to create new streams, so this approach would not restrict an attacker from using a compromised OCAP from still creating a lot of data.

2 Likes

Alternate proposal: explicit revocation only

Instead of having any concept of time-based capability revocation, we instead add explicit capability revocation and rely entirely on it instead.

Explicit capability revocation is a feature we want to build anyway. With the time-based capability expiration we have today, an attacker who is able to compromise a session key is still able to do a lot of damage and publish a lot of malicious data before the key expires. Explicit capability revocation would allow a user who has detected the fact that one of their session keys was compromised to sign a message with their root key which would register the compromised session key into a revocation registry (details of how this registry works left undefined for now). Any Ceramic node validating an event would need to not just check the signature on the event and the capability that authorizes the session key being used, but also would need to check that the session key being used is not present in the revocation registry. If it is, the event is rejected as invalid.

Revoking a session key like this would not only prevent the key from being used to author new events, it would also retroactively invalidate all previous events ever authored with that key. This has the added benefit of letting recover from a scenario where they granted access to an app that they later learned was malicious and published bogus data on behalf of the user (a scenario that does not require a session key to be compromised, because the user actually explicitly authorized the app to use that session key, it just so turned out that the app itself was malicious).

pros

  • Simplicity - the architecture of ceramic becomes very simple, and the security model is easy to understand and reason about
  • Increases the robustness of the protocol by completely eliminating the concept of anchoring and native verifiable timestamping.
  • Does not rely on wall clock time in any way, time ceases to have any explicit meaning or role in the protocol.

cons

  • Pushes responsibility for detecting and managing a compromised session key entirely to the end user
    ** Note there are already categories of attack that would require the end user to take responsibility for explicit capability revocation, even if we continue to support time-based capability expiration.
  • A compromised session key can remain usable indefinitely.
    ** On the one hand, if a session key is compromised today, the attacker can already generate an unbounded amount of malicious data before the key expires. On the other hand, with this proposal the attacker could sit on the key for an arbitrarily long period of time before using it to generate malicious data, expanding the types of attacks that are possible and making it harder for users to monitor their session keys for misuse.
  • Capabilities must be revoked retroactively
    ** Without a trusted source of time, there’s no way for a user to revoke a session key and just prevent it from being used for new writes, without discarding the existing writes that were already performed with the key. This means if the user/app has been using this session key for a while and has written a lot of valid data with that key before an attacker compromises it, the user must invalidate all the valid writes done with that key in order to get rid of the malicious writes done by the attacker. Note there might be ways to mitigate this such as having the user sign a new message with the root key or a new session key that “restores” the valid writes that were done with the now-revoked key. There are also ways we can create norms and guardrails around keeping session keys short-lived so that there aren’t too many valid writes associated with any one session key, to keep the cost (in terms of valid writes that need to be thrown out and marked invalid) of revoking a session key low.
2 Likes

I have mixed feelings about the above proposal. Making it more as a conversation starter rather than something I am necessarily endorsing or pushing for. I’d love to hear from more security-minded folks about just how bad it is to have a compromised key stay valid indefinitely until/unless the user explicitly revokes it.

1 Like

I also had a somewhat half-baked idea where if the node that originally received a write signed an attestation that it was the node the write originated on, and that signature got shared with other nodes when an event is synchronized, it would allow a capability to be issued to a session key AND restricted to events created on a specific node. That way if a session key is compromised, the key cannot be used to publish events via any node on the network, the attacker would need to use the original node that the application was using when the session key capability was created. That could possibly be combined with some a light-weight time-based capability expiration mechanism where well-behaved nodes use wall-clock time to enforce capability expiration just on brand new writes that come in via the API (but not to writes learned about over the network via Recon). So a malicious app is still a problem, but an attacker who steals a key that was used by a well-behaved app wouldn’t be able to use that key after it expires. This breaks down though if the app is using an in-browser Ceramic node as then the attacker could just inject code into the node in browser to bypass the time-based expiration enforcement.

1 Like

One thing I’d add to this idea is the possibility to keep certain events valid even though their OCAP has been revoked.

  1. There are new events issued with a non-revoked OCAP that build on top of the event w/ revoked OCAP, e.g. in the same stream
  2. As part of the OCAP revocation, a list of events (i.e. their CIDs) that should remain valid after the revocation could be provided

Both of these approaches puts more responsibility on the user to keep track of “valid” events somehow. However, it would also enable the user to implement protocols for explicitly revoking capabilities if they so desire.

Finally a note on practical security today if we were to use the explicit revocation only approach. Currently all session keys used with Ceramic are created using webcrypto and the non-extractable option. This means that (given a correct browser implementation) malicious scripts, from XSS or malicious extension, can’t access and steal the users private key. At worst they can write new data when the user has a tab with the app open. Pairing this with Spencers idea of setting norms to frequently rotate keys (e.g. clientside libraries creates a new session key and OCAP every week), the normal case attack vector is fairly small.

When looking at anchoring removal, we need to understand the benefits and costs that we encounter when anchoring, and what tradeoffs exist if it is removed. Anchoring allows us two primary benefits, the first being a backup validation for events, and the second being an ordering of events by time, which is used for validation and conflict resolution. Anchoring has a cost, both in terms of the transaction to save it to the blockchain, and a limit on throughput, as events can only become valid as quickly as we anchor.

Looking at anchoring as a backup validation for events, this is primarily a redundancy in our event validation protocol. If we were able to solve event validation such that it cannot be easily circumvented, we would not need this additional redundancy.

The second benefit, of validation and conflict resolution, can occur within the protocol itself, without the need to communicate in the blockchain. One method of doing this has been proposed by @spencer , where there is a registry of revocation IDs, and any stream that has been written by an id in the revocation list is considered compromised, and not to be used. Due to the use of session keys, this is typically a session key being compromised, not a DID. Since this requires self reporting of compromised IDs, and time to discovery of compromise may be long, clients should rotate their session key often to limit the impact of compromised IDs.

One problem with this approach is that malicious IDs can perform malicious activities until they are revoked, not only on existing streams, but also new streams. This may impact the compromised ID reputationally or financially.

Another approach to this is a secondary token with an enforced shorter expiration, that must be refreshed periodically with confirmation by the ID so that a compromised session key is limited in its scope of impact based on this expiration time. This could be further extended with an access key that is derived from the refresh token with an even shorter expiration, and used with requests. This means that if the access key is compromised, it’s scope of impact is even further limited than a refresh token or session key. In a decentralized system, there becomes an immediate problem with this approach: how to know when something is expired.

To know when something is expired, we have a few different approaches

  • Sync time across nodes in the system, using distributed clocks like lamport, vector, or hybrid logical
  • Track time using a blockchain
  • Provide witness proofs for any event from the system that is writing the event, with a timestamp

For some of the problems with sync’ing time, see Secure Logical Time in P2P Network — Part 1 | by Akira | Medium. Tracking time using a blockchain is similar to anchoring, although without the transaction piece. Finally witness proofs assume well behaved nodes.

To accomplish distributed time, we can take a phased approach to verifying events

  1. Accept any event as long as the token is valid, and the expiration time is satisfied by wall clock time.
  2. Add witness proofs to all written events, so we can have an audit trail, which may be used in the future for reputation or staking and slashing.
  3. Implement a verifiable logical clock to include in the witness proof. These will become our verified events, and applications can choose to only listen to verified events.
    1. See https://arxiv.org/pdf/2405.13349 or \sys: A Verifiable Logical Clock for P2P Networks for more information on this

Note: Although 2 implies staking, there could be a hybrid staking approach, where nodes do not have to stake, unless they want to be a verifier. They could submit unverified events to verifiers, who have staked. This also means that the verifiable logical clock doesn’t need to incorporate all nodes on a network, just some portion of staked nodes.

1 Like

Looking at anchoring as a backup validation for events, this is primarily a redundancy in our event validation protocol. If we were able to solve event validation such that it cannot be easily circumvented, we would not need this additional redundancy.

I don’t understand this part. I don’t even understand what you mean by saying that one of the benefits of anchoring is “a backup validation for events”. Can you elaborate on what this means?

To know when something is expired, we have a few different approaches

As for the main part about how to support time-bounded session keys, I’m not sure I totally follow what you’re proposing, but if I’m understanding correctly (based on the inclusion of things like logical clocks) it seems like you’re mostly trying to figure out how to apply causal ordering across streams, which isn’t really the problem we are trying to solve here. Even if you could ensure perfect clock synchronization across all Ceramic nodes in the network, it wouldn’t solve the problem of expiring session keys. That is because you don’t just need to worry about when a brand new event is created and posted to a Ceramic node via the API for the first time (in which case just checking the capability expiration time against the node’s system time would be sufficient). You also need to worry about when a new node comes online and tries to sync historical events. Those events may be months old, and the session key that created them could have expired a long time ago, but the event should still be considered valid as long as they were created during a time when their session key was valid. That’s the part that anchoring solves for us, it lets us prove when an event was created after the fact.

Does your proposal handle that case? If so then I didn’t understand it, can you clarify how it addresses that?

We have the network itself as a record of events, each of which is signed. To verify an event has occurred, we can look at the event on one or more nodes and validate it. Anchoring provides a method where we don’t have to look at the network, but only the witness that is generated by anchoring.

We are not trying to apply causal ordering. We are trying to determine when the network has reached a point where a key is no longer valid. We could also use this system for ordering, but we only want to know that the system is at time t+1 and the key is no longer valid.

We do not need perfect clock synchronization. We just need to know the network as a whole agrees that time has advanced enough the key is expired.

System time can be modified, allowing out of order events to be injected or validated.

This is the point of the witness proof being added. It says that the event was valid according to network time, and the node attests to it. Nodes that attest to invalid events could be excluded from the network or have their stake slashed. Any nodes coming online will receive validated and unvalidated events, and only have to check the witness proof, not the actual event and its related access control specification.

Anchoring provides a method where we don’t have to look at the network, but only the witness that is generated by anchoring.

Still not following, sorry. Are you saying that we could look at the blockchain transaction to see that an event happened? But the blockchain transaction just has the root of a merkle with many events under it. Without the TimeEvent from Ceramic that has the witness path, you can’t tell if any one event is included as part of the anchor transaction on the underlying blockchain. So you still need data from Ceramic to tell that an event happened. Also telling that an event happened is different than validating an event, you always need the actual event data from Ceramic to tell if the event was valid, I don’t think anchoring lets you avoid that.

This is the point of the witness proof being added. It says that the event was valid according to network time, and the node attests to it.

Okay let me see if I understand. I think what you’re saying is that when an event is added to the network, a set of nodes can check that the capability is valid by comparing the expiration time in the capability against their local system clocks. Then they can sign an attestation saying that they observed the event when it entered the system and assert that the capability was valid at that time. Then in the future a node that wants to synchronize and verify an event can check if there are a sufficient number of these witness signatures and if there are enough nodes on the network that attest to having seen the event when it entered the system and that the capability was valid at that time, then you consider the event still valid even if the capability has since expired. Is that right?

Assuming I understood the proposal correctly, that basically just sounds like a blockchain. That system of having multiple nodes sign a message to assert that they observed a thing at a given time is more or less exactly what we are getting with blockchain anchors currently. I guess it’s possible that by only supporting those witness proof signatures and none of the other functionalities that a normal blockchain provides we might be able to optimize some things, but I feel like it ultimately reduces to the same order of magnitude complexity?

For the context of this discussion, an event happening is the same as being valid, since we are not concerned with the content of the event, and validation against a schema, but rather that someone signed the event, and that signature was valid. We can then compare hashes for validity.

No. I’m saying a set of nodes provide indicators for time passing, which can be used by a single node to verify that an event is not expired. This node then signs a witness, including node and time information. Someone at a later point could re-validate using the event and node information. Only one node is involved in verification, unlike a blockchain.

I’m still pretty confused how anchoring helps this at all currently.

No. I’m saying a set of nodes provide indicators for time passing

Okay so then I think this is the part I don’t understand. Can you elaborate on what this actually looks like?

Not to distract from this ^ discussion thread, but I wanted to add some of my thoughts to this topic as well.

I was going to make a long post with multiple points but I’m going to break it down into separate posts.

My first question is logistical. Even if we decide to remove anchoring, it’s looking like we not only need an alternative approach for OCAP revocation, we then need to design it, implement it, test it, and harden it.

Do we have the time to do that before going fully live, or will we have to implement self-anchoring for now anyway and investigate Ceramic-without-Anchoring (CWA?) in parallel, with a later date for shipping it?

If we don’t think we have the time, then that doesn’t affect our plans for shipping self-anchoring, and we can consider this discussion a plan for afterwards.

If we do think have time, then working back from our go-live deadline, how much time do we have to get all the work for CWA done before we start falling behind?

1 Like

May I propose CWOA as the acronym as CWA could also just as easily mean Ceramic With Anchoring :sweat_smile:

1 Like

My second point was going to be about my high-level idea for where I see this idea going.

IMHO, in this context, I see Anchoring in the same light as I see Aggregation:

  • As an optional layer:
    If the Streaming protocol and code can work completely standalone without Anchoring, this opens up a potentially new market for p2p collaboration that we’ve not been able to tap into because of the need for a blockchain.
  • As an optional layer:
    If Anchoring is layered on top of Streaming versus being intertwined with it like it is today, then the protocol becomes much leaner and cleaner. Our latest self-anchoring design already considers Anchoring a separate subsystem, and we can revamp the Ceramic protocol to also separate Anchoring out from Streaming for similar reasons as we’re already discussing for Aggregation.

I don’t think we should force the system to work without Anchoring. I’d like it much better as an optional and cleanly separated out component of the system. That way we can support a wider variety of use cases, e.g. legal ones where finalization time does matter at the same time as p2p collaboration where it doesn’t as much.

Oh lolol, that’s fair :joy:

Thirdly, I feel like we’d be opening up a huge can of worms trying to work around time-bound tokens. Most of Web2 works that way, and we’d be in brand new security territory if we went in a different direction.

It’s not just that it might take a while to figure this out - it’s also breaking from current best practices, and app developers could easily make mistakes that leave glaring security holes for their users. While some of the responsibility would fall to the developers, a lot of it would also fall on us.

Being explicit about whether or not to use Anchoring/finality for a particular use case feels like a lot safer decision than trying to fiddle with token lifetimes and revocations, and accidentally ending up with an implementation that expects finality but doesn’t do it right.

One of my mentors would say, “Start with a design based on the strictest use case, then intentionally relax requirements as you go based on the specific use cases you want to target.” This prevents designing for use cases needing looser guarantees, when in fact your system could be used (and fail) for a use case expecting stricter guarantees.

In my mind, this means that we start with support for legal or academic use cases, which do need to be aware of both the flow of time and authorization for writes, then intentionally decide whether we want to focus on other use cases that can handle looser guarantees.

I agree with this in theory but in this case anchoring adds a lot of complexity and potential points of failure to our protocol. If we could satisfy 90% of use cases without it, then I think it might be worth it to abandon those 10% that do need it for the benefit of the simplicity and ability to deliver faster and higher quality to satisfy the remaining 90%.

This is definitely the piece I am most worried about

1 Like