Protocol minimization

Thanks to @mohsin, @cole, and @nathanielc for discussion and insight around this topic.

This post describes an approach for protocol minimization that enables engineering efforts to move faster on the things that really matter (i.e. decentralization of CAS and node incentivization). Currently the Ceramic protocol is tightly coupled with ComposeDB inside of the js-ceramic code base. There’s an ongoing effort to migrate parts of the js-ceramic codebase into rust-ceramic. The primary driver of this to enable a purpose built data synchronization protocol Recon. Beyond that there’s an open question of what parts of the protocol belongs in rust and what in the js codebase. Really this is a question about the protocol boundary between Ceramic and ComposeDB, since we want the Ceramic protocol to be fully implemented in Rust.

Move fast on what matters

Beyond Recon there are two main things that are urgently needed by the protocol. Providing a decentralized alternative to CAS and an incentive mechanism for running nodes. Both of these are big problems that we need to invest a considerable amount of engineering effort into. Considering this it seems like it would be prudent be extremely conservative with what protocol features we port over from js-ceramic to rust-ceramic, and consider rewrites only of logic that is strictly necessary. Since the more time we spend on rewriting, the less time we have to build what actually matters.

What can rust-ceramic not go without?

Which parts of js-ceramic is actually critical to rewrite in our rust code base then? Well in it’s current iteration Recon blindly accepts events from other nodes. This opens up an attack vector where malicious peers can DoS the network by spamming fake events. The rust code base thus needs to be able to validate events before propagating them further in the network. Besides this, not much else is required.
None of these features takes us closer to CAS decentralization or node incentivization:

  • “Firehose” API
  • Aggregation of events into JSON documents
  • Conflict resolution
  • Requesting anchors from CAS
    These features can happily live on inside of the javascript code base.

Minimal event validation

So if Recon doesn’t require full stream conflict resolution, how would the rust code base actually validate events?

  • InitEvent - validate signature against controller
  • DataEvent - get InitEvent from id field, validate signature against controller
  • TimeEvents - validate time information
    Ideally Events signed using an OCAP with an expiry date would only be considered valid if a TimeEvent exists that confirms the signature was produced within the validity window.

Note that the above description doesn’t actually require validating any prev field. This means that for a consumer of an the event API exposed by rust-ceramic, multiple branches of stream history would be visible. This is actually a good thing since it enables late publishing traceability. Note also that this approach does not require cip145 to be implemented.

A first step towards decentralizing anchoring

One first step that can be taken towards decentralized anchoring is to enable rust-ceramic nodes to self anchor. This would enable large node operators to remove the reliance on a trusted service in CAS. It could also potentially make anchoring more performant and robust because it removes a lot of redundant work being done today (e.g. network traffic, event validation).
Key to note here is that now the prev field becomes relevant in the rust code base. The node could continually be building a tree as new events come in. If a new event points to an event already in the tree using the prev pointer, the past event can be replaced. Thus reducing the number of events anchored in the batch (since the events from prev are implicitly anchored).

Javascript cleanup

Now the above changes leaves us with much of the js-ceramic code base intact. The main optimization we could easily implement would be to remove signature validation, while keeping the conflict resolution logic as is. Here’s a few other things to consider:

  • Merge our main two javascript code bases js-ceramic and js-composedb into a monorepo, or make js-ceramic into more of a middle ware that js-composedb imports
  • Requesting anchors from CAS remains in the javascript code base until fully replaced by decentralized alternative(s) in Rust
  • Consider if the “Firehose” API should be considered a part of ComposeDB
4 Likes

The big thing advantage in my mind of pushing more from js-ceramic down into Rust is the potential performance, stability, and observability gains (plus the potential to simplify the deployment architecture).

I’m pretty worried about our ability to offer production-grade data infrastructure at scale with Node.js in the long term.

2 Likes

I think long term we can be open to pushing more things down to Rust as it makes sense. What I’m trying to convey here is that there will be more urgent needs on the protocol layer, and while aggregation and other feature do provide value, we don’t necessarily need to consider them part of the core protocol.

1 Like

I struggle with this because until now it was conceived that the “Firehose API” (aka Ceramic Data Feed API) would be the interface that would allow developers to build their own indexes/dbs. If we consider it part of ComposeDB, then that breaks.

For my info, when you say Firehose API here, are you referring to the document-specific change feed, or a more general event-based change feed? I’d think of events at the Ceramic layer, documents as middleware, and indexes/dbs as products.

My primary concern is that considering aggregation to not be part of the protocol may allow us to end up in a place where aggregation cannot be implemented properly. Without aggregation, I find the protocol to have limited value, acting like a kafka topic that anything can be written to.

As a concrete example, assume we are attempting to write Verifiable Credentials to Ceramic. For someone that wants to know when new VC’s are written to Ceramic, we have two approaches: Ceramic provides a way to “find” all the VC’s, possibly by a parent (model) stream id; Ceramic emits a stream of events like new streams being created, which will bring up my second concern later.

In the first approach, there’s nothing to guarantee that someone isn’t writing non VC data against this parent stream id, which then puts the onus on all consumers to try to verify the data that was written. This opens up a vector for DoS on all consumers of that parent stream id, which would then require all streams to come with a reputation score so they can be excluded to prevent DoS.

My secondary concern, as mentioned before, is the inability to be notified about new events on your Ceramic node. This very quickly reduces architectural options for users of Ceramic. For consumers implementing aggregation, a typical architectural might look like:

An architecture like this can support a streaming or polling based approach, however the polling based approach introduces question of how often or how large of a batch, which immediately adds complexity. An event driven architecture might look like

We very quickly see that not having an event feed starts introducing complexity to each consumer of a set of events, assuming we want to use ceramic to record intermediary states.

Based on this, I feel that simplest valuable implementation of rust-ceramic is one that includes at least

  • Ability to write data to node
  • Data syncing between nodes
  • Verification and validation of written events
  • Feed of events that occur on the node - init, data, time

Without those features, we open up ourselves to bad event consumers (requesting too much data too frequently) or bad event producers (too much invalid data on network). The work to address those is much more involved, and produces something less valuable.

3 Likes

I think we are aligned on this. Right now Firehose API is focument specific feeds + more. What I’m suggesting is that anything that is not just a feed of signed events, e.g. computation over events, belongs in a layer above, e.g. middleware and/or indexes.

I don’t really see how doiong schema validation for a particular type of VC prevents DoS? An attacker could simply generate a lot of fake VCs that conform to the schema.

I think there is another valid area of research around suppring VCs as a type of event on Ceramic (see Simplify event creation)

I wonder if there’s a misunderstanding of what I suggested in the OP because I definitely agree with (1), (2), and (4). There should definitely be a way to get a feed of events from the Rust node! My main point is that:

a. json-schema validation & json-patch processing are application specific
b. conflict resolution is also application specific (we are not able to build CRDTs based on the current model)

By moving (a) and (b) to the application layer we can allow the protocol to move faster on delivering the core values of Ceramic, e.g.:

  • Scalable p2p event replication
  • Blockchain timestamping
  • (eventually) on-chain access control
  • (eventually) incentivized event availability
1 Like

At that point, the server itself can take action against the publisher of the invalid messages to exclude them. We don’t need to flood the network with invalid data, then have producer’s reputation reduced, then have some other step taken at network level or ceramic level. It also handles the case of “mistakenly” publishing bad data.

That’s fine. It doesn’t mean the protocol needs to support json-schema validation and json-patch processing. But the server and protocol should support a way to validate init and data events for a specific separator. This allows us initial DoS protection from bad producers against nodes, and eventually we could move into reputation for ceramic nodes to exclude them from network.

Tbh, I don’t see how these two are different. You could “mistakenly” publish data that doesn’t conform to the schema.

What validation do you mean a node could do besides validating signatures of Init- and DataEvents?

@dbcfd would your concern be addressed if rust-ceramic exposed an API for higher-level layers to send feedback back down to rust-ceramic about which events were “bad” according to some application-specific logic around what counts as “bad”? “bad” events might mean events that failed jsonSchema validation, or that were flagged as spam by some automated or manual spam detection system. Then rust-ceramic could take that input from the higher layer and incorporate into its peer reputation system, and potentially disconnect from peers that are shown to consistently publish “bad” events and stop re-sharing those events around the network.

1 Like

That would be one possibility.

There’s two scenarios we want to protect against, regardless of the maliciousness of the action

  • A producer writing data against the Ceramic node that is malformed for its intended audience, e.g. writing a AI prompt set to a stream that is supposed to be a VC, or spamming repeated data.
  • A node sending data to the network that is malformed for its intended audience, e.g. corrupted from original data for intended purpose or duplicated

We can then look to address this with local or network approaches. For local, we could have

  • DID / IP / PeerId allow or deny lists
  • Pluggable Validation - runtime loaded module with functionality that is invoked, likely by separator
  • Scriptable Validation - script that is loaded at some point and invoked, likely by separator, our use of json schema is a variation of this for models and MIDs

For network, we could have

  • Reputation for Event / DID / Ip / PeerId - other nodes publish to a stream or hit an endpoint on a peer to provide feedback on a set of data, which allows us to determine which data to accept going forward
  • Remote Validation - We publish or request from a network of validators, likely by separator, that validate the data, allowing us to then publish to the data network only validated data

There’s probably some other approaches I’m not thinking about, and these approaches aren’t mutually exclusive. We should at least have some level of protection in rust-ceramic and the protocol to begin with, and can continue to harden that. Main reason I bring up the approaches is that some will likely need to be incorporated into the protocol, and influence the capabilities of rust-ceramic. Aggregation has brought us a base level of security since we can discard things that don’t conform to aggregation, and without aggregation, we then need a different approach to securing the network.

We should be able to answer the question “is this data valid for this stream, or is it mistaken/malicious”.

This is super great discussion!
In addtiion to the rust/js difference, I think it would also help to set a clearer definition of protocol vs apps/indexers for the ecosystem. ComposeDB seems to be much more of an app/indexer, while CAS and recon node are part of the core protocols. This is not apparent for new devs.

We should be able to answer the question “is this data valid for this stream, or is it mistaken/malicious”.

I still don’t see how validation helps us answer this question much.

It seems like there are three options:

  1. Supports one, or a few ways to validate data
  2. Support arbitrary ways to validate data through some programmable logic
  3. Don’t validate data

(1) greatly limits what can be built on the protocol, (2) would be great, but hard to get right. So to me (3) seems like the best option.

There are a few use cases, which we’ve heard desire for from users, that makes me convinced that (3) is the best option. The ability to point to files and folders (e.g. various ipld DAG structures), possibility to deleting data, and storing encrypted data. These features all requires a “detached payload”, e.g. the event only contains the CID of the payload.

Yes, this is definitely needed. I think that drawing a clear boundary of what the rust-ceramic code base does is the first step to help clarify some of this!

I really don’t see how additional validation over data helps here. An attacker can easily create a lot of “bad” events that conform to the rules of the validation logic and have the same effect on the network overall.

It really comes down to how fat the protocol should be. Is Ceramic more like Arweave or Ethereum?

Naively, I do hope it would be possible and “relatively easy” to be able define and deploy custom stream and validation logic. I have always thought of custom stream type as Ceramic Protocol’s version of “Smart Contract”. Maybe this could be a good research topics.

This also comes back to the same pain points @msena brought up in the last core dev call. Say we have a points system that transform user actions → points, a custom stream type where de-dup logic is defined by the third party developer seems to be a natural fit.

1 Like

You’ve dismissed this option as “hard to get right”, but haven’t elaborated why. Some of the options listed above address this, and most are fairly easy to accomplish and get right.

Framing it another way, what prevents us supporting arbitrary ways to validate data through some programmable logic. I think the answers to this have impacts on network behavior and incentives.

Fair, I think that for custom logic for validation to work well on the protocol level we need to ship code around for that validation logic (e.g. with WASM). We need to think about how to prevent DoS through loops in the code, and we need to think about if it’s possible to upgrade the code and if so how and implications.
If instead it would be something that is configured per node, we could maybe get around these problems, but it wouldn’t really be part of the core protocol at that point.

To be clear. I don’t think custom validation logic is a bad idea in general. I just think that in terms of priority it comes later than decentralized anchoring and incentivized event availability.

Interestingly, those are actually solved problems with validation logic in WASM. The code is upgradeable if it conforms to an interface, and could be limited by cpu, memory, and/or execution time. If it exceeds those limits, the event is also rejected.

This is what I alluded to above. Should every node on the network care about every event, or just those that it has validation code for? I think this is a lot of network overhead for something that is better solved with incentivization. We don’t have to ship code around to every node, we only have to ship or make available code to nodes that are incentivized to run it and will receive events related to that validation.

1 Like

Limiting resources sound great. I’m not sure an interface solves upgradability though. The main question here is more about who is allowed to publish new code, i.e. the trust model. Afaik there’s no easy solution to this problem. If it’s the initial author that is allowed to upgrade, everyone using that particular type of data now has to trust that developer.

We have the same issue with ceramic servers. Right now we expect developers to trust us, and the source they run, and eventually if we distribute binaries, our gpg key. For now, we could do something very similar to the stream types, in that these plugins come from our network. Eventually the protocol will need to incentivize trusting node runners and validation plugins.