Protocol minimization

We need to do something to protect node operators from spam. To be clear my suggestion wasn’t just for programmatic validation of event schemas. I actually think the early versions of this might include very manual spam detection processes, that then could evolve into more automated and programmable systems over time. But even if devs and node operators are curating out spam manually, it’s still helpful to have APIs that can feed that information back down into the p2p layer so that if a peer is identified as consistently providing spam then the node can just disconnect from that peer and stop receiving new data from them.

Yes, some kind of access control will be needed. If that’s defined per node or protocol wide through some on-chain logic I think is something we still need to do more research on.

This is not exactly the same. People who operate nodes can decide when and how to upgrade their Ceramic nodes. If there is validation logic built into streams that is upgradable, it would likely need to be upgradable without nodes having to upgrade, e.g. over the network. So node operators would have less agency in this case.

Even though I’ve been a proponent of rewriting all of js-ceramic for a long time, I like this ^ approach. It forces us to be explicit and intentional about what we bring over from the previous version of the protocol and JS implementation to the new version - now, later, or ever.

This lets us simplify the protocol based on feedback we’ve received from the community over the years (and more recently with Ceramic API and Recon), and generally lets the community be part of evolving the protocol from this foundation.

@spencer, w.r.t. Node.js performance, what if we worked on scaling deployments with sharding? We need a sharding story regardless of whether aggregation is in Rust or Node.js. That way, solving for performance becomes less urgent while we focus on decentralization and incentivization. Both of these are existential and will need many months of work to be production-ready, not to mention that we’re still working on making Recon production-ready.

@dbcfd, w.r.t. aggregation, trying to provide a Rust implementation feels like making the same mistake PL did with IPFS. They not only became a bottleneck for IPFS implementations, none of the implementations are suitable for production (IMO). They tried to make IPFS be everything for everyone and ended up with it being not good enough for anyone. Their latest approach of specifying what IPFS is and offering existing implementations as reference implementations feels a lot more scalable.

We could specify what aggregation needs to be and enumerate a few different forms it can take, without being prescriptive and inflexible, or providing actual implementations, and offer js-ceramic as a reference implementation. Maybe we do provide implementations in the future but that (a) doesn’t slow us down on other fronts now, (b) doesn’t let aggregation go off the rails with lots of (potentially incompatible) implementations, and (c) engages the community more. We’re still in the process of learning what the community even needs re: aggregation, and it feels premature to come up with implementations at this point.

2 Likes

I think this falls into a future enhancement, possibly not needed. I view this as a future enchancement because it requires

  • “hot” reload of validation - Nodes get upgraded/restarted as is, validation could be upgraded at the same time, with install of something like ceramic-validators package
  • Upgrades to validation logic - Since we’re talking about data exchange formats, these will likely not change need upgrades often. For example a MID document is valid if it is json according to some schema. For most of our users, this might be the only validator needed, and will update very infrequently.

I don’t think we need to support aggregation in rust-ceramic, just validation. Validation is a much narrower scope that is more easily defined, and if in place, makes aggregation easier. It also acknowledges that not all applications need aggregation.

Most importantly, it’s something that could be added and expanded on fairly easily.

  • M0 - Ability to add a validator that isn’t part of the binary, e.g. a shared module. This first validator should accept all events
  • M1 - Ability to specify a validator that works on specific events
  • M2 - Add validator that can validate events that conform to a json schema that are model events
  • M3 - Ability to write validators in some language other than rust

We could go the reputation route, but pluggable validators seems much more doable.

Just so we’re on the same page, aggregation = validation + conflict resolution + resolving stream state?

To be sure, validation in some form will be necessary but including it in the protocol ties the ability to synchronize data with the nature of the data itself. They feel like they belong in separate layers of abstraction, e.g. (loosely) TCP vs HTTP. We wouldn’t want the lower layer protocol to know or care about the higher layer’s syntax or semantics. In fact, even if we did implement validation in Rust, it still shouldn’t be part of Ceramic Recon (IMO), which would primarily cover the synchronization, anchoring, and incentivization pieces.

We’ll need some sort of reputation anyway because it’s not just syntactic correctness that guarantees correctness of data. It feels incomplete validating the structure of the data only to accept well-structured spam.

Also, something like JSON schema validation (for example) feels straightforward enough that application developers are unlikely to spend a lot of time implementing it themselves, or get it wrong / end up with incompatible variants? I’m not very familiar with all our current validation rules so I might be wrong. Regardless, what would implementing it ourselves buy us that just specifying it wouldn’t?

Since streams are immutable, we’ll have to include the validator version in the stream somehow so that a new node looking at an old stream can validate it the way it was when the events were first created. If a validator version does need to be bumped for any reason, how will that be reflected in the stream state so that old nodes and new nodes all resolve to the same state? Will various versions of validator modules be served over the network?

Implementing all of this ourselves, we’ll also have to deal with unknown unknowns, edge cases, community feedback, bug fixes, DX improvements, security hardening, etc. Are we going to have a catalog of validator modules? How’s that going to be maintained? Is there going to be a reputation system for 3rd party contributors, or are we going to vet each community contribution? Are we going to implement all the modules ourselves?

Even if we start small, it feels like we’ll still have to discuss and design for many of these considerations upfront. Only at >= M2 do we reach parity with existing js-ceramic validation, which means we’ll have to plan at least that far ahead.

None of this is to say that this isn’t doable or desirable but instead that while the initial implementation might be straightforward, the production-ready version will take many months of work, taking away time from other, more existential problems like decentralization and incentivization.

Maybe I’m overestimating the work? In my experience we’ve often erred on the side of underestimating how much time such feature development takes but I’m also not the best person to estimate the work in this case. Just trying to be cautious :slight_smile:

This doesn’t really make sense to me. For example, with MIDs updates will be encoded as json-patches. I don’t think there’s an easy way to validate that “If I apply this json-patch to some unknown previous state, it will be valid according to this json-schema”.

I actually thought about these lines as well. I don’t think we’ve really defined what layer recon falls at. If it’s L3, like IPv4, that has an idea of what the next layer is, whereas TCP just has data. Also TCP as a server by itself is usually not useful, and is combined with another layer like HTTP. For Recon, we also might have to look at UDP, since TCP does have aggregation (streams) built into the protocol. Both TCP and UDP do have checksum, which is validation, in terms of the data not being corrupted.

This is actually a very good reason for talking about validation now. Even if we’re not doing validation right now, how would we represent that this is a valid event in recon, and at it was valid at x time and not before? Without that information, and with the reputation route, does that event then become invalid and impact node/producer reputation?

Given that we’ve already been doing this ourselves, I think the question above reinforces that for the time being, this might be something we want to do, since we’re the most familiar with all of these issues. How do we move validators forward without losing data or impacting the network is something that we will end up being responsible for even if we’re not writing the validators.

I feel that being able to determine validity of events on the network is more existential to the network than decentralization. The data on the network has to be usable, otherwise there’s no point in it being decentralized. And incentivization can be done at a lot of levels, with incentivization of the whole ceramic network only being existential to us.

Valid. Having personally done this, I don’t find it to be a lot of work, but I also may not be the one doing the work.

2 Likes

That’s an example where you need to know the previous state. There’s also where you don’t need to know the state, such as every event is its own document, and is in the same space for namespacing purposes, e.g. sensor1stream.

For validators that need to know the previous state, this is where it feels wrong to just punt to a higher level aggregator. Using Mohsin’s TCP example, I know I’m at the latest message due to following the sequence numbers. With Recon, how do I tell in which order the events are applied to make the expected state?

Punting things like this may save us work, but makes the network less usable, since I now need to know how an application is communicating across its aggregating/indexing nodes, and be able to replicate that so that I can use its data.

I don’t want to stretch the analogy too far :slight_smile: It was mainly an argument for a separation of concerns between synchronizing data bytes and validating data structure. FWIW, I think the TCP/UDP checksum is closer to signature validation than schema validation. libp2p works similarly - it validates the signature of p2p messages but doesn’t care about the payload itself, which it leaves to higher protocols.

That a TCP server isn’t particularly useful by itself is true but also not what the current proposal is recommending for Ceramic/Recon. Just like TCP (or libp2p, or Kafka) is useful in the context of higher layer protocols, so will Ceramic/Recon be in the context of some stream aggregation, indexing, etc. implementation.

That’s a bit of a false dichotomy. An argument for a Ceramic layer that only includes signature validation isn’t an argument for skipping schema validation entirely but for it to instead be part of a higher layer that the current spec acknowledges but remains silent on. It’s an argument for a stricter separation of layers, not for a Ceramic network that only does stream synchronization.

ComposeDB is already an example of this. If we stopped at adding signature validation to rust-ceramic, a network of ComposeDB nodes will continue to provide schema validation and aggregation. All applications relying on ComposeDB will only see validated stream updates. Should a partner choose to implement their own validation, we should instruct them (via spec) about what problems they need to address in their implementation based our own experience implementing aggregation. Or they could keep using the default implementation via js-ceramic.

Validating Data Events

  • The structure is valid
  • The signature is valid
  • The signature chains up to the controller’s DID

The notable exception here compared to the OP for considering the event valid is that the OCAP may be expired. This is because figuring out the times of events is up to the aggregation layer. Moreover, the aggregation layer may use events signed by expired OCAPs as part of repairing a stream. This means that it needs to be made aware of such events.

The blocks in the event CAR file should include:

  • The signed envelope
  • The data
  • The OCAP

Validating Time Events

  • The structure is valid
  • TimeEvent/prev == TimeEvent/proof/root/${TimeEvent/path}
  • TimeEvent/proof/root is in the root store (i.e. the blockchain / local cache).

The blocks in the event CAR file should include:

  • The Time Event
  • The proof
  • The root
  • The blocks along the path

Time Events are validated because there exist Anchor Dag Cbor (0x97ad09eb) transactions on the blockchain. The blockchain stores them and Ceramic nodes can cache them. Once a root is validated, they never need to call Ethereum again for that root.

Anchor Dag Cbor (0x97ad09eb) Transactions

https://etherscan.io/advanced-filter?mtd=0x97ad09eb~Anchor+Dag+Cbor

For existing anchors:
9,693 * (32 byte anchor root, 4 byte seconds since 1970) = 387.72 kilobytes
(Timestamps will require 4 bytes until 2106-02-06T22:28:15)

For future anchors (targeting a rate of 1 anchor per hour):
365 * 24 = 8760
8760 * 40 bytes = 350.4 kilobytes
This comes out to ~1.05 MB for 3 years of hourly anchoring, ~35.04 MB for 100 years of hourly anchoring, and so on, which is a minuscule amount of data.

Validating Init Events

We don’t have to treat signed Init Events any differently than Data Events.

This is actually an interesting question. I would actually argue for the rust-ceramic layer to take care of validating that expired OCAPs are timestamped before they were expired. Otherwise I think this could open up a DoS vector where stolen session keys could be used to spam nodes.

4 Likes

We could wait for a Data Event to be anchored before propagating it around but that would introduce major latency for discovering new Data Events. We could check signatures but not expiration time but that is your DoS with an expired CACAO.
Perhaps the best middle ground it to propagate if ether anchored or the CACAO is valid Now(). With nodes pulling out any event that failed to propagate a covering Time Event by the time the CACAO Expired. This may be a little annoying is the ceramic nodes conducting a recon sync have very different opinion’s as to what Now() is. We may need to send a timestamp with the first message in a sync to pick an as_of timestamp to be the now for the sync so both sides agree on what is expired without being covered as of the as_of time.

3 Likes

given the 24 hour grace period for CACAO expirations, we only need to ensure the nodes have clocks synchronized within 24 hours of each other, which seems like a very reasonable requirement.

1 Like