CIP-137: Ceramic API

reading through previous comments (one from @AaronDGoldman in particular) made me realize that I think this API as designed is only set up for a single client at a time. If we want to support multiple clients using the same Ceramic node to consume different subscription sets, then we probably need a concept of a subscription_id that is returned from ceramic_subscribe and accepted in the poll, recon, reconPoll, and exportRawEvents functions (basically any of the “read” methods)

Both performance and usability improvements, HTTP/2 has support for multiplexing multiple requests over the same tcp connection. This improves performance but is also the feature that allows for streaming RPC methods as these streams can run concurrent with other requests.

Yeah package size is a concern, how do we determine what size is too large? My experience says 300K is on the larger end but not a deal breaker. Do we have a target size?

There’s another set of problems that this CIP doesn’t go into at all currently, which is around event “finalization”. The timestamp at which a Data Event gets anchored via a Time Event can affect the validity of the Data Event (e.g. if the Time Event tells you that the key used to author the Data Event was expired). Thus, some users may wish to only sync Data Events that have been anchored and have a corresponding Time Event, rather than proactively syncing unanchored events only to later learn that they are invalidated and need to be rolled back.

So one simple change we can make (and that I think we should make) is just add an option to the read APIs for whether or not to include unanchored Data Events.

I worry that there will still be tricky usability issues even with that in place though. Let’s imagine there’s a long stream with hundreds of Data Events in between each Time Event. If a client wants to avoid applying Data Events until they know the corresponding Time Event and thus can be sure that the Data Event is valid and won’t need to be rolled back - how do they do so? The obvious naive way would be to queue up Data Events in memory in the client until the corresponding Time Event is received and only apply the Data Events at that point. That could require queuing a lot of Data Events in memory though. I wonder if there’s a way that the node could send the Time Event at the beginning of the batch of Data Events that it anchors rather than the end, so that the client could know the timestamp information up front.

That then begs the question of what is the trust model between the client and the node? Do the clients trust that if the node gives them a Time Event for a range of Data Events that that Time Event does indeed map to that range of Data Events? Or are clients expected to validate the link between Data Events and Time Events themselves?

As a quick note gRPC is supported in browsers via GitHub - grpc/grpc-web: gRPC for Web Clients I don’t have first hand experience with that project but it seems to be well maintained and quite popular.

2 Likes

@0xEE3CA4dd4CeB341691
“Stream Set” and “Interest Range” are very similar concepts and we may be able to drop one if we are clear in how we are using them.

  • Stream Set is defined by a sort-key for example the model stream set would have all events with a model header, all Models and Model Instance Documents. The stream set schema would have all the tile documents.
    • Event Set would be a more precise name as it is all event in the streams not the single tip of the stream from an earlier design.
  • Interest Range is a slice of a Stream set. It has a Start and Stop sort-value. We define the Interest Range as all events from the start-value inclusive to the stop-value exclusive from a stream set.

since we sort the events in a stream set by sort-key then controller then StreamID then event hight by defining sort-value and a range of controllers we can shard the stream set across many nodes.

@paul
ceramic_subscribe is trying to accomplish two things at the same time. Tell a ceramic node that it should sync that interest range and send events to the subscriber. When a node learns about an event from Recon synchronizations it will track the anchoring time and the time that node learned about that event. This allows a node to pull from a given time for some number of events. These time could be anchor time witch means it can use the same times for any node but would be high latency since the time stamps would only appear after anchoring. The local node timestamps are low latency but are different between nodes.

@nathanielc
gRPC has the advantage and disadvantage that the protobuf files pin down the schema. I think that there is advantage to backward compatibility that often leads to only using repeated or optional fields and never required. I do end up with a only optional protobuf is that actually a win over a more self describing format like CBOR?

@spencer
I agree that we need more description of pagination controls. With an temporal pagination I imagine that it will be something like after=timestamp, count=int with the node filtering the events that match. If a client wants all the events independent of when the events happened then I would expect pagination like sort-key=, start-value=, stop-value=, count=int with no after. We should have two time filters. One for local node time and one for anchored time so we could have a filter like anchored=timestamp and not return any event that was not anchored before timestamp.

maybe somthing like:
sort-key string identifies the stream set
start-value string the first event to return
stop-value string the first event not to return
after timestamp only return events the node learned about after this time
anchored timestamp only return events anchored after this time
count int return a max of this number of events.

  1. Yes I think dynamic subscriptions will be important for DX

  2. No, the sort-key is only the first part of the sort order, see the EventId spec

  1. The main spec is a bit out of date, for Nathaniel and I’s latest thinking (besides the OpenAPI vs gRPC debate) see this PR: https://github.com/ceramicnetwork/CIPs/pull/141

  2. Also see the PR above, added a description there.

  3. Pagination is handled using a byte offset (see above PR), this is similar to how subscriptions are handled in kafka

  1. See the recon spec?

  2. Just added to the PR referenced above

  3. If the user wants to get the total order of events in the network they need to use recon (or do post processing on the poll/subscribe api)

  1. Maybe, but how do you specify a range? As a recon range? What happens if new events come in during your request? Anyway, if we end up using OpenAPI, requests that use the CAR file media type will automatically return the raw events so this method is not really needed anymore

  2. Maybe we could have both? In the OpenAPI model using CAR files this might actually be quite simple to do as a single API? I thought adding a single event would be much more common. What use cases do you imagine that need to add a large number of events all at once?

  3. Good idea!

  4. There is some similarities, especially as we move towards using a byte offset approach. @nathanielc can say more here.

The intention with using the offset is that the node doesn’t need to keep any state about clients at all.

How much would we actually get from multiplexing though? If we end up with clients that have a lot of subscriptions that makes a lot of sense, but many clients might actually end up subscribing to a low amount of models?

Can you say a bit more about what it means to be “streaming RPC methods”?

300K in total is ok, but not ideal. However, this would be for only one of our deps.

Nice, this one is quite small idd.

This could definitely be a challenge, and might make the byte offset approach intractible (cc @nathanielc).

Added this as a discussion topic on the agenda for the next core-devs call.

Sorry, I still don’t follow. The EventId includes the controller. Is the issue just that that last 8 bytes of the controller aren’t sufficient to establish the range we need so we need the full controller? If that’s the case, then wouldn’t we want the same for the StreamID and the “sort value” (the model streamid)?

Can we break that PR up to separate the discussion about what transport layer to use from discussions about the core semantics of the API? Makes it pretty hard to follow the important changes to the CIP when unrelated PRs are sneaking in big changes that are unrelated to the PRs title and description.

I think we should strongly consider a stateful cursor-like interface, otherwise the server has to do a lot more work to seek to the proper place in the output stream for each new batch.

I don’t follow this, can you explain more?

yeah, that seems reasonable. I presume the API would only guarantee to return events that are present at the beginning of the request. The high-level semantics are the same either way, even if the user is asking for a list of event ids then following-up with requests for the data of those event ids, they still have to worry about the result set changing during.

Why have both? If you have an API to put multiple events at once you can always use it with just a single event. It’s strictly a superset of functionality.

Between discussions in this thread and PRs changes/comments, I find it difficult to have a clear idea of the high-level goals we have with these APIs and how they fit in our larger stack, so here are the main questions and comments I have:

Designed for ComposeDB or generic?

Between the different PRs and threads it seems to me part of the design is “ComposeDB-specific” while other parts are meant to be generic so it gets tricky to understand the actual scope of these APIs and target usage?

If ComposeDB-specific, it doesn’t seem to fully address the needs, in particular:

  • Ability to load a stream by stream ID (I think Spencer mentioned this)?
  • Subscriptions management: does the client need to handle this? If so, it puts lots of the burden to any downstream implementation to support similar logic.
  • Remote-only APIs: from previous discussions I thought we wanted to avoid having both a Ceramic server and a ComposeDB server, but it seems we’re back to needing both?

If meant to be generic, it seems at least some of the design decision are purely driven by ComposeDB, notably:

  • No ability to filter by stream type (if I understood Spencer’s comments well)
  • model/controller/streamid hierarchy of events, that matches ComposeDB’s current needs, but before with IDX we were controller-centric, are we confident this is the right constraint for years to come and for other systems than ComposeDB?

Target usage

Beyond the ComposeDB vs generic design, it’s not clear to me what the clients are supposed to be, notably among these 3 categories:

Large apps backend

This seems to be the main target with a dedicated Ceramic server or cluster running with a Docker/K8s infra?

Web browsers

Browser support seems to be the topic of some discussions, are we expecting Web apps to handle complex aggregation logic without an intermediary server?

Low-end devices

Do we expect Ceramic APIs to be used for these? If so, I see 2 opposite needs here:

  • Mobile apps will need aggregation servers such as ComposeDB to handle most of the logic, considering the network bandwidth and reliability that would be needed, and possibly the CPU and memory usage they probably won’t be able to interact with Ceramic APIs directly
  • IoT devices should be able to interact with Ceramic with low CPU/memory/storage capacity, for example we can imagine sensors that only act as producers of events to the network, or single-stream consumers to flip switches or display some data

Adoption drivers

From previous discussions it seems we’ll have a 12 to 18 months strategy to drive adoption to our stack, notably in terms of transactions assuming it’s still the main metric we want to use?

New Ceramic stack

I’m wondering how we see delivering this “new Ceramic” as part of this strategy? In particular, the suggested APIs seem very low-level, which means they will need a lot of tooling built on top to address product needs. For example, if we use Kafka as a reference, the aggregations APIs seem like a big part of the value proposition.

Between the time to reach a first proof of concept, testing, documentation, clients and tools development, and examples, I expect it’s going to be at least 6 months before the ecosystem can start really considering using these new Ceramic APIs in projects, and from there likely another 3-6 months for them to evaluate, build and deploy products on top of it, so that’s an optimistic 9 months before driving new adoption to the network IMO, which seems like a huge bet if we expect this new stack to drive our metrics?

Existing stacks

Assuming we don’t go “all-in” on this new Ceramic as our only focus and drop everything else, we have 2 stacks currently driving our metrics:

TileDocuments + IDX

Early ecosystem projects like Orbis are still using TileDocuments (I don’t think they use IDX as they have their own indexing system), for which we have pretty much stopped development for the past year when focusing on ComposeDB.

One of expectations with the new Ceramic APIs was that we could completely drop support for the CAIP-10 link and TileDocument stream types, as well as the IDX protocol, and the ecosystem could re-build these implementations on top of the new Ceramic APIs if they wanted. This now seems highly unlikely to me, at least in the short term, considering how low-level the Ceramic APIs are and the amount of work involved.

If I were in this situation with having a deprecated system matching my needs, and a new system that requires a lot of development and maintenance effort to adopt just to match the behavior of the deprecated system, I’d keep using the deprecated system for a while and seriously investigate alternative options rather than immediately invest in the new system.

ComposeDB

We’ve invested most of our efforts over the past year in ComposeDB as a replacement to the TileDocuments + IDX stack, but we still don’t have a “stable” version, good DX and significant data ecosystem/network effects yet.

We could drive things forwards to support our 12-18 months strategy and gain adoption with ComposeDB, but that would mean continuing to put a significant effort into it, which seems unlikely if we need to shift our focus on improving new Ceramic stack? Furthermore, integrating the new Ceramic APIs into the ComposeDB stack probably won’t be as easy as just replacing our IPFS + libp2p dependencies by Ceramic ones, but more significant architectural changes that will make it hard to focus on other ComposeDB improvements.

2 Likes

The Ceramic API can possibly bring new functionality to the IDX ecosystem. It will be possible to subscribe to all streams where family = "Orbis Protocol" or family = IDX-model.

2 Likes

That’s assuming the current Ceramic nodes move to this new implementation, no?
Is it expected the current stream types would start adopting this new system? If so, how would the transition from/compatibility with pubsub messages currently used look like?

1 Like

Yes, data would need to be migrated somehow. Imo, the transition strategy is out of scope from the Ceramic API design however.

Without digressing too much, is migrating self-certifying data even possible? or we are just referring to some kinda of shim or facade to load old and new data?

In this case the data (event streams) remain exactly the same. It’s just that they need to be migrated from js-ceramic (and whatever storage used) to into go-/rust-ceramic.

2 Likes

Even in the context of Recon and Ceramic API, where events are elevated to first class citizen instead of being a lower level concept (commits), is it still true that there is only one writer/controller per stream?

Yes, this is still the assumption. I think having multiple writers in a stream would require support for concurrent writes in streams, e.g. logs that diverge and converge. Currently every stream only has one canonical history.

1 Like

How would CAS be affected in the shift to Ceramic API?
I would assume that in the recon update, CAS would still only happen in js-ceramic in the near term. But in a world where Ceramic API is the primary protocol while js-ceramic become a layer for aggregators like ComposeDBs, it seems that CAS should be part of Ceramic API rather than js-ceramic.

1 Like

Good question!
What you are saying is basically how I’ve been thinking about it as well. For the time being the request anchor functionality would remain in js-ceramic. In the future this functionalitty could move to rust-ceramic. However, it might also be a good idea to let CAS functionality stay in js-ceramic and instead implement a decentralized version of CAS built directly into rust-ceramic.

Could you elaborate a bit on this? Is this more of an transition stage where CAS in js-ceramic ultimately be deprecated or referring to something else?

Right now 3Box Labs operates CAS as a substidized service for the community. This is ok for now, but ultimately it might be desirable to decentralize this functionality of Ceramic to something that is part of the core protocol.

What this looks like is something we need to decide as a community!