CIP-137: Ceramic API

Discussion for CIP-137 Ceramic API.

Hmm, getting a 404 there @jthor

Link is not live yet, PR is here:

1 Like

ceramic://*?<sort-key>=<sort-value>

It is not clear to me how indefinite and ephemeral subscriptions should interact or whether every subscription should be required to have a TTL less than some max subscription length.

The ceramic nodeā€™s interests list is a union of all the subscriptions. I expect some of the subscriptions to be indefinite and the node should be ready to answer questions about those models at any time. Others may be ephemeral where the node should only sync the events when a subscriber is connected and stop syncing that data when the connection ends. ephemeral is likely to be useful for development and seldom used in a production application.

listSubscriptions in this form does not have a way to tell the difference between having a subscription from one or more than one subscriber.

Also, I think it will be important to be clear about the different kinds of times for events. Thier are two universal timestamps for an event. The event is after the timestamp of the last time event it depends on and before the timestamp of the first time event that depends on it. The local timestamp is the time that a node learned about an event. This is not universal as difrent nodes will learn about events at difrent times. This local timestamp will be after the after timestamp but may be earlier or later than the before timestamp.
Polling probably needs to be done on node learned event time rather than after or before time for the event. As a node could learn about new events at any time.

1 Like
  1. Do we need to include a reference to Event Log - Ceramic Developers as events must be dag-jose encoded or has that requirements been loosened: event - a base64 encoded CAR file containing the event? Might not be clear for someone who is not already familiar with Ceramic and all prior CIPs

  2. Agree with @AaronDGoldman here. Are ā€œStream Setā€ and ā€œInterest Rangeā€ the same concept or different as 124 mainly talks about Stream Set while 137 use Interest Range? Unless they are describing different layer of abstraction, maybe we could use a more consistent terms.

  3. Following #2, there seems to be some ambiguity regarding what is a ā€œsetā€ as ceramic://*?<sort-key>=<sort-value> without specifying range gives the impression of ā€œinterested topicā€ while only by including the specific range [ ctrlRangeStart, ctrlRangeEnd], could we have a more deterministic view of how data is included.
    Do we need to define them more explicitly as:

  • Event Topic: ceramic://*?<sort-key>=<sort-value>
  • Event Range: [ ctrlRangeStart, ctrlRangeEnd]
  • Set = Topic + Interest Range
  1. Following #3, I am wondering how we could handle the mapping between Topics/Sets and Subscription so that each set or subset has a better chance of finding peers with interested range. eg.
Node 1
SubscriptionA: [ModelA, ModelB]
SubscriptionA: [ModelB, ModelC] 

Node 2, Publishing [Model B]
Node 3, Publishing [Model A, C]

In this case, Node 1 should be able to discover Model A, B, C and sync from Node 2 and Node 3 or does it have to look for exact match?

  1. In 124, why is it called ā€œStream Setā€ instead of ā€œEvent Setā€? @AaronDGoldman @jthor
  1. yes I think thatā€™s a good idea when we reference returning of a CAR file.

  2. CIP-124 also talks about interest ranges, but Iā€™m sure we could do some more language normalization. Recon | Ceramic Improvement Proposals

  3. A set defined as ceramic://*?<sort-key>=<sort-value> is technically a range as well. From the first event to the last possible event for the given sort value.

  4. Itā€™s possible for a node to tell peers that itā€™s interested in multiple different ranges

  5. I thought about this as well. Iā€™ll defer to @AaronDGoldman

1 Like

Wrote up a proposal to potentially use a REST api instead:

From a quick look, Iā€™m wondering about the following:

  1. The return type of the ceramic_subscribe method is described as ā€œa Recon messageā€, what does that mean? What is the consumer supposed to do with this message? When subscribing to a resource, Iā€™d expect to receive events over time, not a single response when setting up the subscription.
  2. As already mentioned in previous messages above, when looking at the ceramic_unsubscribe and ceramic_listSubscriptions APIs it seems thereā€™s only one consumer for these APIs?
  3. ceramic_putRawEvent and ceramic_putEvent: Iā€™d expect the event ID to be returned here, unless it can be easily derived purely client-side?

Ideally for me there would be two types of Ceramic APIs, the library APIs and remote APIs:

  • Library APIs are the public module APIs used when importing Ceramic in a Rust project + additional bindings for other targets (Node.js using https://napi.rs/ for example) - these assume a single consumer for the APIs
  • Remote APIs define client/server protocols similar to the JSON-RPC APIs described here, but assuming multiple clients/subscribers so notably having ceramic_unsubscribe and ceramic_listSubscriptions require a subscription ID to be provided.

For remote APIs, we can describe the RPC protocol (resources, requests, responses, events and errors types) in a generic way and from there describe specific implementation specs, for example JSON-RPC (with specific types encoding), REST (supporting JSON, CBOR or other encodings), gRPC, etc.
Ideally, most of these could be defined and implemented by the community, and the main focus for core devs would be to provide the library bindings and maintain the RPC protocol specs + test vectors that could be used by implementations.

From @paul

I guess the main thing is why weā€™d need REST in the first place, couldnā€™t the new Ceramic implementation be used directly as a library? Iā€™d much rather just have it as a npm dependency and run it with the ComposeDB node than having a dedicated server for it.

From @AaronDGoldman

I thought this was the discussion of the API between Ceramic and ComposeDB. We would need a Ceramic API if we either want to enable multiple Ceramic implementations or other databases over ceramic.

From @dbcfd

Iā€™d prefer not using it directly as a library. Iā€™d rather start with well defined boundaries at the http layer. We can then use it as a ā€˜libraryā€™ by just wrapping the http interface. We only produce an actual library if we need to for performance reasons, but that would likely be exposing the rest interface over IPC.

Between docker compose and k8s, itā€™s easy to standup services and provides much cleaner separation of functionality and stronger contracts

From @paul

Yeah I guess this is where I see things differently, because having to run Docker is such as failure of DX for me when it comes to building apps. Say you want you build a desktop app similar to Dropbox running Ceramic and ComposeDB, youā€™re going to ask end-users to install Docker? Thatā€™s not even an option for UX.

To produce a library version of ceramic indicates that thereā€™s a way to run an ā€œembeddedā€ ceramic, that can participate in the network, without overloading the users computer. A dropbox app communicates with a centralized infrastructure, and a ceramic equivalent would be communicating with a remote ceramic node to retrieve event information of interest.

  1. This should be defined in CIP-124
  2. I think this shouldnā€™t be the case, anyway in my recent PR referenced above this should be addressed.
  3. Good point!

Agree that an IPC interface would be useful.
For the remote api it seems like REST has a big benefit in that CAR files can returned as a content type.

I think this could be a reasonable assumption though? If I have a way to subscribe to a minimal subset of streams (e.g. mine and my friends), then itā€™s very feasible I could run it locally!

Thereā€™s a few ways to handle this.

  • Run a ā€œfullā€ ceramic node, this may not be feasible on all computers/networks
  • Run a ā€œliteā€ ceramic node, that only syncs a small subset of streams, and may not perform other functionality, or uses in memory databases, etc.
  • Run a ceramic client in your app, that knows how to find remote ceramic servers, and request data from them

Users that deploy in full mode, likely will use k8s (or other solution to standup infrastructure) to deploy multiple services in some type of ā€œhostedā€ environment.

1 Like

I think resources consumption and APIs design are orthogonal concerns, the Ceramic node should be able to manage resources without overwhelming the system and warn about reaching limits whether it is used as a library or a server.
Small server instances have more limited resources than many laptops so I donā€™t think there should be any assumption something running as a server has more load capacity.

To be clear, Iā€™m not saying Ceramic shouldnā€™t be run as a server, but I donā€™t think it should be the only option. Providing both direct bindings to the library and a wrapping server supporting multiple clients gives options to developers to use a solution that best suits their needs.

I guess my question is more about how events are delivered over time? When I subscribe, I expect to get a stream of events as they get known by the server from a push mechanism (WebSocket or SSE for example).
Here it seems the response is a list of events, but how do I get notified when new events have been pushed to the network? Do I need to call the method again, as a purely pull-based mechanism?

You have to use the ceramic_poll api, or /ceramic/subscribe{streamid} on the REST api approach I suggested.
You basically have a pointer you need to keep track of (eventOffse in our case). This is similar to the approach used by Kafka for event streams.

I think we should also seriously considered using gRPC for the API. There are two clear advantages to gRPC over something like REST or OpenAPI.

  1. gRPC has first class support for streaming requests (i.e. no need to poll for new events)
  2. gRPC has a clear and consistent mapping of API procedures to HTTP/2. Meaning we do not have to manage that mapping ourselves, for example we donā€™t have to specify how headers are used or URL paths, gRPC makes those decisions for us and does so consistently. This removes a large design burden and allows us to focus on more important higher level concepts in the API.

OpenAPI and gRPC have some common strengths:

  • Clients can be generated
  • Strong types for the API entities
  • Allow for RPC style APIs
  • Use HTTP as a transport

Some advantages of gRPC

  • Uses HTTP/2 which allows for multiplexing streams and first class streaming endpoints as mentioned above
  • Generated clients are easier to use as there is a clear RPC interface
  • We do not have to decide how to map RPC concepts to HTTP

Some advantages OpenAPI has over gRPC

  • Not required to use the generated client as implementing an HTTP/1 client is relatively straightforward.
  • Possibly simpler client libraries
  • Does not require the use of protocol buffers

To make this comparison more concrete here is a first pass at defining the API using gRPC.

// The greeting service definition.
service Ceramic {
  // Subscribes to events for a given stream, receive a stream of events.
  rpc Subscribe(SubscribeRequest) returns (stream Event) {}
  // Perform one recon message exchange.
  rpc Recon(Recon) returns Recon {}
}

// Request
message SubscribeRequest {
  string stream_id = 1;
  string startEvent = 2;
  // any other metadata needed to start a new subscription
}

// A single Event
message Event {
  string network = 1;
  string cid = 2;
  // all event fields ...
}

// A Recon message
message Recon {
  repeated string keys = 1;
  repeated string hashes = 2;
}

With this simple service with a single RPC method clients can get events from the server with low latency and without having to manage polling/retries. We get backpressure and active subscription state for free (i.e. the server knows which subscriptions are active because there is an active stream out to a client; once the client closes the connection the subscription is no longer needed).

For a more detailed discussion on gRPC vs REST vs OpenAPI see https://cloud.google.com/blog/products/api-management/understanding-grpc-openapi-and-rest-and-when-to-use-them

1 Like

Regarding REST vs GRPC, this is nice GitHub - grpc-ecosystem/grpc-gateway: gRPC to JSON proxy generator following the gRPC HTTP spec why not both?

Uses HTTP/2 which allows for multiplexing streams and first class streaming endpoints as mentioned above

Is this mainly a performance improvement, or are the other advantages of this?

I would also like to see an example of how we would transport CAR files over grpc. Just as a bytestring parameter in one of the messages?

Finally a thing that somewhat scares me is the package size of grpc libraries in javascript: @grpc/grpc-js v1.8.16 ā˜ Bundlephobia

Regarding REST vs GRPC, this is nice GitHub - grpc-ecosystem/grpc-gateway: gRPC to JSON proxy generator following the gRPC HTTP spec why not both?

@0xEE3CA4dd4CeB341691 Not sure if you can send byestrings over json-rpc, regardless I think the main thing Iā€™d like to figure out is more OpenAPI (as specified in my PR above) vs gRPC.

some assorted thoughts from reading through the CIP:

ceramic_subscribe/ceramic_unsubscribe:

  • Do we need these messages at all? Can this just be node configuration? Does the subscription set need to change at runtime?
  • ctrlRangeStart - why is this a specific option? Isnā€™t controller range already included in the sort key?

ceramic_poll:

  • I donā€™t really understand how this is used. Examples would help. Does this return events from all subscribed streams all intermixed in with each other?
  • Why does this return in the order that the node receives events? Why canā€™t it return in event sort order?
  • almost certainly needs some pagination controls. Or really this whole CIP needs something to describe how cursor-like functionality is built up so you can stream large result sets to the client without using a single huge response message. I guess based on what transport protocol we use (which is being discussed right now), this might or might not already be natively handled by the transport protocol. But if we go with something lower-level like plain REST, weā€™ll definitely need to think about this.

ceramic_recon:

  • reconRange <Array> - a Recon message, *[eventid or ahash]*
    ^ I donā€™t know how to read that
  • Can we link to the Recon CIP here?
  • Iā€™m not really sure when or why a user would want to use the recon API directly. That seems like something the node should do under the hood and then it should return an ordered stream of events back to the client without the client needing to understand the details of the underlying sync protocol.

ceramic_exportRawEvents:

  • should there be a way to ask for a range of eventids instead of having to list each id manually?

ceramic_putRawEvent/ceramic_putEvent. Should these be _putEvents (plural) and allow publishing multiple events in a single request?

In general I think this proposal would be a lot easier to digest with some examples of how it would be used. Could be an ā€œappendixā€ or something else linked off of the main CIP page to illustrate its not a core part of the CIP but is attached as an example to ease understanding.

Have we done a comparison to the APIs exposed by other event streaming systems (e.g. kafka)?