Ceramic vs Polybase/Spacetime

Hello, I request the core team to give a comparison table on the subtleties between the two database technologies.

Please see Whitepaper Update (v2) | Polybase Blog and https://polybase.xyz/Polybase_A_Decentralised_Query_Index_and_Storage_Protocol-v2.0.pdf

I feel there is a lot of overlap in the data model, index and aggregation, and possible economics (tokens). But I also think new developers should understand all these, so they don’t FOMO into the wrong rabbit hole.

Thanks!

1 Like

Hello :wave: Sid and Calum here, co-founders of Polybase.

First, we think the team at Ceramic is brilliant and have built a great product!

I (Sid) will give a high level overview of the two technologies and then Calum will dig into technical details.

Fundamentally, Ceramic is a decentralized event streaming platform whereas Polybase is a verifiable, decentralized database.

We think of Ceramic as more similar to Apache Kafka and Polybase as Postgres (or Firebase), when comparing to web2.

While you can “pin” the state of an event stream to function as a database, it neither has the queryability or scalability of a database, and having data owned by multiple users is much harder.

Furthermore, Polybase uses zero knowledge proofs to secure permissions and transactions so when you read the (public or private) data in dapps or smart contracts, it is simple and cheap to verify that its correct.

Polybase is architected to function as a drop in replacement for web2 databases without you having to think about any new abstractions or concepts. It just works.


Calum here :wave:, from a technical standpoint, Polybase is a state zk-rollup protocol, that provides native support for modular data storage and indexing.

We verify proofs on the client according to smart contract rules, and these are then verified and rolled up into a single root hash by decentralized network of nodes. Separate indexing and storage modules are used to complete the database functionality.

As data storage and indexing modules are pluggable, you can actually use one or more different protocols alongside Polybase. So you might use Polybase as your database (aka verifiable state), Filecoin for data storage, Ceramic for a custom aggregation pipeline and some other protocol for search.

We will of course, have our own indexing and storage engine (that will feel similar to Firebase) to make things fast to get started, but the power of making this pluggable is you, the builder, get to decide what to use based on the different trade offs you need.

We’re excited to hear more feedback from developers on this!

4 Likes

Thanks for this response Calum. Adding some more context below.

While it’s true that Ceramic acts as a web3 Kafka it can also be leveraged as a database. A good article on this subject in general is The duality of Streams and Tables. This means that anyone can spin up a node to replicate the state of the database from a subset of event streams in Ceramic. See ComposeDB, which you can think about as a decentralized graph database.

The advantage of this is greater scalability and greater optionality for node operators. A node chooses what subset of data to index and the data can in theory be sharded down to unique user account. For an argument why this approach can scale better than a blockchain see this article.

4 Likes

Hi Joel. Could you address the how the data could be verified if they are accessed from an indexer instead of the data source? Do we loss the verifiability?

1 Like

When running the indexer for ComposeDB all data streams will be verified as usual by Ceramic.

2 Likes

In this architecture, what is the best way to find out all the data owned by a user if each node operator could choose what data to index? Would it lead to web2 like data silos?
Would a “user node” like DIF Decentralized Web Node be useful? Is ceramic expecting each user to run their own indexing node in addition to nodes operated by projects?

1 Like

Having users run nodes that subscribe to their specific data is certainly an advantage of the Ceramic design. It’s not a requirement however. Any node can chose to replicate any data in the network, so it doesn’t really lead to web2 silos in the way you describe.

2 Likes

users are we referring to end users or developers?

1 Like

In practice, how do nodes decide which streams to replicate? Without an clear incentive to replicate all streams, how could app developers ensure data availability? Love ceramic btw, just trying to learn here…

1 Like

Both, in principle a user can run a node that only replicates their data. There is however no easy way to do so right now.

With ComposeDB you can configure your Ceramic node to index all data within a data model.

This is a great question. If this is the case, would it be accurate to say that, for non-event data (e.g. user profile data), would @polybase be a better solution, despite Ceramic’s ability to be leveraged as a database and use “materialized views” (streams to tables)?

What are the pros/cons here?

Thanks.

1 Like

I think it depends on your general architecture. If you expect to have other data where you have higher throughput than profile data, then also using Ceramic for the profile data might be easier than relying on two protocols.

1 Like