Indexing to Elasticsearch using Apache Kafka

seref · March 27, 2023, 8:34pm

Hello!

We are currently indexing ComposeDB into Elasticsearch using the following pipeline: ComposeDB → PostgreSQL → KafkaConnect → Apache Kafka → Consumer → Elasticsearch.

Ideally, we would like to remove PostgreSQL and KafkaConnect from the pipeline and instead stream updates directly to Kafka. What are your thoughts or plans regarding the use of the native Apache Kafka indexer? As a fan of Kafka and Ceramic, I’m very excited about this possibility.

Also, here is the architecture of index.as. Any feedback would be greatly appreciated.

Justina · March 28, 2023, 3:15pm

Hey @seref thanks for sharing this! It’s a very interesting implementation. At the moment we don’t have plans for directly supporting Kafka for indexing. It is something we are discussing and looking into potential solutions, but it’s very early and we cannot guarantee that it will result into actual implementation. We will definitely keep the community up-to-date if we end up having more concrete plans for it.

dbcfd · March 28, 2023, 9:06pm

@seref

This is currently the best approach for information exposed by ceramic. We are looking at exposing a more streaming friendly API, which would notify of metadata (e.g. new models and documents), and events on documents. Would this be something that would solve your use case better?

seref · March 28, 2023, 9:22pm

Thank you @Justina and @dbcfd for the context.

Streaming friendly API and events would be great. By doing this, it would be much easier to connect to Kafka or any other 3rd party tool.

jthor · March 29, 2023, 8:52am

Just to be sure, the “Index to postgres” block in your diagram you mean the postgres that ComposeDB automatically indexes to?

seref · March 29, 2023, 10:00am

Yes, it’s ComposeDB’s native PostgreSQL indexer. Diagram was a bit inaccurate in that part. And also, it works smoothly.