Storing vector embeddings in ComposeDB

seref · August 29, 2023, 4:58pm

Hello everyone!

Is storing vector embeddings in ComposeDB a good practice?

While the current indexing structure may not provide the best querying experience with Postgres, I don’t see any major drawbacks for storage. We’re considering using another vector db (Chroma) for querying. Using ComposeDB to store data in an interoperable and DID controlled way already seems like a good idea to me.

I guess another alternative, particularly if the vector size is large, would be to store them individually in IPFS and then reference their CIDs in ComposeDB.

Would love to hear your thoughts.

jthor · August 29, 2023, 8:36pm

I’ve heard of a few different teams exploring storing vector embeddings on ComposeDB. I think it’s an excellent idea!

In your case, how big are the vectors?

seref · August 30, 2023, 5:44pm

Good to hear that.

We are using OpenAI’s text-embedding-ada-002 model to generate embeddings, and its dimension size is 1536. I created a dummy ComposeDB model and a document, everything worked well. Here it is: US3R SCAN