Is it best to create ComposeDB models as small as possible?

spencer · September 12, 2022, 9:52pm

This is a really good question and the answer unfortunately is “it depends”. There are tradeoffs between smaller vs larger Models. The advantage of smaller Models is they are more modular, easier to compose in different combinations, and allow apps to depend on exactly the data they need and no more. The primary disadvantage of smaller Models, however, is that data belong to two different Models cannot be updated atomically.

For example, consider if you wanted to store a user’s address. Once possibility is to have a single Model that stores the entire address, where documents within that Model would look something like:

{ 
  street: '123 main street',
  city: 'boston',
  state: 'MA'
}

Another possibility is to have 3 Models: a address_street, address_city, and a address_state Model. So the first Model would have the data {street: '123 main street'}, the second would have {city: 'boston'}, and the third would have {state: 'MA'}. This would allow an app to index all the users who live in the state of Massachusetts without needing to also index what city they live in or what their street address is. The problem with this structure, however, would be that if the user were to move to a new address in a new state, the update to each of the 3 address documents in the 3 Models would happen independently - meaning they could be reordered, an app could learn about one but not the other two, or one of the writes could even fail entirely while the other two succeed. This could lead to inconsistent views about where the user actually lives.

So the key tradeoff to consider when thinking about how narrow or broadly to define a Data Model is around the need to update various pieces of the data atomically. If there is a strong relationship between two fields of the Model such that it’s important to be able to update both of those fields in a way that apps will only ever see the change to both fields or neither, then those fields should be in the same Model. If there are fields that don’t need the ability to be atomically updated in this way, and especially if some apps might care about some of those fields but not others, then splitting them up into multiple smaller Models will provide developers with more flexibility around which pieces of the data they want to index in their application.