How to Query for Non-Existent Fields in Firestore

As developers working with Firebase’s Firestore database, we often encounter situations where we need to query for documents that don’t have a particular field. It should be simple, right? Well, unfortunately, this seemingly simple task is more challenging than expected. In this article, we’ll explore why querying for non-existent fields in Firestore is problematic and discuss some potential workarounds.

The Challenge: Firestore’s Indexing Mechanism

Firestore’s querying capabilities are built on its indexing system. As Doug Stevenson, a former Firebase team member, succinctly puts it:

“You can’t query for something that doesn’t exist in Firestore. A field needs to exist in order for a Firestore index to be aware of it.”

– Doug Stevenson

For example, if you try to make the following Firestore query call in Node.js – where db is your initialized Firestore connection:

const querySnapShot = await db.collection("users").where("age", "==", null).get();
querySnapShot.forEach(doc => console.log(doc.data());

No results will be returned.

This fundamental aspect of Firestore’s architecture means that traditional methods of querying for non-existent fields, such as where("field", "==", null) just won’t work.

Why Can’t We Query for Non-Existent Fields?

The reason lies in how Firestore manages its indexes:

  1. All Firestore queries rely on indexes. This is what makes querying so fast and why when you try to use a where query on a new field you’re required to build an index.
  2. A document is only included in an index for a field if that field exists in the Firestore document. I.e. how can you index something that doesn’t exist? You can’t.
  3. If a document doesn’t have a value for a certain field, it won’t be present in the index for that field.

This implies that there’s no straightforward way to query for documents that don’t have a specific field, as those documents aren’t represented in the relevant index. Thus, when you use the where("field", "==", null) nothing is returned.

You also might be asking, “Well, in MongoDB you can easily query by non-existent fields, so why not Firestore?” It is true you can do a query like db.mycollection.find({ “age” : { “$exists” : false } }). However, you can’t do an MongoDB index search if you query by $"exists": false making the query slow and inefficient.

Workarounds and Solutions

Despite this limitation, there are several approaches we can take to “query” for non-existent fields. Unfortunately each solutions has trade-offs, as we’ll see below.

1. Fetch All Documents and Filter Client or Server-Side

When dealing with smaller collections, you can fetch all documents and filter them in your application code. This is currently the most straightforward way to find documents without a specific field:

const getAllDocsWithoutField = async (collectionName, fieldName) => {
  const snapshot = await db.collection(collectionName).get();

  const docsWithoutField = snapshot.docs.filter(doc => {
    const data = doc.data();
    return !(fieldName in data);
  });

  return docsWithoutField;
};

// Example Usage
const docsWithoutAge = await getAllDocsWithoutField('users', 'age');

Advantages of this approach::

  1. Simplicity: This approach is straightforward to implement and understand.
  2. Flexibility: You can easily modify the filtering logic to handle complex cases or multiple fields.
  3. No Data Model Changes: It doesn’t require any changes to your existing data structure.
  4. Accurate Results: This method guarantees finding all documents without the specified field.

Drawbacks of this approach:

  1. Inefficiency: This method requires fetching all documents in the collection, which can be slow and resource-intensive for large collections.
  2. Cost: Firestore charges for read operations. With this approach, you’re reading every document in the collection, which can quickly become expensive. As of 2024, Google Cloud charges $0.031 per 100,000 document reads for most regions.
  3. Bandwidth: Fetching all documents consumes more bandwidth, which can be a concern for mobile applications or in regions with limited internet connectivity. We generally recommend making these calls server-side for larger requests.
  4. Firestore Query Limits: Firestore has a limit on the amount of data that can be retrieved in a single query. The current limit is 1 MB. For large collections, you might hit this limit before retrieving all documents.
  5. Performance: As the filtering happens client-side, the performance of the filtering operation depends on the client’s hardware, which might lead to slow operations on less powerful devices. Again, you might consider doing this server-side.

2. Set Field Missing Field to Null

If querying for documents without a certain field is a common requirement in your application, consider setting the field to null. For example:

{
  name: "John Doe",
  age: 30
}
{
  name: "Jane Smith",
  age: null
}

This allows you to query directly:

const snapshot = await db.collection('users')
  .where('age', '==', null)
  .get();

Advantages of this approach:

  1. Efficiency: This query can use Firestore’s indexes, making it much faster and more cost-effective than fetching all documents.
  2. Scalability: This method works well even for very large collections.
  3. Cost-effective: You only retrieve the documents you need, reducing read operations and bandwidth usage.

Drawbacks of this approach:

  1. Maintenance Overhead: You need to ensure the null field is explicitly set whenever you add or update documents, and update existing docs with the null field if not currently present. This might be a hassle depending on how you architected your system.
  2. Data Redundancy: You’re storing an additional field, possibly for many documents, which slightly increases storage costs, but not too much.

Implications and Best Practices

The inability to directly query for non-existent fields in Firestore has several implications for application design and data modeling:

  1. Data Model Considerations: When designing your Firestore data model, think about whether you’ll need to query for the absence of fields. If so, plan to include null field from the beginning.
  2. Application Logic: Be prepared to handle field absence non-existent fields in your application code.
  3. Performance Planning: For large collections, implement pagination or limit the scope of queries to avoid performance issues when using the client-side filtering approach. You can use the limit function to accomplish this.
  4. Cost Management: Be mindful of the potential costs associated with reading entire collections. Consider implementing caching strategies – perhaps in Redis or Firestore Bundles.

No Perfect Solution

Querying for non-existent fields in Firestore has been a challenge due to the database’s architecture. While there’s no perfect solution, the approaches outlined above offer ways to work around this limitation. The choice between client or server-side filtering or using a null field depends on your specific use case, the size of your data, update frequency, and performance requirements. If you’re starting a new collection, the second options might be the best fit.

As Firestore continues to evolve, it’s worth keeping an eye on new features or querying capabilities that might address this limitation in the future – while the capability to query non-existent fields like MongoDB has been requested for years, we have our hopes it will be released.

Bonus

A nifty Firestore query function is orderBy, which only returns documents with that field. For example, .orderBy("age") will only return documents with an age field. While this isn’t useful for querying non-existent fields, if can be useful to find specific docs and not return the whole collection.