Achieving Reliable Zero Downtime Re-indexing with Searchkick

Problem

Ruby on Rails and Searchkick have allowed us to rapidly build applications with powerful search capabilities on top of vast amounts of data. Searchkick is a Ruby gem that integrates with Elasticsearch or OpenSearch, offering advanced search capabilities and enabling complex data queries efficiently. Re-indexing Searchkick models is essential to maintain search accuracy and performance, but the current solutions lead to data inconsistency during the process. This happens because writes and updates made while re-indexing are not reflected in the new index, causing missing or outdated data after the re-index completes. This is a critical issue for high-traffic applications with frequent data changes.

‍

Solution

This article proposes a solution to achieve zero downtime re-indexing with Searchkick by overriding its index jobs. This allows us to duplicate any operation (create, update, delete) happening during re-indexing to both the current and new index, ensuring data consistency. The solution leverages:

Ruby on Rails extensions to modify Searchkick behavior
Redis to store re-indexing state and flags
Index aliases for seamless transition between old and new indexes

‍

Introduction

At Woflow, we are riding the Ruby on Rails and Searchkick express train, and it has allowed us to rapidly build applications capable of managing vast amounts of data. Using this dynamic duo, we are able to run fast full-text searches using dozens of filters on indexes with millions of entries.

Re-indexing is essential for maintaining search accuracy, data structure, and performance over time. When using Searchkick you need to re-index when:

install or upgrade searchkick gem
change the search_data method
change the searchkick method

This article is meant for software engineers, architects, developers or enthusiasts using Ruby on Rails with Searchkick in production environments. We'll explore an efficient and reliable method for re-indexing your Searchkick database while maintaining uninterrupted search functionality and real-time result updates. Our approach ensures seamless user experience during the re-indexing process, critical for high-traffic applications.

‍

Naive solution

To tackle the re-index issue, most of us would start by scouring the internet. You'd think that with two battle-tested technologies, you'd stumble upon several reliable foolproof solutions, right? Well, not quite. Most of the solutions we found online for re-indexing Searchkick models have a major snag. They solve the problem of allowing users to search for data while the re-index is happening, but if the user writes data, it will eventually be lost.

Let's first take a peek at the code for the most common solution we stumbled across in the wild:

‍

require 'sidekiq/api'

module ModelReindexer
  def self.reindex_model(model, promote_and_clean = true)
    puts "ModelReindexer started for model: #{model.name}"
    
    # Async here will force jobs to be created in Sidekiq
    index = model.reindex(async: true, refresh_interval: '30s')
    
    puts "All jobs queued in index: #{index[:index_name]}"
    
    loop do
      # Check the reindex status using Searchkick
      status = Searchkick.reindex_status(index[:index_name])
      puts "Reindex batches left: #{status[:batches_left]}"

      # Check every 5 seconds
      sleep 5

      break if status[:completed]
    end
    
    puts 'Reindex complete. Promoting...'
    model.search_index.promote(index[:index_name], update_refresh_interval: true)
    puts "Reindex of #{model.name} complete."

    if promote_and_clean
      puts 'Cleaning old indices'
      model.search_index.clean_indices
    end
  end
end

‍‍

Over large datasets, these re-index operations can take several minutes. In our case, we have more than 10 indexes with 2-10 million documents in each, and that can take up to 1 hour to fully re-index. With that delay, one can run into a major roadblock: how to keep the search features running smoothly while running re-index without causing downtime or data loss. For our search features to continue working flawlessly, during the re-index, users will have to be able to continue to search and write data, of course.

This discrepancy creates a significant data consistency problem. By the time the re-indexing completes and the new index is promoted, it's already out of date. Any records that were created, updated, or deleted in the original index during the process are not captured in the new index. This can lead to missing data, outdated information, or "phantom" records that no longer exist in the database but persist in the search index. In high-traffic applications or those with frequent data changes, this inconsistency can be substantial, potentially affecting the accuracy and reliability of search results immediately after the re-index operation completes.

‍

The fix: Double-write data while re-indexing

Ankane, the creator of Searchkick, talks about this problem in this very cohesive article.

Even though Searchkick Pro is not available anymore, the solution is still described in the blog post. It is simple enough: every time we are about to run an operation in the current index, we queue the same operation for the new index. This Job queue is handled by Sidekiq.

‍

Implementing the full solution

To implement this solution, we are going to extend Searchkick's functionality by overriding its index jobs. This approach allows us to intercept and modify the behavior of these jobs, ensuring that any changes made during the re-indexing process are captured in both the current and new indexes. By doing so, we can maintain data consistency and achieve zero downtime during the re-indexing operation.

‍

Extending Searchkick

The main idea is to override all the index jobs found in the Searchkick module. One great way to do that is by leveraging Ruby on Rails extensions. We should be careful about this though, here is the warning.

‍

💡 Though Ruby allows you to reopen classes, you shouldn't abuse that feature. In particular, avoid changing existing methods, especially in the Ruby core or standard library. If you change the behavior of a method, your application might stop working properly.

If you decide to add a new method, make sure you are using an unique name. Otherwise, if you are using a Gem which defines a method with the same name, something might not work as expected.

Be careful when you modify existing classes. Consider using Inheritance or Composition.

‍

But oh well, we need to improve Searchkick, so we created an extension to enhance this module. We are going to need to override certain job classes provided by it, such as BulkReindexJob and ProcessBatchJob, for example. We can find all the background jobs in this part of the Searchkick source code (gotta love open source ❤️).

‍

Leveraging Redis for State Management

Redis plays a crucial role in maintaining the state of re-indexing processes in this extension. It stores control flags and the new index name, enabling us to track and duplicate the operations to the new index, ensuring we maintain data integrity.

‍

Utilizing Index Aliases

Searchkick internally uses index aliases for the re-index process, which allows seamless transition between old and new indexes without changing the application's index references. This mechanism is crucial in ensuring zero downtime, as it enables the application to switch to the new index once re-indexing is complete, without any interruptions. You can read more about this topic in this Elasticsearch documentation.

‍

Implementing the Rails Extension

So, without further ado, we started implementing the extension. The idea is to prepend all relevant job classes with a new module ReindexCheck. This module will override the perform method in a way that will intercept job execution, check re-indexing status, and ensure data consistency by writing to both indexes during the process if necessary. Here’s a simplified implementation of this idea:

‍

module Searchkick
  module ReindexCheck
    def perform(*args, **options)
      job_type = self.class.name.demodulize
      class_name, index_name, record_ids = extract_job_details(job_type, args, options)

      new_index_name = Searchkick.get_new_index_name(class_name)

      if reindexing?(class_name) && index_name != new_index_name
        spawn_job_for_new_index(job_type, args, options, class_name, new_index_name, record_ids)
      end

      super(*args, **options)
    end
  end
  
  JOBS_TO_OVERRIDE = [BulkReindexJob, ProcessBatchJob, ProcessQueueJob, ReindexV2Job]
  JOBS_TO_OVERRIDE.each do |job_class|
    job_class.class_eval do
      prepend Searchkick::ReindexCheck
    end
  end
  
  # .... for simplicity some methods are omitted here, you can find their full implementation at the end of the article
end

‍

💡 One super important step after implementing this extension is to write solid unit tests. They will make sure you update your extension whenever Searchkick's source code evolves, especially if the signatures of these jobs change.

‍

Updating the solution to manage the Re-indexing Process

At Woflow, we decided to make a rake task for whenever we need to fully re-index a model. In order to expose flags and the new index name to Redis, so our extensions can access that needed information. Here’s a simplified version of our rake task:

‍

namespace :searchkick do
  task :async_reindex, [:model] => :environment do |_task, args|
    model_name = args[:model]
    model_class = {
      'Entry' => Entry,
      'Brand' => Brand,
      'Job' => Job
    }[model_name]

    raise "Invalid model name: #{model_name}" unless model_class

    puts "Reindexing #{model_name}..."

    index = model_class.reindex(async: true, refresh_interval: '30s')
    Searchkick.flag_start_reindex(model_name, index[:index_name])
    
    loop do
      # Check the reindex status using Searchkick
      status = Searchkick.reindex_status(index[:index_name])
      puts "Reindex batches left: #{status[:batches_left]}"

      # Check every 15 seconds
      sleep 15

      break if status[:completed]
    end

    model_class.search_index.promote(index[:index_name])
    Searchkick.flag_end_reindex(model_name)
    puts "#{model_name} reindexing complete."
  end
end

‍‍

In this rake task, we call the reindex method with async true, flag the start of the index and the new index name in Redis. Then we wait for the re-index batches to reach 0. When it does, we promote the new index (internally it will update the Opensearch/ ElasticSearch aliases) and then we clear the index flag from Redis.

‍

Conclusion

The solution detailed in this post provides a practical approach to achieving reliable zero downtime re-indexing with Searchkick. While it may not be perfect, it effectively addresses the data consistency issues that can arise during re-indexing. To ensure its reliability, it's crucial to:

Implement comprehensive unit tests to protect against future changes in Searchkick's source code⁠⁠
Set up proper logging mechanisms to monitor the re-indexing process
Regularly review and update the implementation as needed, especially after Searchkick updates

By following these steps and understanding the intricacies of the solution, you can maintain a robust and efficient search functionality in your Ruby on Rails application, even when dealing with large datasets and frequent updates. Happy coding and good luck. 👏 🙇 🎬

‍

PS: Obfuscated implementations

Disclaimer: The generic methods for extracting jobs details and spawning secondary jobs were omited from the top text to avoid extra noise. They are noisy but simple to understand and can be found HERE.

‍

Achieving Reliable Zero Downtime Re-indexing with Searchkick

Lucas Oliveira

Problem

Solution

Introduction

Naive solution

The fix: Double-write data while re-indexing

Implementing the full solution

Extending Searchkick

Leveraging Redis for State Management

Utilizing Index Aliases

Implementing the Rails Extension

Updating the solution to manage the Re-indexing Process

Conclusion

PS: Obfuscated implementations

More from the Blog

XtremeAI Acquisition: Expanding Reach and Bolstering AI Capabilities

Ego’s, Titles, and Time, Oh My

Woflow Raises $7.3M Series A to Help Power the Digital Transformation of Millions of Merchants

Platform Details

Easily manage 100 to 1,000,000+ suppliers

Sales enablement

Automated data onboarding

Access to reports & reporting

Custom data schemas

Support ticket resolution

Fully integrated with your systems

Scalable infrastructure

See Pricing →

Are you ready to level up your commerce experience? Let's talk

Product

Company

Developers