Rails 8 (released November 2024) is the smoothest upgrade yet.
The focus?
Simplicity and performance - removing external dependencies while boosting speed.
Solid Queue, Solid Cache, and Solid Cabl...
Rails 8 (released November 2024) is the smoothest upgrade yet.
The focus?
Simplicity and performance - removing external dependencies while boosting speed.
Solid Queue, Solid Cache, and Solid Cable eliminate Redis for many use cases.
Built-in authentication removes the need for Devise.
Ruby 3.4 (latest stable) with continued YJIT improvements delivers excellent performance.
Note: This is Part 5 of our Rails Upgrade Series. Read Part 1: Planning Rails Upgrade for the overall strategy.
Before We Start
Expected Timeline: 1-2 weeks for medium-sized applications (easiest upgrade!)
Medium-sized application: 20,000-50,000 lines of code, 30-100 models, moderate test coverage, 2-5 developers. Smaller apps may take 1-2 weeks, larger enterprise apps 6-12 weeks.
Prerequisites:
Currently on Rails 7.2 (upgrade from 7.1 first if needed)
Ruby 3.1.0+ installed (Ruby 3.4 strongly recommended)
Test coverage of 80%+
Understanding of background job setup
Step 1: Upgrade Ruby to 3.4 (Recommended)
Rails 8 requires Ruby 3.1.0 minimum, but Ruby 3.4 (latest stable) is strongly recommended for maximum performance.
Why Ruby 3.4?
Ruby 3.4 (December 2024) - Latest stable:
it as default block parameter - Cleaner block syntax
Prism parser improvements - Faster parsing
YJIT optimizations - Continued performance improvements
Better memory efficiency - Reduced memory usage
Enhanced pattern matching - More powerful syntax
Improved error messages - Better debugging experience
Ruby 3.3 (December 2023) - Also excellent:
Prism parser - New default parser
RJIT - Pure Ruby JIT compiler
M:N thread scheduler - Better concurrency
15-20% faster than Ruby 3.2 with YJIT
Ruby 3.2 (December 2022) - Stable choice:
Production-ready YJIT - Stable and fast
WASI support - WebAssembly integration
Data class - Immutable value objects
Upgrade Ruby
# Check current Ruby version
ruby -v
# Install Ruby 3.4 (recommended - latest stable)
rbenv install 3.4.1
rbenv local 3.4.1
# Or Ruby 3.3 (also good)
rbenv install 3.3.6
rbenv local 3.3.6
# Verify
ruby -v
# => ruby 3.4.1
# Update bundler
gem install bundler
bundle install
Enable YJIT (Critical for Performance)
# config/boot.rb
ENV['RUBY_YJIT_ENABLE'] = '1'
# Or set in environment
export RUBY_YJIT_ENABLE=1
Performance gain: Ruby 3.4 continues YJIT improvements, 15-20% faster than Ruby 3.2, or 30-40% faster than Ruby 2.7.
Test with Ruby 3.4
# Run full test suite
bundle exec rails test
# Check YJIT stats
rails runner 'puts RubyVM::YJIT.runtime_stats' | grep ratio
Step 2: Update the Gemfile
# Gemfile
# Update Rails
gem 'rails', '~> 8.0.0'
# Solid Queue (replaces Sidekiq/Resque for many use cases)
gem 'solid_queue'
# Solid Cache (database-backed caching)
gem 'solid_cache'
# Solid Cable (WebSocket without Redis)
gem 'solid_cable'
# Keep existing gems
gem 'importmap-rails'
gem 'turbo-rails'
gem 'stimulus-rails'
gem 'sprockets-rails'
gem 'puma', '>= 5.0'
gem 'bootsnap', require: false
# Database adapters
gem 'pg', '~> 1.1' # PostgreSQL
# or
gem 'mysql2', '~> 0.5' # MySQL
# Optional: Remove if migrating to Solid alternatives
# gem 'sidekiq' # Can be replaced by Solid Queue
# gem 'redis' # Can be replaced by Solid Cache/Cable
bundle update rails
bundle install
Step 3: Run the Update Task
rails app:update
Review changes to:
config/application.rb
config/environments/*.rb
config/initializers/new_framework_defaults_8_0.rb
Step 4: Solid Queue (Optional but Recommended)
Solid Queue is a database-backed job queue that eliminates Redis dependency for background jobs.
When to Use Solid Queue
Good fit:
Low to medium job volume (< 1000 jobs/minute)
Simple job processing needs
Want to eliminate Redis
PostgreSQL or MySQL database
Stick with Sidekiq/Resque if:
High job volume (> 1000 jobs/minute)
Complex job scheduling needs
Already have Redis infrastructure
Need advanced features (unique jobs, batches)
Install Solid Queue
# Install Solid Queue
rails solid_queue:install
# This creates:
# - db/queue_schema.rb
# - config/queue.yml
# - bin/jobs (worker script)
# Run migrations
rails db:migrate
Configure Solid Queue
# config/queue.yml
production:
dispatchers:
- polling_interval: 1
batch_size: 500
workers:
- queues: "*"
threads: 3
processes: 2
polling_interval: 0.1
# config/environments/production.rb
config.active_job.queue_adapter = :solid_queue
Migrate from Sidekiq
# Before (Sidekiq)
class MyJob < ApplicationJob
queue_as :default
def perform(user_id)
# Job logic
end
end
# After (Solid Queue) - Same code!
class MyJob < ApplicationJob
queue_as :default
def perform(user_id)
# Job logic - no changes needed
end
end
Run Solid Queue
# Development
bin/jobs
# Production (with systemd, Docker, or Kamal)
bundle exec rake solid_queue:start
Step 5: Solid Cache (Optional)
Solid Cache is a database-backed cache store that eliminates Redis for caching.
Install Solid Cache
rails solid_cache:install
rails db:migrate
Configure Solid Cache
# config/environments/production.rb
config.cache_store = :solid_cache_store
Usage (Same as Before)
# Fragment caching - no changes
<% cache @post do %>
<%= render @post %>
<% end %>
# Low-level caching - no changes
Rails.cache.fetch("user_#{user.id}") do
user.expensive_calculation
end
Performance Considerations
Pros:
No Redis dependency
Automatic cleanup of old entries
Works with existing database
Cons:
Slower than Redis for high-traffic sites
Database load increases
Recommendation: Use Solid Cache for low to medium traffic. Keep Redis for high-traffic applications.
Step 6: Solid Cable (Optional)
Solid Cable provides WebSocket support without Redis.
Install Solid Cable
rails solid_cable:install
rails db:migrate
Configure Solid Cable
# config/cable.yml
production:
adapter: solid_cable
# Or keep Redis if we have it
# production:
# adapter: redis
# url: redis://localhost:6379/1
Usage (No Changes)
# app/channels/chat_channel.rb
class ChatChannel < ApplicationCable::Channel
def subscribed
stream_from "chat_#{params[:room_id]}"
end
def receive(data)
ActionCable.server.broadcast(
"chat_#{params[:room_id]}",
data
)
end
end
Step 7: Built-in Authentication Generator
Rails 8 includes a built-in authentication generator - no Devise needed for simple use cases.
Generate Authentication
rails generate authentication
This creates:
User model with password authentication
SessionsController for login/logout
PasswordsController for password reset
Authentication views
Helper methods
What We Get
# app/models/user.rb
class User < ApplicationRecord
has_secure_password
generates_token_for :password_reset, expires_in: 15.minutes
generates_token_for :email_confirmation, expires_in: 24.hours
end
# app/controllers/application_controller.rb
class ApplicationController < ActionController::Base
before_action :authenticate
private
def authenticate
if session_record = Session.find_by(id: cookies.signed[:session_id])
Current.session = session_record
else
redirect_to new_session_path
end
end
end
When to Use Built-in Auth vs Devise
Use built-in authentication:
Simple authentication needs
Want full control over auth code
Learning Rails authentication
Small to medium applications
Use Devise:
Need OAuth integration
Complex authentication requirements
Multi-tenancy
Advanced features (confirmable, lockable, etc.)
Migrate from Devise (If Needed)
# Keep the existing User model
# Add has_secure_password
class User < ApplicationRecord
has_secure_password
# Keep existing Devise functionality we need
# Remove Devise modules we don't use
end
# Gradually migrate authentication logic
# Test thoroughly before removing Devise
Step 8: Progressive Web App (PWA) Support
Rails 8 adds built-in PWA support.
Generate PWA Files
rails generate pwa
This creates:
app/views/pwa/manifest.json.erb - PWA manifest
app/views/pwa/service-worker.js - Service worker
Icons and configuration
Configure PWA
<!-- app/views/layouts/application.html.erb -->
<head>
<%= tag.link rel: "manifest", href: pwa_manifest_path %>
<%= tag.meta name: "apple-mobile-web-app-capable", content: "yes" %>
</head>
# config/routes.rb
get "manifest" => "pwa#manifest", as: :pwa_manifest
get "service-worker" => "pwa#service_worker", as: :pwa_service_worker
Step 9: Breaking Changes (Minimal!)
Rails 8 has very few breaking changes - this is the smoothest upgrade.
1. Deprecations from Rails 7 Removed
# If we fixed Rails 7 deprecation warnings, we're good!
# Check for any remaining warnings:
RAILS_ENV=test rails test 2>&1 | grep -i deprecat
2. Default Configuration Changes
# config/initializers/new_framework_defaults_8_0.rb
# Review and enable new defaults
Rails.application.config.load_defaults 8.0
3. ActiveStorage Changes
# ActiveStorage::Blob#open without block now returns file
# Before (Rails 7)
blob.open do |file|
# Use file
end
# After (Rails 8) - both work
blob.open do |file|
# Use file
end
# Or without block
file = blob.open
# Use file
file.close
Step 10: Testing Updates
Test Solid Queue Jobs
# test/jobs/my_job_test.rb
require "test_helper"
class MyJobTest < ActiveJob::TestCase
test "performs job" do
assert_enqueued_with(job: MyJob, args: [1]) do
MyJob.perform_later(1)
end
end
test "processes job" do
MyJob.perform_now(1)
# Assert job effects
end
end
Test Authentication
# test/controllers/posts_controller_test.rb
class PostsControllerTest < ActionDispatch::IntegrationTest
setup do
@user = users(:one)
sign_in @user
end
test "should get index" do
get posts_url
assert_response :success
end
end
Step 11: Performance Improvements
Rails 8 + Ruby 3.4 delivers significant performance gains:
15-20% faster than Ruby 3.2 (or 30-40% faster than Ruby 2.7)
Lower memory usage (Ruby 3.4 improvements)
Faster boot times (Prism parser improvements)
Reduced infrastructure costs (no Redis needed)
Benchmark YJIT Performance
# config/initializers/yjit_stats.rb
if defined?(RubyVM::YJIT) && RubyVM::YJIT.enabled?
Rails.application.config.after_initialize do
at_exit do
stats = RubyVM::YJIT.runtime_stats
puts "\n=== YJIT Stats ==="
puts "Compiled: #{stats[:compiled_iseq_count]} methods"
puts "Ratio: #{stats[:ratio]}%"
puts "==================\n"
end
end
end
Monitor Solid Queue Performance
# Check job queue depth
SolidQueue::Job.pending.count
# Check failed jobs
SolidQueue::Job.failed.count
# Monitor in production
# Use the APM tool (New Relic, Datadog, etc.)
Step 12: Deployment with Kamal 2
Rails 8 includes Kamal 2 for zero-downtime deployments.
Install Kamal
# Kamal is included by default
# Initialize configuration
kamal init
Configure Kamal
# config/deploy.yml
service: myapp
image: myapp/production
servers:
web:
hosts:
- 192.168.1.1
labels:
traefik.http.routers.myapp.rule: Host(`myapp.com`)
options:
network: "private"
registry:
server: ghcr.io
username: myusername
password:
- KAMAL_REGISTRY_PASSWORD
env:
secret:
- RAILS_MASTER_KEY
Deploy
# First deployment
kamal setup
# Subsequent deployments
kamal deploy
# Rollback if needed
kamal rollback
Upgrade Checklist
Note: This checklist covers the most common changes. Depending on the application’s gems, custom code, and architecture, we may encounter additional issues. Always test thoroughly in a staging environment.
Upgrade Ruby to 3.1+ (3.4 recommended)
Enable YJIT for performance
Update Gemfile with Rails 8.0
Run rails app:update
Decide on Solid Queue (optional)
Decide on Solid Cache (optional)
Decide on Solid Cable (optional)
Consider built-in authentication (optional)
Enable PWA support (optional)
Fix any deprecation warnings
Run full test suite
Test in staging environment
Deploy to production with monitoring
Common Gotchas
1. Solid Queue vs Sidekiq
# Solid Queue doesn't support all Sidekiq features
# Check compatibility before migrating:
# - Unique jobs -> Use database constraints
# - Batches -> Implement manually
# - Scheduled jobs -> Supported
# - Retries -> Supported
2. Database Load with Solid Cache
# Monitor database performance
# Solid Cache adds queries to our database
# Consider keeping Redis for high-traffic sites
# Check cache hit rate
Rails.cache.stats
3. Authentication Migration
# Don't rush to remove Devise
# Test built-in auth thoroughly first
# Migrate gradually if needed
Migration Strategy
Conservative Approach
Upgrade to Rails 8 first
Keep existing infrastructure (Redis, Sidekiq, Devise)
Test thoroughly
Gradually adopt Solid gems if beneficial
Progressive Approach
Upgrade Ruby to 3.4
Upgrade to Rails 8
Migrate to Solid Queue for new jobs
Evaluate Solid Cache for non-critical caching
Keep Redis for high-traffic features
Aggressive Approach
Upgrade Ruby to 3.4
Upgrade to Rails 8
Migrate all jobs to Solid Queue
Replace Redis with Solid Cache/Cable
Use built-in authentication for new features
Recommendation: Start conservative, move progressive as we gain confidence.
What’s Next
Congratulations!
We’ve completed the Rails upgrade journey from planning through Rails 8.
Series recap:
Part 1: Strategic planning and preparation
Part 2: Rails 4.2 to 5 - Foundation updates
Part 3: Rails 5.2 to 6 - Zeitwerk and Webpacker
Part 4: Rails 6.1 to 7 - Import Maps and Hotwire
Part 5: Rails 7.2 to 8 - Solid gems and simplification
Keep the Rails App Modern
Monitor deprecation warnings in each Rails version
Upgrade Ruby regularly for performance and security
Test thoroughly at each step
Stay informed about Rails releases
Contribute back to the Rails community
Resources
Official Rails 8.0 Release Notes
Rails Upgrade Guide
Solid Queue Documentation
Solid Cache Documentation
Kamal Documentation
Ruby 3.4 Release Notes
Ruby 3.3 Release Notes
RailsDiff 7.2 to 8.0
At Saeloun, we’ve helped numerous teams successfully upgrade to Rails 8 and modernize their infrastructure.
Whether planning a major upgrade or needing help optimizing a Rails 8 application, we’re here to help.
Contact us for Rails upgrade consulting
I’ve been working on Elixir Toolbox quite a bit lately and wanted to share what’s new:
More categories: ~150 now, including AI/LLM sections (and more)
JSON API: the aggregated data is now exp...
I’ve been working on Elixir Toolbox quite a bit lately and wanted to share what’s new:
More categories: ~150 now, including AI/LLM sections (and more)
JSON API: the aggregated data is now exposed for anyone to query
New trending page, showing packages by recent downloads
llms.txt and llms-full.txt endpoints so LLMs can read the catalog: try asking your AI agent to “find me the best Elixir package for X using elixir-toolbox.dev”
Complete UI refresh, using DaisyUI
GitLab support
This is a passion project. I use it to keep my Elixir skills sharp, and hopefully it helps others too. Check it out!
Learn how apt, yum, dnf, and pkg manage software on Linux and FreeBSD. Compare commands, workflows, and best practices. Start with the right tool for your system.
Originally appeared on RubyMine : Intelligent Ruby and Rails IDE | The JetBrains Blog.
RubyMine enhances the developer experience with context-aware search features that make navigating a Rails app...
Originally appeared on RubyMine : Intelligent Ruby and Rails IDE | The JetBrains Blog.
RubyMine enhances the developer experience with context-aware search features that make navigating a Rails application seamless, a powerful analysis engine that detects problems in the source code, and integrated support for the most popular version control systems.
With AI becoming increasingly popular among developers as a tool that helps them understand codebases or develop applications, these RubyMine features provide an extra level of value. Indeed, with access to the functionality of the IDE and information about a given project, AI assistants can produce higher-quality results more efficiently.
To improve AI-assisted workflows, since 2025.3, RubyMine has also been able to…
My coding agent harnesses are designed to enable parallel serial work—multiple agents running in multiple tabs, all committing to main instead of worktrees.
turbocommit does this by linking each se...
My coding agent harnesses are designed to enable parallel serial work—multiple agents running in multiple tabs, all committing to main instead of worktrees.
turbocommit does this by linking each session's commits: https://github.com/searlsco/turbocommit?tab=readme-ov-file#continuity-across-workstreams
Originally appeared on Tim Riley.
Oops, nearly missed these weeknotes. Let me make this a quick one just to sneak it in and keep the streak alive (6 months and counting!)
My big achievement this...
Originally appeared on Tim Riley.
Oops, nearly missed these weeknotes. Let me make this a quick one just to sneak it in and keep the streak alive (6 months and counting!)
My big achievement this week was getting Hanami Minitest ready for feedback. Check out my post for a preview of the generated files and where I’m looking for help. This has already generated a whole lot of great feedback and discussion. Thank you everyone for sharing your thoughts!
I had a couple of very old hanami-rspec preview releases yanked from RubyGems.org (thank you Colby!), so now they no longer confuse the bundle outdated command.
Aaron added a nice new feature to Hanami CLI: a --name option to allow the app name to be customised. I reviewed…
TL;DR
I needed full-text search across compliance records in Humadroid — some of which are encrypted at the application layer. The naive answer is “just decrypt everything into...
TL;DR
I needed full-text search across compliance records in Humadroid — some of which are encrypted at the application layer. The naive answer is “just decrypt everything into a search index.” The real answer involves understanding exactly what you’re trading, making that trade-off explicit and per-organization, and designing the index so it reveals as little as possible. Here’s the pattern I built, what I considered, and what I’d tell an auditor who asks about it.
Why Encrypted Records Exist in the First Place
Humadroid is a GRC (Governance, Risk, and Compliance) platform. Our customers use it to manage SOC 2 controls, store implementation notes, track evidence, and maintain security documentation. Some of that content is sensitive. Think: “Here’s how we configured our AWS CloudTrail logging” or “Our penetration test found these three critical vulnerabilities.”
We encrypt sensitive fields at the application layer using Rails’ built-in encrypts directive. Not just database-level encryption at rest (we have that too) — actual column-level encryption where the plaintext never touches the database.
This matters because a database breach doesn’t expose the content. An attacker with a SQL dump sees gibberish. Your DBA can’t read implementation notes. The data is genuinely protected at rest in a way that disk encryption alone doesn’t provide.
(If you’re building a compliance tool and you don’t encrypt this stuff… I have questions.)
In practice, the split looks something like this: record titles and identifiers live unencrypted because they need to be sortable, filterable, and displayable in lists. The actual substance — implementation details, policy content, audit findings — gets encrypted. Not every model has encrypted fields, but the ones that carry real security context do.
And here’s the problem. You can’t run WHERE content ILIKE '%cloudtrail%' on an encrypted column. The database doesn’t know what’s in there. That’s the entire point.
The Search Problem
When you have 200+ compliance controls, dozens of policy documents, and a growing pile of evidence artifacts, finding things matters. A lot. During an audit, someone asks “show me how you handle access reviews” and you need to find the relevant control, its implementation notes, and the supporting evidence. Fast.
Without search, people do what people always do: they scroll. They open tabs. They Ctrl+F inside individual documents. They message a colleague asking “where did we put the thing about the thing?”
I’ve been there. During our own SOC 2 prep, I watched myself doing exactly this — clicking through screens in my own product looking for records I knew existed. Not great when you’re building the tool that’s supposed to solve this exact workflow.
So. We need search. But some of the most valuable content lives in encrypted columns.
The Options (And Why Most of Them Suck)
I spent more time than I’d like to admit thinking about this. Here’s the landscape:
Option 1: Client-side search. Decrypt everything in the browser and search there. Works for small datasets, absolutely falls apart at scale. You’d need to load every record to search across them. Plus, the initial load time would be brutal, and you’re shipping all that decrypted data to the client just in case someone searches.
Option 2: Searchable encryption schemes. There’s academic work on this — order-preserving encryption, homomorphic encryption, encrypted search indexes. Fascinating stuff. Completely impractical for a startup that needs to ship features this quarter. The libraries are immature, the performance characteristics are unpredictable, and explaining to an auditor how your custom cryptographic search works sounds like a nightmare I don’t need.
Option 3: Blind indexes. Hash the content with HMAC and search against hashes. Works for exact matches (like email lookups), useless for full-text search. You can’t stem, rank, or fuzzy-match a hash.
Option 4: Decrypt and index separately. Take the encrypted content, decrypt it at the application layer, tokenize it, and store the tokens in a dedicated search index table. Accept the trade-off explicitly.
I went with Option 4. But the devil — and the audit readiness — is in the details.
The Trade-Off (Let’s Be Honest About It)
Here’s what I’m not going to do: pretend this is a zero-cost decision.
When you decrypt content and write it into a search index, you’re creating a second representation of that data. It’s not plaintext — PostgreSQL tsvector data consists of stemmed, normalized word tokens, not readable sentences. But it’s not nothing either.
For the string “Annual penetration test report for AWS infrastructure,” the tsvector stores something like:
'annual':1 'aws':6 'infrastructur':7 'penetr':2 'report':4 'test':3
It’s not the original text. You can’t reconstruct sentences or context. But an attacker with database access could determine that a record mentions “penetration” and “AWS.” That’s more information than the fully encrypted column reveals.
The index also stores record titles in cleartext — used for rendering search results without loading and decrypting the actual records. Titles like “Q3 Penetration Test Results” or “AWS IAM Access Review Policy” are visible as-is. This is the most exposed piece.
This is a conscious trade-off. And I think the right approach is to make it:
Explicit — organizations opt into search knowing what it means
Reversible — toggle it off and the entire index is purged immediately
Layered — the index still lives behind RBAC, account scoping, encryption at rest, and TLS in transit
Which brings us to the architecture.
The Architecture: Separate Table, Opt-In, Purgeable
The core design principle: the search index is a secondary, disposable data structure. It can be rebuilt from the source records at any time and destroyed without losing anything.
The Index Table
The search index lives in its own table, completely decoupled from the source records. Polymorphic references point back to whatever was indexed:
create_table :search_index, id: :uuid do |t|
t.references :account, type: :uuid, null: false,
foreign_key: { on_delete: :cascade }
t.references :searchable, polymorphic: true, type: :uuid, null: false
t.string :title, null: false, default: ""
t.string :context_label
t.string :url_path
t.tsvector :tsv
t.timestamps
end
add_index :search_index, :tsv, using: :gin
add_index :search_index, %i[searchable_type searchable_id], unique: true
add_index :search_index, :title,
using: :gin, opclass: :gin_trgm_ops
A few design decisions worth explaining.
The title and context_label columns exist purely for rendering. When someone types in a command palette, I need to show results fast — without loading each matching record and decrypting its fields just to display a label. Precomputing these at index time makes the UI feel instant.
The url_path is precomputed for the same reason. No route generation at query time.
The unique index on [searchable_type, searchable_id] gives us upsert semantics — one index entry per record, always overwritten on update, never duplicated.
We also enable pg_trgm for fuzzy matching on titles. Because people misspell “penetration” more often than you’d think.
Per-Organization Opt-In
This isn’t a global feature flag. Each organization explicitly enables search indexing through a boolean on their account record:
after_commit :handle_search_toggle,
if: :saved_change_to_search_indexing_enabled?
def handle_search_toggle
if search_indexing_enabled?
SearchRebuildJob.perform_async(id)
else
SearchPurgeJob.perform_async(id)
end
end
Toggle on → background job decrypts and indexes everything. Toggle off → one DELETE statement wipes every index entry for that organization. Gone.
The toggle itself is gated behind a Flipper flag for controlled rollout, and only account admins can flip it. Feature flag + explicit opt-in + admin-only access means nobody gets indexed by accident.
Making Models Searchable
Rather than scattering index logic across the codebase, I built a concern that each searchable model includes. The model defines a simple contract — what to index and at what weight — and the concern handles the lifecycle:
module Searchable
extend ActiveSupport::Concern
included do
after_commit :reindex_search, on: %i[create update], if: :should_reindex?
after_commit :remove_from_index, on: :destroy
end
private
def should_reindex?
return false unless account&.search_indexing_enabled?
previously_new_record? ||
(self.class.search_tracked_columns & previous_changes.keys).any?
end
end
The should_reindex? check is important. We only reindex when tracked columns actually change — updating a record’s date or status shouldn’t trigger decryption and reindexing of its content. This keeps the write overhead minimal.
Each model declares what to index and at what priority:
def self.search_content_definition
[[:title, "A"], [:identifier, "A"],
[:implementation_notes, "B"], [:description, "C"]]
end
The weight system (A through D) feeds PostgreSQL’s setweight() function, so a title match ranks higher than a match buried in a description. Identifiers like “CC-1.1” get weight A because when someone searches for that string, they want that exact record.
During upsert, the content parts are assembled into a single weighted tsvector:
weighted_sql = content_parts.map do |(text, weight)|
"setweight(to_tsvector('english', coalesce(#{connection.quote(text)}, '')), '#{weight}')"
end.join(" || ")
This is where the encrypted fields get decrypted — at the application layer, in memory, just long enough to tokenize them. The plaintext never hits the database; only the tsvector does.
Smart Search: Full-Text First, Fuzzy Fallback
The search itself uses a two-pass strategy:
def self.smart_search(query, limit: 20)
results = fulltext_search(query, limit: limit)
results = fuzzy_search(query, limit: limit) if results.empty?
results
end
First pass: PostgreSQL full-text search with prefix matching (:* operator) and ts_rank_cd for relevance scoring. This handles the “I know approximately what I’m looking for” case.
Second pass: trigram similarity on the title column. This catches typos and partial matches that full-text search misses. Searching for “Penetraton Tets” should still find “Penetration Test Report.” (Yes, I test with actual typos I’ve made.)
The prefix matching is particularly nice for command palettes. Typing “cloud” immediately matches “CloudTrail Configuration” before you finish the word.
What I’d Tell an Auditor
(Because I’m going through SOC 2 myself, this isn’t hypothetical.)
“What does the search index contain?”
PostgreSQL tsvector tokens — stemmed word fragments with positional information. Not the original text. Also: record titles and context labels in cleartext, used for rendering search results.
“Can you reconstruct the original content from the index?”
No. Tsvectors are one-way transformations. You can determine that a record contains the word stems “penetr” and “aws” but you cannot reconstruct the sentence “Our annual penetration test of AWS infrastructure revealed three findings.”
“What about the titles?”
Titles are stored in cleartext. This is the explicit trade-off: organizations that enable search accept that record titles are visible in the index table. Titles are generally less sensitive than implementation details, but we document this in our security practices.
“What protections exist on the index?”
Five layers: account-scoped multi-tenancy (data isolation), role-based access control checked before any results are rendered, encryption at rest on the database volume, TLS for all data in transit, and instant purge capability by disabling the feature.
“Can an organization opt out?”
Immediately. Toggling the setting triggers a background job that purges every index entry for that organization. The feature is disabled by default — organizations must explicitly enable it.
“Why not use a more sophisticated encryption scheme?”
Because the mature, production-ready options for full-text search over encrypted data don’t exist in a form I’d trust for a compliance product. Partially homomorphic encryption and order-preserving encryption are fascinating research areas, but deploying unproven cryptographic schemes in a security product is worse than a well-understood trade-off with clear documentation and user consent.
I’d rather have an auditable, understandable system than a clever one.
The Rebuild-Purge Lifecycle
One thing I wanted to get right: the index should be entirely rebuildable and entirely destroyable at any point. No orphaned entries, no stale data, no “well, we mostly cleaned up.”
The rebuild job iterates through every searchable model type, decrypts content at the application layer, tokenizes it into tsvectors, and upserts into the index. Each record is processed independently with error handling — one bad record doesn’t kill the whole rebuild.
The purge is simpler: one SQL statement. Done.
Both operations are idempotent. Run the rebuild twice and you get the same index. Run the purge when there’s nothing to purge and nothing breaks. This matters for reliability — background jobs fail, retry, and sometimes run out of order.
Things I’m Still Thinking About
This solution works. It shipped. People are using it. But I’m not going to pretend it’s perfect.
Index staleness. If the rebuild job fails partway through, some records might have stale index entries until the next update triggers a reindex. I’m not losing sleep over this — search results being slightly outdated is a UX annoyance, not a security issue — but it’s not zero.
Scaling. Right now, the index lives in the same PostgreSQL database as everything else. For our current scale, this is fine. If we hit the point where the tsvector GIN index causes write amplification problems, I’ll probably move to a dedicated search service. But premature optimization is… well, you know.
Content vs. metadata sensitivity. I’m treating titles as “acceptable to store in cleartext” and implementation details as “must be tokenized.” But some titles might be sensitive too. A future iteration might let organizations configure sensitivity per field. For now, the current granularity matches what our customers have told us they’re comfortable with.
The philosophical question. Is it better to have no search and force people to manually browse through hundreds of records (which has its own security implications — people share screenshots, export to spreadsheets, ask colleagues on Slack), or to have fast search with a well-documented, auditable index that reveals word stems? I think the latter. But reasonable people can disagree.
The Boring Conclusion
There’s no cryptographic magic here. No breakthrough. Just a careful trade-off with clear boundaries: separate table, per-organization opt-in, instant purge, tsvector tokens instead of plaintext, documented threat model, layered defenses.
Sometimes the right engineering decision isn’t the clever one. It’s the one you can explain to an auditor in plain English, that your customers can enable or disable with a toggle, and that you can blow away in one SQL statement if something goes wrong.
That’s the kind of security I want in a compliance tool. Boring, understandable, and honest about what it does.
(Now if you’ll excuse me, I need to go search for that one control I wrote last month. With cmd+k, thankfully.)