Newsfusion

blog.saeloun.com • 9 days ago

Upgrading from Rails 7.2 to Rails 8 - The Latest and Greatest

Rails 8 (released November 2024) is the smoothest upgrade yet. The focus? Simplicity and performance - removing external dependencies while boosting speed. Solid Queue, Solid Cache, and Solid Cabl... Rails 8 (released November 2024) is the smoothest upgrade yet. The focus? Simplicity and performance - removing external dependencies while boosting speed. Solid Queue, Solid Cache, and Solid Cable eliminate Redis for many use cases. Built-in authentication removes the need for Devise. Ruby 3.4 (latest stable) with continued YJIT improvements delivers excellent performance. Note: This is Part 5 of our Rails Upgrade Series. Read Part 1: Planning Rails Upgrade for the overall strategy. Before We Start Expected Timeline: 1-2 weeks for medium-sized applications (easiest upgrade!) Medium-sized application: 20,000-50,000 lines of code, 30-100 models, moderate test coverage, 2-5 developers. Smaller apps may take 1-2 weeks, larger enterprise apps 6-12 weeks. Prerequisites: Currently on Rails 7.2 (upgrade from 7.1 first if needed) Ruby 3.1.0+ installed (Ruby 3.4 strongly recommended) Test coverage of 80%+ Understanding of background job setup Step 1: Upgrade Ruby to 3.4 (Recommended) Rails 8 requires Ruby 3.1.0 minimum, but Ruby 3.4 (latest stable) is strongly recommended for maximum performance. Why Ruby 3.4? Ruby 3.4 (December 2024) - Latest stable: it as default block parameter - Cleaner block syntax Prism parser improvements - Faster parsing YJIT optimizations - Continued performance improvements Better memory efficiency - Reduced memory usage Enhanced pattern matching - More powerful syntax Improved error messages - Better debugging experience Ruby 3.3 (December 2023) - Also excellent: Prism parser - New default parser RJIT - Pure Ruby JIT compiler M:N thread scheduler - Better concurrency 15-20% faster than Ruby 3.2 with YJIT Ruby 3.2 (December 2022) - Stable choice: Production-ready YJIT - Stable and fast WASI support - WebAssembly integration Data class - Immutable value objects Upgrade Ruby # Check current Ruby version ruby -v # Install Ruby 3.4 (recommended - latest stable) rbenv install 3.4.1 rbenv local 3.4.1 # Or Ruby 3.3 (also good) rbenv install 3.3.6 rbenv local 3.3.6 # Verify ruby -v # => ruby 3.4.1 # Update bundler gem install bundler bundle install Enable YJIT (Critical for Performance) # config/boot.rb ENV['RUBY_YJIT_ENABLE'] = '1' # Or set in environment export RUBY_YJIT_ENABLE=1 Performance gain: Ruby 3.4 continues YJIT improvements, 15-20% faster than Ruby 3.2, or 30-40% faster than Ruby 2.7. Test with Ruby 3.4 # Run full test suite bundle exec rails test # Check YJIT stats rails runner 'puts RubyVM::YJIT.runtime_stats' | grep ratio Step 2: Update the Gemfile # Gemfile # Update Rails gem 'rails', '~> 8.0.0' # Solid Queue (replaces Sidekiq/Resque for many use cases) gem 'solid_queue' # Solid Cache (database-backed caching) gem 'solid_cache' # Solid Cable (WebSocket without Redis) gem 'solid_cable' # Keep existing gems gem 'importmap-rails' gem 'turbo-rails' gem 'stimulus-rails' gem 'sprockets-rails' gem 'puma', '>= 5.0' gem 'bootsnap', require: false # Database adapters gem 'pg', '~> 1.1' # PostgreSQL # or gem 'mysql2', '~> 0.5' # MySQL # Optional: Remove if migrating to Solid alternatives # gem 'sidekiq' # Can be replaced by Solid Queue # gem 'redis' # Can be replaced by Solid Cache/Cable bundle update rails bundle install Step 3: Run the Update Task rails app:update Review changes to: config/application.rb config/environments/*.rb config/initializers/new_framework_defaults_8_0.rb Step 4: Solid Queue (Optional but Recommended) Solid Queue is a database-backed job queue that eliminates Redis dependency for background jobs. When to Use Solid Queue Good fit: Low to medium job volume (< 1000 jobs/minute) Simple job processing needs Want to eliminate Redis PostgreSQL or MySQL database Stick with Sidekiq/Resque if: High job volume (> 1000 jobs/minute) Complex job scheduling needs Already have Redis infrastructure Need advanced features (unique jobs, batches) Install Solid Queue # Install Solid Queue rails solid_queue:install # This creates: # - db/queue_schema.rb # - config/queue.yml # - bin/jobs (worker script) # Run migrations rails db:migrate Configure Solid Queue # config/queue.yml production: dispatchers: - polling_interval: 1 batch_size: 500 workers: - queues: "*" threads: 3 processes: 2 polling_interval: 0.1 # config/environments/production.rb config.active_job.queue_adapter = :solid_queue Migrate from Sidekiq # Before (Sidekiq) class MyJob < ApplicationJob queue_as :default def perform(user_id) # Job logic end end # After (Solid Queue) - Same code! class MyJob < ApplicationJob queue_as :default def perform(user_id) # Job logic - no changes needed end end Run Solid Queue # Development bin/jobs # Production (with systemd, Docker, or Kamal) bundle exec rake solid_queue:start Step 5: Solid Cache (Optional) Solid Cache is a database-backed cache store that eliminates Redis for caching. Install Solid Cache rails solid_cache:install rails db:migrate Configure Solid Cache # config/environments/production.rb config.cache_store = :solid_cache_store Usage (Same as Before) # Fragment caching - no changes <% cache @post do %> <%= render @post %> <% end %> # Low-level caching - no changes Rails.cache.fetch("user_#{user.id}") do user.expensive_calculation end Performance Considerations Pros: No Redis dependency Automatic cleanup of old entries Works with existing database Cons: Slower than Redis for high-traffic sites Database load increases Recommendation: Use Solid Cache for low to medium traffic. Keep Redis for high-traffic applications. Step 6: Solid Cable (Optional) Solid Cable provides WebSocket support without Redis. Install Solid Cable rails solid_cable:install rails db:migrate Configure Solid Cable # config/cable.yml production: adapter: solid_cable # Or keep Redis if we have it # production: # adapter: redis # url: redis://localhost:6379/1 Usage (No Changes) # app/channels/chat_channel.rb class ChatChannel < ApplicationCable::Channel def subscribed stream_from "chat_#{params[:room_id]}" end def receive(data) ActionCable.server.broadcast( "chat_#{params[:room_id]}", data ) end end Step 7: Built-in Authentication Generator Rails 8 includes a built-in authentication generator - no Devise needed for simple use cases. Generate Authentication rails generate authentication This creates: User model with password authentication SessionsController for login/logout PasswordsController for password reset Authentication views Helper methods What We Get # app/models/user.rb class User < ApplicationRecord has_secure_password generates_token_for :password_reset, expires_in: 15.minutes generates_token_for :email_confirmation, expires_in: 24.hours end # app/controllers/application_controller.rb class ApplicationController < ActionController::Base before_action :authenticate private def authenticate if session_record = Session.find_by(id: cookies.signed[:session_id]) Current.session = session_record else redirect_to new_session_path end end end When to Use Built-in Auth vs Devise Use built-in authentication: Simple authentication needs Want full control over auth code Learning Rails authentication Small to medium applications Use Devise: Need OAuth integration Complex authentication requirements Multi-tenancy Advanced features (confirmable, lockable, etc.) Migrate from Devise (If Needed) # Keep the existing User model # Add has_secure_password class User < ApplicationRecord has_secure_password # Keep existing Devise functionality we need # Remove Devise modules we don't use end # Gradually migrate authentication logic # Test thoroughly before removing Devise Step 8: Progressive Web App (PWA) Support Rails 8 adds built-in PWA support. Generate PWA Files rails generate pwa This creates: app/views/pwa/manifest.json.erb - PWA manifest app/views/pwa/service-worker.js - Service worker Icons and configuration Configure PWA  <head> <%= tag.link rel: "manifest", href: pwa_manifest_path %> <%= tag.meta name: "apple-mobile-web-app-capable", content: "yes" %> </head> # config/routes.rb get "manifest" => "pwa#manifest", as: :pwa_manifest get "service-worker" => "pwa#service_worker", as: :pwa_service_worker Step 9: Breaking Changes (Minimal!) Rails 8 has very few breaking changes - this is the smoothest upgrade. 1. Deprecations from Rails 7 Removed # If we fixed Rails 7 deprecation warnings, we're good! # Check for any remaining warnings: RAILS_ENV=test rails test 2>&1 | grep -i deprecat 2. Default Configuration Changes # config/initializers/new_framework_defaults_8_0.rb # Review and enable new defaults Rails.application.config.load_defaults 8.0 3. ActiveStorage Changes # ActiveStorage::Blob#open without block now returns file # Before (Rails 7) blob.open do |file| # Use file end # After (Rails 8) - both work blob.open do |file| # Use file end # Or without block file = blob.open # Use file file.close Step 10: Testing Updates Test Solid Queue Jobs # test/jobs/my_job_test.rb require "test_helper" class MyJobTest < ActiveJob::TestCase test "performs job" do assert_enqueued_with(job: MyJob, args: [1]) do MyJob.perform_later(1) end end test "processes job" do MyJob.perform_now(1) # Assert job effects end end Test Authentication # test/controllers/posts_controller_test.rb class PostsControllerTest < ActionDispatch::IntegrationTest setup do @user = users(:one) sign_in @user end test "should get index" do get posts_url assert_response :success end end Step 11: Performance Improvements Rails 8 + Ruby 3.4 delivers significant performance gains: 15-20% faster than Ruby 3.2 (or 30-40% faster than Ruby 2.7) Lower memory usage (Ruby 3.4 improvements) Faster boot times (Prism parser improvements) Reduced infrastructure costs (no Redis needed) Benchmark YJIT Performance # config/initializers/yjit_stats.rb if defined?(RubyVM::YJIT) && RubyVM::YJIT.enabled? Rails.application.config.after_initialize do at_exit do stats = RubyVM::YJIT.runtime_stats puts "\n=== YJIT Stats ===" puts "Compiled: #{stats[:compiled_iseq_count]} methods" puts "Ratio: #{stats[:ratio]}%" puts "==================\n" end end end Monitor Solid Queue Performance # Check job queue depth SolidQueue::Job.pending.count # Check failed jobs SolidQueue::Job.failed.count # Monitor in production # Use the APM tool (New Relic, Datadog, etc.) Step 12: Deployment with Kamal 2 Rails 8 includes Kamal 2 for zero-downtime deployments. Install Kamal # Kamal is included by default # Initialize configuration kamal init Configure Kamal # config/deploy.yml service: myapp image: myapp/production servers: web: hosts: - 192.168.1.1 labels: traefik.http.routers.myapp.rule: Host(`myapp.com`) options: network: "private" registry: server: ghcr.io username: myusername password: - KAMAL_REGISTRY_PASSWORD env: secret: - RAILS_MASTER_KEY Deploy # First deployment kamal setup # Subsequent deployments kamal deploy # Rollback if needed kamal rollback Upgrade Checklist Note: This checklist covers the most common changes. Depending on the application’s gems, custom code, and architecture, we may encounter additional issues. Always test thoroughly in a staging environment. Upgrade Ruby to 3.1+ (3.4 recommended) Enable YJIT for performance Update Gemfile with Rails 8.0 Run rails app:update Decide on Solid Queue (optional) Decide on Solid Cache (optional) Decide on Solid Cable (optional) Consider built-in authentication (optional) Enable PWA support (optional) Fix any deprecation warnings Run full test suite Test in staging environment Deploy to production with monitoring Common Gotchas 1. Solid Queue vs Sidekiq # Solid Queue doesn't support all Sidekiq features # Check compatibility before migrating: # - Unique jobs -> Use database constraints # - Batches -> Implement manually # - Scheduled jobs -> Supported # - Retries -> Supported 2. Database Load with Solid Cache # Monitor database performance # Solid Cache adds queries to our database # Consider keeping Redis for high-traffic sites # Check cache hit rate Rails.cache.stats 3. Authentication Migration # Don't rush to remove Devise # Test built-in auth thoroughly first # Migrate gradually if needed Migration Strategy Conservative Approach Upgrade to Rails 8 first Keep existing infrastructure (Redis, Sidekiq, Devise) Test thoroughly Gradually adopt Solid gems if beneficial Progressive Approach Upgrade Ruby to 3.4 Upgrade to Rails 8 Migrate to Solid Queue for new jobs Evaluate Solid Cache for non-critical caching Keep Redis for high-traffic features Aggressive Approach Upgrade Ruby to 3.4 Upgrade to Rails 8 Migrate all jobs to Solid Queue Replace Redis with Solid Cache/Cable Use built-in authentication for new features Recommendation: Start conservative, move progressive as we gain confidence. What’s Next Congratulations! We’ve completed the Rails upgrade journey from planning through Rails 8. Series recap: Part 1: Strategic planning and preparation Part 2: Rails 4.2 to 5 - Foundation updates Part 3: Rails 5.2 to 6 - Zeitwerk and Webpacker Part 4: Rails 6.1 to 7 - Import Maps and Hotwire Part 5: Rails 7.2 to 8 - Solid gems and simplification Keep the Rails App Modern Monitor deprecation warnings in each Rails version Upgrade Ruby regularly for performance and security Test thoroughly at each step Stay informed about Rails releases Contribute back to the Rails community Resources Official Rails 8.0 Release Notes Rails Upgrade Guide Solid Queue Documentation Solid Cache Documentation Kamal Documentation Ruby 3.4 Release Notes Ruby 3.3 Release Notes RailsDiff 7.2 to 8.0 At Saeloun, we’ve helped numerous teams successfully upgrade to Rails 8 and modernize their infrastructure. Whether planning a major upgrade or needing help optimizing a Rails 8 application, we’re here to help. Contact us for Rails upgrade consulting

Topics: rails 8 simplicity performance redis

Subsystem: Ruby On Rails

felipeelias.github.io • 9 days ago

Elixir Toolbox - Major Update

I’ve been working on Elixir Toolbox quite a bit lately and wanted to share what’s new: More categories: ~150 now, including AI/LLM sections (and more) JSON API: the aggregated data is now exp...

Topics: elixir-toolbox elixir ai llms daisyui

Subsystem: Ruby On Rails

digitalocean.com • 9 days ago

Package Management Essentials: apt, yum, dnf, pkg

Learn how apt, yum, dnf, and pkg manage software on Linux and FreeBSD. Compare commands, workflows, and best practices. Start with the right tool for your system.

Topics: apt yum dnf pkg linux

Subsystem: Ruby On Rails

digitalocean.com • 10 days ago

Create SSH Keys with OpenSSH on macOS, Linux, or Windows

Create SSH keys with OpenSSH on macOS, Linux, or WSL. Generate secure key pairs, set file permissions and set up passwordless SSH access.

Topics: ssh openssh macos linux wsl

Subsystem: Ruby On Rails

rubyland.news • 10 days ago

Building LLM-Friendly MCP Tools in RubyMine: Pagination, Filtering, and Error Design

Originally appeared on RubyMine : Intelligent Ruby and Rails IDE | The JetBrains Blog. RubyMine enhances the developer experience with context-aware search features that make navigating a Rails app...

Topics: rubymine ai-assisted workflows context-aware analysis

Subsystem: Ruby On Rails

justin.searls.co • 10 days ago

🔥 My coding agent harnesses are designed to enable…

My coding agent harnesses are designed to enable parallel serial work—multiple agents running in multiple tabs, all committing to main instead of worktrees. turbocommit does this by linking each se...

Topics: coding agents parallel work commits

Subsystem: Ruby On Rails

digitalocean.com • 10 days ago

How To Set Up WireGuard on Rocky Linux 8

Install and configure WireGuard on Rocky Linux 8. Set up a secure VPN server, configure firewall rules, manage peers, and troubleshoot connections.

Topics: wireguard rocky linux firewall peers troubleshooting

Subsystem: Ruby On Rails

rubyland.news • 10 days ago

Continuations 2026/08: Great feedback

Originally appeared on Tim Riley. Oops, nearly missed these weeknotes. Let me make this a quick one just to sneak it in and keep the streak alive (6 months and counting!) My big achievement this...

Topics: hanami rspec feedback cli features

Subsystem: Ruby On Rails

dcyoungdev.substack.com • 10 days ago

Reading Roundup: AI Tooling, Rails Conventions, and Code Worth Keeping

Five reads on building Rails codebases that stay readable, scalable, and honest under pressure

Topics: ai tooling code conventions scalability readability

Subsystem: Ruby On Rails

maciej.litwiniuk.net • 10 days ago

Making Encrypted Records Searchable (Without Losing Sleep Over Your Audit)

TL;DR I needed full-text search across compliance records in Humadroid — some of which are encrypted at the application layer. The naive answer is “just decrypt everything into... TL;DR I needed full-text search across compliance records in Humadroid — some of which are encrypted at the application layer. The naive answer is “just decrypt everything into a search index.” The real answer involves understanding exactly what you’re trading, making that trade-off explicit and per-organization, and designing the index so it reveals as little as possible. Here’s the pattern I built, what I considered, and what I’d tell an auditor who asks about it. Why Encrypted Records Exist in the First Place Humadroid is a GRC (Governance, Risk, and Compliance) platform. Our customers use it to manage SOC 2 controls, store implementation notes, track evidence, and maintain security documentation. Some of that content is sensitive. Think: “Here’s how we configured our AWS CloudTrail logging” or “Our penetration test found these three critical vulnerabilities.” We encrypt sensitive fields at the application layer using Rails’ built-in encrypts directive. Not just database-level encryption at rest (we have that too) — actual column-level encryption where the plaintext never touches the database. This matters because a database breach doesn’t expose the content. An attacker with a SQL dump sees gibberish. Your DBA can’t read implementation notes. The data is genuinely protected at rest in a way that disk encryption alone doesn’t provide. (If you’re building a compliance tool and you don’t encrypt this stuff… I have questions.) In practice, the split looks something like this: record titles and identifiers live unencrypted because they need to be sortable, filterable, and displayable in lists. The actual substance — implementation details, policy content, audit findings — gets encrypted. Not every model has encrypted fields, but the ones that carry real security context do. And here’s the problem. You can’t run WHERE content ILIKE '%cloudtrail%' on an encrypted column. The database doesn’t know what’s in there. That’s the entire point. The Search Problem When you have 200+ compliance controls, dozens of policy documents, and a growing pile of evidence artifacts, finding things matters. A lot. During an audit, someone asks “show me how you handle access reviews” and you need to find the relevant control, its implementation notes, and the supporting evidence. Fast. Without search, people do what people always do: they scroll. They open tabs. They Ctrl+F inside individual documents. They message a colleague asking “where did we put the thing about the thing?” I’ve been there. During our own SOC 2 prep, I watched myself doing exactly this — clicking through screens in my own product looking for records I knew existed. Not great when you’re building the tool that’s supposed to solve this exact workflow. So. We need search. But some of the most valuable content lives in encrypted columns. The Options (And Why Most of Them Suck) I spent more time than I’d like to admit thinking about this. Here’s the landscape: Option 1: Client-side search. Decrypt everything in the browser and search there. Works for small datasets, absolutely falls apart at scale. You’d need to load every record to search across them. Plus, the initial load time would be brutal, and you’re shipping all that decrypted data to the client just in case someone searches. Option 2: Searchable encryption schemes. There’s academic work on this — order-preserving encryption, homomorphic encryption, encrypted search indexes. Fascinating stuff. Completely impractical for a startup that needs to ship features this quarter. The libraries are immature, the performance characteristics are unpredictable, and explaining to an auditor how your custom cryptographic search works sounds like a nightmare I don’t need. Option 3: Blind indexes. Hash the content with HMAC and search against hashes. Works for exact matches (like email lookups), useless for full-text search. You can’t stem, rank, or fuzzy-match a hash. Option 4: Decrypt and index separately. Take the encrypted content, decrypt it at the application layer, tokenize it, and store the tokens in a dedicated search index table. Accept the trade-off explicitly. I went with Option 4. But the devil — and the audit readiness — is in the details. The Trade-Off (Let’s Be Honest About It) Here’s what I’m not going to do: pretend this is a zero-cost decision. When you decrypt content and write it into a search index, you’re creating a second representation of that data. It’s not plaintext — PostgreSQL tsvector data consists of stemmed, normalized word tokens, not readable sentences. But it’s not nothing either. For the string “Annual penetration test report for AWS infrastructure,” the tsvector stores something like: 'annual':1 'aws':6 'infrastructur':7 'penetr':2 'report':4 'test':3 It’s not the original text. You can’t reconstruct sentences or context. But an attacker with database access could determine that a record mentions “penetration” and “AWS.” That’s more information than the fully encrypted column reveals. The index also stores record titles in cleartext — used for rendering search results without loading and decrypting the actual records. Titles like “Q3 Penetration Test Results” or “AWS IAM Access Review Policy” are visible as-is. This is the most exposed piece. This is a conscious trade-off. And I think the right approach is to make it: Explicit — organizations opt into search knowing what it means Reversible — toggle it off and the entire index is purged immediately Layered — the index still lives behind RBAC, account scoping, encryption at rest, and TLS in transit Which brings us to the architecture. The Architecture: Separate Table, Opt-In, Purgeable The core design principle: the search index is a secondary, disposable data structure. It can be rebuilt from the source records at any time and destroyed without losing anything. The Index Table The search index lives in its own table, completely decoupled from the source records. Polymorphic references point back to whatever was indexed: create_table :search_index, id: :uuid do |t| t.references :account, type: :uuid, null: false, foreign_key: { on_delete: :cascade } t.references :searchable, polymorphic: true, type: :uuid, null: false t.string :title, null: false, default: "" t.string :context_label t.string :url_path t.tsvector :tsv t.timestamps end add_index :search_index, :tsv, using: :gin add_index :search_index, %i[searchable_type searchable_id], unique: true add_index :search_index, :title, using: :gin, opclass: :gin_trgm_ops A few design decisions worth explaining. The title and context_label columns exist purely for rendering. When someone types in a command palette, I need to show results fast — without loading each matching record and decrypting its fields just to display a label. Precomputing these at index time makes the UI feel instant. The url_path is precomputed for the same reason. No route generation at query time. The unique index on [searchable_type, searchable_id] gives us upsert semantics — one index entry per record, always overwritten on update, never duplicated. We also enable pg_trgm for fuzzy matching on titles. Because people misspell “penetration” more often than you’d think. Per-Organization Opt-In This isn’t a global feature flag. Each organization explicitly enables search indexing through a boolean on their account record: after_commit :handle_search_toggle, if: :saved_change_to_search_indexing_enabled? def handle_search_toggle if search_indexing_enabled? SearchRebuildJob.perform_async(id) else SearchPurgeJob.perform_async(id) end end Toggle on → background job decrypts and indexes everything. Toggle off → one DELETE statement wipes every index entry for that organization. Gone. The toggle itself is gated behind a Flipper flag for controlled rollout, and only account admins can flip it. Feature flag + explicit opt-in + admin-only access means nobody gets indexed by accident. Making Models Searchable Rather than scattering index logic across the codebase, I built a concern that each searchable model includes. The model defines a simple contract — what to index and at what weight — and the concern handles the lifecycle: module Searchable extend ActiveSupport::Concern included do after_commit :reindex_search, on: %i[create update], if: :should_reindex? after_commit :remove_from_index, on: :destroy end private def should_reindex? return false unless account&.search_indexing_enabled? previously_new_record? || (self.class.search_tracked_columns & previous_changes.keys).any? end end The should_reindex? check is important. We only reindex when tracked columns actually change — updating a record’s date or status shouldn’t trigger decryption and reindexing of its content. This keeps the write overhead minimal. Each model declares what to index and at what priority: def self.search_content_definition [[:title, "A"], [:identifier, "A"], [:implementation_notes, "B"], [:description, "C"]] end The weight system (A through D) feeds PostgreSQL’s setweight() function, so a title match ranks higher than a match buried in a description. Identifiers like “CC-1.1” get weight A because when someone searches for that string, they want that exact record. During upsert, the content parts are assembled into a single weighted tsvector: weighted_sql = content_parts.map do |(text, weight)| "setweight(to_tsvector('english', coalesce(#{connection.quote(text)}, '')), '#{weight}')" end.join(" || ") This is where the encrypted fields get decrypted — at the application layer, in memory, just long enough to tokenize them. The plaintext never hits the database; only the tsvector does. Smart Search: Full-Text First, Fuzzy Fallback The search itself uses a two-pass strategy: def self.smart_search(query, limit: 20) results = fulltext_search(query, limit: limit) results = fuzzy_search(query, limit: limit) if results.empty? results end First pass: PostgreSQL full-text search with prefix matching (:* operator) and ts_rank_cd for relevance scoring. This handles the “I know approximately what I’m looking for” case. Second pass: trigram similarity on the title column. This catches typos and partial matches that full-text search misses. Searching for “Penetraton Tets” should still find “Penetration Test Report.” (Yes, I test with actual typos I’ve made.) The prefix matching is particularly nice for command palettes. Typing “cloud” immediately matches “CloudTrail Configuration” before you finish the word. What I’d Tell an Auditor (Because I’m going through SOC 2 myself, this isn’t hypothetical.) “What does the search index contain?” PostgreSQL tsvector tokens — stemmed word fragments with positional information. Not the original text. Also: record titles and context labels in cleartext, used for rendering search results. “Can you reconstruct the original content from the index?” No. Tsvectors are one-way transformations. You can determine that a record contains the word stems “penetr” and “aws” but you cannot reconstruct the sentence “Our annual penetration test of AWS infrastructure revealed three findings.” “What about the titles?” Titles are stored in cleartext. This is the explicit trade-off: organizations that enable search accept that record titles are visible in the index table. Titles are generally less sensitive than implementation details, but we document this in our security practices. “What protections exist on the index?” Five layers: account-scoped multi-tenancy (data isolation), role-based access control checked before any results are rendered, encryption at rest on the database volume, TLS for all data in transit, and instant purge capability by disabling the feature. “Can an organization opt out?” Immediately. Toggling the setting triggers a background job that purges every index entry for that organization. The feature is disabled by default — organizations must explicitly enable it. “Why not use a more sophisticated encryption scheme?” Because the mature, production-ready options for full-text search over encrypted data don’t exist in a form I’d trust for a compliance product. Partially homomorphic encryption and order-preserving encryption are fascinating research areas, but deploying unproven cryptographic schemes in a security product is worse than a well-understood trade-off with clear documentation and user consent. I’d rather have an auditable, understandable system than a clever one. The Rebuild-Purge Lifecycle One thing I wanted to get right: the index should be entirely rebuildable and entirely destroyable at any point. No orphaned entries, no stale data, no “well, we mostly cleaned up.” The rebuild job iterates through every searchable model type, decrypts content at the application layer, tokenizes it into tsvectors, and upserts into the index. Each record is processed independently with error handling — one bad record doesn’t kill the whole rebuild. The purge is simpler: one SQL statement. Done. Both operations are idempotent. Run the rebuild twice and you get the same index. Run the purge when there’s nothing to purge and nothing breaks. This matters for reliability — background jobs fail, retry, and sometimes run out of order. Things I’m Still Thinking About This solution works. It shipped. People are using it. But I’m not going to pretend it’s perfect. Index staleness. If the rebuild job fails partway through, some records might have stale index entries until the next update triggers a reindex. I’m not losing sleep over this — search results being slightly outdated is a UX annoyance, not a security issue — but it’s not zero. Scaling. Right now, the index lives in the same PostgreSQL database as everything else. For our current scale, this is fine. If we hit the point where the tsvector GIN index causes write amplification problems, I’ll probably move to a dedicated search service. But premature optimization is… well, you know. Content vs. metadata sensitivity. I’m treating titles as “acceptable to store in cleartext” and implementation details as “must be tokenized.” But some titles might be sensitive too. A future iteration might let organizations configure sensitivity per field. For now, the current granularity matches what our customers have told us they’re comfortable with. The philosophical question. Is it better to have no search and force people to manually browse through hundreds of records (which has its own security implications — people share screenshots, export to spreadsheets, ask colleagues on Slack), or to have fast search with a well-documented, auditable index that reveals word stems? I think the latter. But reasonable people can disagree. The Boring Conclusion There’s no cryptographic magic here. No breakthrough. Just a careful trade-off with clear boundaries: separate table, per-organization opt-in, instant purge, tsvector tokens instead of plaintext, documented threat model, layered defenses. Sometimes the right engineering decision isn’t the clever one. It’s the one you can explain to an auditor in plain English, that your customers can enable or disable with a toggle, and that you can blow away in one SQL statement if something goes wrong. That’s the kind of security I want in a compliance tool. Boring, understandable, and honest about what it does. (Now if you’ll excuse me, I need to go search for that one control I wrote last month. With cmd+k, thankfully.)

Topics: cloudtrail humadroid grc compliance security

Subsystem: Ruby On Rails

Articles