Bringing AI into Your Performance Workflow: Claude for Rails Profiling

15 min read

Performance optimization is one of those tasks that every Rails developer knows they should do but often put off. Why? Because profiling tools dump cryptic output, the results are hard to interpret, and by the time you've made sense of it all, you've lost momentum on the feature you were actually building.

What if your profiling workflow could tell you not just what is slow, but why it's slow and how to fix it, in plain English, with concrete code suggestions?

The Traditional Profiling Gap

Alexander Dymo wrote in "Ruby Performance Optimization" that "optimization with the profiler is an art, not engineering." He's right. When you run a profiler like ruby-prof, you get something like this:

Thread ID: 70234567890
Total Time: 1.76s

%total   calls  name
  2.21    842   ActiveRecord::Associations::Preloader::Association#set_inverse
  1.75    842   ActiveRecord::Associations::Association#set_inverse_instance
  0.51    842   ActiveRecord::Associations::BelongsToAssociation#inversed_from

Now what? You know line 188 in association.rb is taking 2.21% of your time, but why? Is 842 calls normal? Is this an N+1 query? Should you optimize this or focus elsewhere?

The "art" Dymo refers to is exactly this: interpreting these numbers, recognizing patterns, understanding what's signal versus noise. It's a skill that takes time to develop. This is where most developers either guess, spend an hour Googling, or just move on.

Enter: Claude as Your Performance Analyst

Here's a better workflow: you profile your code, visualize the call graph in QCachegrind to understand the flow, then send the report to Claude for analysis.

Let's look at a real example. Here's some, some what contrived, Rails controller code loading categories with nested associations:

  def profile
    @categories = Category.includes(journeys: { missions: :steps })
    @steps = @categories.flat_map { |c| c.journeys.flat_map { |j| j.missions.flat_map(&:steps) } }

    @category_steps = @categories.each_with_object({}) do |category, hash|
      hash[category.id] = category.journeys.flat_map { |j| j.missions.flat_map(&:steps) }
    end

    @category_grades = @categories.each_with_object({}) do |category, hash|
      hash[category.id] = GradePresenter.new(steps: @category_steps[category.id], category: category)
    end

    @category_charts = @categories.each_with_object({}) do |category, hash|
      hash[category.id] = StatusChartPresenter.new(steps: @category_steps[category.id], category: category)
    end

    # Pre-compute breakdown chart presenter
    @breakdown_chart = CategoryBreakdownChartPresenter.new(steps: @steps, categories: @categories)

    # Overall grade presenter
    @overall_grade = GradePresenter.new(steps: @steps)
  end

The profiler shows 842 calls to set_inverse taking 2.21% of total time. Within seconds, Claude's analysis comes back:

"The biggest bottleneck is association lazy loading and inverse setting, consuming ~4.7% of total runtime. This is a classic N+1 query problem disguised as association overhead, with ActiveRecord processing 842 belongs_to associations individually instead of eagerly loading them."

Fix: "The 842 identical calls to association methods suggests a clear N+1 pattern. Pattern likely: loading a collection, then accessing parent/child associations in a loop."

Expected Impact: "70% reduction in association overhead"

No cryptic percentages. No guessing. Claude identified that even though I used includes(), the subsequent flat_map chains were still triggering inverse association setup for each record individually. The fix isn't about the query, it's about how I'm accessing the data afterward.

The Workflow

The profiling system works in three stages:

1. Profile - Run ruby-prof with GC disabled (following Dymo's methodology) to get clean measurements without garbage collection noise. This is crucial: as Dymo emphasizes, GC overhead can account for 80% of slowdowns, so you need to isolate your code's actual performance.

2. Visualize - Open the results in QCachegrind, a powerful call graph visualizer. See which methods call which, spot hot paths visually, and understand the execution flow at a glance. This visual representation is where you start to develop the "eye" for performance by seeing the 842 calls stacked up makes the problem obvious in a way that numbers alone don't.

3. Analyze - Send the profiler's text output to Claude, which reads it like an expert performance engineer would. Claude identifies bottlenecks, explains the root cause in domain terms (not profiler jargon), and suggests specific optimizations.

For example, when profiling the controller code shown earlier, Claude's analysis comes back with:

"Pattern Analysis: The 842 identical calls to association methods suggests a clear N+1 pattern: 842 belongs_to associations being processed, each requiring individual inverse setup. Pattern likely: loading a collection, then accessing parent/child associations in a loop."

That's the insight you need. Not just "line 188 is slow" but "you're accessing associations in a loop after eager loading, which defeats the purpose."

The key insight: Claude doesn't replace the profiler, it interprets it. QCachegrind shows you the visual flow. Claude explains what you're looking at and what to do about it. You learn to recognize the patterns yourself over time.

Why This Matters for Your Workflow

Profiling shouldn't be a separate "optimization sprint" you do quarterly. It should be integrated into development:

# Working on a new feature that loads data?
profile_method "ReportsController#dashboard"

# See visual call graph in QCachegrind
# Get Claude's analysis: "You're missing eager loading on line 47..."
# Fix it immediately while the code is fresh in your mind

Core Principles

This workflow is built wth a favorite book of mine in mind, Alexander Dymo's "Ruby Performance Optimization" (Pragmatic Programmers, 2015). Nearly a decade old, the book's core insights remain as relevant today as ever:

Memory optimization matters more than CPU optimization
Garbage collection accounts for 80% of slowdowns
Always benchmark in production mode
Profile first, optimize second, measure third

Dymo writes that "optimization with the profiler is an art, not engineering." The profiler gives you data with raw numbers, call counts, percentages. The art is in recognizing what those numbers mean: Is this an N+1 query? Memory churn? GC pressure? Should you fix this or focus elsewhere?

This is where Claude becomes invaluable. It's learned the patterns Dymo teaches. What 842 calls to set_inverse actually indicates, why that percentage matters, what the fix should look like. You still need to understand the fundamentals (read Dymo's book!), but Claude helps you apply that expertise faster.

Dymo's methodologies of disabling GC during profiling, using production mode for accurate measurements, focusing on memory over micro-optimizations to form the technical backbone of this approach. What's changed is that now, instead of spending years learning to recognize performance anti-patterns, Claude can spot them immediately while you're building that expertise.

The Economics

One concern I had: wouldn't sending large profiler reports to Claude's API get expensive?

Turns out, no. By truncating reports to show only the top methods (the signal, not the noise), each analysis costs roughly 2 cents. Even if you profile 50 times during a heavy development day, that's a dollar. Compare that to the developer time saved not puzzling over cryptic profiler output, and it's a steal.

The real cost savings isn't the API, it's the reduction in "I'll optimize this later" (read: never) decisions.

What This Unlocks

With profiling integrated into your workflow and Claude demystifying the results, you start to see patterns:

That "innocent" flat_map chain is triggering 842 inverse association setups
Your presenter is accessing associations in a loop, defeating eager loading
The .includes() you added isn't helping because of how you're iterating

More importantly, you learn the art Dymo talks about. Claude doesn't just give you the fish (the fix); it teaches you to fish (why this pattern is slow, what to look for next time). After Claude points out a few N+1 patterns, you start recognizing them before profiling. You begin to internalize what Dymo teaches: that optimization is pattern recognition, not guesswork.

The profiler becomes a learning tool, not a black box. Each analysis builds your intuition for what makes Ruby slow. Over time, you write more performant code from the start because you've learned to see through the profiler's eyes—aided by Claude's explanations.

Getting Started

The barrier to entry is low: add ruby-prof to your Gemfile, install QCachegrind (via Homebrew on Mac), and set up a simple wrapper script that profiles your code and sends results to Claude's API.

The payoff is immediate: the first time you run it on a slow endpoint and Claude explains that your 842 set_inverse calls are actually a disguised N+1 pattern in your flat_map chain, you'll understand what Dymo meant about profiling being an art. Claude helps you see the artistry in the numbers.

Performance optimization doesn't have to be a dark art practiced by senior engineers with profiler fluency. It's still an art, Dymo is right about that, but now it's an art you can learn faster, with a guide who's read all the same books you're learning from.

Built while rereading "Ruby Performance Optimization" by Alexander Dymo (Pragmatic Programmers, 2015). Still the definitive guide to thinking about Ruby performance correctly, and the source of this truth: "Optimization with the profiler is an art, not engineering."