Performance optimization is one of those tasks that every Rails developer knows they should do but often put off. Why? Because profiling tools dump cryptic output, the results are hard to interpret, and by the time you've made sense of it all, you've lost momentum on the feature you were actually building.
What if your profiling workflow could tell you not just what is slow, but why it's slow and how to fix it, in plain English, with concrete code suggestions?
The Traditional Profiling Gap
Alexander Dymo wrote in "Ruby Performance Optimization" that "optimization with the profiler is an art, not engineering." He's right. When you run a profiler like ruby-prof, you get something like this:
Thread ID: 70234567890
Total Time: 1.76s
%total calls name
2.21 842 ActiveRecord::Associations::Preloader::Association#set_inverse
1.75 842 ActiveRecord::Associations::Association#set_inverse_instance
0.51 842 ActiveRecord::Associations::BelongsToAssociation#inversed_fromNow what? You know line 188 in association.rb is taking 2.21% of your time, but why? Is 842 calls normal? Is this an N+1 query? Should you optimize this or focus elsewhere?
The "art" Dymo refers to is exactly this: interpreting these numbers, recognizing patterns, understanding what's signal versus noise. It's a skill that takes time to develop. This is where most developers either guess, spend an hour Googling, or just move on.
Enter: Claude as Your Performance Analyst
Here's a better workflow: you profile your code, visualize the call graph in QCachegrind to understand the flow, then send the report to Claude for analysis.
Let's look at a real example. Here's some, some what contrived, Rails controller code loading categories with nested associations:
def profile
@categories = Category.includes(journeys: { missions: :steps })
@steps = @categories.flat_map { |c| c.journeys.flat_map { |j| j.missions.flat_map(&:steps) } }
@category_steps = @categories.each_with_object({}) do |category, hash|
hash[category.id] = category.journeys.flat_map { |j| j.missions.flat_map(&:steps) }
end
@category_grades = @categories.each_with_object({}) do |category, hash|
hash[category.id] = GradePresenter.new(steps: @category_steps[category.id], category: category)
end
@category_charts = @categories.each_with_object({}) do |category, hash|
hash[category.id] = StatusChartPresenter.new(steps: @category_steps[category.id], category: category)
end
# Pre-compute breakdown chart presenter
@breakdown_chart = CategoryBreakdownChartPresenter.new(steps: @steps, categories: @categories)
# Overall grade presenter
@overall_grade = GradePresenter.new(steps: @steps)
endThe profiler shows 842 calls to set_inverse taking 2.21% of total time. Within seconds, Claude's analysis comes back:
"The biggest bottleneck is association lazy loading and inverse setting, consuming ~4.7% of total runtime. This is a classic N+1 query problem disguised as association overhead, with ActiveRecord processing 842 belongs_to associations individually instead of eagerly loading them."
Fix: "The 842 identical calls to association methods suggests a clear N+1 pattern. Pattern likely: loading a collection, then accessing parent/child associations in a loop."
Expected Impact: "70% reduction in association overhead"
No cryptic percentages. No guessing. Claude identified that even though I used includes(), the subsequent flat_map chains were still triggering inverse association setup for each record individually. The fix isn't about the query, it's about how I'm accessing the data afterward.
The Workflow
The profiling system works in three stages:
1. Profile - Run ruby-prof with GC disabled (following Dymo's methodology) to get clean measurements without garbage collection noise. This is crucial: as Dymo emphasizes, GC overhead can account for 80% of slowdowns, so you need to isolate your code's actual performance.
2. Visualize - Open the results in QCachegrind, a powerful call graph visualizer. See which methods call which, spot hot paths visually, and understand the execution flow at a glance. This visual representation is where you start to develop the "eye" for performance by seeing the 842 calls stacked up makes the problem obvious in a way that numbers alone don't.
3. Analyze - Send the profiler's text output to Claude, which reads it like an expert performance engineer would. Claude identifies bottlenecks, explains the root cause in domain terms (not profiler jargon), and suggests specific optimizations.
For example, when profiling the controller code shown earlier, Claude's analysis comes back with:
"Pattern Analysis: The 842 identical calls to association methods suggests a clear N+1 pattern: 842 belongs_to associations being processed, each requiring individual inverse setup. Pattern likely: loading a collection, then accessing parent/child associations in a loop."
That's the insight you need. Not just "line 188 is slow" but "you're accessing associations in a loop after eager loading, which defeats the purpose."
The key insight: Claude doesn't replace the profiler, it interprets it. QCachegrind shows you the visual flow. Claude explains what you're looking at and what to do about it. You learn to recognize the patterns yourself over time.
Why This Matters for Your Workflow
Profiling shouldn't be a separate "optimization sprint" you do quarterly. It should be integrated into development:
# Working on a new feature that loads data?
profile_method "ReportsController#dashboard"
# See visual call graph in QCachegrind
# Get Claude's analysis: "You're missing eager loading on line 47..."
# Fix it immediately while the code is fresh in your mindCore Principles
This workflow is built wth a favorite book of mine in mind, Alexander Dymo's "Ruby Performance Optimization" (Pragmatic Programmers, 2015). Nearly a decade old, the book's core insights remain as relevant today as ever:
- Memory optimization matters more than CPU optimization
- Garbage collection accounts for 80% of slowdowns
- Always benchmark in production mode
- Profile first, optimize second, measure third
Dymo writes that "optimization with the profiler is an art, not engineering." The profiler gives you data with raw numbers, call counts, percentages. The art is in recognizing what those numbers mean: Is this an N+1 query? Memory churn? GC pressure? Should you fix this or focus elsewhere?
This is where Claude becomes invaluable. It's learned the patterns Dymo teaches. What 842 calls to set_inverse actually indicates, why that percentage matters, what the fix should look like. You still need to understand the fundamentals (read Dymo's book!), but Claude helps you apply that expertise faster.
Dymo's methodologies of disabling GC during profiling, using production mode for accurate measurements, focusing on memory over micro-optimizations to form the technical backbone of this approach. What's changed is that now, instead of spending years learning to recognize performance anti-patterns, Claude can spot them immediately while you're building that expertise.
The Economics
One concern I had: wouldn't sending large profiler reports to Claude's API get expensive?
Turns out, no. By truncating reports to show only the top methods (the signal, not the noise), each analysis costs roughly 2 cents. Even if you profile 50 times during a heavy development day, that's a dollar. Compare that to the developer time saved not puzzling over cryptic profiler output, and it's a steal.
The real cost savings isn't the API, it's the reduction in "I'll optimize this later" (read: never) decisions.
What This Unlocks
With profiling integrated into your workflow and Claude demystifying the results, you start to see patterns:
- That "innocent"
flat_mapchain is triggering 842 inverse association setups - Your presenter is accessing associations in a loop, defeating eager loading
- The
.includes()you added isn't helping because of how you're iterating
More importantly, you learn the art Dymo talks about. Claude doesn't just give you the fish (the fix); it teaches you to fish (why this pattern is slow, what to look for next time). After Claude points out a few N+1 patterns, you start recognizing them before profiling. You begin to internalize what Dymo teaches: that optimization is pattern recognition, not guesswork.
The profiler becomes a learning tool, not a black box. Each analysis builds your intuition for what makes Ruby slow. Over time, you write more performant code from the start because you've learned to see through the profiler's eyes—aided by Claude's explanations.
Getting Started
The barrier to entry is low: add ruby-prof to your Gemfile, install QCachegrind (via Homebrew on Mac), and set up a simple wrapper script that profiles your code and sends results to Claude's API.
The payoff is immediate: the first time you run it on a slow endpoint and Claude explains that your 842 set_inverse calls are actually a disguised N+1 pattern in your flat_map chain, you'll understand what Dymo meant about profiling being an art. Claude helps you see the artistry in the numbers.
Performance optimization doesn't have to be a dark art practiced by senior engineers with profiler fluency. It's still an art, Dymo is right about that, but now it's an art you can learn faster, with a guide who's read all the same books you're learning from.
Built while rereading "Ruby Performance Optimization" by Alexander Dymo (Pragmatic Programmers, 2015). Still the definitive guide to thinking about Ruby performance correctly, and the source of this truth: "Optimization with the profiler is an art, not engineering."