Even the most seasoned DevOps engineers can stumble into common pitfalls when configuring and managing New Relic, often leading to missed insights, alert fatigue, or inflated costs. Getting it right from the start means your observability platform actually delivers value, not just noise, but how many teams genuinely maximize their investment in this powerful technology?
Key Takeaways
- Configure sampling rates judiciously for APM and Infrastructure agents to balance data granularity with cost, aiming for a 10% transaction sample rate for most applications.
- Implement custom instrumentation for critical business transactions using New Relic’s API calls to gain visibility beyond default frameworks.
- Establish meaningful alert conditions with baselines and multiple thresholds to reduce false positives and ensure actionable notifications.
- Regularly audit and prune unused dashboards and alerts to maintain a clean, efficient monitoring environment, performing this review quarterly.
- Leverage New Relic One’s query builder for NRQL to create targeted, multi-faceted dashboards that correlate performance with business metrics.
1. Overlooking Agent Configuration: The Silent Performance Killer
One of the biggest mistakes I see teams make is a “set it and forget it” mentality with their New Relic agents. They install the APM agent, maybe the Infrastructure agent, and then assume everything just works perfectly out of the box. This is a recipe for either missing critical data or, paradoxically, drowning in too much irrelevant data.
When we brought New Relic into our stack at a previous company, a small SaaS firm in Atlanta’s Midtown district, we initially just deployed the default agents. For weeks, we were mystified why certain intermittent performance issues weren’t showing up clearly in our dashboards. It turned out our default sampling rate for transactions was too low on less-trafficked services, and our log ingestion was overwhelming our quotas with non-critical debug messages. It was a mess.
Pro Tip: Fine-Tune Sampling Rates
For most applications, I recommend starting with an APM transaction trace sampling rate of around 10%. For very high-throughput services, you might go lower, say 5%, but for critical business transactions, consider raising it to 20% or even 50% temporarily during troubleshooting. You can adjust this in your newrelic.yml file (for Java/Node.js/Ruby agents) or via environment variables for others.
Example configuration snippet for a Node.js application:
# newrelic.js
exports.config = {
app_name: ['My Node.js Service'],
license_key: 'YOUR_NEW_RELIC_LICENSE_KEY',
agent_enabled: true,
transaction_tracer: {
enabled: true,
transaction_threshold: 'apdex_f',
sampling_target: 10, // Aim for 10 transactions per minute
sampling_target_period_in_seconds: 60 // Over a 1-minute period
},
// ... other configurations
};
For Infrastructure agents, pay close attention to which metrics you’re collecting. Do you really need every single process metric from every single container? Probably not. Filter out unnecessary processes or directories using the exclude_matching_metrics or exclude_matching_processes directives in your newrelic-infra.yml.
Common Mistake: Ignoring Log Management
Another common misstep is letting your logs run wild. New Relic Logs is incredibly powerful, but it can get expensive and noisy if you’re ingesting everything. I’ve seen teams ingest gigabytes of verbose debug logs that provide little operational value, burying critical errors. This is usually due to a lack of proper log level configuration in the application itself, combined with an unfiltered log forwarding setup.
To avoid this, configure your application logging frameworks (Log4j, NLog, Winston, etc.) to emit appropriate log levels for production environments. Then, use New Relic’s parsing and filtering rules to drop or sample logs at the ingestion point. This is often done via the New Relic UI under “Logs” -> “Log management settings” -> “Filtering rules.”
2. Relying Solely on Default Dashboards: The Blind Spot Syndrome
The out-of-the-box dashboards New Relic provides are fantastic for a quick overview. They give you a baseline. But relying solely on them is like driving a car only looking at the speedometer – you’re missing the road ahead, the fuel gauge, and the engine warning lights. Every application, every business, has unique critical flows that need tailored monitoring. Generic dashboards won’t cut it.
I remember a client last year, a logistics company operating out of a warehouse near Hartsfield-Jackson Airport, was experiencing intermittent order processing failures. Their default New Relic APM dashboards showed average transaction times were okay, error rates seemed normal. But when we dug in, we realized the “order processing” transaction was actually a composite of several microservices, and one specific, rarely-hit internal API call was timing out. The default dashboards just averaged it all out, masking the problem. We needed custom instrumentation.
Pro Tip: Custom Instrumentation for Business Critical Transactions
This is where New Relic’s custom instrumentation shines. You need to identify your key business transactions – things like “User Login,” “Place Order,” “Process Payment,” “Generate Report.” These are the lifeblood of your application. Use the New Relic agent APIs to mark these transactions explicitly.
For example, in Java, you might use @Trace annotations or the NewRelic.getAgent().getTransaction().setTransactionName() method. For Node.js, you’d use newrelic.startWebTransaction() or newrelic.setTransactionName(). This allows you to track their performance, error rates, and throughput independently of the broader application. It’s the difference between knowing your car is running and knowing your engine is misfiring on cylinder 3.
Example for a Node.js custom transaction:
const newrelic = require('newrelic');
async function processUserOrder(orderId) {
return newrelic.startWebTransaction('ProcessUserOrder', async function() {
// Custom logic for order processing
const result = await someOrderService.process(orderId);
newrelic.addCustomAttribute('orderId', orderId); // Add context
newrelic.addCustomAttribute('orderStatus', result.status);
return result;
});
}
Once instrumented, create custom dashboards using NRQL (New Relic Query Language). Don’t be afraid of NRQL; it’s incredibly powerful. You can query specific events, aggregate data, and build visualizations that directly reflect your business KPIs. For instance, a dashboard showing “Average ‘Place Order’ duration,” “Orders Processed per Minute,” and “Failed Payments.”
Common Mistake: Forgetting Custom Attributes
Many teams overlook the immense value of custom attributes. These are key-value pairs you attach to transactions or events, providing context that goes beyond standard metrics. Want to know the performance of your application for users in a specific region, or for customers on a particular subscription plan? Add region or subscriptionPlan as custom attributes. This allows for incredibly granular filtering and analysis in NRQL, transforming generic data into actionable business intelligence.
““There’s basically no more GitHub at all anymore,” one GitHub employee told me last month. “It’s all Microsoft, and the company is collapsing, both in outages that are reallllly bad and have torched the company reputation… and in an exodus of leadership.””
3. Alert Overload: The “Boy Who Cried Wolf” Problem
Ah, alerts. The bane of many on-call rotations. I’ve seen teams bombarded with hundreds of alerts daily, most of them non-actionable, leading to alert fatigue. When every minor fluctuation triggers a PagerDuty incident, engineers start ignoring them, and then, inevitably, a real crisis slips through the cracks. This isn’t just annoying; it’s dangerous. The problem isn’t New Relic; it’s how teams configure their alert conditions.
Pro Tip: Baseline-Based Alerting and Multiple Thresholds
Stop using static thresholds for everything! New Relic offers powerful baseline alerting. This feature learns the normal behavior of your metrics and alerts you when deviations occur. This is far more effective for metrics like CPU utilization or request queue length, which naturally fluctuate. For example, setting an alert for “CPU usage above 80%” might be fine for a batch job, but disastrous for a web server whose CPU normally hovers around 20% but occasionally spikes to 60% during peak hours.
Furthermore, implement multiple thresholds (warning and critical). A warning threshold (e.g., latency above 500ms for 5 minutes) can trigger a Slack notification for awareness, while a critical threshold (e.g., latency above 1000ms for 2 minutes) triggers a PagerDuty incident. This graduated approach ensures that only truly urgent issues escalate to on-call engineers.
When configuring alerts, always ask: “Is this alert actionable? What specific action would I take if this alert fires?” If you can’t answer that, the alert is probably noise.”
Screenshot Description: A screenshot of the New Relic Alerts UI showing a condition with two thresholds: a ‘Warning’ threshold set at 500ms average duration over 5 minutes and a ‘Critical’ threshold at 1000ms average duration over 2 minutes, applied to a specific APM transaction metric.
Common Mistake: Not Using NRQL Alerts
Many teams stick to basic metric-based alerts. While useful, they often miss complex scenarios. New Relic’s NRQL alert conditions are a game-changer. You can write custom queries to define highly specific alert logic. For instance, “alert me if the error rate for ‘Place Order’ transactions is greater than 2% AND the number of successful ‘Place Order’ transactions is less than 10 per minute.” This combines multiple conditions, providing a much more nuanced and accurate alert.
We used NRQL alerts extensively at a prior role when monitoring a payment gateway. We didn’t just care about individual transaction failures; we cared if the failure rate for a specific payment method (e.g., ‘Visa’) exceeded a threshold for a sustained period, indicating a potential issue with that processor. NRQL allowed us to pinpoint this immediately.
4. Neglecting Regular Maintenance: The Accumulation of Technical Debt
New Relic isn’t a static tool; it evolves with your application and infrastructure. What was relevant six months ago might be obsolete today. Failing to regularly review and prune your New Relic configuration leads to technical debt, wasted resources, and a monitoring environment that becomes increasingly difficult to navigate. This is where I often see the most cost inefficiency – paying for data you no longer use or dashboards no one looks at.
Pro Tip: Quarterly Audits and Pruning
I advocate for a quarterly audit of your New Relic setup. This should involve:
- Reviewing Dashboards: Are all dashboards still relevant? Are they being used? Archive or delete stale ones.
- Checking Alerts: Are all alert conditions still valid? Are there any “noisy” alerts that need refinement? Are there critical metrics that aren’t being alerted on?
- Agent Configuration: Are agents deployed on all necessary services? Are sampling rates still appropriate? Are there any services with agents that have been decommissioned but are still reporting?
- Log Ingestion Rules: Are your log filtering rules still effective? Are you ingesting unnecessary logs?
- Cost Review: Understand your data ingestion volume and identify areas for potential reduction without sacrificing critical visibility. New Relic One provides detailed usage reports.
This isn’t just about cost savings, though that’s a significant benefit. It’s about maintaining a clean, effective observability platform that your team trusts and can efficiently use. An environment cluttered with irrelevant data and defunct alerts breeds distrust and inefficiency.
Common Mistake: Ignoring Data Retention Policies
New Relic offers different data retention periods for various data types. Many teams don’t pay attention to this, assuming all data is kept indefinitely. While some data is, others, like raw transaction traces, have shorter retention. If you need historical data for compliance, long-term trend analysis, or specific debugging, ensure you understand these policies and potentially export data or use New Relic’s long-term storage options if available for your specific data needs. This often means leveraging New Relic’s Data Retention Policies documentation to plan your data strategy effectively.
5. Underutilizing New Relic One: Sticking to the Basics
New Relic One isn’t just a UI refresh; it’s a powerful platform designed for full-stack observability. Many teams, especially those who’ve been New Relic users for years, tend to stick to the older navigation patterns and features, missing out on the advanced capabilities that New Relic One brings to the table. This is like buying a new smartphone but only using it for calls and texts – you’re paying for a lot of power you’re not harnessing.
Pro Tip: Embrace the Query Builder and Entity Explorer
The New Relic One Query Builder is your best friend for ad-hoc analysis and dashboard creation. Instead of clicking through predefined charts, learn to craft your own NRQL queries. You can correlate data across different services (APM), infrastructure, logs, and even custom events. This allows for incredibly powerful troubleshooting and performance analysis.
For example, you can query FROM Transaction, SystemSample SELECT average(duration), average(cpuPercent) WHERE appName = 'MyWebApp' AND hostname = 'web-server-01' TIMESERIES to see how CPU usage on a specific server correlates with transaction duration for your application. This kind of cross-domain analysis is where New Relic truly shines.
The Entity Explorer in New Relic One is also vastly underutilized. It provides a topological view of your services, hosts, and related entities. You can quickly see relationships, dependencies, and health statuses, making it much easier to pinpoint the root cause of an issue across a complex distributed system. Instead of jumping between different monitoring screens, use the Entity Explorer to visualize the impact of an issue and navigate directly to the affected component.
Common Mistake: Ignoring Synthetics and Browser Monitoring
While APM and Infrastructure are foundational, many teams neglect New Relic Synthetics and Browser monitoring. Synthetics allows you to proactively monitor your application’s availability and performance from various global locations, even when no real users are active. This is crucial for catching issues before your customers do. Browser monitoring, on the other hand, gives you real user experience (RUM) data, showing you exactly how your application performs from the end-user’s perspective, including page load times, JavaScript errors, and AJAX performance.
I’ve seen countless times where backend APM metrics looked perfectly healthy, but Synthetics or Browser monitoring revealed a slow-loading third-party script or a critical JavaScript error impacting actual users. Your users don’t care if your backend is fast if the frontend is broken.
Mastering New Relic means moving beyond basic installation and embracing its full suite of capabilities. By avoiding these common missteps and implementing thoughtful configuration, custom instrumentation, and proactive maintenance, you’ll transform New Relic from a data ingestion engine into an indispensable tool for operational excellence and business insight. This focus on operational excellence also ties into understanding why app performance costs billions in 2026.
What is a good starting point for New Relic APM agent sampling rates?
A solid starting point for most applications is a transaction trace sampling rate of 10%. For very high-throughput services, you might reduce this to 5%, while critical, low-volume business transactions could benefit from a temporary increase to 20-50% for detailed troubleshooting.
How can I reduce alert fatigue with New Relic?
To combat alert fatigue, implement baseline alerting where possible, which learns normal metric behavior. Additionally, use multiple alert thresholds (e.g., warning and critical) to differentiate between minor issues and urgent problems, and leverage NRQL alerts for highly specific, actionable conditions.
Why are custom attributes important in New Relic?
Custom attributes add crucial context to your monitoring data, allowing you to filter and analyze performance based on business-specific dimensions like customer ID, region, or subscription plan. This transforms generic metrics into actionable business intelligence, enabling more targeted troubleshooting and analysis.
How often should I review my New Relic configuration and dashboards?
I recommend performing a comprehensive audit of your New Relic configuration, dashboards, and alerts quarterly. This ensures that your monitoring setup remains relevant, efficient, and cost-effective, preventing the accumulation of technical debt and unnecessary data ingestion.
What is the advantage of using NRQL alerts over standard metric alerts?
NRQL alerts offer significantly more flexibility and precision than standard metric alerts. They allow you to define complex alert conditions by combining multiple metrics, filtering data, and correlating events across different services, resulting in more accurate and actionable notifications that reduce false positives.