{"id":17222,"date":"2024-02-26T17:08:53","date_gmt":"2024-02-26T17:08:53","guid":{"rendered":"http:\/\/scannn.com\/elevating-tech-performance-with-dora-metrics\/"},"modified":"2024-02-26T17:08:53","modified_gmt":"2024-02-26T17:08:53","slug":"elevating-tech-performance-with-dora-metrics","status":"publish","type":"post","link":"https:\/\/scannn.com\/lv\/elevating-tech-performance-with-dora-metrics\/","title":{"rendered":"Elevating Tech Performance with DORA Metrics"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div>\n<p>We have a strong culture of continuous improvement at AdAction, particularly in the tech department.\u00a0 Our goal is to be an always-improving team that quickly and efficiently delivers high-quality products.\u00a0As such we want all of our teams to have the opportunity to improve and perform at the highest level possible.\u00a0\u00a0\u00a0<\/p>\n<p>However, the path to improvement is invisible without a yardstick to measure progress. For qualitative insights into team health, we\u2019ve embraced the model pioneered by Spotify, utilizing team health checks to identify areas for enhancement at both the team and organizational levels. Yet, the quest for a quantitative measure of performance led us to the DevOps Research and Analytics (DORA) metrics developed by Google. Despite their promise, we encountered challenges in calculating and leveraging these metrics effectively.<\/p>\n<p>In this article, we\u2019ll explore our interpretation of the individual DORA metrics, alongside the challenges we encountered in their implementation. For those new to DORA metrics, a practical first step is conducting a quick assessment. We recommend beginning with a straightforward evaluation available at DORA\u2019s Quick Check, which guides you through a series of questions to gauge your current standing.<\/p>\n<p>There are four core metrics that Google initially defined in its original post.\u00a0They developed these metrics through six years of research and believe the combination of which indicates how well a team is performing (at least as far as DevOps goes).\u00a0The four metrics are:<\/p>\n<ul>\n<li>Deployment Frequency \u2013 How often is a team shipping to production?<\/li>\n<li>Lead Time for Changes \u2013 How long does it take for a commit to make it to production?<\/li>\n<li>Change Failure Rate \u2013 The percentage of deployments that create a failure in production.<\/li>\n<li>Time to Restore Service \u2013 How long does it take to recover from a failure in production?<\/li>\n<\/ul>\n<p>You can see a theme that all these metrics relate to code in production. The Dora team identified four classes of team performance (Elite, High, Medium, Low) which become evident from these metrics:<\/p>\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-us.googleusercontent.com\/Ks7JbTgCCn47Qt2NSPpckXkcBRAj8Z6yLtiLwfOzNFH-gS9y9-j1_iM_7HaUrN9lm9Dxmf2L4TULfNxnbKMHc87PR7j9XE3igVFJybHisYCuXPTlXKagPL8alRFMIAAFkXso68km5fCAuJvsBz7E2Dw\" alt=\"\"\/><\/figure>\n<h2 id=\"h-our-interpretation-of-dora-metrics\">Our interpretation of Dora Metrics<\/h2>\n<p>Before we could start collecting and monitoring these metrics, we needed to align on what exactly each one meant to AdAction.<\/p>\n<h3 id=\"h-deployment-frequency\">Deployment Frequency<\/h3>\n<p>This was the only straightforward metric for us, simply how often do we release new versions of our codebases to production.<\/p>\n<h3 id=\"h-lead-time-for-changes\">Lead Time for Changes\u00a0<\/h3>\n<p>This is a little more debatable.\u00a0What are we really trying to measure?\u00a0The speed of development, or the speed at which a team takes finished code and makes it live.\u00a0These are two very different things, each with its own merits.\u00a0Which you choose dictates how you will measure.\u00a0We first looked at the time from PR open until that code was deployed.\u00a0In time, due to the nature of the tooling we used for measuring, we switched to a commit time until deploy model.<\/p>\n<h3 id=\"h-change-failure-rate\">Change Failure Rate<\/h3>\n<p>For us this was a less clear metric.\u00a0What constitutes a \u201cfailure\u201d? The Dora report indicates a percentage of <em>deployments<\/em> causing failures.\u00a0Does this mean an incident due to a CD deployment error or any outage due to faulty code in production?\u00a0We had strong proponents in both camps.\u00a0This is still in flux and is largely controlled by what tooling we use.\u00a0Once failures are identified, the failure rate will be the number divided by the number of total deployments.<\/p>\n<h3 id=\"h-time-to-restore-service\">Time to Restore Service\u00a0<\/h3>\n<p>Before this, we must answer the question of what is a failure. Identifying failures accurately is crucial, as it directly impacts our ability to measure and improve our response times.\u00a0<\/p>\n<h2 id=\"h-collecting-the-metrics\">Collecting the Metrics<\/h2>\n<p>Once we decided on what these metrics meant to us, we needed to collect them.\u00a0The Google DORA team released the project Four Keys for gathering these metrics from GitHub or GitLab.\u00a0However, the project makes extensive use of Google Cloud, and we\u2019re an AWS shop.\u00a0<\/p>\n<p>What\u2019s more, as of January 2024, the project is no longer maintained, and the source code is archived on Github.\u00a0So, that left us on our own to gather these metrics.\u00a0We explored several options and finally landed on Datadog Dora metrics as our solution.\u00a0We will be detailing the setup of Datadog for Dora metrics, but first, let\u2019s discuss the other solutions we considered.<\/p>\n<h3 id=\"h-attempt-1-jira-and-github-apis\">Attempt #1 \u2013 Jira and Github APIs<\/h3>\n<p>To begin with, we wanted to go to the original source for this data.\u00a0Jira\u2019s roots as an issue tracking system lent itself to tracking failures.\u00a0With Github the combination of PRs and github actions would allow us to measure both lead time to change and deployment frequency.<\/p>\n<p>In Jira, we created a new issue label (called SystemFailure) to flag bugs that we considered a full failure or outage.\u00a0Then, we added a couple of fields to reflect the start and stop of the outage.\u00a0Using the following prototype python code we could pull stats for Time to Restore Service using Jira\u2019s API:<\/p>\n<figure class=\"wp-block-image size-large\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"709\" alt=\"\" srcset=\"https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-1024x709.png 1024w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-300x208.png 300w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-768x532.png 768w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image.png 1364w\" src=\"https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-1024x709.png\" data-sizes=\"(max-width: 1024px) 100vw, 1024px\" class=\"wp-image-5430 lazyload\"\/><noscript><img decoding=\"async\" width=\"1024\" height=\"709\" src=\"https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-1024x709.png\" alt=\"\" class=\"wp-image-5430\" srcset=\"https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-1024x709.png 1024w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-300x208.png 300w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-768x532.png 768w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image.png 1364w\" sizes=\"(max-width: 1024px) 100vw, 1024px\"\/><\/noscript><\/figure>\n<p>There was a big downside to this approach.\u00a0It relies on a human to populate the correct fields in Jira.\u00a0 We\u2019re strong believers in automation, so this manual step really felt wrong. For github, we started with pulling PR statistics with this code:<\/p>\n<p>https:\/\/github.com\/AdGem\/dora-metrics<\/p>\n<p>At this point, we realized we didn\u2019t want our extraction code to be doing our analytics, rather we\u2019d like to pull raw data and do our analytics using BI tools. In a sense, we\u2019d like to treat it as an Extract Load Transform (ELT) problem.<\/p>\n<h3 id=\"h-attempt-2-airbyte-untested\">Attempt #2 \u2013 Airbyte (untested)<\/h3>\n<p>At that point, we thought \u201cELT? We know ELT.\u201d\u00a0For the past few months we\u2019ve been building a new Data infrastructure stack around No\/Low code ELT solutions.\u00a0For extraction, we\u2019ve used Airbyte. Fortunately, Airbyte has connectors for both Jira and Github, so we could extract from our original sources and eliminate the need for custom code.\u00a0However, that\u2019s about as far as we got with the idea, as our (hopefully) final solution came to our attention.<\/p>\n<h3 id=\"h-attempt-3-datadog\">Attempt #3 \u2013 DataDog<\/h3>\n<p>As you may know, we love observability and Datadog.\u00a0So when we were invited to join Datadog\u2019s Dora Metrics beta, we dropped everything to participate.\u00a0Datadog\u2019s solution promises to solve many of the challenges we face with pulling data together, all in a single stop.<\/p>\n<h2 id=\"h-datadog-setup\">Datadog Setup<\/h2>\n<p>Before setting Dora metrics one of the first things that we did was to register the services to the Datadog\u2019s service catalog, which provides a consolidated view of our services it includes details such as service name, environment, and repository:<\/p>\n<h3 id=\"h-deployment-frequency-1\"><img decoding=\"async\" width=\"624\" height=\"249.28602921471455\" src=\"https:\/\/lh7-us.googleusercontent.com\/zMMRKuRbDzbSgbks3dBNXIfsR7yGBw0FZKq0TjNjGZrls44EKX83IX1D0ROHb1cPNiKx8s4G3SrK1LO8gS5Ew9UkPRRed4abos8l5dNqokEtjfYgSK-09UH7mFir6CgQt4HQXm98kvcPeLVrmFaQtSg\"\/><\/h3>\n<p><strong>Deployment Frequency<\/strong>\u00a0<\/p>\n<p>To get this metric there are 3 values that are required:<\/p>\n<ul>\n<li>Started_at: Unix timestamp in nanoseconds when the deployment started.<\/li>\n<li>Service: Service name from a service available in the Service Catalog.<\/li>\n<li>Finished_at: Unix timestamp in nanoseconds when the deployment finished. It should not be older than 3 hours.<\/li>\n<\/ul>\n<p>You could either use the Datadog API Or Datadog-ci to send the deployment events. Example using the Datadog API at https:\/\/api.datadoghq.com\/api\/v2\/dora\/deployment:<\/p>\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"1024\" height=\"674\" alt=\"\" srcset=\"https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-1-1024x674.png 1024w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-1-300x198.png 300w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-1-768x506.png 768w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-1.png 1376w\" src=\"https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-1-1024x674.png\" data-sizes=\"(max-width: 1024px) 100vw, 1024px\" class=\"wp-image-5431 lazyload\"\/><noscript><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"674\" src=\"https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-1-1024x674.png\" alt=\"\" class=\"wp-image-5431\" srcset=\"https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-1-1024x674.png 1024w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-1-300x198.png 300w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-1-768x506.png 768w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-1.png 1376w\" sizes=\"(max-width: 1024px) 100vw, 1024px\"\/><\/noscript><\/figure>\n<p>Example with Datadog-ci:<\/p>\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"369\" alt=\"\" srcset=\"https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-2-1024x369.png 1024w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-2-300x108.png 300w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-2-768x277.png 768w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-2.png 1370w\" src=\"https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-2-1024x369.png\" data-sizes=\"(max-width: 1024px) 100vw, 1024px\" class=\"wp-image-5432 lazyload\"\/><noscript><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"369\" src=\"https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-2-1024x369.png\" alt=\"\" class=\"wp-image-5432\" srcset=\"https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-2-1024x369.png 1024w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-2-300x108.png 300w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-2-768x277.png 768w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-2.png 1370w\" sizes=\"(max-width: 1024px) 100vw, 1024px\"\/><\/noscript><\/figure>\n<p>We choose to use datadog-ci in the pipeline so it sends the data for each deployment. To install the datadog-ci and export the required environment variables:<\/p>\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"225\" alt=\"\" srcset=\"https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-3-1024x225.png 1024w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-3-300x66.png 300w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-3-768x168.png 768w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-3.png 1368w\" src=\"https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-3-1024x225.png\" data-sizes=\"(max-width: 1024px) 100vw, 1024px\" class=\"wp-image-5433 lazyload\"\/><noscript><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"225\" src=\"https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-3-1024x225.png\" alt=\"\" class=\"wp-image-5433\" srcset=\"https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-3-1024x225.png 1024w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-3-300x66.png 300w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-3-768x168.png 768w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-3.png 1368w\" sizes=\"(max-width: 1024px) 100vw, 1024px\"\/><\/noscript><\/figure>\n<p>Here is an example of how to send the deployment events to Datadog, in this case, we validate that is sending data through the production pipeline, we used the tag \u2013 \u2013 env to tag production deployments:<\/p>\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"320\" alt=\"\" srcset=\"https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-4-1024x320.png 1024w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-4-300x94.png 300w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-4-768x240.png 768w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-4.png 1368w\" src=\"https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-4-1024x320.png\" data-sizes=\"(max-width: 1024px) 100vw, 1024px\" class=\"wp-image-5434 lazyload\"\/><noscript><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"320\" src=\"https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-4-1024x320.png\" alt=\"\" class=\"wp-image-5434\" srcset=\"https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-4-1024x320.png 1024w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-4-300x94.png 300w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-4-768x240.png 768w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-4.png 1368w\" sizes=\"(max-width: 1024px) 100vw, 1024px\"\/><\/noscript><\/figure>\n<h3><strong><br \/>Lead Time for Changes\u00a0<\/strong><\/h3>\n<p>According to Datadog, for change lead time to be available, you must send deployment events while your repository metadata is synchronized to Datadog. The deployment events must include the repository_url and commit_sha fields as seen in the last example.\u00a0Datadog provides a GitHub integration on the GitHub integration tile:<\/p>\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-us.googleusercontent.com\/UK5SC2-y1XhJ5-XMg8HQ9Le43Ln89veNzw9wTVyesP1g22a7ZzSiVmzIv7Uh4ZOTEeY8DnOS3Vqb7Jc7zMqEbg5iMt3iPlLyw-V8hP7lWM6uGpfwRMPWcICEGfbe6K16lRWU8hjN7Ng27VEvYf7m-rU\" alt=\"\"\/><\/figure>\n<p>It allows you to synchronize your repository metadata automatically.<\/p>\n<h3 id=\"h-change-failure-rate-1\">Change Failure Rate<\/h3>\n<p>You can submit incidents using the DORA metrics API. The only attributes that are required to send incidents:<\/p>\n<ul>\n<li>service:name from a service available in the Service Catalog.<\/li>\n<li>started_at:Unix timestamp in nanoseconds when the incident started.<\/li>\n<\/ul>\n<p>You can optionally add more attributes as you can see in Datadog\u2019s API:<\/p>\n<ul>\n<li>repository_url<\/li>\n<li>Commit_sha<\/li>\n<li>Name<\/li>\n<li>severity<\/li>\n<\/ul>\n<p>https:\/\/api.datadoghq.com\/api\/v2\/dora\/incident<\/p>\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"748\" alt=\"\" srcset=\"https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-5-1024x748.png 1024w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-5-300x219.png 300w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-5-768x561.png 768w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-5.png 1366w\" src=\"https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-5-1024x748.png\" data-sizes=\"(max-width: 1024px) 100vw, 1024px\" class=\"wp-image-5435 lazyload\"\/><noscript><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"748\" src=\"https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-5-1024x748.png\" alt=\"\" class=\"wp-image-5435\" srcset=\"https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-5-1024x748.png 1024w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-5-300x219.png 300w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-5-768x561.png 768w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-5.png 1366w\" sizes=\"(max-width: 1024px) 100vw, 1024px\"\/><\/noscript><\/figure>\n<p>In our pipeline there are some incidents that we can automate in this case after each deployment there is a DeploymentCheckStep that validates that there is not an app version mismatch:\u00a0<\/p>\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1006\" height=\"1024\" alt=\"\" srcset=\"https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-6-1006x1024.png 1006w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-6-295x300.png 295w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-6-768x782.png 768w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-6.png 1358w\" src=\"https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-6-1006x1024.png\" data-sizes=\"(max-width: 1006px) 100vw, 1006px\" class=\"wp-image-5436 lazyload\"\/><noscript><img loading=\"lazy\" decoding=\"async\" width=\"1006\" height=\"1024\" src=\"https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-6-1006x1024.png\" alt=\"\" class=\"wp-image-5436\" srcset=\"https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-6-1006x1024.png 1006w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-6-295x300.png 295w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-6-768x782.png 768w, https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/image-6.png 1358w\" sizes=\"(max-width: 1006px) 100vw, 1006px\"\/><\/noscript><\/figure>\n<ul>\n<li>Missing dependencies<\/li>\n<li>Service downtime or unavailability<\/li>\n<li>Threshold breaches for performance metrics<\/li>\n<li>Security vulnerabilities detected<\/li>\n<li>Critical logs or error patterns identified<\/li>\n<\/ul>\n<h3 id=\"h-time-to-restore-service-1\"><strong>Time to Restore Service<\/strong><\/h3>\n<p>In our continuous delivery pipeline, automating the reporting of incidents is crucial we don\u2019t want to only notify that there is something wrong, but also marking incidents as resolved when there is an outage that is fixed. Resolved incidents are incidents that include finished_at. <\/p>\n<p>This means that you can reuse the same endpoint as when you are reporting an incident, but the key difference is that you must include the finished_at which is described Unix timestamp in nanoseconds when the incident finished. It should not be older than 3 hours.\u00a0<\/p>\n<p>The automation of resolved is not straightforward. However, Datadog offers you an integration with Jira to automatically create a Jira Issue and Slack to automatically create a Slack channel for each new incident where you could automate the resolution of the incidents created before.<\/p>\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-us.googleusercontent.com\/lK12fxlF3Ts1V5Ncgj8hcrWz9w5JPdEEEHVz0FRs3QHc1n4wopzhWspqY3tKfCRePLyz_V5_w6SFNSbKex4paZQKA0vAwaRjo37TzjyPO4m1XDZLNtn1jf_cYGk2cFowIKEQq-1yy9BS-CZtmk3NFHU\" alt=\"\"\/><\/figure>\n<p>You could also automate the creation of incidents using Datadog\u2019s workflows which allows you to orchestrate and automate your end-to-end processes.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"624\" height=\"313\" src=\"https:\/\/lh7-us.googleusercontent.com\/2wd-s8mZBFXNUG4ZBWN4jlJg_jH9eIMu5oVkX-aQsO1Tn0xcL8aiOHvBUsY5sNdOzkONn4R9VzKhDh8VCfIScPTdaPOk1TI0rT5d8v5JfSbqUWd28XbXsC_D2x1aY_IirgUUKRWliso1K8QsuDBCFmc\"\/><\/p>\n<p>https:\/\/docs.datadoghq.com\/service_management\/workflows\/<\/p>\n<p>These are some hoops to jump through though. In the future, we hope Datadog will integrate Dora failure tracking with their built-in incident management solution, making for seamless failure tracking.<\/p>\n<h2 id=\"h-results\">Results<\/h2>\n<p>After setting up the Datadog dora metrics you will get some charts for each dora metric, you can filter them by team, service , environment, repository and date.<\/p>\n<h3 id=\"h-deployment-frequency-2\">Deployment Frequency<\/h3>\n<p>How often an organization successfully releases to production:<br \/><img loading=\"lazy\" decoding=\"async\" width=\"624\" height=\"261\" src=\"https:\/\/lh7-us.googleusercontent.com\/SG0MIuvYuhwPXXhxBFQ_SqemXpkfX2k0L73XQ2KLvm9r4ovL1Bc1XmkGyIPqNhcksiP3XyJrrHLiH5NRabpapLJPbihbDb13Z6M7jU-1X3iaOUFlf88WeYNyyQ5K9uYpcfxGQ7n7hmJaFpeZle4MC4U\"\/><\/p>\n<h3 id=\"h-lead-time-for-changes-2\">Lead Time for Changes<\/h3>\n<p>The amount of time it takes a commit to get into production:<\/p>\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-us.googleusercontent.com\/f9U2Rzan-rFJahMVNTeJhNm1SAGorZt7zKVoxXo0deyTJ32jjx2lyqGXuYf98psn0shshCAiFAMjrOA51KgsxriksI_gZgD_vkD1iXI44oN_vycGGBGTOQNoUOpcIrM4DMVUwm2jS7o_66bfWBzbHQ4\" alt=\"\"\/><\/figure>\n<h3 id=\"h-change-failure-rate-2\">Change Failure Rate<\/h3>\n<p>The percentage of deployments causing a failure in production:<\/p>\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-us.googleusercontent.com\/VvVc3cxTdLp6bEzLkLhNy_HmEM3Zr9DyNZTHv9npKeYccxr7kOT3zysURUF4YRsFkgqHuMyal8dz5E5KkJCQryuxHUxZ2GtTeBX5Si_MTlk7xYUW-xxRbo1ioxKQCsONbGLn2Dk45uP4lymxf4HjorE\" alt=\"\"\/><\/figure>\n<h3 id=\"h-time-to-restore-service-2\">Time to Restore Service<\/h3>\n<p>How long it takes an organization to recover from a failure in production:<\/p>\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-us.googleusercontent.com\/4GscHfnAZgttjw33noFd-ZPao5DUEZh4hG2gLUXyDcpLc6wdZZzxNM_ZYGJbTje0blqPcwV0Uadx4FoLa9kIFOezzjfNz27M5anYoSDOhH5klkaCl4lnYfYMNs0JbGZvIw6qWXTF3ObQNvMSPaRbUPo\" alt=\"\"\/><\/figure>\n<h2 id=\"h-findings\">Findings<\/h2>\n<h3 id=\"h-dora-metrics-offer-valuable-insights\">DORA Metrics Offer Valuable Insights<\/h3>\n<p>DORA metrics are great for measuring and comparing your teams and to see how they fit in the industry standards. You may think your teams are doing great, but are they really? This is why assessments are important.<\/p>\n<h3 id=\"h-early-implementation-and-challenges\">Early Implementation and Challenges<\/h3>\n<p>We are still in the early stages of our journey with Dora metrics. They are very promising for helping teams succeed. However, it\u2019s far from trivial to pull these metrics together.\u00a0 It appears we\u2019ve found a solution to our data collection woes in Datadog. However, failure tracking still leaves a bit to be desired.\u00a0 We\u2019ve only just begun to collect these metrics, and only for a single team so far.\u00a0Perhaps in a few months, we\u2019ll revisit this topic and share how it\u2019s impacted our teams.<\/p>\n<h3 id=\"h-importance-of-holistic-assessment\">Importance of Holistic Assessment<\/h3>\n<p>That being said, it\u2019s crucial to look beyond DORA Metrics and consider other factors contributing to successful business outcomes. Like focusing on delivering a whole new feature in production rather than evaluating some pieces of code delivered into production.<\/p>\n<h3 id=\"h-potential-benefits-for-team-efficiency\">Potential Benefits for Team Efficiency<\/h3>\n<p>Dora metrics can offer good feedback on efficiency and stability.\u00a0This may not appear critical to a developer to measure how fast they merge their code, but if you start categorizing teams this could encourage the teams to be faster and more efficient when delivering code to and be elite performers and if they are not in that category.<\/p>\n<h3 id=\"h-cautions-against-comparisons\">Cautions Against Comparisons<\/h3>\n<p>We think it is important to avoid misconceptions. These metrics should not be a competition to measure which are our good teams and which are our bad teams or set up a competition amongst them , it will always depend on the Context, each team is different, you can\u2019t compare pears with apples. I think it is about improvement.<\/p>\n<h3 id=\"h-references\">References<\/h3>\n<ol>\n<li>Google Cloud Blog \u2013 \u201cAre you an Elite DevOps performer? Find out with the Four Keys Project\u201d Link<\/li>\n<li>Google Cloud Blog \u2013 \u201cThe 2019 Accelerate State of DevOps: Elite performance, productivity, and scaling\u201d Link<\/li>\n<li>Spotify R&amp;D \u2013 \u201cSquad Health Check model \u2013 visualizing what to improve\u201d Link<\/li>\n<li>Datadog \u201cDora Mertics\u201d Link<\/li>\n<\/ol>\n<div class=\"saboxplugin-wrap\" itemtype=\"http:\/\/schema.org\/Person\" itemscope=\"\" itemprop=\"author\">\n<div class=\"saboxplugin-tab\">\n<div class=\"saboxplugin-gravatar\"><img loading=\"lazy\" decoding=\"async\" width=\"100\" height=\"100\" alt=\"Fabian Leon - DevOps Engineer\" itemprop=\"image\" src=\"https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/Screenshot-2024-02-25-at-4.55.13\u202fPM.png\" class=\"lazyload\" bad-src=\"data:image\/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==\"\/><noscript><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/02\/Screenshot-2024-02-25-at-4.55.13\u202fPM.png\" width=\"100\" height=\"100\" alt=\"Fabian Leon - DevOps Engineer\" itemprop=\"image\"\/><\/noscript><\/div>\n<\/div>\n<\/div><\/div>\n<p><br \/>\n<br \/><a href=\"https:\/\/www.adaction.com\/blog\/elevating-tech-performance-with-dora-metrics\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>We have a strong culture of continuous improvement at AdAction, particularly in the tech department.\u00a0 Our goal is to be an always-improving team that quickly and efficiently delivers high-quality products.\u00a0As such we want all of our teams to have the opportunity to improve and perform at the highest level possible.\u00a0\u00a0\u00a0 However, the path to improvement [&hellip;]<\/p>\n","protected":false},"author":16,"featured_media":17223,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[122],"tags":[],"class_list":["post-17222","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data"],"_links":{"self":[{"href":"https:\/\/scannn.com\/lv\/wp-json\/wp\/v2\/posts\/17222","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/scannn.com\/lv\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/scannn.com\/lv\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/scannn.com\/lv\/wp-json\/wp\/v2\/users\/16"}],"replies":[{"embeddable":true,"href":"https:\/\/scannn.com\/lv\/wp-json\/wp\/v2\/comments?post=17222"}],"version-history":[{"count":0,"href":"https:\/\/scannn.com\/lv\/wp-json\/wp\/v2\/posts\/17222\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/scannn.com\/lv\/wp-json\/wp\/v2\/media\/17223"}],"wp:attachment":[{"href":"https:\/\/scannn.com\/lv\/wp-json\/wp\/v2\/media?parent=17222"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/scannn.com\/lv\/wp-json\/wp\/v2\/categories?post=17222"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/scannn.com\/lv\/wp-json\/wp\/v2\/tags?post=17222"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}