{"id":15548,"date":"2024-01-12T16:36:23","date_gmt":"2024-01-12T16:36:23","guid":{"rendered":"http:\/\/scannn.com\/enhancing-devops-with-datadog-our-journey\/"},"modified":"2024-01-12T16:36:23","modified_gmt":"2024-01-12T16:36:23","slug":"enhancing-devops-with-datadog-our-journey","status":"publish","type":"post","link":"https:\/\/scannn.com\/lv\/enhancing-devops-with-datadog-our-journey\/","title":{"rendered":"Enhancing DevOps with Datadog: Our Journey"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div>\n<p><span style=\"font-weight: 400;\">A key objective of building a DevOps culture is delivering software quickly. However, you need to see where you\u2019re going and where you are at to ensure you\u2019re not rapidly heading into a brick wall. That\u2019s why a core tenet of DevOps is measurement. You need to observe your environment to know you\u2019re not building an environment like this:<\/span><\/p>\n<p class=\"has-text-align-center\"><img fetchpriority=\"high\" decoding=\"async\" src=\"https:\/\/lh7-us.googleusercontent.com\/BXUHiDDbHyL_A2SF5BxYCHmarzCzyrT6tpgmL7J7H-ngqgkvoZEG4rqscXsz6Ca4QVwuVkt0ehU_9XAvgP-MjTsu-Ss695RDhpPxLN-ak93oteD9Oo6oD2YfH0rXsFPDhgWQE4dNtCp1Y4FWFqw0xtE\" width=\"624\" height=\"296\"\/><\/p>\n<p><span style=\"font-weight: 400;\">The solution is to instrument your software and infrastructure wherever possible. Datadog is a market-leading platform for observing systems, offering advanced monitoring and analytics for software and infrastructure performance. At AdAction, we\u2019ve rolled out Datadog to all of our systems, and I\u2019d like to walk you through our experience, talking through the successes and pitfalls we encountered. Hopefully, along the way, this will provide you with some value.<\/span><\/p>\n<h2 class=\"has-black-color has-text-color\"><span style=\"font-weight: 400;\">Instrumenting<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">To begin our observability journey, we first needed to instrument all the things.<\/span><\/p>\n<p class=\"has-text-align-center\"><img decoding=\"async\" src=\"https:\/\/lh7-us.googleusercontent.com\/BH_-U1QpBZTbcuGLRB1EB58quC_EbS3uCVh2HAur30O_auAQqkec-0I8YH-PAaKlMOZoMv0wKQbCSnaQSy1kukI1FaS_vWgb3cxGSrcqanru-RGDC9b310Gr-Nz8DbV1qyEgl9L0xG3Gvj7jMt1bDiA\" width=\"624\" height=\"468\"\/><\/p>\n<h3 class=\"has-black-color has-text-color\"><span style=\"font-weight: 400;\">Integrate AWS<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Luckily, Datadog can collect a wealth of data about your architecture out of the box when you\u2019re in the cloud. \\ We\u2019re on AWS, so we were able to hit the ground running. A bit of setup is involved, which you can find detailed in this Datadog <\/span><span style=\"font-weight: 400;\">support article<\/span><span style=\"font-weight: 400;\">. The setup was effortless and straightforward for us. As long as you have an IAM user configured with the correct permissions, it takes just a few minutes to load up a cloud formation template, and then you\u2019ll have data like this coming in:<\/span><\/p>\n<p class=\"has-text-align-center\"><img decoding=\"async\" src=\"https:\/\/lh7-us.googleusercontent.com\/XaQ1C-f2eox2l3NQXepo7p3XngkuGDKUAASsuGTv-UkM9lrqQavPbcU2ziCLatVuB5wwzvwOg73uTzz1pUSpQhWaih4sKvbv5gqdsPitYOtZaAa6VzguiAVukz2PjplofIUNj7IjxWI4LWcSPUEgw90\" width=\"624\" height=\"273\"\/><\/p>\n<h3 class=\"has-black-color has-text-color\"><span style=\"font-weight: 400;\">Airbyte<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">We have non-AWS provided systems we\u2019d also like to instrument. One such system is <\/span><span style=\"font-weight: 400;\">Airbyte<\/span><span style=\"font-weight: 400;\">, which we <\/span><span style=\"font-weight: 400;\">self-host<\/span><span style=\"font-weight: 400;\"> on a standalone EC2 instance. There are plenty of Airbyte-specific metrics we\u2019d be very interested in (number of jobs, long-running jobs) and AWS EC2 metrics like host health and network.\u00a0 Luckily, Datadog offers an <\/span><span style=\"font-weight: 400;\">Airbyte integration<\/span><span style=\"font-weight: 400;\"> to pull in the Airbyte-specific metrics. We get the EC2 metrics automatically from the AWS integration but need to install a Datadog agent on the host to collect the Airbyte information. We are firm believers in infrastructure as code, so we didn\u2019t want to do this installation and configuration manually but instead integrated it into the user_data to provision the Airbyte EC2 instance. Here is the relevant section of our EC2 user_data:<\/span><\/p>\n<pre class=\"brush: php; title: ; notranslate\" title=\"\">\n# Configure Datadog Integration\nmkdir \/home\/ec2-user\/datadog &amp;amp;amp;&amp;amp;amp; cd \/home\/ec2-user\/datadog\n\necho \u2018dogstatsd_mapper_profiles:\u2019 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018  \u2013 name: airbyte_worker\u2019 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018    prefix: \u201cworker.\u201d\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018    mappings:\u2019 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018      \u2013 match: \u201cworker.temporal_workflow_*\u201d\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018        name: \u201cairbyte.worker.temporal_workflow.$1\u2033\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018      \u2013 match: \u201cworker.worker_*\u201d\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018        name: \u201cairbyte.worker.$1\u2033\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018      \u2013 match: \u201cworker.state_commit_*\u201d\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018        name: \u201cairbyte.worker.state_commit.$1\u2033\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018      \u2013 match: \u201cworker.job_*\u201d\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018        name: \u201cairbyte.worker.job.$1\u2033\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018      \u2013 match: \u201cworker.attempt_*\u201d\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018        name: \u201cairbyte.worker.attempt.$1\u2033\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018      \u2013 match: \u201cworker.activity_*\u201d\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018        name: \u201cairbyte.worker.activity.$1\u2033\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018      \u2013 match: \u201cworker.*\u201d\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018        name: \u201cairbyte.worker.$1\u2033\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018  \u2013 name: airbyte_cron\u2019 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018    prefix: \u201ccron.\u201d\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018    mappings:\u2019 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018      \u2013 match: \u201ccron.cron_jobs_run\u201d\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018        name: \u201cairbyte.cron.jobs_run\u201d\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018      \u2013 match: \u201ccron.*\u201d\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018        name: \u201cairbyte.cron.$1\u2033\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018  \u2013 name: airbyte_metrics_reporter\u2019 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018    prefix: \u201cmetrics-reporter.\u201d\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018    mappings:\u2019 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018      \u2013 match: \u201cmetrics-reporter.*\u201d\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018        name: \u201cairbyte.metrics_reporter.$1\u2033\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018  \u2013 name: airbyte_orchestrator\u2019 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018    prefix: \u201corchestrator.\u201d\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018    mappings:\u2019 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018      \u2013 match: \u201corchestrator.*\u201d\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018        name: \u201cairbyte.orchestrator.$1\u2033\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018  \u2013 name: airbyte_server\u2019 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018    prefix: \u201cserver.\u201d\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018    mappings:\u2019 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018      \u2013 match: \u201cserver.*\u201d\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018        name: \u201cairbyte.server.$1\u2033\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018  \u2013 name: airbyte_general\u2019 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018    prefix: \u201cairbyte.\u201d\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018    mappings:\u2019 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018      \u2013 match: \u201cairbyte.worker.temporal_workflow_*\u201d\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018        name: \u201cairbyte.worker.temporal_workflow.$1\u2033\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018      \u2013 match: \u201cairbyte.worker.worker_*\u201d\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018        name: \u201cairbyte.worker.$1\u2033\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018      \u2013 match: \u201cairbyte.worker.state_commit_*\u201d\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018        name: \u201cairbyte.worker.state_commit.$1\u2033\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018      \u2013 match: \u201cairbyte.worker.job_*\u201d\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018        name: \u201cairbyte.worker.job.$1\u2033\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018      \u2013 match: \u201cairbyte.worker.attempt_*\u201d\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018        name: \u201cairbyte.worker.attempt.$1\u2033\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018      \u2013 match: \u201cairbyte.worker.activity_*\u201d\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018        name: \u201cairbyte.worker.activity.$1\u2033\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018      \u2013 match: \u201cairbyte.cron.cron_jobs_run\u201d\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\necho \u2018        name: \u201cairbyte.cron.jobs_run\u201d\u2018 &amp;amp;gt;&amp;amp;gt; \/home\/ec2-user\/datadog\/datadog.yaml\n\nexport DATADOG_API=$(aws secretsmanager get-secret-value \u2013secret-id production\/airbyte\/datadog \u2013query SecretString \u2013output text \u2013region us-east-2)\n\nexport DATADOG_API_KEY=$(echo $DATADOG_API | grep -o \u2018\u201dkey\u201d:\u201d[^\u201d]*\u2019 | grep -o \u2018[^\u201d]*$\u2019)\n\nsudo sed -i \u201c\/- airbyte-api-server\/a\\  dd-agent:\\n    container_name: dd-agent\\n    image: gcr.io\/datadoghq\/agent:7\\n    pid: host\\n    environment:\\n      \u2013 DD_API_KEY=${DATADOG_API_KEY}\\n      \u2013 DD_SITE=datadoghq.com\\n      \u2013 DD_HOSTNAME=airbyte-ec2\\n      \u2013 DD_DOGSTATSD_NON_LOCAL_TRAFFIC=true\\n    volumes:\\n      \u2013 \/var\/run\/docker.sock:\/var\/run\/docker.sock\\n      \u2013 \/proc\/:\/host\/proc\/:ro\\n      \u2013 \/sys\/fs\/cgroup:\/host\/sys\/fs\/cgroup:ro\\n      \u2013 \/home\/ec2-user\/datadog\/datadog.yaml:\/etc\/datadog-agent\/datadog.yaml\\n    networks:\\n      \u2013 airbyte_internal\\n  metric-reporter:\\n    image: airbyte\/metrics-reporter:\\${VERSION}\\n    container_name: metric-reporter\\n    networks:\\n      \u2013 airbyte_internal\\n    environment:\\n      \u2013 DATABASE_PASSWORD=\\${DATABASE_PASSWORD}\\n      \u2013 DATABASE_URL=\\${DATABASE_URL}\\n      \u2013 DATABASE_USER=\\${DATABASE_USER}\\n      \u2013 DD_AGENT_HOST=\\${DD_AGENT_HOST}\\n      \u2013 DD_DOGSTATSD_PORT=\\${DD_DOGSTATSD_PORT}\\n      \u2013 METRIC_CLIENT=\\${METRIC_CLIENT}\\n      \u2013 PUBLISH_METRICS=\\${PUBLISH_METRICS}\u201d \/home\/ec2-user\/airbyte\/docker-compose.yaml\n\nsudo sed -i \u201cs\/PUBLISH_METRICS=false\/PUBLISH_METRICS=true\/\u201d \/home\/ec2-user\/airbyte\/.env\n\nsudo sed -i \u201cs\/METRIC_CLIENT=\/METRIC_CLIENT=datadog\/\u201d \/home\/ec2-user\/airbyte\/.env\n\nsudo sed -i \u201cs\/DD_AGENT_HOST=\/DD_AGENT_HOST=dd-agent\/\u201d \/home\/ec2-user\/airbyte\/.env\n\nsudo sed -i \u201cs\/DD_DOGSTATSD_PORT=\/DD_DOGSTATSD_PORT=8125\/\u201d \/home\/ec2-user\/airbyte\/.env\n\nunset DATADOG_API_KEY<\/pre>\n<pre class=\"wp-block-code\"><code><code><span style=\"background-color: initial; font-family: inherit; font-size: inherit; color: initial;\"\/><\/code><\/code><\/pre>\n<h3 class=\"has-black-color has-text-color\"><span style=\"font-weight: 400;\">Early APM Adoption<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">The visibility afforded by the Datadog AWS integration was excellent. However, we still needed insights into how our application code was running. Here is where Application Performance Monitoring (APM) comes in handy. APM gives visibility to the actual code execution, including traces of slow responses, lists of errors, and flame graphs. We wanted to know how our code was running.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Unfortunately, the APM setup didn\u2019t go nearly as smoothly for us as the AWS integration. We are primarily a PHP shop (Laravel specifically), and at the time, there was no easy-to-install APM tool for PHP.\u00a0\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">First, our Nginx configuration was outdated and needed to be updated. Once that was fixed, we still needed to have the AWS CLI command configured correctly in our Elastic Beanstalk environments to pull the DD app key from AWS Secrets Manager. Once that was set, we would pull down the script to run it as part of our platform deployment hooks. Unfortunately, in our initial attempts, we weren\u2019t installing the agent at the right time in the bootup of our Beanstalk. (At the time, Datadog\u2019s only instructions for Elastic Beanstalk used `.ebextensions,` so we were flying a little blind.) It turns out that the install script needed to be run in the last pre-deploy step. Once all of that was sorted out, we had this pre-deploy hook script:<\/span><\/p>\n<pre class=\"brush: php; title: ; notranslate\" title=\"\">\n#!\/bin\/bash\necho \"Installing Datadog Agent and Log Collection\"\nset -ev\n\n\n# Setup Linux Agent\nif ! command -v datadog-agent &amp;amp;amp;&amp;amp;gt; \/dev\/null; then\n   # download datadog install script and give proper permissions and ownership\n   curl -L https:\/\/s3.amazonaws.com\/dd-agent\/scripts\/install_script_agent7.sh -o datadog_install_script.sh\n   chown root:root datadog_install_script.sh\n   chmod +x datadog_install_script.sh\n   chmod 700 datadog_install_script.sh\n\n\n   # copy datadog.yaml to \/etc\/datadog-agent\/datadog.yaml and give proper permissions and ownership\n   mkdir -p \/etc\/datadog-agent\n   cp .platform\/hooks\/predeploy\/datadog\/datadog.yaml \/etc\/datadog-agent\/datadog.yaml\n   chown root:root \/etc\/datadog-agent\/datadog.yaml\n   chmod 640 \/etc\/datadog-agent\/datadog.yaml\n\n\n   # get datadog secret information from AWS Secrets Manager\n   DD_SECRET=$(aws secretsmanager get-secret-value --secret-id $DATADOG_SECRET_ARN --query SecretString --output text --region us-east-2)\n\n\n   # export datadog agent version as environment variables\n   export DD_AGENT_MAJOR_VERSION=\"7\"\n   export DD_AGENT_MINOR_VERSION=\"\"\n\n\n   # export datadog secret information as environment variables\n   DD_API_KEY=$(jq -r \".DD_API_KEY\" &amp;amp;lt;&amp;amp;lt;&amp;amp;lt; $DD_SECRET) &amp;amp;amp;&amp;amp;amp; export DD_API_KEY=$DD_API_KEY\n   DD_SITE=$(jq -r \".DD_SITE\" &amp;amp;lt;&amp;amp;lt;&amp;amp;lt; $DD_SECRET) &amp;amp;amp;&amp;amp;amp; export DD_SITE=$DD_SITE\n\n\n   # Add proper API key to datadog.yaml and enable logs\n   sed -i \"s\/DD_API_KEY\/$DD_API_KEY\/\" \/etc\/datadog-agent\/datadog.yaml\n   sed 's\/# logs_enabled: false\/logs_enabled: true\/' -i \/etc\/datadog-agent\/datadog.yaml\n\n\n   # Copy datadog-agent directory to \/etc\/datadog-agent.d so logs can be collected\n   rsync -a .platform\/hooks\/predeploy\/datadog\/datadog-agent\/ \/etc\/datadog-agent\/conf.d\/\n\n\n   # Run install script\n   DD_API_KEY=unused \/var\/app\/staging\/datadog_install_script.sh; sed -i 's\/ install_script\/ ebs_install_script\/' \/etc\/datadog-agent\/install_info\n\n\n   echo \"Datadog Agent and Log Collection installed\"\nelse\n   echo \"DataDog Agent and Log Collection already installed!\"\nfi\n\n\n# Setup PHP APM\nif [ ! -f \/etc\/php.d\/98-ddtrace.ini ]; then\n   # Install datadog php extension\n   curl -LO https:\/\/github.com\/DataDog\/dd-trace-php\/releases\/latest\/download\/datadog-setup.php\n   php datadog-setup.php --php-bin=all --enable-profiling\n\n\n   # Restart php-fpm so extension is loaded\n   sudo systemctl restart php-fpm\nelse\n   echo \"DataDog APM Extension already installed!\"\nfi\n\n\necho \"Datadog and Log Collection Installed\"\n<\/pre>\n<h3 class=\"has-black-color has-text-color\"><span style=\"font-weight: 400;\">RUM<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Beyond our PHP server applications, we have an Android app called <\/span><i><span style=\"font-weight: 400;\">Cosmic Rewards<\/span><\/i><span style=\"font-weight: 400;\">. We, of course, wanted to see how the app was running as well. Luckily, Datadog has a Real User Monitoring feature, analogous to APM, but for mobile apps. Setting it up was not trivial, but it was well worth it.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">We use Okhttp (Retrofit), and Datadog needs a specific dependency to work with it. Additionally, it can give granular network visibility if you put some custom interceptors in place (which we really wanted to do). Here is the dependency using version catalog:<\/span><\/p>\n<pre class=\"brush: php; title: ; notranslate\" title=\"\">\ndatadog-okhttp = { module = \"com.datadoghq:dd-sdk-android-okhttp\", version.ref = \"datadog\" }\n<\/pre>\n<p><span style=\"font-weight: 400;\">And here is the update to add the custom interceptors in `OkHttpClient.Builder()`:<\/span><\/p>\n<pre class=\"brush: php; title: ; notranslate\" title=\"\">\n  .addInterceptor(\n               DatadogInterceptor(\n                   firstPartyHosts = listOf(buildConfigInfo.apiHostName),\n                   traceSampler = RateBasedSampler(20f)\n               )\n           )\n           .addNetworkInterceptor(\n               TracingInterceptor(\n                   tracedHosts = listOf(buildConfigInfo.apiHostName),\n                   traceSampler = RateBasedSampler(20f)\n               )\n           )\n<\/pre>\n<p><span style=\"font-weight: 400;\">We also ran into a problem with our obfuscated source code. We needed to create a mapping file for each app version to make sense of the Datadog results. We used the Datadog gradle plugin, which provides a Gradle task for uploading the mapping. Here we add the Gradle plugin:<\/span><\/p>\n<pre class=\"brush: php; title: ; notranslate\" title=\"\">\ndatadog = { id = \"com.datadoghq.dd-sdk-android-gradle-plugin\", version.ref = \"datadogGradlePlugin\" }\n<\/pre>\n<p><span style=\"font-weight: 400;\">Then we need to apply it to the app module `build.gradle`:<\/span><\/p>\n<pre class=\"brush: php; title: ; notranslate\" title=\"\">\n   alias libs.plugins.datadog\n<\/pre>\n<p><span style=\"font-weight: 400;\">Then, when we sync gradle under the `app.datadog` folder, we have tasked generated, one per build type.\u00a0 We execute the task after building the release build. We run `<\/span><span style=\"font-weight: 400;\">.\/gradlew uploadMappingRelease`<\/span><span style=\"font-weight: 400;\"> after running `<\/span><span style=\"font-weight: 400;\">.\/gradlew assembleRelease<\/span><span style=\"font-weight: 400;\">.` We automate this in our CI system.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">We also wanted to get detailed information and tracking for screen views and used Datadog\u2019s Mixed view tracking strategy to do so. In `RumConfiguration.Builder()` we added:<\/span><\/p>\n<pre class=\"brush: php; title: ; notranslate\" title=\"\">\nRumConfiguration.Builder(applicationId)\n\/\/ ...\n           .useViewTrackingStrategy(MixedViewTrackingStrategy(trackExtras = true))\n           .build()<\/pre>\n<p><span style=\"font-weight: 400;\">With this, we\u2019re able to track Activities, Fragments, and also the extras, such as arguments passed across screens.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Now that we have fully set up RUM, one of our favorite things is how helpful it is to debug Application Not Responding (ANR) issues.\u00a0ANRs are one of the most challenging issues to debug, and Datadog makes it easier than other platforms to spot the root cause.<\/span><\/p>\n<h3 class=\"has-black-color has-text-color\"><span style=\"font-weight: 400;\">APM Everywhere<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Now that we\u2019d pioneered Datadog APM use in one system, it was time to bring it into all of our server applications.\u00a0 Being firm believers in a DevOps culture, we felt the individual scrum teams should take it upon themselves to integrate APM rather than rely on our Continuous Improvement team. We worked with the product managers to schedule the work, and the teams tackled it like any other feature development. Luckily, with the work done with our first APM integration, it was much more trivial to bring Datadog APM on board with our new pre-deploy script.<\/span><\/p>\n<h3 class=\"has-black-color has-text-color\"><span style=\"font-weight: 400;\">Custom Metrics<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">At this point, we had all of our application performance monitored. What we didn\u2019t have was a single pane of glass to look at for the total health of our business. We needed to instrument the important business events within our systems to get there. Datadog custom metrics are a feature perfect for tracking stats beyond just the application. Once again, we divvied the work of integrating custom metrics up to the teams. The teams are using the open-source project <\/span><span style=\"font-weight: 400;\">laravel-datadog-helper<\/span><span style=\"font-weight: 400;\">, which has greatly simplified the process of tracking custom metrics. The helper was installed using composer:<\/span><\/p>\n<pre class=\"brush: php; title: ; notranslate\" title=\"\">\n09:01:59  ~\/dev\/adgem_api  \u2b21 v20.5.1  ? php-8.1.22   Setup_Updates \u2718 \u272d  6s\n$\ncomposer require chaseconey\/laravel-datadog-helper\n.\/composer.json has been updated\nRunning composer update chaseconey\/laravel-datadog-helper\nLoading composer repositories with package information\nUpdating dependencies\nLock file operations: 2 installs, 0 updates, 0 removals\n - Locking chaseconey\/laravel-datadog-helper (1.2.1)\n - Locking datadog\/php-datadogstatsd (1.4.1)\nWriting lock file\nInstalling dependencies from lock file (including require-dev)\nPackage operations: 2 installs, 0 updates, 0 removals\n - Installing datadog\/php-datadogstatsd (1.4.1): Extracting archive\n - Installing chaseconey\/laravel-datadog-helper (1.2.1): Extracting archive\n\n\n\u2026\n\n\nUsing version ^1.2 for chaseconey\/laravel-datadog-helper\ncomposer require chaseconey\/laravel-datadog-helper  4.16s user 0.85s system 23% cpu 21.445 total\n\n<\/pre>\n<p><span style=\"font-weight: 400;\">We made a few minor tweaks to the configuration:<\/span><\/p>\n<p><span style=\"font-weight: 400;\">First, we set up a prefix for our metrics to help us distinguish between the same (or similarly named) metrics across multiple projects:<\/span><\/p>\n<pre class=\"brush: php; title: ; notranslate\" title=\"\">\n   \/*\n   |--------------------------------------------------------------------------\n   | Datadog Tracking Prefix\n   |--------------------------------------------------------------------------\n   |\n   | This is the prefix that will be placed in front of all of your metric entries. If you have multiple\n   | applications being tracked in Datadog, it is recommended putting the application name somewhere\n   | inside of your prefix. A common naming scheme is something like app.&amp;amp;lt;app-name&amp;amp;gt;.\n   |\n   *\/\n   'prefix' =&amp;amp;gt; env('DD_METRIX_PREFIX', 'service-hub'), \/\/ metrics prefix\n<\/pre>\n<p><span style=\"font-weight: 400;\">We also needed to ensure we pulled in the correct Datadog App key that we configured in our pre-deploy script above:<\/span><\/p>\n<pre class=\"brush: php; title: ; notranslate\" title=\"\">\n   'api_key' =&amp;amp;gt; env('DD_API_KEY', null),\n\n\n   'application_key' =&amp;amp;gt; env('DD_APP_KEY', null),\n\n<\/pre>\n<p><span style=\"font-weight: 400;\">Tracking custom metrics was straightforward, particularly with the laravel-datadog-helper<\/span>:<\/p>\n<pre class=\"brush: php; title: ; notranslate\" title=\"\">\n       Datadog::increment('support_case.created');\n\n<\/pre>\n<p>We were able to tap into the power of Eloquent model events to fire our Datadog custom metrics.<\/p>\n<p>We created a custom event dispatcher:<\/p>\n<pre class=\"brush: php; title: ; notranslate\" title=\"\">\n   public function player(): BelongsTo\n   {\n       return $this-&amp;amp;gt;belongsTo(Player::class)-&amp;amp;gt;withTrashed();\n   }\n\n\n   public function campaign(): BelongsTo\n   {\n       return $this-&amp;amp;gt;belongsTo(Campaign::class);\n   }\n\n\n   public function scopeForPlayer(Builder $query, string $playerId): void\n   {\n       $query-&amp;amp;gt;where('player_id', $playerId);\n   }\n\n<\/pre>\n<p>Wrote a subscriber for our new events:<\/p>\n<pre class=\"brush: php; title: ; notranslate\" title=\"\">\n&amp;amp;lt;?php\n\n\nnamespace App\\Listeners;\n\n\nuse App\\Events\\ClickCreated;\nuse App\\Events\\ConversionCreated;\nuse App\\Events\\NewUserRegistration;\nuse App\\Support\\Facades\\Features;\nuse ChaseConey\\LaravelDatadogHelper\\Datadog;\nuse Illuminate\\Events\\Dispatcher;\n\n\nclass DataDogSubscriber\n{\n   public function handleConversionCreated(ConversionCreated $event): void\n   {\n       Datadog::increment('conversions.created.'.$event-&amp;amp;gt;conversion-&amp;amp;gt;status-&amp;amp;gt;value);\n   }\n\n\n   public function handleClickCreated(): void\n   {\n       Datadog::increment('clicks.created');\n   }\n\n\n   public function handleNewUserRegistration(): void\n   {\n       Datadog::increment('players.created');\n   }\n\n\n   \/**\n    * Register the listeners for the subscriber.\n    *\n    * @return array&amp;amp;lt;string, string&amp;amp;gt;\n    *\/\n   public function subscribe(Dispatcher $events): array\n   {\n       return [\n           ConversionCreated::class =&amp;amp;gt; 'handleConversionCreated',\n           ClickCreated::class =&amp;amp;gt; 'handleClickCreated',\n           NewUserRegistration::class =&amp;amp;gt; 'handleNewUserRegistration',\n       ];\n   }\n}\n\n<\/pre>\n<p>and registered our subscriber to in the EventServiceProvider:<\/p>\n<pre class=\"brush: php; title: ; notranslate\" title=\"\">\n\n\nnamespace App\\Providers;\nuse App\\Listeners\\DataDogSubscriber;\n\n\nuse Illuminate\\Foundation\\Support\\Providers\\EventServiceProvider as ServiceProvider;\n\n\nclass EventServiceProvider extends ServiceProvider\n{\n   \/**\n    * The event listener mappings for the application.\n    *\n    * @var array&amp;amp;lt;class-string, array&amp;amp;lt;int, class-string&amp;amp;gt;&amp;amp;gt;\n    *\/\n   protected $listen = [\n       \/\/ ...\n   ];\n\n\n   \/**\n    * The subscriber classes to register.\n    *\n    * @var array&amp;amp;lt;class-string&amp;amp;gt;\n    *\/\n   protected $subscribe = [\n       \/\/ ...\n       DataDogSubscriber::class,\n   ];\n\n\n   \/**\n    * The model observers for your application.\n    *\n    * @var array&amp;amp;lt;class-string, array&amp;amp;lt;int, class-string&amp;amp;gt;&amp;amp;gt;\n    *\/\n   protected $observers = [\n       \/\/ ...\n   ];\n}\n\n<\/pre>\n<h2 class=\"has-black-color has-text-color\" id=\"h-monitoring\">Monitoring<\/h2>\n<p>Once we had instrumentation, we needed to do something with it.\u00a0Collecting the data does very little if you aren\u2019t able to do something with it.<\/p>\n<h3 class=\"has-black-color has-text-color\" id=\"h-dashboards\">Dashboards<\/h3>\n<p>It was time to build our single pane of glass to see the health of our systems at a glance.\u00a0Enter Datadog Dashboards.\u00a0You can pull arbitrary metrics, graphs, and visuals together into a single stop.\u00a0They have widgets for time series, charts, arbitrary query values, heatmaps; the list goes on and on.\u00a0Here is a dashboard we built for our data pipeline:<\/p>\n<p class=\"has-text-align-center\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lh7-us.googleusercontent.com\/vDjMY1eLCh4XkwaRyFA3dMbVNNOUgB-2jPfqg09wrlWKZ1tcdPD-MT8KB8UH9kM08s7qlTyJcfXcJ_6plf5UGSuK6JzLeG9phHbklIbpAfQrAjAcYYfC1jfLlmGcL_Jhn84VNCSySANVLEFMJdjuU4s\" width=\"624\" height=\"453\"\/><\/p>\n<h3>Monitors<\/h3>\n<p>Dashboards are great, but short of a 24\/7 operations team, you won\u2019t be looking at the dashboard at all times. You need to know when it requires your attention.\u00a0Datadog monitors to the rescue.\u00a0Datadog allows you to alert on a wide range of conditions.\u00a0You can create monitors for metrics breaching thresholds, for anomalies and outliers, for host status, for apm, and many more.<\/p>\n<p>Here\u2019s a monitor I set up for our AWS Managed Apache Airflow to alarm if we have too many failed tasks (as you can see, the 22nd was a little bumpy ?:<\/p>\n<p class=\"has-text-align-center\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lh7-us.googleusercontent.com\/z_UYHAjsLazi0qY_G2mq9IL8AIvbsaZTiy9JO2agERiLKKbaQzrQt3Bptidnw18OLODDEia5_qFunrvDuMqxCY_jg7My3stR7HGfloPMweHh3Y3t8XTodBEr_sBaEJKVMvigL0_1gYeC9DCV2Z0GyHc\" width=\"624\" height=\"380\"\/><\/p>\n<p>Nevertheless, with a multitude of monitors in place, we can go about our regular business confident that we will know of any issues as soon as they come up.\u00a0Currently, we have alarms going to the alerts slack room for the respective team.\u00a0In the future, we may implement an on-call rotation with a tool like PagerDuty.\u00a0However, up to this point, our systems have been stable enough, and the teams are responsive enough that something formal hasn\u2019t been necessary.<\/p>\n<h2>Up Next<\/h2>\n<p>We\u2019re getting a lot out of Datadog, but we\u2019ve only scratched the surface of what\u2019s possible with this powerful platform.\u00a0Here are a few areas we haven\u2019t yet implemented but may explore with time.<\/p>\n<h3>DB Monitoring<\/h3>\n<p>High on our list of todos is Datadog\u2019s enhanced Database monitoring.\u00a0This will help us understand query bottlenecks in slow request traces.\u00a0Datadog will present SQL Explain Plans for queries directly in the UI without visiting the SQL terminal and recreating the query.<\/p>\n<p>However, as enticing as this feature is, implementing it won\u2019t be as trivial as other Datadog features.\u00a0To turn database monitoring on for Postgres RDS, you need to run some commands as the database admin, and, more importantly, will need to reboot the DB instance, necessitating an outage.\u00a0We need to schedule this for downtime and haven\u2019t pulled the trigger yet.<\/p>\n<h3>Android Replay Feature<\/h3>\n<p>We will implement the Android replay feature, which provides a visual retrospective of the user session, specifically what the user did before a crash or a particular moment.<\/p>\n<h3>Log Aggregation<\/h3>\n<p>Currently, we use Papertrail for Log management.\u00a0Datadog has log aggregation and monitoring, but we haven\u2019t felt any urgency to migrate over.\u00a0We will explore this in the future.<\/p>\n<h3>Dora Metrics<\/h3>\n<p>Datadog\u2019s beta support for DevOps Research and Assessment (Dora) metrics is equally exciting.\u00a0Dora metrics are intended to inform teams if they are performing DevOps at an elite level.\u00a0We plan to help our teams continuously improve using Dora metrics and have sought a way to gather and present these stats. With Datadog\u2019s Dora Metrics feature we hope we\u2019ve found the solution for both challenges.\u00a0Stay tuned for a more in-depth blog post about our adventures with Dora metrics.<\/p>\n<h2>Looking Ahead<\/h2>\n<p>Our journey with Datadog has underscored the immense value robust instrumentation offers for both day-to-day agility and long-term resilience. Going from blindly firefighting production issues to proactive anomaly detection and informed root cause analysis unblocks teams and delights customers.<\/p>\n<p>As capabilities continue evolving, we are eager to implement database monitors for granular query analysis, leverage mobile replay to reconstruct crashes, and potentially migrate log streams. Each innovation promises further gains. We are proud of the visibility our teams now wield to increase development velocity. But perhaps more importantly, we are confident that with comprehensive observability, our systems will gracefully scale and withstand inevitable turbulence ahead.<\/p>\n<div class=\"saboxplugin-wrap\" itemtype=\"http:\/\/schema.org\/Person\" itemscope=\"\" itemprop=\"author\">\n<div class=\"saboxplugin-tab\">\n<div class=\"saboxplugin-gravatar\"><img loading=\"lazy\" decoding=\"async\" width=\"100\" height=\"100\" alt=\"Ron White AdAction\" itemprop=\"image\" src=\"https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/01\/RonWhite.jpeg\" class=\"lazyload\" bad-src=\"data:image\/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==\"\/><noscript><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.adaction.com\/wp-content\/uploads\/2024\/01\/RonWhite.jpeg\" width=\"100\" height=\"100\" alt=\"Ron White AdAction\" itemprop=\"image\"\/><\/noscript><\/div>\n<div class=\"saboxplugin-desc\">\n<div itemprop=\"description\">\n<p>I am a Software Architect at AdAction. Currently, I\u2019m primarily supporting the Data Team in its effort to build a modern data pipeline. I love problem-solving and consider myself a true polyglot. I have over 25 years of experience wrangling software and almost 10 wrangling children.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div><\/div>\n<p><br \/>\n<br \/><a href=\"https:\/\/www.adaction.com\/blog\/enhancing-devops-with-datadog\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>A key objective of building a DevOps culture is delivering software quickly. However, you need to see where you\u2019re going and where you are at to ensure you\u2019re not rapidly heading into a brick wall. That\u2019s why a core tenet of DevOps is measurement. You need to observe your environment to know you\u2019re not building [&hellip;]<\/p>\n","protected":false},"author":16,"featured_media":15549,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[122],"tags":[],"class_list":["post-15548","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data"],"_links":{"self":[{"href":"https:\/\/scannn.com\/lv\/wp-json\/wp\/v2\/posts\/15548","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/scannn.com\/lv\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/scannn.com\/lv\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/scannn.com\/lv\/wp-json\/wp\/v2\/users\/16"}],"replies":[{"embeddable":true,"href":"https:\/\/scannn.com\/lv\/wp-json\/wp\/v2\/comments?post=15548"}],"version-history":[{"count":0,"href":"https:\/\/scannn.com\/lv\/wp-json\/wp\/v2\/posts\/15548\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/scannn.com\/lv\/wp-json\/wp\/v2\/media\/15549"}],"wp:attachment":[{"href":"https:\/\/scannn.com\/lv\/wp-json\/wp\/v2\/media?parent=15548"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/scannn.com\/lv\/wp-json\/wp\/v2\/categories?post=15548"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/scannn.com\/lv\/wp-json\/wp\/v2\/tags?post=15548"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}