目录

AWS中国云中的调用链监控EC2版

AWS中国云中的调用链监控(EC2版)

问题

现在有个SpringBoot项目部署在AWS中国云,需要进行性能测试。要进行性能测试势必要对服务与服务之间的调用链进行监控,这里就需要部署AWS云的调用链监控来监控请求调用链状况了。

Amazon X-Ray vs AWS CloudWatch Application Signals

AWS X-Ray的Java SDK有一段文是这样的,如下:

End-of-support notice – On February 25th, 2027, Amazon X-Ray will discontinue support for Amazon X-Ray SDKs and daemon. After February 25th, 2027, you will no longer receive updates or releases. For more information on the support timeline, see X-Ray SDK and daemon end of support timeline. We recommend to migrate to OpenTelemetry. For more information on migrating to OpenTelemetry, see Migrating from X-Ray instrumentation to OpenTelemetry instrumentation .

简单来说Amazon X-Ray SDK到2027 年 2 月 25 日之后他们就不维护了,意思以后要用Application Signals来进行监控来,Application Signals实际上就是OpenTelemetry,所以,接下来步骤和OpenTelemetry差不了多少。

AWS CloudWatch Application Signals集成

步骤 1:在账户中启用 Application Signals

这一步我已经完成了跳过。

步骤 2:下载并启动 CloudWatch 代理

# 在EC2上面安装CloudWatch代理程序
sudo yum -y install amazon-cloudwatch-agent
# 复制日志收集配置文件
sudo chmod 757 /opt/aws/amazon-cloudwatch-agent/etc/
# 为CloudWatch代理程序配置collectd插件
sudo mkdir -p /usr/share/collectd
sudo touch /usr/share/collectd/types.db
# 生成新的cloudwatch配置文件 注意这里有环境变量,是预先配置好的:${APPLICATION_NAME} ${DEPLOYMENT_GROUP_NAME}
sudo tee ~/amazon-cloudwatch-agent.json > /dev/null <<EOF
{
    "traces": {
        "traces_collected": {
          "application_signals": {}
        }
    },
    "logs": {
        "metrics_collected": {
          "application_signals": {}
        },
        "logs_collected": {
            "files": {
                "collect_list": [
                    {
                        "file_path": "/var/mtgcms-${APPLICATION_NAME}/log/${APPLICATION_NAME}.log",
                        "log_group_name": "${DEPLOYMENT_GROUP_NAME}-${APPLICATION_NAME}-spring-logging",
                        "log_stream_name": "${DEPLOYMENT_GROUP_NAME}-${APPLICATION_NAME}-spring-logging-{instance_id}.log",
                        "timestamp_format": "%Y-%m-%d %H:%M:%S.%f",
                        "multi_line_start_pattern": "{timestamp_format}"
                    }
                ]
            }
        }
    },
    "agent": {
        "metrics_collection_interval": 60,
        "run_as_user": "root"
    },
    "metrics": {
        "aggregation_dimensions": [
            [
                "InstanceId"
            ]
        ],
        "append_dimensions": {
            "AutoScalingGroupName": "\${aws:AutoScalingGroupName}",
            "ImageId": "\${aws:ImageId}",
            "InstanceId": "\${aws:InstanceId}",
            "InstanceType": "\${aws:InstanceType}"
        },
        "metrics_collected": {
            "collectd": {
                "metrics_aggregation_interval": 60
            },
            "cpu": {
                "measurement": [
                    "cpu_usage_idle",
                    "cpu_usage_iowait",
                    "cpu_usage_user",
                    "cpu_usage_system"
                ],
                "metrics_collection_interval": 60,
                "resources": [
                    "*"
                ],
                "totalcpu": false
            },
            "disk": {
                "measurement": [
                    "used_percent",
                    "inodes_free"
                ],
                "metrics_collection_interval": 60,
                "resources": [
                    "*"
                ]
            },
            "diskio": {
                "measurement": [
                    "io_time"
                ],
                "metrics_collection_interval": 60,
                "resources": [
                    "*"
                ]
            },
            "mem": {
                "measurement": [
                    "mem_used_percent"
                ],
                "metrics_collection_interval": 60
            },
            "statsd": {
                "metrics_aggregation_interval": 60,
                "metrics_collection_interval": 60,
                "service_address": ":8125"
            },
            "swap": {
                "measurement": [
                    "swap_used_percent"
                ],
                "metrics_collection_interval": 60
            }
        }
    }
}
EOF
echo "开始安装amazon-cloudwatch-agent配置文件"
sudo mv -f ~/amazon-cloudwatch-agent.json /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json
# 复原CloudWatch目录权限
sudo chmod 755 /opt/aws/amazon-cloudwatch-agent/etc/
sudo chown root:root /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json
# 启用CloudWatch应用SysV服务
sudo systemctl enable amazon-cloudwatch-agent
sudo systemctl daemon-reload

EC2运行角色权限设置

EC2运行的角色需要有CloudWatchAgentServerPolicy权限策略。

步骤3:配置aws-otel-java-instrumentation

打开 开源项目,下载java的代理jar程序,链接:https://github.com/aws-observability/aws-otel-java-instrumentation/releases/latest/download/aws-opentelemetry-agent.jar
下载这个jar程序后,在我们SpringBoot应用程序的环境变量JAVA_OPTS,内容类似如下配置:

export JAVA_OPTS="-javaagent:/var/aws-opentelemetry-agent.jar -Dotel.resource.attributes=service.name={SpringBoot项目服务名},deployment.environment.name=SpringBoot项目环境变量"

例子如下:

JAVA_OPTS="-javaagent:/var/aws-opentelemetry-agent.jar -Dotel.resource.attributes=service.name=admin,deployment.environment.name=dev -Dspring.profiles.active=dev -server -Xms16g -Xmx16g -XX:MaxGCPauseMillis=500 -XX:+UseParallelGC"

还需要配置一些环境变量:

Environment="OTEL_AWS_APPLICATION_SIGNALS_ENABLED=true"
Environment="OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf"
Environment="OTEL_AWS_APPLICATION_SIGNALS_EXPORTER_ENDPOINT=http://localhost:4316/v1/metrics"
Environment="OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://localhost:4316/v1/traces"

启动SpringBoot项目即可。

效果

https://i-blog.csdnimg.cn/direct/eebc8fdad4764671a16f3540561ca057.png

参考