open-tracing
Open-Tracing
现代微服务架构正在逐渐普及。面对真正高并发的生产系统,解耦成大量微服务后,以前容易实现的重点任务变得不容易实现了:用户体验优化、后台真实错误原因分析、系统内各组件的调用情况等。分布式跟踪系统(Zipkin、Dapper、HTrace、X-Trace等)可以解决这个问题,但是这些系统使用不兼容的API,难以整合到一起。
OpenTracing提供平台无关、厂商无关的API,让开发人员可以方便地添加、更换追踪系统。
相当于是在做标准化,类似日志中的SLF4j,目前还在发展中。
Trace概念
1、Trace(追踪):
在广义上,一个trace代表了一个事务或者流程在(分布式)系统中的执行过程。在OpenTracing标准中,trace是多个span组成的一个有向无环图(DAG),每一个span代表trace中被命名并计时的连续性的执行片段。
2、Span(跨度):一个span代表系统中具有开始时间和执行时长的逻辑运行单元。span之间通过嵌套或者顺序排列建立逻辑因果关系。
TraceId作用
- 串起来一次请求
request-id
1
2
3
4{
"RequestId": "4C467B38-3910-447D-87BC-AC049166F216"
/* 返回结果数据 */
}第三方有问题反馈时,可以拿着这个id作为凭证,就省去了很多沟通的问题
1
2
3
4
5
6
7
8[qisheng.li@YD-app-api-01 logs]$ curl -sI 'http://api2.yaduo.com/atourlife/duomicang/queryDuoMiCangTabOtherData?appVer=3.6.0&channelId=10005&platType=1&token=7254035f0e3e4d05bc7af3afb54f313e&deviceId=73519b32-c539-3c18-af4c-ce4523938bb9&activitySource=ydaandroid&activeId=&inactiveId='
HTTP/1.1 200
Date: Fri, 12 Mar 2021 06:19:56 GMT
Content-Type: application/json;charset=UTF-8
Content-Length: 2477
Connection: keep-alive
Set-Cookie: acw_tc=2760829916155299964998880ec4036c629fa0b9319095cdd9fffc150bc930;path=/;HttpOnly;Max-Age=1800
ZIPKIN-TRACE-ID: f39f5791988ff5b2elk关联日志
幂等
OpenZipkin
Brave
Brave is a distributed tracing instrumentation library.
Brave’s dependency-free tracer library works against JRE6+.
可以简单理解为标准的实现(类比logback和log4j)
Trace上下文传递
1 | Client Tracer Server Tracer |
http请求
1 | 2021-03-12 01:29:19.624 INFO [order-center,f211feedd7b9904e,9c4b9442005296fb,true] --- [o-9301-exec-131] http.request.response.log : |
header中的x-b3
开头的会自动传递下去
采样:
1 | Server Tracer |
zipkin
上报
上报方式
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Tracing tracing(@Value("${spring.application.name}") String serviceName, @Value("${spring.zipkin.base-url:}") String zipkinServer) {
Reporter reporter = Reporter.NOOP;
if (StringUtils.isNotBlank(zipkinServer)) {
reporter = AsyncReporter.builder(OkHttpSender.create(zipkinServer))
.queuedMaxSpans(1000) // historical constraint. Note: AsyncReporter supports memory bounds
.messageTimeout(1, TimeUnit.SECONDS)
.metrics(ReporterMetrics.NOOP_METRICS)
.build(SpanBytesEncoder.JSON_V2);
}
final SamplerProperties samplerProperties = new SamplerProperties();
// 默认全采样
samplerProperties.setProbability(1);
return Tracing.newBuilder()
.sampler(new ProbabilityBasedSampler(samplerProperties))
.localServiceName(serviceName)
.propagationFactory(ExtraFieldPropagation.newFactory(B3Propagation.FACTORY, "user-name"))
.currentTraceContext(Slf4jCurrentTraceContext.create(ThreadLocalCurrentTraceContext.newBuilder()
.build()))
.spanReporter(reporter)
.build();
}
- 采样
- Reporter
- eureka支持
- 挂掉影响
- Zipkin 展示端
- zipkin存储
- mysql
- 玩具
- elastic-search
- 调优
- translog
- Refresh_interval
- _id
- 保留几天
- 定时删除脚本
- elastic-search的template
- 调优
系统接入
Spring-Cloud
Sleuth
Sleuth configures everything you need to get started. This includes where trace data (spans) are reported to, how many traces to keep (sampling), if remote fields (baggage) are sent, and which libraries are traced.
Spring Cloud Sleuth integrates with the OpenZipkin Brave tracer via the bridge that is available in the
spring-cloud-sleuth-brave
module.
baggage
Request级别的日志debug开关
- @NewSpan
- @SpanTag
- @ContinueSpan
相关代码位置:
1 | // org.springframework.cloud.sleuth.instrument.web.TraceWebServletAutoConfiguration |
埋点增强
db
线程池
Brave:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36// brave.propagation.CurrentTraceContext
/**
* Decorates the input such that the {@link #get() current trace context} at the time a task is
* scheduled is made current when the task is executed.
*/
public ExecutorService executorService(ExecutorService delegate) {
class CurrentTraceContextExecutorService extends brave.internal.WrappingExecutorService {
protected ExecutorService delegate() {
return delegate;
}
protected <C> Callable<C> wrap(Callable<C> task) {
return CurrentTraceContext.this.wrap(task);
}
protected Runnable wrap(Runnable task) {
return CurrentTraceContext.this.wrap(task);
}
}
return new CurrentTraceContextExecutorService();
}
/** Wraps the input so that it executes with the same context as now. */
public <C> Callable<C> wrap(Callable<C> task) {
final TraceContext invocationContext = get();
class CurrentTraceContextCallable implements Callable<C> {
public C call() throws Exception {
try (Scope scope = maybeScope(invocationContext)) {
return task.call();
}
}
}
return new CurrentTraceContextCallable();
}sleuth:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16// org.springframework.cloud.sleuth.instrument.async.LazyTraceExecutor
public void execute(Runnable command) {
if (this.tracing == null) {
try {
this.tracing = this.beanFactory.getBean(Tracing.class);
}
catch (NoSuchBeanDefinitionException e) {
this.delegate.execute(command);
return;
}
}
this.delegate.execute(new TraceRunnable(this.tracing, spanNamer(), command));
}
// org.springframework.cloud.sleuth.instrument.async.ExecutorBeanPostProcessor 代理逻辑
bolt
feign
rocketmq
问题排查
参考
- Zikin-server运维 - Atour Wiki
- What is Distributed Tracing?
- Spring Cloud Sleuth 2.0概要使用说明 - BTStream’s Blog
- GitHub - spring-cloud/spring-cloud-sleuth: Distributed tracing for spring cloud
- OpenTracing基本原理 - 知乎
- openTracing文档中文版
- GitHub - openzipkin/brave: Java distributed tracing implementation compatible with Zipkin backend services.
- Sleuth-debug-flag - Atour Wiki
- Introducing to Zipkin - Distribution Tracing - ITZone
- OpenZipkin · A distributed tracing system
- GitHub - openzipkin/b3-propagation: Repository that describes and sometimes implements B3 propagation
- 干货 | Qunar全链路跟踪及Debug
- zipkin-Kibana
- elk-Discover - Kibana&_a=(columns:!(_source),index:’01f5dec0-e772-11ea-9d81-e1017b1b6645’,interval:auto,query:(language:lucene,query:ee3ceac468425f6e),sort:!(‘@timestamp’,desc)))