KL's blog

tomcat8.x parseHost bug导致的性能损耗

2023-03-19T13:39:47.000Z

现象

通过jfr抓取的deoptimization event发现有很多parseHost相关的jit退化

jit deoptimization的日志也有这个

<uncommon_trap thread='6154' reason='range_check' action='make_not_entrant' debug_id='0' compile_id='67267' compiler='c2' level='4' stamp='390.396'>
<jvms bci='4' method='org.apache.tomcat.util.http.parser.HttpParser isNumeric (I)Z' bytes='9' count='23695' iicount='23695'/>
uncommon_trap>
<make_not_entrant thread='6154' compile_id='67267' compiler='c2' level='4' stamp='390.396'/>
67267   !   4       org.apache.tomcat.util.http.parser.HttpParser::isNumeric (9 bytes)   made not entrant
<writer thread='5831'/>

<uncommon_trap thread='5848' reason='range_check' action='none' debug_id='0' compile_id='117864' compiler='c2' level='4' count='16' state='range_check recompiled' stamp='605.407'>
<jvms bci='4' method='org.apache.tomcat.util.http.parser.HttpParser isAlpha (I)Z' bytes='9' count='8591' iicount='8591' range_check_traps='16'/>
<jvms bci='1' method='org.apache.tomcat.util.http.parser.HttpParser$DomainParseState next (I)Lorg/apache/tomcat/util/http/parser/HttpParser$DomainParseState;' bytes='249' count='5394' iicount='5394'/>
uncommon_trap>

线上火焰图，parseHost有较高的占比——1.29（572 samples）
压测时没有发现类似的问题

排查

有哪些类型的host

使用arthas查看线上host的值，发现主要有两种

xx.xx.com
10.10.10.10:2279

第一种是域名形式的，由nginx调用过来。第二种是ip+port的形式，主要是健康检查保活。

第一种形式，在tomcat8.x版本下，会抛出ArrayIndexOutOfBoundException，异常的初始化比较耗费cpu。

第二种形式，则会正常解析结束。

找到有问题的char

出问题的代码如下，就是一个静态的数组，范围是0 ~ 127，存储其是否是字母、数字。

异常的case，是c超过了下标的范围，从而导致 ArrayIndexOutOfBoundsException 异常。

public static boolean isAlpha(int c) {
  // Fast for valid alpha characters, slower for some incorrect
  // ones
  try {
    return IS_ALPHA[c];
  } catch (ArrayIndexOutOfBoundsException ex) {
    return false;
  }
}


public static boolean isNumeric(int c) {
  // Fast for valid numeric characters, slower for some incorrect
  // ones
  try {
    return IS_NUMERIC[c];
  } catch (ArrayIndexOutOfBoundsException ex) {
    return false;
  }
}

异常路径，有异常栈的填充，理论上应该比较慢。可以用monitor看到avg耗时，然后用watch找出耗时长的请求：

[arthas@28]$ watch org.apache.tomcat.util.http.parser.HttpParser isAlpha 'params[0]'  '#cost>0.05' -n 5
method=org.apache.tomcat.util.http.parser.HttpParser.isAlpha location=AtExit
ts=2023-01-09 14:33:38; [cost=0.055174ms] result=@Integer[97]
method=org.apache.tomcat.util.http.parser.HttpParser.isAlpha location=AtExit
ts=2023-01-09 14:33:38; [cost=0.087424ms] result=@Integer[98]
method=org.apache.tomcat.util.http.parser.HttpParser.isAlpha location=AtExit
ts=2023-01-09 14:33:38; [cost=0.117283ms] result=@Integer[-1]
method=org.apache.tomcat.util.http.parser.HttpParser.isAlpha location=AtExit
ts=2023-01-09 14:33:38; [cost=0.105648ms] result=@Integer[-1]
method=org.apache.tomcat.util.http.parser.HttpParser.isAlpha location=AtExit
ts=2023-01-09 14:33:39; [cost=0.060458ms] result=@Integer[-1]
Command execution times exceed limit: 5, so command will exit. You can set it with -n option

可以看到，有异常的值-1，-1会导致ArrayIndexOutOfBoundsException 异常。

有问题的char来源

接着跟进下-1的来源：

[arthas@28]$ stack  org.apache.tomcat.util.http.parser.HttpParser isNumeric  'params[0] < 0' -n 5
ts=2023-01-09 13:19:10;thread_name=http-nio-22794-exec-77;id=1a93;is_daemon=true;priority=5;TCCL=java.net.URLClassLoader@7eda2dbb
    @org.apache.tomcat.util.http.parser.HttpParser.isNumeric()
        at org.apache.tomcat.util.http.parser.HttpParser$DomainParseState.next(HttpParser.java:915)
        at org.apache.tomcat.util.http.parser.HttpParser.readHostDomainName(HttpParser.java:842)
        at org.apache.tomcat.util.http.parser.Host.parse(Host.java:95)
        at org.apache.tomcat.util.http.parser.Host.parse(Host.java:95)
        at org.apache.coyote.AbstractProcessor.parseHost(AbstractProcessor.java:292)
        at org.apache.coyote.http11.Http11Processor.prepareRequest(Http11Processor.java:1203)
        at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:776)
        at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
        at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:806)
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1498)
        at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java.lang.Thread.run(Thread.java:833)

出问题的地方是从readHostDomainName过来的，看下对应实现：

readHostDomainName返回值是ip:port的分隔符，”:”的index位置。

解析逻辑，就是从host对应的reader中，不断地读取字符，传递给状态机。然后状态机根据传入的字符，进行不同状态的流转。

如果传入的是域名，则会一直读取到流结束，流结束之后，read返回的是-1，从而走到异常的逻辑。
如果传入的是ip:port，则会在流结束之前，正常的走到解析流程，不会走到异常的逻辑。

            static int readHostDomainName(Reader reader) throws IOException {
/*838*/         DomainParseState state = DomainParseState.NEW;
/*839*/         int pos = 0;
              // 状态机的逻辑，不断从reader中读出char，喂给状态机
                while (state.mayContinue()) {
                   // 如果流结束了，返回-1怎么办？
/*842*/             state = state.next(reader.read());
/*843*/             ++pos;
                }
              // 找到ip port的分割符（即":"）
/*846*/         if (DomainParseState.COLON == state) {
/*848*/             return pos - 1;
                }
              // 没有找到，直接返回-1
/*850*/         return -1;
            }

看下状态机的状态：

// 状态机实现
private static enum DomainParseState {
  NEW(true, false, false, false, " at the start of"),
  ALPHA(true, true, true, true, " after a letter in"),
  NUMERIC(true, true, true, true, " after a number in"),
  PERIOD(true, false, false, true, " after a period in"),
  HYPHEN(true, true, false, false, " after a hypen in"),
  // 第一种退出场景，读取到ip:port的分割符——COLON(即":")就正常退出了
  COLON(false, false, false, false, " after a colon in"),
  // 第二种退出场景，流结束，一直没有找到colon（比如域名的这种情况）
  END(false, false, false, false, " at the end of");

  private final boolean mayContinue;
  private final boolean allowsHyphen;
  private final boolean allowsPeriod;
  private final boolean allowsEnd;
  private final String errorLocation;

  private DomainParseState(boolean mayContinue, boolean allowsHyphen, boolean allowsPeriod, boolean allowsEnd, String errorLocation) {
    this.mayContinue = mayContinue;
    this.allowsHyphen = allowsHyphen;
    this.allowsPeriod = allowsPeriod;
    this.allowsEnd = allowsEnd;
    this.errorLocation = errorLocation;
  }
  // 异常发生在这里，这个c是从前面的reader中读出来的，
  // 如果流结束，返回的是-1，-1直接进入到isAlpha或者isNumeric，则直接返回false，内部会抛出ArrayIndexOutOfBound异常
  public DomainParseState next(int c) {
    if (HttpParser.isAlpha(c)) {
      return ALPHA;
    }
    if (HttpParser.isNumeric(c)) {
      return NUMERIC;
    }
    if (c == 46) {
      if (this.allowsPeriod) {
        return PERIOD;
      }
      throw new IllegalArgumentException(sm.getString("http.invalidCharacterDomain", new Object[]{Character.toString((char)c), this.errorLocation}));
    }
    if (c == 58) {
      if (this.allowsEnd) {
        return COLON;
      }
      throw new IllegalArgumentException(sm.getString("http.invalidCharacterDomain", new Object[]{Character.toString((char)c), this.errorLocation}));
    }
    // 注意这里，流结束的标致
    if (c == -1) {
      // 从ALPHA或者NUMERIC状态，是allowsEnd的（参见上面的状态声明），直接返回END
      if (this.allowsEnd) {
        return END;
      }
      throw new IllegalArgumentException(sm.getString("http.invalidSegmentEndState", new Object[]{this.name()}));
    }
    if (c == 45) {
      if (this.allowsHyphen) {
        return HYPHEN;
      }
      throw new IllegalArgumentException(sm.getString("http.invalidCharacterDomain", new Object[]{Character.toString((char)c), this.errorLocation}));
    }
    throw new IllegalArgumentException(sm.getString("http.illegalCharacterDomain", new Object[]{Character.toString((char)c)}));
  }

  public boolean mayContinue() {
    return this.mayContinue;
  }
}

解决方案

tomcat已经在8.5.41版本修复，对应的release note：

https://tomcat.apache.org/tomcat-8.5-doc/changelog.html

对应的代码diff，可以看到优先处理流结束的情况，就会避免isAlpha抛异常：

结论

http/1.1之后，要求header中必须存在Host字段。
nginx在转发时，会将Host字段设置为对应的域名。同时探活时是单节点探活，对应的Host是ip:port
低版本的tomcat（< 8.5.41），在解析域名这种host的时候，存在bug。bug会导致isAlpha和isNumeric方法内部抛出ArrayIndexOutofRange异常。异常的影响主要有两点：
- 填充异常栈的cpu开销
- jit deopt的开销（native栈转interpret栈）
性能损失跟客户端的请求量相关，请求量越大，越明显

tomcat应用部署过程（一）

2022-10-23T09:07:04.000Z

从前两篇文章中，我们熟悉了tomcat核心组件的启动过程。但是应用是如何部署的，何时部署的，这些过程仍然没有解释清楚。这篇文章，我们主要分析下应用部署的过程。要厘清楚调用关系，最快的莫过于火焰图。

从火焰图中，可以清晰地看到，spring应用的启动是在HostConfig#deployDirectory中进行的。那么这个HostConfig到底是何方神圣，启动过程中，怎么没有见到他的身影呢？

源码

HostConfig

从哪里来？

public class HostConfig implements LifecycleListener {
 
  // 省略
}

HostConfig是LifecycleListener的实现，通过前面的分析，我们知道所有的Listener都在LifecyBase中注册。开启debug模式，在addListener的时候，添加断点，就不难找到调用链路了。

Digester解析StandardHost过程中创建的HostConfig，默认的我们的server.xml中是没有声明HostConfig的，顺藤摸瓜，可以在代码中找到调用点：

// org.apache.catalina.startup.Catalina#createStartDigester
digester.addRuleSet(new HostRuleSet("Server/Service/Engine/"));

// org.apache.catalina.startup.HostRuleSet#addRuleInstances
digester.addRule(prefix + "Host",
                 new LifecycleListenerRule
                 ("org.apache.catalina.startup.HostConfig",
                  "hostConfigClass"));

干了什么？

先看start方法：

// org.apache.catalina.startup.HostConfig#start
 /**
     * Process a "start" event for this Host.
     */
public void start() {

  if (log.isDebugEnabled())
    log.debug(sm.getString("hostConfig.start"));

  try {
    ObjectName hostON = host.getObjectName();
    oname = new ObjectName
      (hostON.getDomain() + ":type=Deployer,host=" + host.getName());
    // 注册deployer
    Registry.getRegistry(null, null).registerComponent
      (this, oname, this.getClass().getName());
  } catch (Exception e) {
    log.error(sm.getString("hostConfig.jmx.register", oname), e);
  }

  if (!host.getAppBaseFile().isDirectory()) {
    log.error(sm.getString("hostConfig.appBase", host.getName(),
                           host.getAppBaseFile().getPath()));
    host.setDeployOnStartup(false);
    host.setAutoDeploy(false);
  }

  // 尝试deploy一次app，这个开关默认是true
  if (host.getDeployOnStartup())
    deployApps();

}

再看看看他在监听的方法里做了什么：

// org.apache.catalina.startup.HostConfig#lifecycleEvent
 /**
     * Process the START event for an associated Host.
     *
     * @param event The lifecycle event that has occurred
     */
    @Override
    public void lifecycleEvent(LifecycleEvent event) {

        // Identify the host we are associated with
        try {
            host = (Host) event.getLifecycle();
          // 从StandardHost复制一些配置过来
            if (host instanceof StandardHost) {
                setCopyXML(((StandardHost) host).isCopyXML());
                setDeployXML(((StandardHost) host).isDeployXML());
                setUnpackWARs(((StandardHost) host).isUnpackWARs());
                setContextClass(((StandardHost) host).getContextClass());
            }
        } catch (ClassCastException e) {
            log.error(sm.getString("hostConfig.cce", event.getLifecycle()), e);
            return;
        }

      // 这里是listener提供的功能
        // Process the event that has occurred
        if (event.getType().equals(Lifecycle.PERIODIC_EVENT)) {
            check();
        } else if (event.getType().equals(Lifecycle.BEFORE_START_EVENT)) {
            beforeStart();
        } else if (event.getType().equals(Lifecycle.START_EVENT)) {
            start();
        } else if (event.getType().equals(Lifecycle.STOP_EVENT)) {
            stop();
        }
    }

LifeCycleEvent中，除了PERIODIC_EVENT不是状态转移触发的，其他的基本都是状态转移触发的，可以查看前面的相关文章。

PERIODIC_EVENT

事件来源

首先看Lifecycle.PERIODIC_EVENT，这个事件是ContainerBase中发出的， 是在单独的线程中处理的。

1 2	// org.apache.catalina.core.ContainerBase#backgroundProcess fireLifecycleEvent(Lifecycle.PERIODIC_EVENT, null);

ContainerBase在startInternal的最后，如果backgroundProcessorDelay > 0（默认值-1），则会启动一个线程来周期性地调用自身和child容器的backgroundProcess。只有StandardEngine修改了默认值，改为了10，所以会持有这个backgroundProcessor：

// org.apache.catalina.core.StandardEngine#StandardEngine
/**
     * Create a new StandardEngine component with the default basic Valve.
     */
public StandardEngine() {

  super();
  pipeline.setBasic(new StandardEngineValve());
  /* Set the jmvRoute using the system property jvmRoute */
  try {
    setJvmRoute(System.getProperty("jvmRoute"));
  } catch(Exception ex) {
    log.warn(sm.getString("standardEngine.jvmRouteFail"));
  }
  // By default, the engine will hold the reloading thread
  // 这里修改了默认值
  backgroundProcessorDelay = 10;

}

用jstack可以验证下，发现只有一条这个线程：

"ContainerBackgroundProcessor[StandardEngine[Catalina]]" #57 daemon prio=5 os_prio=31 tid=0x0000000118f72000 nid=0x7203 waiting on condition [0x000000017a0ba000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1357)
        at java.lang.Thread.run(Thread.java:748)

线程启动的代码位置：

// org.apache.catalina.core.ContainerBase#startInternal
// Start our thread
threadStart();

// org.apache.catalina.core.ContainerBase#threadStart
 /**
     * Start the background thread that will periodically check for
     * session timeouts.
     */
    protected void threadStart() {

        if (thread != null)
            return;
      // 注意虽然Host/Context/Wrapper也继承了ContainerBase，但是这个值都是默认的-1，不会创建线程
      // StandardEngine修改了默认值，所以会有这个线程，线程内会调用子容器的backgroundProcess（）方法
        if (backgroundProcessorDelay <= 0)
            return;

        threadDone = false;
        String threadName = "ContainerBackgroundProcessor[" + toString() + "]";
        thread = new Thread(new ContainerBackgroundProcessor(), threadName);
        thread.setDaemon(true);
        thread.start();

    }

// org.apache.catalina.core.ContainerBase.ContainerBackgroundProcessor
 protected class ContainerBackgroundProcessor implements Runnable {
   @Override
   public void run() {
     Throwable t = null;
     String unexpectedDeathMessage = sm.getString(
       "containerBase.backgroundProcess.unexpectedThreadDeath",
       Thread.currentThread().getName());
     try {
       while (!threadDone) {
         try {
           // 这里有sleep
           Thread.sleep(backgroundProcessorDelay * 1000L);
         } catch (InterruptedException e) {
           // Ignore
         }
         if (!threadDone) {
           processChildren(ContainerBase.this);
         }
       }
     } catch (RuntimeException|Error e) {
       t = e;
       throw e;
     } finally {
       if (!threadDone) {
         log.error(unexpectedDeathMessage, t);
       }
     }
   }

// org.apache.catalina.core.ContainerBase.ContainerBackgroundProcessor#processChildren
 protected void processChildren(Container container) {
   ClassLoader originalClassLoader = null;

   try {
     if (container instanceof Context) {
       Loader loader = ((Context) container).getLoader();
       // Loader will be null for FailedContext instances
       if (loader == null) {
         return;
       }

       // Ensure background processing for Contexts and Wrappers
       // is performed under the web app's class loader
       originalClassLoader = ((Context) container).bind(false, null);
     }
     // 调用自身的，
     container.backgroundProcess();
     Container[] children = container.findChildren();
     for (int i = 0; i < children.length; i++) {
       if (children[i].getBackgroundProcessorDelay() <= 0) {
         // 递归处理子容器
         processChildren(children[i]);
       }
     }
   } catch (Throwable t) {
     ExceptionUtils.handleThrowable(t);
     log.error("Exception invoking periodic operation: ", t);
   } finally {
     if (container instanceof Context) {
       ((Context) container).unbind(false, originalClassLoader);
     }
   }
 }

对应操作

StandardEngine会递归的调用子容器的backgroundProcess方法，该方法中会发出PERIODIC_EVENT。

StandardHost发出PERIODIC_EVENT，HostConfig作为其listener接收到PERIODIC_EVENT，会执行check的逻辑，

// org.apache.catalina.startup.HostConfig#check()

 /**
     * Check status of all webapps.
     */
    protected void check() {

      // 是否开启自动部署
        if (host.getAutoDeploy()) {
            // Check for resources modification to trigger redeployment
            DeployedApplication[] apps =
                deployed.values().toArray(new DeployedApplication[0]);
            for (int i = 0; i < apps.length; i++) {
                if (!isServiced(apps[i].name))
                    checkResources(apps[i], false);
            }

            // Check for old versions of applications that can now be undeployed
            if (host.getUndeployOldVersions()) {
                checkUndeploy();
            }

            // Hotdeploy applications
            deployApps();
        }
    }

// org.apache.catalina.startup.HostConfig#deployApps()
/**
     * Deploy applications for any directories or WAR files that are found
     * in our "application root" directory.
     */
    protected void deployApps() {

        File appBase = host.getAppBaseFile();
        File configBase = host.getConfigBaseFile();
        String[] filteredAppPaths = filterAppPaths(appBase.list());
        // Deploy XML descriptors from configBase
        deployDescriptors(configBase, configBase.list());
      // 部署war包
        // Deploy WARs
        deployWARs(appBase, filteredAppPaths);
      //部署 war_exploded
        // Deploy expanded folders
        deployDirectories(appBase, filteredAppPaths);

    }

这三种形式的deploy最终都会以任务的形式提交到host的startStopExecutor中（不阻塞其他的Listener），

deployDescriptors -> DeployDescriptor
deployWARs -> DeployWar
deployDirectories -> DeployDirectory

最终也会调用HostConfig的方法进行部署，以DeployDirectory为例，最终调用org.apache.catalina.startup.HostConfig#deployDirectory。

这个过程跟火焰图中的调用栈就对得上了。

// org.apache.catalina.startup.HostConfig#deployDirectory
 Class clazz = Class.forName(host.getConfigClass());
LifecycleListener listener =
  (LifecycleListener) clazz.newInstance();
context.addLifecycleListener(listener);

context.setName(cn.getName());
context.setPath(cn.getPath());
context.setWebappVersion(cn.getVersion());
context.setDocBase(cn.getBaseName());
host.addChild(context);

核心的代码就是创建Contex，添加为host的子容器。Context可以通过META-INF/context.xml里定制，如果没有的话，会走默认的。这样应用就添加到了tomcat里。子容器在添加之后，host会调用其start方法，触发它的初始化流程。

BEFORE_START_EVENT

创建server.xml中声明的appBase和configBase目录：

// org.apache.catalina.startup.HostConfig#beforeStart
 if (host.getCreateDirs()) {
            File[] dirs = new File[] {host.getAppBaseFile(),host.getConfigBaseFile()};
            for (int i=0; i
                if (!dirs[i].mkdirs() && !dirs[i].isDirectory()) {
                    log.error(sm.getString("hostConfig.createDirs",dirs[i]));
                }
            }
        }

START_EVENT

//  org.apache.catalina.startup.HostConfig#start
 /**
     * Process a "start" event for this Host.
     */
    public void start() {

        if (log.isDebugEnabled())
            log.debug(sm.getString("hostConfig.start"));

        try {
            ObjectName hostON = host.getObjectName();
            oname = new ObjectName
                (hostON.getDomain() + ":type=Deployer,host=" + host.getName());
            Registry.getRegistry(null, null).registerComponent
                (this, oname, this.getClass().getName());
        } catch (Exception e) {
            log.error(sm.getString("hostConfig.jmx.register", oname), e);
        }

        if (!host.getAppBaseFile().isDirectory()) {
            log.error(sm.getString("hostConfig.appBase", host.getName(),
                    host.getAppBaseFile().getPath()));
            host.setDeployOnStartup(false);
            host.setAutoDeploy(false);
        }

        if (host.getDeployOnStartup())
            deployApps();

    }

这里只是注册HostConfig到Mbean的Registry中，如果开启了deployOnStartup，这里也会尝试部署一次应用。

STOP_EVENT

// org.apache.catalina.startup.HostConfig#stop
 /**
     * Process a "stop" event for this Host.
     */
    public void stop() {

        if (log.isDebugEnabled())
            log.debug(sm.getString("hostConfig.stop"));

        if (oname != null) {
            try {
                Registry.getRegistry(null, null).unregisterComponent(oname);
            } catch (Exception e) {
                log.error(sm.getString("hostConfig.jmx.unregister", oname), e);
            }
        }
        oname = null;
    }

同理，stop中，只是将自身从Registry中移除。

ContextConfig

从哪里来？

和HostConfig类似，Context会有一个对应的LifecycleListener，叫做ContextConfig。他也是在创建的时候默认指定的：

// org.apache.catalina.startup.ContextRuleSet
digester.addRule(prefix + "Context",
                 new LifecycleListenerRule
                 ("org.apache.catalina.startup.ContextConfig",
                  "configClass"));

干了什么？

看下他在监听部分做了什么：

//org.apache.catalina.startup.ContextConfig#lifecycleEvent
 @Override
    public void lifecycleEvent(LifecycleEvent event) {

        // Identify the context we are associated with
        try {
            context = (Context) event.getLifecycle();
        } catch (ClassCastException e) {
            log.error(sm.getString("contextConfig.cce", event.getLifecycle()), e);
            return;
        }

        // Process the event that has occurred
        if (event.getType().equals(Lifecycle.CONFIGURE_START_EVENT)) {
            configureStart();
        } else if (event.getType().equals(Lifecycle.BEFORE_START_EVENT)) {
            beforeStart();
        } else if (event.getType().equals(Lifecycle.AFTER_START_EVENT)) {
            // Restore docBase for management tools
            if (originalDocBase != null) {
                context.setDocBase(originalDocBase);
            }
        } else if (event.getType().equals(Lifecycle.CONFIGURE_STOP_EVENT)) {
            configureStop();
        } else if (event.getType().equals(Lifecycle.AFTER_INIT_EVENT)) {
            init();
        } else if (event.getType().equals(Lifecycle.AFTER_DESTROY_EVENT)) {
            destroy();
        }

    }

CONFIGURE_START_EVENT

事件来源

StandardContext在启动的时候会发出这个事件，Listener在收到这个event之后，会做一些初始化的准备工作。listener逻辑执行完成之后，会继续执行Context启动的后续逻辑

// org.apache.catalina.core.StandardContext#startInternal
// Notify our interested LifecycleListeners
fireLifecycleEvent(Lifecycle.CONFIGURE_START_EVENT, null);

 // Start our child containers, if not already started
// 子容器启动（ServletWrapper）
for (Container child : findChildren()) {
  if (!child.getState().isAvailable()) {
    child.start();
  }
}

// Start the Valves in our pipeline (including the basic),
// if any
// pipeline的初始化，会拉起valve的初始化
if (pipeline instanceof Lifecycle) {
  ((Lifecycle) pipeline).start();
}

// Call ServletContainerInitializers
// SCI初始化，spring boot默认依赖这个机制启动，org.springframework.web.SpringServletContainerInitializer
// Jasper JSP Engine也是通过SCI初始化： org.apache.jasper.servlet.JasperInitializer
for (Map.Entry>> entry :
     initializers.entrySet()) {
  try {
    entry.getKey().onStartup(entry.getValue(),
                             getServletContext());
  } catch (ServletException e) {
    log.error(sm.getString("standardContext.sciFail"), e);
    ok = false;
    break;
  }
}

// Configure and call application event listeners
// ServletContextListener的初始化，使用spring父子容器的话，这里会拉起父容器
// spring的listener： org.springframework.web.context.ContextLoaderListener
if (ok) {
  if (!listenerStart()) {
    log.error(sm.getString("standardContext.listenerFail"));
    ok = false;
  }
}

 // Configure and call application filters
// filter启动
if (ok) {
  if (!filterStart()) {
    log.error(sm.getString("standardContext.filterFail"));
    ok = false;
  }
}

// Load and initialize all "load on startup" servlets
// servlet启动，如果servlet设置了load-on-startup
// 如果只是使用了spring mvc，一般就是个servlet，则是在这一步拉起来的
if (ok) {
  if (!loadOnStartup(findChildren())){
    log.error(sm.getString("standardContext.servletFail"));
    ok = false;
  }
}

loadOnStartup如果是true，则启动的时候就拉起Servlet，否则的话是第一个请求过来时触发加载，lazy式的：

// org.apache.catalina.core.StandardContext#loadOnStartup
// Load the collected "load on startup" servlets
for (ArrayList list : map.values()) {
  for (Wrapper wrapper : list) {
    try {
      // 触发servlet加载，走入servlet的声明周期，调用servlet的init方法
      wrapper.load();
    } catch (ServletException e) {
      getLogger().error(sm.getString("standardContext.loadOnStartup.loadException",
                                     getName(), wrapper.getName()), StandardWrapper.getRootCause(e));
      // NOTE: load errors (including a servlet that throws
      // UnavailableException from the init() method) are NOT
      // fatal to application startup
      // unless failCtxIfServletStartFails="true" is specified
      if(getComputedFailCtxIfServletStartFails()) {
        return false;
      }
    }
  }
}

对应操作

在这个事件的处理函数configureStart中，会扫描web.xml以及相关的文件，配置context。最主要的方法是webConfig()。

Scan the web.xml files that apply to the web application and merge them
using the rules defined in the spec. For the global web.xml files,
where there is duplicate configuration, the most specific level wins. ie
an application’s web.xml takes precedence over the host level or global
web.xml file.

值得一提的是，这里的listener处理是同步的，处理完才会返回到主流程中。webConfig中包含了Servlet注解、filter等的扫描，也包含了SCI的处理。

// org.apache.catalina.startup.ContextConfig#configureStart
 /**
     * Process a "contextConfig" event for this Context.
     */
protected synchronized void configureStart() {
  // Called from StandardContext.start()

  // 核心，web.xml, web-fragment.xml, SCI处理
  // ASM读取class上servlet3.0相关的注解(WEB-INF/classes和)
  // 多个fragment合并成一个web.xml，可以log effective web.xml，
  // 处理WEB-INF/classes/META-INF/resources
  // 扫描过程中找到的servlet定义，也会添加为Context的子容器（Wrapper）
  webConfig();

  // 处理Listener/Filter/Servlet上的@Resource注解 JSR250
  if (!context.getIgnoreAnnotations()) {
    applicationAnnotationsConfig();
  }
  if (ok) {
    validateSecurityRoles();
  }

  // Configure an authenticator if we need one
  if (ok) {
    authenticatorConfig();
  }

  // Make our application available if no problems were encountered
  if (ok) {
    context.setConfigured(true);
  } else {
    log.error(sm.getString("contextConfig.unavailable"));
    context.setConfigured(false);
  }

}

logEffectiveWebXml Set to true if you want the effective web.xml used for a web application to be logged (at INFO level) when the application starts. The effective web.xml is the result of combining the application’s web.xml with any defaults configured by Tomcat and any web-fragment.xml files and annotations discovered. If not specified, the default value of false is used.

BEFORE_START_EVENT

调用start之前的钩子，主要是计算docBase

// org.apache.catalina.startup.ContextConfig#beforeStart
/**
     * Process a "before start" event for this Context.
     */
protected synchronized void beforeStart() {

  try {
    fixDocBase();
  } catch (IOException e) {
    log.error(sm.getString(
      "contextConfig.fixDocBase", context.getName()), e);
  }

  antiLocking();
}

AFTER_START_EVENT

Restore docBase for management tools

// Restore docBase for management tools
if (originalDocBase != null) {
  context.setDocBase(originalDocBase);
}

CONFIGURE_STOP_EVENT

和configure start event对应，容器销毁时执行：

Removing children
Removing application parameters
Removing security constraints
Removing Ejbs
Removing environments
Removing errors pages
Removing filter defs
Removing filter maps
Removing local ejbs
Removing Mime mappings
Removing parameters
Removing resource env refs
Removing resource links
Removing resources
Removing security role
Removing servlet mappings
Removing welcome files
Removing wrapper lifecycles
Removing wrapper listeners
Remove (partially) folders and files created by antiLocking
Reset ServletContextInitializer scanning

AFTER_INIT_EVENT

如果存在conf/context.xml，则处理下

// org.apache.catalina.startup.ContextConfig#init
/**
     * Process a "init" event for this Context.
     */
protected void init() {
  // Called from StandardContext.init()

  Digester contextDigester = createContextDigester();
  contextDigester.getParser();

  if (log.isDebugEnabled()) {
    log.debug(sm.getString("contextConfig.init"));
  }
  context.setConfigured(false);
  ok = true;

  contextConfig(contextDigester);
}

AFTER_DESTROY_EVENT

删除对应的work dir

// org.apache.catalina.startup.ContextConfig#destroy
/**
     * Process a "destroy" event for this Context.
     */
protected synchronized void destroy() {
  // Called from StandardContext.destroy()
  if (log.isDebugEnabled()) {
    log.debug(sm.getString("contextConfig.destroy"));
  }

  // Skip clearing the work directory if Tomcat is being shutdown
  Server s = getServer();
  if (s != null && !s.getState().isAvailable()) {
    return;
  }

  // Changed to getWorkPath per Bugzilla 35819.
  if (context instanceof StandardContext) {
    String workDir = ((StandardContext) context).getWorkPath();
    if (workDir != null) {
      ExpandWar.delete(new File(workDir));
    }
  }
}

总结

应用的部署和初始化是依赖于HostConfig的，HostConfig是Host容器的LifecycleListener，如果没有在xml中显式声明的话，会有默认的
HostConfig在start的时候，会尝试deployApps，会首次触发一次应用的部署，也是向startStopExecutor线程池提交一个任务（localhost-startStop）。
Engine在启动结束时，会起一个ContainerBackgroundProcessor的线程，每10s会调用子容器的backgroundProcess方法，Host容器会发出PERIODIC_EVENT
HostConfig监听到PERIODIC_EVENT，会判断是否开启了autoDeploy，如果开启了，则会检查是否有变更。有变更的话会触发部署，向startStopExecutor线程池提交一个任务（localhost-startStop）
任务的主要内容就是创建Context容器，并添加为Host容器的子容器，并触发Context容器的初始化
Context容器也有一个LifeCycleListener——ContextConfig，会接收Context容器相关的事件
Context容器在start时，会发出CONFIGURE_START_EVENT，ContextConfig接收到之后，会扫描web.xml、扫描jar包等，做一些准备的工作
Context容器在调用Listener之后，会初始化他的子容器（ServletWrapper）和pipeline（触发valve的初始化），调用SCI的onstartup方法，按顺序触发ContextListener、Filter、Servlet。
Servlet如果声明了load-on-startup，则会在Context的start方法中被初始化（调用servlet的init方法）
除了这种通过HostConfig触发的应用部署，还有关闭autoDeploy的情况下的部署，我们在下篇文章中再介绍。

tomcat bind/listen/acceptor过程

2022-10-19T14:58:12.000Z

tomcat bind/listen/acceptor过程

经典的网络server，一般有如下的流程：

今天来看下tomcat对应的步骤是如何实现的。

日志

几个关键日志：

19-Oct-2022 11:32:58.320 INFO [main] org.apache.coyote.AbstractProtocol.init Initializing ProtocolHandler ["http-nio-8080"]
19-Oct-2022 11:32:58.407 INFO [main] org.apache.tomcat.util.net.NioSelectorPool.getSharedSelector Using a shared selector for servlet write/read
19-Oct-2022 11:33:07.092 INFO [main] org.apache.catalina.startup.Catalina.load Initialization processed in 9593 ms

19-Oct-2022 11:33:07.123 INFO [main] org.apache.catalina.core.StandardService.startInternal Starting service [Catalina]
19-Oct-2022 11:33:07.124 INFO [main] org.apache.catalina.core.StandardEngine.startInternal Starting Servlet engine: [Apache Tomcat/8.5.66]
19-Oct-2022 11:33:07.165 INFO [main] org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler ["http-nio-8080"]
19-Oct-2022 11:33:50.942 INFO [main] org.apache.catalina.startup.Catalina.start Server startup in 43847 ms

对应关系：

bind & listen -> Initializing ProtocolHandler [“http-nio-8080”]
accept -> Starting ProtocolHandler [“http-nio-8080”]

tomcat的组件都实现了LifeCycle接口，都会有init和start方法。bind和listen默认就是在init方法中初始化的；accept是Acceptor初始化之后开始的，是在start方法中进行的。

bind & listen

org.apache.catalina.startup.Catalina.load Initialization processed in 9593 ms

// org.apache.catalina.startup.Catalina#load()
 // Start the new server
try {
  getServer().init();
} catch (LifecycleException e) {
  if (Boolean.getBoolean("org.apache.catalina.startup.EXIT_ON_INIT_FAILURE")) {
    throw new java.lang.Error(e);
  } else {
    log.error("Catalina.start", e);
  }
}

long t2 = System.nanoTime();
if(log.isInfoEnabled()) {
  log.info("Initialization processed in " + ((t2 - t1) / 1000000) + " ms");
}

StandardServer的init会触发子组件的init，直到AbstractProtocol的init：

"main@1" prio=5 tid=0x1 nid=NA runnable
  java.lang.Thread.State: RUNNABLE
    #AbstractEndpoint.init
  at org.apache.tomcat.util.net.AbstractEndpoint.init(AbstractEndpoint.java:1153)
    #AbstractJsseEndpoint.init
  at org.apache.tomcat.util.net.AbstractJsseEndpoint.init(AbstractJsseEndpoint.java:222)
#AbstractProtocol.init
  at org.apache.coyote.AbstractProtocol.init(AbstractProtocol.java:599)
  at org.apache.coyote.http11.AbstractHttp11Protocol.init(AbstractHttp11Protocol.java:80)
    #Connector.initInternal
  at org.apache.catalina.connector.Connector.initInternal(Connector.java:1074)
  at org.apache.catalina.util.LifecycleBase.init(LifecycleBase.java:136)
  - locked <0x99f> (a org.apache.catalina.connector.Connector)
    #StandardService.initInternal
  at org.apache.catalina.core.StandardService.initInternal(StandardService.java:552)
  - locked <0x9c6> (a java.lang.Object)
  at org.apache.catalina.util.LifecycleBase.init(LifecycleBase.java:136)
  - locked <0x9a0> (a org.apache.catalina.core.StandardService)
    #StandardServer.initInternal
  at org.apache.catalina.core.StandardServer.initInternal(StandardServer.java:846)
  at org.apache.catalina.util.LifecycleBase.init(LifecycleBase.java:136)
  - locked <0x9a1> (a org.apache.catalina.core.StandardServer)
    #catalina.load
  at org.apache.catalina.startup.Catalina.load(Catalina.java:639)
  at org.apache.catalina.startup.Catalina.load(Catalina.java:662)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-1)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at org.apache.catalina.startup.Bootstrap.load(Bootstrap.java:302)
  at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:472)

Initializing ProtocolHandler [“http-nio-8080”]

// org.apache.coyote.AbstractProtocol#init
@Override
public void init() throws Exception {
  // 日志输出的地方
  if (getLog().isInfoEnabled()) {
    getLog().info(sm.getString("abstractProtocolHandler.init", getName()));
  }

  if (oname == null) {
    // Component not pre-registered so register it
    oname = createObjectName();
    if (oname != null) {
      Registry.getRegistry(null, null).registerComponent(this, oname, null);
    }
  }

  if (this.domain != null) {
    ObjectName rgOname = new ObjectName(domain + ":type=GlobalRequestProcessor,name=" + getName());
    this.rgOname = rgOname;
    Registry.getRegistry(null, null).registerComponent(
      getHandler().getGlobal(), rgOname, null);
  }

  String endpointName = getName();
  endpoint.setName(endpointName.substring(1, endpointName.length()-1));
  endpoint.setDomain(domain);
// 调用endpoint的init
  endpoint.init();
}

endpoint实际进行了bind和listen

// org.apache.tomcat.util.net.AbstractEndpoint#init
public void init() throws Exception {
  // 注意这里有开关控制，默认是true
  if (bindOnInit) {
    // 这里有bind
    bind();
    bindState = BindState.BOUND_ON_INIT;
  }
  if (this.domain != null) {
    // Register endpoint (as ThreadPool - historical name)
    oname = new ObjectName(domain + ":type=ThreadPool,name=\"" + getName() + "\"");
    Registry.getRegistry(null, null).registerComponent(this, oname, null);

    ObjectName socketPropertiesOname = new ObjectName(domain +
                                                      ":type=SocketProperties,name=\"" + getName() + "\"");
    socketProperties.setObjectName(socketPropertiesOname);
    Registry.getRegistry(null, null).registerComponent(socketProperties, socketPropertiesOname, null);

    for (SSLHostConfig sslHostConfig : findSslHostConfigs()) {
      registerJmx(sslHostConfig);
    }
  }
}

// org.apache.tomcat.util.net.NioEndpoint#bind
 /**
     * Initialize the endpoint.
     */
@Override
public void bind() throws Exception {

  if (!getUseInheritedChannel()) {
    serverSock = ServerSocketChannel.open();
    socketProperties.setProperties(serverSock.socket());
    InetSocketAddress addr = (getAddress()!=null?new InetSocketAddress(getAddress(),getPort()):new InetSocketAddress(getPort()));
    // 这里进行了bind和listen
    serverSock.socket().bind(addr,getAcceptCount());
  } else {
    // Retrieve the channel provided by the OS
    Channel ic = System.inheritedChannel();
    if (ic instanceof ServerSocketChannel) {
      serverSock = (ServerSocketChannel) ic;
    }
    if (serverSock == null) {
      throw new IllegalArgumentException(sm.getString("endpoint.init.bind.inherited"));
    }
  }
  serverSock.configureBlocking(true); //mimic APR behavior

  // Initialize thread count defaults for acceptor, poller
  if (acceptorThreadCount == 0) {
    // FIXME: Doesn't seem to work that well with multiple accept threads
    acceptorThreadCount = 1;
  }
  if (pollerThreadCount <= 0) {
    //minimum one poller thread
    pollerThreadCount = 1;
  }
  setStopLatch(new CountDownLatch(pollerThreadCount));

  // Initialize SSL if needed
  initialiseSsl();

  selectorPool.open();
}


// java.net.ServerSocket#bind(java.net.SocketAddress, int)
 public void bind(SocketAddress endpoint, int backlog) throws IOException {
   if (isClosed())
     throw new SocketException("Socket is closed");
   if (!oldImpl && isBound())
     throw new SocketException("Already bound");
   if (endpoint == null)
     endpoint = new InetSocketAddress(0);
   if (!(endpoint instanceof InetSocketAddress))
     throw new IllegalArgumentException("Unsupported address type");
   InetSocketAddress epoint = (InetSocketAddress) endpoint;
   if (epoint.isUnresolved())
     throw new SocketException("Unresolved address");
   if (backlog < 1)
     backlog = 50;
   try {
     SecurityManager security = System.getSecurityManager();
     if (security != null)
       security.checkListen(epoint.getPort());
     // 先bind端口
     getImpl().bind(epoint.getAddress(), epoint.getPort());
     // 再listen，listen时内核会创建SYN Queue和Accept Queue
     getImpl().listen(backlog);
     bound = true;
   } catch(SecurityException e) {
     bound = false;
     throw e;
   } catch(IOException e) {
     bound = false;
     throw e;
   }
 }

至此已经bind和listen，但是应用层还没有accept连接，如果此时有请求过来，都是待在SYN Queue和Accept Queue中。

bindOnInit配置：

Controls when the socket used by the connector is bound.
By default it is bound when the connector is initiated and unbound when the connector is destroyed.
If set to false, the socket will be bound when the connector is started and unbound when it is stopped.

accept

org.apache.catalina.startup.Catalina.start Server startup in 43847 ms

对应代码：

// org.apache.catalina.startup.Catalina#start
 // Start the new server
try {
  getServer().start();
} catch (LifecycleException e) {
  log.fatal(sm.getString("catalina.serverStartFail"), e);
  try {
    getServer().destroy();
  } catch (LifecycleException e1) {
    log.debug("destroy() failed for failed Server ", e1);
  }
  return;
}

long t2 = System.nanoTime();
if(log.isInfoEnabled()) {
  log.info("Server startup in " + ((t2 - t1) / 1000000) + " ms");
}

同样的StandardServer的start也会触发子组件的start

"main@1" prio=5 tid=0x1 nid=NA runnable
  java.lang.Thread.State: RUNNABLE
    #NioEndpoint.startInternal
  at org.apache.tomcat.util.net.NioEndpoint.startInternal(NioEndpoint.java:261)
    #AbstractEndpoint.start
  at org.apache.tomcat.util.net.AbstractEndpoint.start(AbstractEndpoint.java:1219)
    #AbstractProtocol.start
  at org.apache.coyote.AbstractProtocol.start(AbstractProtocol.java:609)
    #Connector.startInternal
  at org.apache.catalina.connector.Connector.startInternal(Connector.java:1099)
  at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:183)
  - locked <0x99d> (a org.apache.catalina.connector.Connector)
    #StandardService.startInternal
  at org.apache.catalina.core.StandardService.startInternal(StandardService.java:440)
  - locked <0xa6a> (a java.lang.Object)
  at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:183)
  - locked <0x99e> (a org.apache.catalina.core.StandardService)
    #StandardServer.startInternal
  at org.apache.catalina.core.StandardServer.startInternal(StandardServer.java:766)
  - locked <0xa6b> (a java.lang.Object)
  at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:183)
  - locked <0x99f> (a org.apache.catalina.core.StandardServer)
    #Catalina.start
  at org.apache.catalina.startup.Catalina.start(Catalina.java:688)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-1)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:342)
  at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:473)

Starting ProtocolHandler [“http-nio-8080”]

// org.apache.coyote.AbstractProtocol#start
@Override
public void start() throws Exception {
  // 日志输出的地方
  if (getLog().isInfoEnabled()) {
    getLog().info(sm.getString("abstractProtocolHandler.start", getName()));
  }

  // 调用endpoint的start
  endpoint.start();

  // Start timeout thread
  asyncTimeout = new AsyncTimeout();
  Thread timeoutThread = new Thread(asyncTimeout, getNameInternal() + "-AsyncTimeout");
  int priority = endpoint.getThreadPriority();
  if (priority < Thread.MIN_PRIORITY || priority > Thread.MAX_PRIORITY) {
    priority = Thread.NORM_PRIORITY;
  }
  timeoutThread.setPriority(priority);
  timeoutThread.setDaemon(true);
  timeoutThread.start();
}

AbstractProtocol最终调用endpoint的start方法：

// org.apache.tomcat.util.net.AbstractEndpoint#start
public final void start() throws Exception {
  // 如果没有初始化，就触发一次bind
  if (bindState == BindState.UNBOUND) {
    bind();
    bindState = BindState.BOUND_ON_START;
  }
  startInternal();
}

// org.apache.tomcat.util.net.NioEndpoint#startInternal
/**
     * Start the NIO endpoint, creating acceptor, poller threads.
     */
@Override
public void startInternal() throws Exception {

  if (!running) {
    running = true;
    paused = false;

    processorCache = new SynchronizedStack<>(SynchronizedStack.DEFAULT_SIZE,
                                             socketProperties.getProcessorCache());
    eventCache = new SynchronizedStack<>(SynchronizedStack.DEFAULT_SIZE,
                                         socketProperties.getEventCache());
    nioChannels = new SynchronizedStack<>(SynchronizedStack.DEFAULT_SIZE,
                                          socketProperties.getBufferPool());

    // Create worker collection
    // 创建线程池
    if (getExecutor() == null) {
      createExecutor();
    }

    // 创建maxConnections限制
    initializeConnectionLatch();

    // Start poller threads
    // Poller线程
    pollers = new Poller[getPollerThreadCount()];
    for (int i=0; i
      pollers[i] = new Poller();
      Thread pollerThread = new Thread(pollers[i], getName() + "-ClientPoller-"+i);
      pollerThread.setPriority(threadPriority);
      pollerThread.setDaemon(true);
      pollerThread.start();
    }

    // 启动acceptor线程
    startAcceptorThreads();
  }
}


// org.apache.tomcat.util.net.AbstractEndpoint#startAcceptorThreads
protected final void startAcceptorThreads() {
  int count = getAcceptorThreadCount();
  acceptors = new Acceptor[count];

  for (int i = 0; i < count; i++) {
    acceptors[i] = createAcceptor();
    String threadName = getName() + "-Acceptor-" + i;
    acceptors[i].setThreadName(threadName);
    Thread t = new Thread(acceptors[i], threadName);
    t.setPriority(getAcceptorThreadPriority());
    t.setDaemon(getDaemon());
    t.start();
  }
}

至此acceptor线程启动，tomcat具备了accept的能力。看下Acceptor线程是干啥的：

// org.apache.tomcat.util.net.NioEndpoint.Acceptor
/**
     * The background thread that listens for incoming TCP/IP connections and
     * hands them off to an appropriate processor.
     */
protected class Acceptor extends AbstractEndpoint.Acceptor {

  @Override
  public void run() {

    int errorDelay = 0;

    // Loop until we receive a shutdown command
    while (running) {

      // Loop if endpoint is paused
      while (paused && running) {
        state = AcceptorState.PAUSED;
        try {
          Thread.sleep(50);
        } catch (InterruptedException e) {
          // Ignore
        }
      }

      if (!running) {
        break;
      }
      state = AcceptorState.RUNNING;

      try {
        //if we have reached max connections, wait
        countUpOrAwaitConnection();

        SocketChannel socket = null;
        try {
          // Accept the next incoming connection from the server
          // socket
          // 注意这里accept了
          socket = serverSock.accept();
        } catch (IOException ioe) {
          // We didn't get a socket
          countDownConnection();
          if (running) {
            // Introduce delay if necessary
            errorDelay = handleExceptionWithDelay(errorDelay);
            // re-throw
            throw ioe;
          } else {
            break;
          }
        }
        // Successful accept, reset the error delay
        errorDelay = 0;

        // Configure the socket
        if (running && !paused) {
          // setSocketOptions() will hand the socket off to
          // an appropriate processor if successful
          if (!setSocketOptions(socket)) {
            closeSocket(socket);
          }
        } else {
          closeSocket(socket);
        }
      } catch (Throwable t) {
        ExceptionUtils.handleThrowable(t);
        log.error(sm.getString("endpoint.accept.fail"), t);
      }
    }
    state = AcceptorState.ENDED;
  }


  private void closeSocket(SocketChannel socket) {
    countDownConnection();
    try {
      socket.socket().close();
    } catch (IOException ioe)  {
      if (log.isDebugEnabled()) {
        log.debug(sm.getString("endpoint.err.close"), ioe);
      }
    }
    try {
      socket.close();
    } catch (IOException ioe) {
      if (log.isDebugEnabled()) {
        log.debug(sm.getString("endpoint.err.close"), ioe);
      }
    }
  }
}

其他组件初始化时机

其他组件初始化是在bind/accept之间，还是之后？

本地测试结果：

19-Oct-2022 14:51:28.147 INFO [main] org.apache.coyote.AbstractProtocol.init Initializing ProtocolHandler ["http-nio-8080"]
19-Oct-2022 14:51:28.164 INFO [main] org.apache.tomcat.util.net.NioSelectorPool.getSharedSelector Using a shared selector for servlet write/read
19-Oct-2022 14:51:28.176 INFO [main] org.apache.catalina.startup.Catalina.load Initialization processed in 501 ms
19-Oct-2022 14:51:28.237 INFO [main] org.apache.catalina.core.StandardService.startInternal Starting service [Catalina]
19-Oct-2022 14:51:28.237 INFO [main] org.apache.catalina.core.StandardEngine.startInternal Starting Servlet engine: [Apache Tomcat/8.5.66]
19-Oct-2022 14:51:28.251 INFO [main] org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler ["http-nio-8080"]
19-Oct-2022 14:51:28.257 INFO [main] org.apache.catalina.startup.Catalina.start Server startup in 80 ms
Connected to server
[2022-10-19 02:51:28,339] Artifact web:war exploded: Artifact is being deployed, please wait...
19-Oct-2022 14:51:28.747 INFO [RMI TCP Connection(2)-127.0.0.1] org.apache.jasper.servlet.TldScanner.scanJars At least one JAR was scanned for TLDs yet contained no TLDs. Enable debug logging for this logger for a complete list of JARs that were scanned but no TLDs were found in them. Skipping unneeded JARs during scanning can improve startup time and JSP compilation time.
14:51:28.803 [RMI TCP Connection(2)-127.0.0.1] INFO  c.a.context.SimpleLogContextListener - contextInitialized... begin sleep
19-Oct-2022 14:51:38.253 INFO [localhost-startStop-1] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory [/Users/qishengli/Downloads/apache-tomcat-8.5.66/webapps/manager]
19-Oct-2022 14:51:38.278 INFO [localhost-startStop-1] org.apache.jasper.servlet.TldScanner.scanJars At least one JAR was scanned for TLDs yet contained no TLDs. Enable debug logging for this logger for a complete list of JARs that were scanned but no TLDs were found in them. Skipping unneeded JARs during scanning can improve startup time and JSP compilation time.
19-Oct-2022 14:51:38.296 INFO [localhost-startStop-1] org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web application directory [/Users/qishengli/Downloads/apache-tomcat-8.5.66/webapps/manager] has finished in [42] ms
14:51:38.812 [RMI TCP Connection(2)-127.0.0.1] INFO  com.air.filter.TestFilter - initing Filter...
14:51:48.818 [RMI TCP Connection(2)-127.0.0.1] INFO  com.air.TestServlet2 - init TestServlet2...
14:51:58.819 [RMI TCP Connection(2)-127.0.0.1] INFO  com.air.SampleServlet - initing sample servlet
14:51:58.820 [RMI TCP Connection(2)-127.0.0.1] INFO  com.air.TestServlet3 - init TestServlet3...
[2022-10-19 02:52:08,833] Artifact web:war exploded: Artifact is deployed successfully
[2022-10-19 02:52:08,833] Artifact web:war exploded: Deploy took 40,494 milliseconds

Initializing ProtocolHandler [“http-nio-8080”] -> Starting ProtocolHandler [“http-nio-8080”] -> contextInitialized… begin sleep (Context Listener) -> initing Filter… (Filter)

-> init TestServlet2 （@WebServlet） -> init TestServlet3 (web.xml配置的servlet)

线上日志：

19-Oct-2022 14:29:36.778 INFO [main] org.apache.coyote.AbstractProtocol.init Initializing ProtocolHandler ["http-nio-21002"]
19-Oct-2022 14:29:36.790 INFO [main] org.apache.tomcat.util.net.NioSelectorPool.getSharedSelector Using a shared selector for servlet write/read
19-Oct-2022 14:29:36.796 INFO [main] org.apache.catalina.startup.Catalina.load Initialization processed in 448 ms
19-Oct-2022 14:29:36.801 INFO [main] org.apache.catalina.core.StandardService.startInternal Starting service [Catalina]
19-Oct-2022 14:29:36.801 INFO [main] org.apache.catalina.core.StandardEngine.startInternal Starting Servlet Engine: Apache Tomcat/8.5.38
19-Oct-2022 14:29:38.641 INFO [localhost-startStop-1] org.apache.jasper.servlet.TldScanner.scanJars At least one JAR was scanned for TLDs yet contained no TLDs. Enable debug logging for this logger for a complete list of JARs that were scanned but no TLDs were found in them. Skipping unneeded JARs during scanning can improve startup time and JSP compilation time.
# 中间有业务日志
19-Oct-2022 14:34:25.888 INFO [main] org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler ["http-nio-21002"]
19-Oct-2022 14:34:25.894 INFO [main] org.apache.catalina.startup.Catalina.start Server startup in 289097 ms

Initializing ProtocolHandler [“http-nio-21002”] -> spring初始化 -> Starting ProtocolHandler [“http-nio-21002”]

main vs localhost-startStop

从上面的日志可以看到，bind&listen和accept的初始化都是在main线程中，其他操作是在localhost-startStop-1线程中（RMI TCP这个估计跟idea有关系，暂且搁置）。

main thread就是tomcat的主线程，tomcat在启动Context/Engine/Host/Wrapper等组件时，会丢到startStopExectutor中进行，最终阻塞等待所有结果返回，如下代码所示：

// org.apache.catalina.core.ContainerBase#initInternal
@Override
protected void initInternal() throws LifecycleException {
  BlockingQueue startStopQueue = new LinkedBlockingQueue<>();
  startStopExecutor = new ThreadPoolExecutor(
    getStartStopThreadsInternal(),
    getStartStopThreadsInternal(), 10, TimeUnit.SECONDS,
    startStopQueue,
    // 就是这个线程
    new StartStopThreadFactory(getName() + "-startStop-"));
  startStopExecutor.allowCoreThreadTimeOut(true);
  super.initInternal();
}

// org.apache.catalina.core.ContainerBase#startInternal
 // Start our child containers, if any
Container children[] = findChildren();
List> results = new ArrayList<>();
for (Container child : children) {
  // 提交startChild的任务
  results.add(startStopExecutor.submit(new StartChild(child)));
}

for (Future result : results) {
  try {
    // 等待每个child启动完成
    result.get();
  } catch (Throwable e) {
    log.error(sm.getString("containerBase.threadedStartFailed"), e);
    if (multiThrowable == null) {
      multiThrowable = new MultiThrowable();
    }
    multiThrowable.add(e);
  }

}

正常启动时，spring就是由startStopExecutor的线程拉起的，梳理tomcat组件之间的启动顺序可以发现是这样的：

service有多个组件，包含engine和connector。start时，engine的调用顺序在connector前面。

engine后续会负责servlet容器的初始化，从而触发spring的初始化。虽然是在线程池中异步初始化的，但是会一直等待子组件初始化完成，再返回。

connector会触发endpoint的初始化，最终触发Acceptor的初始化。

所以默认的servlet初始化应该是在accept之前，从本地的测试日志也可以看出来：

19-Oct-2022 14:51:38.296 INFO [localhost-startStop-1] org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web application directory [/Users/qishengli/Downloads/apache-tomcat-8.5.66/webapps/manager] has finished in [42] ms

本地测试的结果不对，应该是idea使用了RMI调用，在启动结束之后，添加了Context。

总结

bind& listen过程
catalina.load -> StandardServer.initInternal -> StandardService.initInternal -> Connector.initInternal -> AbstractProtocol.init -> AbstractEndpoint.init -> NioEndpoint#bind
accept过程
Catalina.start -> StandardServer.startInternal -> StandardService.startInternal -> Connector.startInternal -> AbstractProtocol.start -> AbstractEndpoint.start -> NioEndpoint.startInternal -> Acceptor线程启动可以accept
默认的context listener、filter、servlet都是在bind之后，accept之前初始化的
init的时候是否bind&listen，可以通过bindOnInit参数控制，默认是true
bind&listen和accept之间，穿插了spring的初始化，这段时间应用层不会处理连接。探活（容器启动之后）过来的大量连接都堆积在全连接队列中，最终造成队列溢出，出现listenDrop的现象。
bindOnInit修改为false之后，可以避免发布时大量的listenDrop问题

参考

TCP SYN Queue and Accept Queue Overflow Explained - Alibaba Cloud Community

tomcat配置connectionTimeout

2022-10-18T15:32:15.000Z

The number of milliseconds this Connector will wait, after accepting a connection, for the request URI line to be presented. Use a value of -1 to indicate no (i.e. infinite) timeout. The default value is 60000 (i.e. 60 seconds) but note that the standard server.xml that ships with Tomcat sets this to 20000 (i.e. 20 seconds). Unless disableUploadTimeout is set to false, this timeout will also be used when reading the request body (if any).

从连接被accept之后，到request line出现的超时时间，单位是毫秒。

验证

telnet

使用telnet连接上tomcat的端口，然后不发送请求，等待超时。得到如下结果，耗时大概20s左右。

➜  qsli.github.com (hexo|✚23…) time telnet localhost 8087
Trying ::1...
Connected to localhost.
Escape character is '^]'.
Connection closed by foreign host.
telnet localhost 8087  0.01s user 0.01s system 0% cpu 20.064 total

arthas

使用arthas查看mbean，找到超时的配置：

[arthas@77045]$ mbean Catalina:type=Connector,port=8087
 OBJECT_NAME               Catalina:type=Connector,port=8087
------------------------------------------------------------------------------------------------------------------------
 NAME                      VALUE
------------------------------------------------------------------------------------------------------------------------
 modelerType               null
 maxPostSize               2097152
 proxyName                 null
 scheme                    http
 className                 null
 acceptCount               100
 secret                    Unavailable
 secure                    false
 threadPriority            -1
 maxSwallowSize            2097152
 ajpFlush                  null
 maxSavePostSize           4096
 proxyPort                 0
 sslProtocols              null
 protocol                  HTTP/1.1
 maxParameterCount         10000
 useIPVHosts               false
 stateName                 STARTED
 redirectPort              8443
 allowTrace                false
 ciphers                   HIGH:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!kRSA
 protocolHandlerClassName  org.apache.coyote.http11.Http11NioProtocol
 maxThreads                -1
 connectionTimeout         20000
 tcpNoDelay                true
 useBodyEncodingForURI     false
 connectionLinger          -1
 processorCache            200
 keepAliveTimeout          20000
 maxKeepAliveRequests      100
 address                   null
 localPort                 8087
 enableLookups             false
 packetSize                null
 URIEncoding               UTF-8
 minSpareThreads           -1
 executorName              tomcatThreadPool
 ciphersUsed               null
 maxHeaderCount            100
 port                      8087
 xpoweredBy                false

connectionTimeout 20000

配置的是20s

代码

实际是socket timeout

// org.apache.coyote.AbstractProtocol#getConnectionTimeout
 /*
     * When Tomcat expects data from the client, this is the time Tomcat will
     * wait for that data to arrive before closing the connection.
     */
public int getConnectionTimeout() {
  // Note that the endpoint uses the alternative name
  return endpoint.getSoTimeout();
}
public void setConnectionTimeout(int timeout) {
  // Note that the endpoint uses the alternative name
  endpoint.setSoTimeout(timeout);
}

// org.apache.tomcat.util.net.AbstractEndpoint#setSoTimeout
public void setSoTimeout(int soTimeout) { socketProperties.setSoTimeout(soTimeout); }

使用的地方，设置在了NioSocketWrapper的ReadTimeout和WriteTimeout上：

//org.apache.tomcat.util.net.NioEndpoint.Poller#register
 /**
         * Registers a newly created socket with the poller.
         *
         * @param socket    The newly created socket
         */
public void register(final NioChannel socket) {
  socket.setPoller(this);
  NioSocketWrapper ka = new NioSocketWrapper(socket, NioEndpoint.this);
  socket.setSocketWrapper(ka);
  ka.setPoller(this);
  // 这里，read/write timeout
  ka.setReadTimeout(getSocketProperties().getSoTimeout());
  ka.setWriteTimeout(getSocketProperties().getSoTimeout());
  ka.setKeepAliveLeft(NioEndpoint.this.getMaxKeepAliveRequests());
  ka.setSecure(isSSLEnabled());
  ka.setReadTimeout(getSoTimeout());
  ka.setWriteTimeout(getSoTimeout());
  PollerEvent r = eventCache.pop();
  ka.interestOps(SelectionKey.OP_READ);//this is what OP_REGISTER turns into.
  if ( r==null) r = new PollerEvent(socket,ka,OP_REGISTER);
  else r.reset(socket,ka,OP_REGISTER);
  addEvent(r);
}

Poller线程中会check这个key是否过期，并不是每次都check，而是有一定的策略：

However, do process timeouts if any of the following are true:
the selector simply timed out (suggests there isn’t much load)
the nextExpiration time has passed
the server socket is being closed

// org.apache.tomcat.util.net.NioEndpoint.Poller#timeout
protected void timeout(int keyCount, boolean hasEvents) {
  long now = System.currentTimeMillis();
  // This method is called on every loop of the Poller. Don't process
  // timeouts on every loop of the Poller since that would create too
  // much load and timeouts can afford to wait a few seconds.
  // However, do process timeouts if any of the following are true:
  // - the selector simply timed out (suggests there isn't much load)
  // - the nextExpiration time has passed
  // - the server socket is being closed
  if (nextExpiration > 0 && (keyCount > 0 || hasEvents) && (now < nextExpiration) && !close) {
    return;
  }
  //timeout
  int keycount = 0;
  try {
    for (SelectionKey key : selector.keys()) {
      keycount++;
      try {
        NioSocketWrapper ka = (NioSocketWrapper) key.attachment();
        if ( ka == null ) {
          cancelledKey(key); //we don't support any keys without attachments
        } else if (close) {
          key.interestOps(0);
          ka.interestOps(0); //avoid duplicate stop calls
          processKey(key,ka);
        } else if ((ka.interestOps()&SelectionKey.OP_READ) == SelectionKey.OP_READ ||
                   (ka.interestOps()&SelectionKey.OP_WRITE) == SelectionKey.OP_WRITE) {
          boolean isTimedOut = false;
          // Check for read timeout
          // 读超时
          if ((ka.interestOps() & SelectionKey.OP_READ) == SelectionKey.OP_READ) {
            long delta = now - ka.getLastRead();
            long timeout = ka.getReadTimeout();
            isTimedOut = timeout > 0 && delta > timeout;
          }
          // Check for write timeout
          // 写超时
          if (!isTimedOut && (ka.interestOps() & SelectionKey.OP_WRITE) == SelectionKey.OP_WRITE) {
            long delta = now - ka.getLastWrite();
            long timeout = ka.getWriteTimeout();
            isTimedOut = timeout > 0 && delta > timeout;
          }
          // 超时之后处理
          if (isTimedOut) {
            key.interestOps(0);
            ka.interestOps(0); //avoid duplicate timeout calls
            ka.setError(new SocketTimeoutException());
            // 注意这里是SocketEvent.ERROR
            if (!processSocket(ka, SocketEvent.ERROR, true)) {
              cancelledKey(key);
            }
          }
        }
      }catch ( CancelledKeyException ckx ) {
        cancelledKey(key);
      }
    }//for
  } catch (ConcurrentModificationException cme) {
    // See https://bz.apache.org/bugzilla/show_bug.cgi?id=57943
    log.warn(sm.getString("endpoint.nio.timeoutCme"), cme);
  }
  long prevExp = nextExpiration; //for logging purposes only
  nextExpiration = System.currentTimeMillis() +
    socketProperties.getTimeoutInterval();
  if (log.isTraceEnabled()) {
    log.trace("timeout completed: keys processed=" + keycount +
              "; now=" + now + "; nextExpiration=" + prevExp +
              "; keyCount=" + keyCount + "; hasEvents=" + hasEvents +
              "; eval=" + ((now < prevExp) && (keyCount>0 || hasEvents) && (!close) ));
  }
}

使用arthas观察下超时时服务端的处理：

`---ts=2022-10-18 23:38:39;thread_name=http-nio-8087-ClientPoller-0;id=1c;is_daemon=true;priority=5;TCCL=java.net.URLClassLoader@470e2030
    `---[0.269583ms] org.apache.tomcat.util.net.NioEndpoint$Poller:timeout()
        +---[2.06% 0.005542ms ] java.lang.System:currentTimeMillis() #1004
        +---[0.97% 0.002625ms ] java.nio.channels.Selector:keys() #1018
        +---[1.27% 0.003417ms ] java.util.Set:iterator() #1018
        +---[1.64% min=0.00175ms,max=0.002666ms,total=0.004416ms,count=2] java.util.Iterator:hasNext() #1018
        +---[1.04% 0.002791ms ] java.util.Iterator:next() #1018
        +---[0.91% 0.002458ms ] java.nio.channels.SelectionKey:attachment() #1021
        +---[0.99% 0.002666ms ] org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper:interestOps() #1028
        +---[0.60% 0.001625ms ] org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper:interestOps() #1032
        +---[0.74% 0.002ms ] org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper:getLastRead() #1033
        +---[0.85% 0.002291ms ] org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper:getReadTimeout() #1034
        +---[18.01% 0.048541ms ] java.nio.channels.SelectionKey:interestOps() #1044
        +---[2.74% 0.007375ms ] org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper:interestOps() #1045
        +---[7.31% 0.019708ms ] java.net.SocketTimeoutException:() #1046
        +---[3.38% 0.009125ms ] org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper:setError() #1046
        +---[19.18% 0.051709ms ] org.apache.tomcat.util.net.NioEndpoint:processSocket() #1047
        +---[0.79% 0.002125ms ] java.lang.System:currentTimeMillis() #1061
        +---[0.88% 0.002375ms ] org.apache.tomcat.util.net.SocketProperties:getTimeoutInterval() #1061
        +---[0.88% 0.002375ms ] org.apache.tomcat.util.net.NioEndpoint:access$100() #1063
        `---[1.19% 0.003208ms ] org.apache.juli.logging.Log:isTraceEnabled() #1063

超时之后，new了一个SocketTimeoutException，交给processSocket进行处理：

[arthas@22174]$ trace org.apache.tomcat.util.net.AbstractEndpoint processSocket -v -n 5 --skipJDKMethod false '1==1'
Press Q or Ctrl+C to abort.
Affect(class count: 3 , method count: 1) cost in 123 ms, listenerId: 4
Condition express: 1==1 , result: true
`---ts=2022-10-18 23:41:44;thread_name=http-nio-8087-ClientPoller-1;id=1d;is_daemon=true;priority=5;TCCL=java.net.URLClassLoader@470e2030
    `---[0.355583ms] org.apache.tomcat.util.net.AbstractEndpoint:processSocket()
        +---[5.44% 0.019334ms ] org.apache.tomcat.util.collections.SynchronizedStack:pop() #1040
        +---[8.26% 0.029375ms ] org.apache.tomcat.util.net.SocketProcessorBase:reset() #1044
        +---[4.59% 0.016333ms ] org.apache.tomcat.util.net.AbstractEndpoint:getExecutor() #1046
        `---[25.59% 0.091ms ] java.util.concurrent.Executor:execute() #1048

接着跟进，看看是哪里退出的：

[arthas@22174]$ trace org.apache.tomcat.util.net.NioEndpoint$SocketProcessor doRun -v -n 5 --skipJDKMethod false '1==1'
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 140 ms, listenerId: 5
Condition express: 1==1 , result: true
`---ts=2022-10-18 23:44:40;thread_name=catalina-exec-2;id=1a;is_daemon=true;priority=5;TCCL=java.net.URLClassLoader@470e2030
    `---[1.595584ms] org.apache.tomcat.util.net.NioEndpoint$SocketProcessor:doRun()
        +---[1.79% 0.028584ms ] org.apache.tomcat.util.net.SocketWrapperBase:getSocket() #1430
        +---[1.14% 0.018125ms ] org.apache.tomcat.util.net.NioChannel:getIOChannel() #1431
        +---[1.92% 0.030625ms ] org.apache.tomcat.util.net.NioChannel:getPoller() #1431
        +---[1.19% 0.018916ms ] org.apache.tomcat.util.net.NioEndpoint$Poller:getSelector() #1431
        +---[30.62% 0.488542ms ] java.nio.channels.SocketChannel:keyFor() #1431
        +---[0.91% 0.0145ms ] org.apache.tomcat.util.net.NioChannel:isHandshakeComplete() #1438
        +---[1.07% 0.017ms ] org.apache.tomcat.util.net.NioEndpoint:getHandler() #1471
        +---[2.11% 0.033708ms ] org.apache.tomcat.util.net.AbstractEndpoint$Handler:process() #1471
        +---[24.60% 0.3925ms ] org.apache.tomcat.util.net.NioEndpoint:access$500() #1474
        `---[1.11% 0.017667ms ] org.apache.tomcat.util.collections.SynchronizedStack:push() #1495
        
[arthas@22174]$ trace org.apache.coyote.AbstractProtocol$ConnectionHandler process -v -n 5 --skipJDKMethod false '1==1'
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 659 ms, listenerId: 7
Condition express: 1==1 , result: true
`---ts=2022-10-18 23:49:21;thread_name=catalina-exec-2;id=1a;is_daemon=true;priority=5;TCCL=java.net.URLClassLoader@470e2030
    `---[0.475083ms] org.apache.coyote.AbstractProtocol$ConnectionHandler:process()
        +---[22.07% 0.104833ms ] org.apache.coyote.AbstractProtocol$ConnectionHandler:getLog() #705
        +---[5.65% 0.026833ms ] org.apache.juli.logging.Log:isDebugEnabled() #705
        +---[2.91% 0.013834ms ] org.apache.tomcat.util.net.SocketWrapperBase:getSocket() #714
        +---[3.99% 0.018959ms ] java.util.Map:get() #716
        +---[1.16% 0.0055ms ] org.apache.coyote.AbstractProtocol$ConnectionHandler:getLog() #717
        `---[0.84% 0.004ms ] org.apache.juli.logging.Log:isDebugEnabled() #717

由于传入的是SocketEvent.ERROR，在ConnectionHandler中就直接返回了：

// org.apache.coyote.AbstractProtocol.ConnectionHandler#process
@Override
public SocketState process(SocketWrapperBase wrapper, SocketEvent status) {
  if (getLog().isDebugEnabled()) {
    getLog().debug(sm.getString("abstractConnectionHandler.process",
                                wrapper.getSocket(), status));
  }
  if (wrapper == null) {
    // Nothing to do. Socket has been closed.
    return SocketState.CLOSED;
  }

  S socket = wrapper.getSocket();

  Processor processor = connections.get(socket);
  if (processor != null) {
    // Make sure an async timeout doesn't fire
    getProtocol().removeWaitingProcessor(processor);
  } else if (status == SocketEvent.DISCONNECT || status == SocketEvent.ERROR) {
    // Nothing to do. Endpoint requested a close and there is no
    // longer a processor associated with this socket.
    // 走到这里了，最终决定将连接关闭
    return SocketState.CLOSED;
  }

总结

connectionTimeout文档上说是连接accept之后，等待requestLine的超时时间，单位毫秒
从NIO的代码来看，被当做了read、write的超时
Poller线程在并不是每次都check超时，而是有一定的策略。在保证超时的语义下，尽量在load低的时候操作。
退出流程：
- Poller#timeout
- NioEndpoint$SocketProcessor#run [SocketEvent.ERROR]
- AbstractEndpoint$Handler#process [SocketState.CLOSED]

参考

Apache Tomcat 8 Configuration Reference (8.5.83) - The HTTP Connector

tomcat的线程池为什么不回落？

2022-10-05T16:25:21.000Z

现象

从监控上看，tomcat的线程busy的非常少，线程池使用率很低，但是线程池里的线程的个数却很多。

难道tomcat的线程池没有回落机制吗？

[arthas@22]$ mbean | grep -i thread
Catalina:type=ThreadPool,name="http-nio-22441"
java.lang:type=Threading
Catalina:type=ThreadPool,name="http-nio-22441",subType=SocketProperties
[arthas@22]$ mbean Catalina:type=ThreadPool,name=*
 OBJECT_NAME                       Catalina:type=ThreadPool,name="http-nio-22441"
----------------------------------------------------------------------------------
 NAME                              VALUE
----------------------------------------------------------------------------------
 currentThreadsBusy                2
 sslImplementationName             null
 paused                            false
 selectorTimeout                   1000
 modelerType                       org.apache.tomcat.util.net.NioEndpoint
 connectionCount                   46
 acceptCount                       2000
 threadPriority                    5
 executorTerminationTimeoutMillis  5000
 running                           true
 currentThreadCount                916
 sSLEnabled                        false
 sniParseLimit                     65536
 maxThreads                        2000
 sslImplementation                 null
 connectionTimeout                 2000
 tcpNoDelay                        true
 maxConnections                    20000
 connectionLinger                  -1
 keepAliveCount                    1
 keepAliveTimeout                  5000
 maxKeepAliveRequests              2000
 localPort                         22441
 deferAccept                       false
 useSendfile                       true
 acceptorThreadCount               1
 pollerThreadCount                 2
 daemon                            true
 minSpareThreads                   25
 useInheritedChannel               false
 alpnSupported                     false
 acceptorThreadPriority            5
 bindOnInit                        true
 pollerThreadPriority              5
 port                              22441
 domain                            Catalina
 name                              http-nio-22441
 defaultSSLHostConfigName          _default_

几个关键点：

currentThreadsBusy 2
currentThreadCount 916
maxThreads 2000
minSpareThreads 25

干活的线程只有2个，但是线程池里有916个线程？why？

多次观察，仍然是这个情况。

原因

mbean数据来源

先搞清楚mbean的数据来源。

// org.apache.tomcat.util.net.AbstractEndpoint#init
// Register endpoint (as ThreadPool - historical name)
oname = new ObjectName(domain + ":type=ThreadPool,name=\"" + getName() + "\"");
Registry.getRegistry(null, null).registerComponent(this, oname, null);

currentThreadBusy——当前有任务的线程个数

// org.apache.tomcat.util.net.AbstractEndpoint#getCurrentThreadsBusy
public int getCurrentThreadsBusy() {
  Executor executor = this.executor;
  if (executor != null) {
    if (executor instanceof ThreadPoolExecutor) {
      return ((ThreadPoolExecutor) executor).getActiveCount();
    } else if (executor instanceof ResizableExecutor) {
      return ((ResizableExecutor) executor).getActiveCount();
    } else {
      return -1;
    }
  } else {
    return -2;
  }
}

currentThreadCount——线程池中，当前线程个数

// org.apache.tomcat.util.net.AbstractEndpoint#getCurrentThreadCount
public int getCurrentThreadCount() {
  Executor executor = this.executor;
  if (executor != null) {
    if (executor instanceof ThreadPoolExecutor) {
      return ((ThreadPoolExecutor) executor).getPoolSize();
    } else if (executor instanceof ResizableExecutor) {
      return ((ResizableExecutor) executor).getPoolSize();
    } else {
      return -1;
    }
  } else {
    return -2;
  }
}

maxThreads——最大线程数

// org.apache.tomcat.util.net.AbstractEndpoint#getMaxThreads
public int getMaxThreads() {
  if (internalExecutor) {
    return maxThreads;
  } else {
    return -1;
  }
}

minSpareThreads——核心线程数

// org.apache.tomcat.util.net.AbstractEndpoint#getMinSpareThreads
public int getMinSpareThreads() {
  return Math.min(getMinSpareThreadsInternal(), getMaxThreads());
}
private int getMinSpareThreadsInternal() {
  if (internalExecutor) {
    return minSpareThreads;
  } else {
    return -1;
  }
}

默认线程池初始化逻辑：

// org.apache.tomcat.util.net.AbstractEndpoint#createExecutor
public void createExecutor() {
  // 使用内部线程池
  internalExecutor = true;
  TaskQueue taskqueue = new TaskQueue();
  TaskThreadFactory tf = new TaskThreadFactory(getName() + "-exec-", daemon, getThreadPriority());
  // 注意，这个ThreadPoolExecutor是tomcat自己魔改过的
  executor = new ThreadPoolExecutor(getMinSpareThreads(), getMaxThreads(), 60, TimeUnit.SECONDS,taskqueue, tf);
  taskqueue.setParent( (ThreadPoolExecutor) executor);
}

看到线程池的初始化，就会发现miniSpareThreads其实就是corePoolSize! 而且有一个写死的keepAliveTime 60s。而且任务队列是个无界的队列。

线程池的keepAliveTime

先看JDK中的注释：

@param keepAliveTime when the number of threads is greater than
the core, this is the maximum time that excess idle threads
will wait for new tasks before terminating.

简单来说，就是超过核心数的线程，如果等待keepAliveTime，还没有接到任务，就会被终止掉。

看一眼实现：

// java.util.concurrent.ThreadPoolExecutor#runWorker
try {
  // 注意，没有获取到task，这里循环也就结束了，走到线程退出的逻辑
  while (task != null || (task = getTask()) != null) {
    // 省略
    task.run();
  }
  completedAbruptly = false;
} finally {
  // 线程退出的一些清理工作
  processWorkerExit(w, completedAbruptly);
}

// 获取task的逻辑
// java.util.concurrent.ThreadPoolExecutor#getTask
for (;;) {
    // Are workers subject to culling?
    boolean timed = allowCoreThreadTimeOut || wc > corePoolSize;

    // 如果允许timeout，而且timeout发生了，这里直接返回null，循环结束，线程的任务就结束了（退出）
    if ((wc > maximumPoolSize || (timed && timedOut))
        && (wc > 1 || workQueue.isEmpty())) {
      if (compareAndDecrementWorkerCount(c))
        return null;
      continue;
    }

    try {
      // 允许timeout（核心线程，或者worker count > 核心个数），则使用poll，而且timeout是keepAliveTime
      // 否则，走的是阻塞版本的take
      Runnable r = timed ?
        workQueue.poll(keepAliveTime, TimeUnit.NANOSECONDS) :
      workQueue.take();
      // poll到task，或者take到，可以直接返回
      if (r != null)
        return r;
      // 走到这里肯定是poll超时了
      timedOut = true;
    } catch (InterruptedException retry) {
      timedOut = false;
    }
}

从源码上看，这个keepAliveTime并没有什么问题。

ReentrantLock

有没有一种可能，task queue的poll是雨露均撒的？

When you have eliminated the impossible, whatever remains, however improbable, must be the truth.

tomcat使用的TaskQueue作为队列，继承自LinkedBlockingQueue。但是核心的poll逻辑，还是用的LinkedBlockingQueue:

// org.apache.tomcat.util.threads.TaskQueue#poll
@Override
public Runnable poll(long timeout, TimeUnit unit)
  throws InterruptedException {
  Runnable runnable = super.poll(timeout, unit);
  if (runnable == null && parent != null) {
    // the poll timed out, it gives an opportunity to stop the current
    // thread if needed to avoid memory leaks.
    parent.stopCurrentThreadIfNeeded();
  }
  return runnable;
}

//java.util.concurrent.LinkedBlockingQueue#poll(long, java.util.concurrent.TimeUnit)
public E poll(long timeout, TimeUnit unit) throws InterruptedException {
  E x = null;
  int c = -1;
  long nanos = unit.toNanos(timeout);
  final AtomicInteger count = this.count;
  final ReentrantLock takeLock = this.takeLock;
  // 锁范围开始
  takeLock.lockInterruptibly();
  try {
    while (count.get() == 0) {
      // 超时时间为0（没有设置超时，或者超时时间到了），则没有就直接返回
      if (nanos <= 0)
        return null;
      // 否则，放入ReentrantLock的条件队列，等待timeout时间
      nanos = notEmpty.awaitNanos(nanos);
    }
    // 此时count > 0，取出一个
    x = dequeue();
    // 减少计数
    c = count.getAndDecrement();
    // 如果还有，则通知条件队列里等待的线程
    if (c > 1)
      notEmpty.signal();
  } finally {
    // 锁范围结束
    takeLock.unlock();
  }
  // 因为poll走了一个，现在容量是capacity - 1，所以signalNotFull
  if (c == capacity)
    signalNotFull();
  return x;
}

核心就在takeLock和notEmpty上，takeLock是ReentrantLock默认非公平，notEmpty是takeLock的条件队列。

// java.util.concurrent.LinkedBlockingQueue

/** Lock held by take, poll, etc */
private final ReentrantLock takeLock = new ReentrantLock();

/** Wait queue for waiting takes */
private final Condition notEmpty = takeLock.newCondition();

ReentrantLock默认非公平的，底层基于AQS实现。公平和非公平的区别只是在首次抢锁的行为上，首次如果没有抢到，都是排队，然后按顺序解锁。

// java.util.concurrent.locks.ReentrantLock.Sync#nonfairTryAcquire
 /**
   * Performs non-fair tryLock.  tryAcquire is implemented in
   * subclasses, but both need nonfair try for trylock method.
   */
@ReservedStackAccess
final boolean nonfairTryAcquire(int acquires) {
  final Thread current = Thread.currentThread();
  int c = getState();
  if (c == 0) {
    // 因为是非公平，这里直接抢一次
    if (compareAndSetState(0, acquires)) {
      setExclusiveOwnerThread(current);
      return true;
    }
  }
  // 如果没有抢到，看看是不是自己已经获取（可重入）
  else if (current == getExclusiveOwnerThread()) {
    int nextc = c + acquires;
    if (nextc < 0) // overflow
      throw new Error("Maximum lock count exceeded");
    setState(nextc);
    return true;
  }
  // 最终抢失败，返回false
  return false;
}

qps比较低的场景下，锁的竞争并不激烈，大部分线程即使抢到了锁，也拿不到任务，只能在条件队列中。

// java.util.concurrent.locks.AbstractQueuedLongSynchronizer.ConditionObject#signal
 /**
         * Moves the longest-waiting thread, if one exists, from the
         * wait queue for this condition to the wait queue for the
         * owning lock.
         *
         * @throws IllegalMonitorStateException if {@link #isHeldExclusively}
         *         returns {@code false}
         */
public final void signal() {
  if (!isHeldExclusively())
    throw new IllegalMonitorStateException();
  Node first = firstWaiter;
  if (first != null)
    doSignal(first);
}

条件队列里是按排队的顺序（longest-waiting thread）去通知的，将条件队列里的wait node转移到锁的等待队列中，重新竞争锁。

此时竞争的对象很少，基本就是busy的线程+被notify唤醒的线程，因此大概率还是能抢到任务的。

实验

问题的根源在于如果task很少，大家会在notEmpty的Condition队列中排队；task来的时候，又是按顺序解锁，如果qps和keepAliveTime合适，在keepAliveTime时间内，每个worker线程都能有机会至少活得一个task，从而不会被回收掉。

顺序排队

maxThreads设置为10，打印每次处理的线程的名称，测试代码：

@Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
  LOGGER.error("thread is " + Thread.currentThread().getName());
  try {
    Thread.sleep(1_000);
  } catch (InterruptedException e) {
    e.printStackTrace();
  }
  resp.getWriter().write("Hello World! " + Thread.currentThread().getName());
}

串行curl 7次：

1	for i in `seq 1 10`; do curl "http://localhost:8087/web_war_exploded/hello" && echo -e '\n'; done;

输出：

➜  conf  for i in `seq 1 10`; do curl "http://localhost:8087/web_war_exploded/hello" && echo -e '\n'; done;
Hello World! http-nio-8087-exec-8

Hello World! http-nio-8087-exec-9

Hello World! http-nio-8087-exec-1

Hello World! http-nio-8087-exec-2

Hello World! http-nio-8087-exec-3

Hello World! http-nio-8087-exec-4

Hello World! http-nio-8087-exec-5

Hello World! http-nio-8087-exec-7

Hello World! http-nio-8087-exec-9

Hello World! http-nio-8087-exec-10

确实是类似round robin的形式来的

线程回落

tomcat默认的线程池，keepAliveTime是60s，修改maxThreads为10，minSpareThreads为3。

启动之后，mbean输出：

[arthas@98537]$ mbean Catalina:type=ThreadPool,name=*
 OBJECT_NAME                       Catalina:type=ThreadPool,name="http-nio-8087"
--------------------------------------------------------------------------------------------------------------------------------
 NAME                              VALUE
--------------------------------------------------------------------------------------------------------------------------------
 currentThreadsBusy                0
 running                           true
 currentThreadCount                3
 maxThreads                        10
 minSpareThreads                   3

跟设置一致，先来波高峰请求，创建出来10个worker（maxThreads）

1	for i in `seq 1 10`; do curl -s "http://localhost:8087/web_war_exploded/hello" & done;

此时mbean输出：

[arthas@98537]$ mbean Catalina:type=ThreadPool,name=*
 OBJECT_NAME                       Catalina:type=ThreadPool,name="http-nio-8087"
--------------------------------------------------------------------------------------------------------------------------------
 NAME                              VALUE
--------------------------------------------------------------------------------------------------------------------------------
 currentThreadsBusy                0
 running                           true
 currentThreadCount                10
 maxThreads                        10
 minSpareThreads                   3

currentThreadCount有10个了，等1min，然后再看：

[arthas@98537]$ mbean Catalina:type=ThreadPool,name=*
 OBJECT_NAME                       Catalina:type=ThreadPool,name="http-nio-8087"
--------------------------------------------------------------------------------------------------------------------------------
 NAME                              VALUE
--------------------------------------------------------------------------------------------------------------------------------
 currentThreadsBusy                0
 running                           true
 currentThreadCount                3
 maxThreads                        10
 minSpareThreads                   3

currentThreadCount已经回落到了3个（minSpareThreads）

线程不回落

线程不回落，只用保证每个线程1min内有一个task就行了。maxThreads是10，也就是10 qpm就行了。

先冲高

1	for i in `seq 1 10`; do curl -s "http://localhost:8087/web_war_exploded/hello" & done;

再维持10 qpm

1	for i in `seq 1 100000`; do curl -s "http://localhost:8087/web_war_exploded/hello" && echo "-n" && sleep 5; done;

代码里sleep了1s，加上curl的sleep 5s，一个请求6s，一分钟10个请求。此时再看mbean输出：

[arthas@98537]$ mbean Catalina:type=ThreadPool,name=*
 OBJECT_NAME                       Catalina:type=ThreadPool,name="http-nio-8087"
--------------------------------------------------------------------------------------------------------------------------------
 NAME                              VALUE
--------------------------------------------------------------------------------------------------------------------------------
 currentThreadsBusy                0
 running                           true
 currentThreadCount                10
 maxThreads                        10
 minSpareThreads                   3

一直是10，跟线上的现象一样，复现了线程不回落的情形。

修改sleep的时间，降低qpm，看看是否有部分回落：

1	for i in `seq 1 100000`; do curl -s "http://localhost:8087/web_war_exploded/hello" && echo "-n" && sleep 7; done;

逐渐回落至8个线程：

1 2	[arthas@98537]$ mbean Catalina:type=ThreadPool,name=* \| grep -i currentThreadCount currentThreadCount 8

解决方案

QPS的临界值是maxThreads / keepAliveTime，考虑上请求的处理时间，实际值可能稍微大一点。大于临界值则不会发生线程的回落，小于临界值会逐渐回落。

调整keepAliveTime

Tomcat使用默认的线程池，keepAliveTime是无法调整的，但是可以使用自定义的线程池，可以设置maxIdleTime（即keepAliveTime）。


<Executor name="tomcatThreadPool" namePrefix="catalina-exec-"
          maxThreads="10" minSpareThreads="3" maxIdleTime="10000"/>
<Connector executor="tomcatThreadPool"
           port="8080" protocol="HTTP/1.1"
           connectionTimeout="20000"
           redirectPort="8443" />

调整为10s之后，维持10qpm，很快就回落了：

1 2	[arthas@54257]$ mbean Catalina:type=ThreadPool,name=* \| grep -i currentThreadCount currentThreadCount 3

总结

tomcat的线程池使用TaskQueue控制请求的分发，poll的逻辑和父类LinkedBlockingQueue一致
LinkedBlockingQueue内部，如果没有task时，poll的线程都会在notEmpty的ReentrantLock的Condition队列中，按序排队
任务来时，signal操作是按队列里的顺序唤醒的，先入先出
qps > maxThreads / keepAliveTime，可以保证在keepAliveTime，每个线程都有机会获得task，从而避免被回收
tomcat默认的线程池，不支持设置keepAliveTime，可以使用自定义的线程池解决
JDK的线程池同样有这个问题，需要注意keepAliveTime的设置
频繁的线程切换，会导致频繁的上下文切换，对性能应该也有影响
对于线上的服务，一般会有探活机制，也是线程不回落的原因之一

参考

tomcat队列满了之后会发生什么？

2022-10-05T08:08:22.000Z

tomcat线程池满了之后，请求会堆积在队列里。队列满了之后会发生什么？

队列长度

首先需要看下队列长度，使用tomcat默认的线程池，采用的是无界队列：

// org.apache.tomcat.util.net.AbstractEndpoint#createExecutor
public void createExecutor() {
  internalExecutor = true;
  // 默认是无界的
  TaskQueue taskqueue = new TaskQueue();
  TaskThreadFactory tf = new TaskThreadFactory(getName() + "-exec-", daemon, getThreadPriority());
  executor = new ThreadPoolExecutor(getMinSpareThreads(), getMaxThreads(), 60, TimeUnit.SECONDS,taskqueue, tf);
  taskqueue.setParent( (ThreadPoolExecutor) executor);
}

好在可以自定义线程池：

<Executor name="tomcatThreadPool" namePrefix="catalina-exec-"
    maxThreads="7" minSpareThreads="4" maxQueueSize="3"/>

<Connector executor="tomcatThreadPool"
           port="8080" protocol="HTTP/1.1"
           connectionTimeout="20000"
           redirectPort="8443" />

此处可以设置maxQueueSize，这里设置为3

启动之后，使用arthas查看mbean：

[arthas@84145]$ mbean Catalina:type=Executor,name=tomcatThreadPool
 OBJECT_NAME              Catalina:type=Executor,name=tomcatThreadPool
--------------------------------------------------------------------------
 NAME                     VALUE
--------------------------------------------------------------------------
 activeCount              1
 modelerType              org.apache.catalina.core.StandardThreadExecutor
 queueSize                0
 largestPoolSize          7
 poolSize                 4
 maxIdleTime              60000
 threadPriority           5
 daemon                   true
 minSpareThreads          4
 maxQueueSize             3
 stateName                STARTED
 namePrefix               catalina-exec-
 name                     tomcatThreadPool
 corePoolSize             4
 completedTaskCount       16
 maxThreads               7
 prestartminSpareThreads  false
 threadRenewalDelay       1000

maxQueueSize确实是3，maxThreads是7

构造队列满的场景

servlet代码，代码里直接sleep，占住tomcat的线程：

@Override
    protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
      try {
        // 多睡一会儿
        Thread.sleep(1000_000);
      } catch (InterruptedException e) {
        e.printStackTrace();
      }
      resp.getWriter().write("Hello World!");
}

客户端直接curl，20个并发请求 > maxThreads + maxQueueSize = 7 + 3 = 10

1	for i in `seq 1 20`; do curl -v http://localhost:8080/web_war_exploded/hello &; done

看一眼tomcat的状态：

[arthas@84145]$ mbean Catalina:type=Executor,name=tomcatThreadPool
 OBJECT_NAME              Catalina:type=Executor,name=tomcatThreadPool
--------------------------------------------------------------------------
 NAME                     VALUE
--------------------------------------------------------------------------
 activeCount              7
 modelerType              org.apache.catalina.core.StandardThreadExecutor
 queueSize                3
 largestPoolSize          7
 poolSize                 7
 maxIdleTime              60000
 threadPriority           5
 daemon                   true
 minSpareThreads          4
 maxQueueSize             3
 stateName                STARTED
 namePrefix               catalina-exec-
 name                     tomcatThreadPool
 corePoolSize             4
 completedTaskCount       29
 maxThreads               7
 prestartminSpareThreads  false
 threadRenewalDelay       1000

queueSize 3已经达到了maxQueueSize。

此时我们再次curl，tomcat应该就会抛出队列满的异常：

➜  ~ curl  http://localhost:8080/web_war_exploded/hello  --trace-ascii -
== Info:   Trying 127.0.0.1:8080...
== Info: Connected to localhost (127.0.0.1) port 8080 (#0)
=> Send header, 100 bytes (0x64)
0000: GET /web_war_exploded/hello HTTP/1.1
0026: Host: localhost:8080
003c: User-Agent: curl/7.79.1
0055: Accept: */*
0062:
== Info: Recv failure: Connection reset by peer
== Info: Closing connection 0
curl: (56) Recv failure: Connection reset by peer

curl的连接直接别reset了，再看tomcat的日志：

java.util.concurrent.RejectedExecutionException: The executor's work queue is full
at org.apache.catalina.core.StandardThreadExecutor.execute(StandardThreadExecutor.java:179)
at org.apache.tomcat.util.net.AbstractEndpoint.processSocket(AbstractEndpoint.java:1105)
at org.apache.tomcat.util.net.NioEndpoint$Poller.processKey(NioEndpoint.java:896)
at org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:872)
at java.lang.Thread.run(Thread.java:750)

提交任务到线程池失败之后，tomcat会cancel掉这个key：

[arthas@84145]$ stack org.apache.tomcat.util.net.NioEndpoint$Poller cancelledKey  -n 5
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 696 ms, listenerId: 2
ts=2022-09-30 14:04:45;thread_name=http-nio-8080-ClientPoller-0;id=1c;is_daemon=true;priority=5;TCCL=java.net.URLClassLoader@123772c4
    @org.apache.tomcat.util.net.NioEndpoint$Poller.cancelledKey()
        at org.apache.tomcat.util.net.NioEndpoint$Poller.processKey(NioEndpoint.java:906)
        at org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:872)
        at java.lang.Thread.run(Thread.java:750)
        
[arthas@84145]$ trace org.apache.tomcat.util.net.NioEndpoint$Poller cancelledKey  -n 5 --skipJDKMethod false
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 473 ms, listenerId: 4
`---ts=2022-09-30 14:12:30;thread_name=http-nio-8080-ClientPoller-0;id=1c;is_daemon=true;priority=5;TCCL=java.net.URLClassLoader@123772c4
    `---[0.508941ms] org.apache.tomcat.util.net.NioEndpoint$Poller:cancelledKey()
        +---[3.74% 0.01904ms ] java.nio.channels.SelectionKey:attach() #765
        +---[1.98% 0.010098ms ] org.apache.tomcat.util.net.NioEndpoint:getHandler() #769
        +---[4.51% 0.022943ms ] org.apache.tomcat.util.net.AbstractEndpoint$Handler:release() #769
        +---[1.04% 0.005282ms ] java.nio.channels.SelectionKey:isValid() #771
        +---[2.64% 0.013427ms ] java.nio.channels.SelectionKey:cancel() #771
        +---[1.75% 0.008886ms ] org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper:getSocket() #778
        +---[10.72% 0.054534ms ] org.apache.tomcat.util.net.NioChannel:close() #778
        +---[2.54% 0.01293ms ] java.nio.channels.SelectionKey:channel() #788
        +---[1.82% 0.009248ms ] java.nio.channels.SelectableChannel:isOpen() #788
        +---[5.49% 0.027926ms ] org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper:getSendfileData() #799
        +---[7.03% 0.035797ms ] org.apache.tomcat.util.net.NioEndpoint:countDownConnection() #807
        `---[2.58% 0.01312ms ] org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper:access$202() #808

实验结果分析

tomcat，线程池满了之后，观察到的现象：

新的http请求，会得到Connection reset by peer，无法正常进行
tomcat日志中会有work queue is full的异常

work queue is full

代码位置：

Executes the given command at some time in the future. The command may execute in a new thread, in a pooled thread, or in the calling thread, at the discretion of the Executor implementation.
If no threads are available, it will be added to the work queue.
If the work queue is full, the system will wait for the specified time and it throw a RejectedExecutionException if the queue is still full after that.

// org.apache.tomcat.util.threads.ThreadPoolExecutor#execute(java.lang.Runnable, long, java.util.concurrent.TimeUnit)
// @deprecated This will be removed in Tomcat 10.1.x onwards
 @Deprecated
public void execute(Runnable command, long timeout, TimeUnit unit) {
  submittedCount.incrementAndGet();
  try {
    executeInternal(command);
  } catch (RejectedExecutionException rx) {
    if (getQueue() instanceof TaskQueue) {
      // If the Executor is close to maximum pool size, concurrent
      // calls to execute() may result (due to Tomcat's use of
      // TaskQueue) in some tasks being rejected rather than queued.
      // If this happens, add them to the queue.
      final TaskQueue queue = (TaskQueue) getQueue();
      try {
        // 如果是TaskQueue，这里还会等一会儿，如果还是失败，再抛出异常
        if (!queue.force(command, timeout, unit)) {
          submittedCount.decrementAndGet();
          throw new RejectedExecutionException(sm.getString("threadPoolExecutor.queueFull"));
        }
      } catch (InterruptedException x) {
        submittedCount.decrementAndGet();
        throw new RejectedExecutionException(x);
      }
    } else {
      submittedCount.decrementAndGet();
      throw rx;
    }
  }
}

用arthas验证下，是否走到force：

[arthas@84145]$ stack org.apache.tomcat.util.threads.TaskQueue force  -n 5
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 2) cost in 82 ms, listenerId: 7
ts=2022-09-30 14:52:32;thread_name=http-nio-8080-ClientPoller-1;id=1d;is_daemon=true;priority=5;TCCL=java.net.URLClassLoader@123772c4
    @org.apache.tomcat.util.threads.TaskQueue.force()
        at org.apache.tomcat.util.threads.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:178)
        at org.apache.tomcat.util.threads.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:151)
        at org.apache.catalina.core.StandardThreadExecutor.execute(StandardThreadExecutor.java:175)
        at org.apache.tomcat.util.net.AbstractEndpoint.processSocket(AbstractEndpoint.java:1105)
        at org.apache.tomcat.util.net.NioEndpoint$Poller.processKey(NioEndpoint.java:896)
        at org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:872)
        at java.lang.Thread.run(Thread.java:750)
        
[arthas@84145]$ watch org.apache.tomcat.util.threads.TaskQueue force 'params'  -n 5  -x 1
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 2) cost in 45 ms, listenerId: 10
method=org.apache.tomcat.util.threads.TaskQueue.force location=AtExit
ts=2022-09-30 14:55:19; [cost=0.648436ms] result=@Object[][
    @SocketProcessor[org.apache.tomcat.util.net.NioEndpoint$SocketProcessor@5af3cff6],
    @Long[0],
    @[MILLISECONDS],
]

确实走到了force的逻辑，但是默认的timeout是0，0代表不等待。只是相当于多了一次尝试。

而且这个超时是无法配置的，对于http请求来说，功能相当于是废掉的。

AbstractEndpoint -> StandardThreadExecutor -> org.apache.tomcat.util.threads.ThreadPoolExecutor

// org.apache.tomcat.util.net.AbstractEndpoint#processSocket
// 这里用的Executor的接口来接的，没有传超时的地方，这里返回的就是StandardThreadExecutor
Executor executor = getExecutor();
if (dispatch && executor != null) {
  // 没地方传超时
  executor.execute(sc);
} else {
  sc.run();
}


// org.apache.catalina.core.StandardThreadExecutor#execute(java.lang.Runnable)
@Override
public void execute(Runnable command) {
  if (executor != null) {
    // Note any RejectedExecutionException due to the use of TaskQueue
    // will be handled by the o.a.t.u.threads.ThreadPoolExecutor
    // 没地方传超时
    executor.execute(command);
  } else {
    throw new IllegalStateException(sm.getString("standardThreadExecutor.notStarted"));
  }
}


// org.apache.tomcat.util.threads.ThreadPoolExecutor#execute(java.lang.Runnable)
@Override
public void execute(Runnable command) {
  // timeout 0
  execute(command,0,TimeUnit.MILLISECONDS);
}

Connection reset by peer

从提交线程池的地方，逆流而上：

// org.apache.tomcat.util.net.AbstractEndpoint#processSocket
public boolean processSocket(SocketWrapperBase socketWrapper,
                             SocketEvent event, boolean dispatch) {
  try {
   // 省略
    SocketProcessorBase sc = processorCache.pop();
    Executor executor = getExecutor();
    if (dispatch && executor != null) {
      executor.execute(sc);
    } else {
      sc.run();
    }
  } catch (RejectedExecutionException ree) {
    getLog().warn(sm.getString("endpoint.executor.fail", socketWrapper) , ree);
    return false;
  } catch (Throwable t) {
    ExceptionUtils.handleThrowable(t);
    // This means we got an OOM or similar creating a thread, or that
    // the pool and its queue are full
    getLog().error(sm.getString("endpoint.process.fail"), t);
    return false;
  }
  return true;
}

// org.apache.tomcat.util.net.NioEndpoint.Poller#processKey
 if (!processSocket(attachment, SocketEvent.OPEN_READ, true)) {
   closeSocket = true;
 }
  if (closeSocket) {
    cancelledKey(sk);
  }

// org.apache.tomcat.util.net.NioEndpoint.Poller#cancelledKey
// If attachment is non-null then there may be a current
// connection with an associated processor.
1. getHandler().release(ka);
2. key.cancel();
3. ka.getSocket().close(true);
4. countDownConnection();

AbstractEndpoint#processSocket 返回false -> NioEndpoint.Poller#cancelledKey，取消主要包含了4步：

【tomcat】getHandler().release(ka);
- 从当前处理的集合（connections）中移除
- 释放Http11Processor至对象池
【nio】key.cancel();
- 处理select的deregister逻辑
【tomcat】ka.getSocket().close(true);
- 关闭IOChannel对应的socket
- 关闭IOChannel
【tomcat】countDownConnection();
- LimitLatch计数减少

那么为啥是tcp reset呢？

socket接收缓冲区（Recv-Q）中的数据，未完全被应用程序读取时，关闭该socket会产生TCP Reset

Http协议的解析都是在worker线程中进行的，由于提交任务失败，这部分内容是没有读取的。因此在连接关闭时，TCP发现Receive Buffer中还有数据没有读取，因此给对端发送了Rest。

结论

默认的Executor的队列是无界队列，因此不会有队列满的情况
使用定制的Executor可以设置maxQueueSize
RejectedExecutionException之后，tomcat会立即重试一次提交（timeout是0）
重试之后，仍然失败，会走到cancelledKey的逻辑，关闭底层的连接
http协议的解析都是在worker线程池中进行的，由于提交任务失败，Receive Buffer里仍有数据
TCP协议在关闭连接时，发现Receive Buffer里仍有数据，给对端发送Reset

参考

tcp rst产生的几种情况 - 知乎

tomcat-stringcache

2022-08-07T11:08:35.000Z

StringCache是啥？

众所周知，http协议是文本协议，因此传输过程中的ByteChunk和CharChunk最终都会转为String。tomcat为了减少内存占用，减少对GC的影响，提出了StringCache的解决方案。

先看下StringCache的实现：

// org.apache.tomcat.util.buf.StringCache
/**
 * This class implements a String cache for ByteChunk and CharChunk.
 *
 * @author Remy Maucherat
 */
public class StringCache {
  /**
     * Statistics hash map for byte chunk.
     */
    protected static final HashMapint[]> bcStats =
            new HashMap<>(cacheSize);


    /**
     * toString count for byte chunk.
     */
    protected static int bcCount = 0;


    /**
     * Cache for byte chunk.
     */
    protected static ByteEntry[] bcCache = null;


    /**
     * Statistics hash map for char chunk.
     */
    protected static final HashMapint[]> ccStats =
            new HashMap<>(cacheSize);


    /**
     * toString count for char chunk.
     */
    protected static int ccCount = 0;


    /**
     * Cache for char chunk.
     */
    protected static CharEntry[] ccCache = null;


    /**
     * Access count.
     */
    protected static int accessCount = 0;


    /**
     * Hit count.
     */
    protected static int hitCount = 0;


    // ------------------------------------------------------------ Properties
}

StringCache包含两类，一类是ByteChunk转过来的，一类是CharChunk转过来的。底层的缓存逻辑是一致的，只是类型不同，我们只需关注一种即可。缓存使用数组实现，以ByteChunk为例，数组的类型是ByteEntry:

//org.apache.tomcat.util.buf.StringCache.ByteEntry
   // -------------------------------------------------- ByteEntry Inner Class
private static class ByteEntry {

  // 底层的byte数组
  private byte[] name = null;
  // String的字符集
  private Charset charset = null;
  // 对应的String实现
  private String value = null;

  @Override
  public String toString() {
    return value;
  }
  @Override
  public int hashCode() {
    return value.hashCode();
  }
  @Override
  public boolean equals(Object obj) {
    if (obj instanceof ByteEntry) {
      return value.equals(((ByteEntry) obj).value);
    }
    return false;
  }
}

这个类一目了然，这里不再赘述。当调用StringCache的toString方法时，会优先从cache中取。

// org.apache.tomcat.util.buf.StringCache#toString(org.apache.tomcat.util.buf.ByteChunk)
if (bcCache == null) {
  // 缓存维护逻辑，此处省略，后面会讲
} else {
  // 调用计数
  accessCount++;
  // Find the corresponding String
  // 二分查找
  String result = find(bc);
  if (result == null) {
    // 没有命中，直接走原来的逻辑
    return bc.toStringInternal();
  }
  // Note: We don't care about safety for the stats
  // 命中计数
  hitCount++;
  return result;
}

cache的查找使用的是二分法：

//org.apache.tomcat.util.buf.StringCache#findClosest(org.apache.tomcat.util.buf.ByteChunk, org.apache.tomcat.util.buf.StringCache.ByteEntry[], int)
/**
     * Find an entry given its name in a sorted array of map elements.
     * This will return the index for the closest inferior or equal item in the
     * given array.
     * @param name The name to find
     * @param array The array in which to look
     * @param len The effective length of the array
     * @return the position of the best match
     */
protected static final int findClosest(ByteChunk name, ByteEntry[] array,
                                       int len) {

  // 二分查找的low和high
  int a = 0;
  int b = len - 1;

  // Special cases: -1 and 0
  if (b == -1) {
    return -1;
  }

  if (compare(name, array[0].name) < 0) {
    return -1;
  }
  if (b == 0) {
    return 0;
  }
// 以上是特殊的case
  int i = 0;
  while (true) {
    // 取中间坐标，用位运算避免溢出风险
    i = (b + a) >>> 1;
    // compare的结果， -1, 0, 1
    int result = compare(name, array[i].name);
    // 在右侧，更新low
    if (result == 1) {
      a = i;
    } else if (result == 0) {
      // 正好查找到
      return i;
    } else {
      // 在左侧，缩减high
      b = i;
    }
    // 特殊情况
    if ((b - a) == 1) {
      int result2 = compare(name, array[b].name);
      if (result2 < 0) {
        return a;
      } else {
        return b;
      }
    }
  }

}

缓存维护

缓存的核心是缓存的维护。StringCache更像一个半成品，采用固定长度的缓存。

在启动初期，有一个训练的阈值，调用次数没有达到阈值之前，只会做stat；超过阈值之后，才会根据前面统计到的stat来构建cache。

// org.apache.tomcat.util.buf.StringCache#toString(org.apache.tomcat.util.buf.ByteChunk)
// If the cache is null, then either caching is disabled, or we're
// still training
if (bcCache == null) {
  // bcCache为空，1.在training阶段，2. cache被禁用了
  // 所以这里直接调用了对应的toString方法
  String value = bc.toStringInternal();
  // 缓存开关打开了，开始构建缓存的统计信息
  // 这里有个String上线的限制，有相应的bug：https://bz.apache.org/bugzilla/show_bug.cgi?id=41057
  if (byteEnabled && (value.length() < maxStringSize)) {
    // If training, everything is synced
    synchronized (bcStats) {
      // If the cache has been generated on a previous invocation
      // while waiting for the lock, just return the toString
      // value we just calculated
      // double checked lock, 在同步代码块中再次check
      if (bcCache != null) {
        return value;
      }
      // Two cases: either we just exceeded the train count, in
      // which case the cache must be created, or we just update
      // the count for the string
      // 超过训练阈值，构建cache逻辑
      if (bcCount > trainThreshold) {
        long t1 = System.currentTimeMillis();
        // Sort the entries according to occurrence
        // stats中每个item的出现次数
        TreeMap> tempMap =
          new TreeMap<>();
        for (Entryint[]> item : bcStats.entrySet()) {
          ByteEntry entry = item.getKey();
          int[] countA = item.getValue();
          Integer count = Integer.valueOf(countA[0]);
          // Add to the list for that count
          ArrayList list = tempMap.get(count);
          if (list == null) {
            // Create list
            list = new ArrayList<>();
            tempMap.put(count, list);
          }
          list.add(entry);
        }
        // Allocate array of the right size
        // 不能超过缓存的上限
        int size = bcStats.size();
        if (size > cacheSize) {
          size = cacheSize;
        }
        ByteEntry[] tempbcCache = new ByteEntry[size];
        // Fill it up using an alphabetical order
        // and a dumb insert sort
        ByteChunk tempChunk = new ByteChunk();
        int n = 0;
        while (n < size) {
          // TreeMap，这里取lastKey就是出现次数最多的
          Object key = tempMap.lastKey();
          ArrayList list = tempMap.get(key);
          // 出现次数并列的情况
          for (int i = 0; i < list.size() && n < size; i++) {
            ByteEntry entry = list.get(i);
            tempChunk.setBytes(entry.name, 0,
                               entry.name.length);
            // 二分查找，找到插入位置
            int insertPos = findClosest(tempChunk,
                                        tempbcCache, n);
            if (insertPos == n) {
              tempbcCache[n + 1] = entry;
            } else {
              System.arraycopy(tempbcCache, insertPos + 1,
                               tempbcCache, insertPos + 2,
                               n - insertPos - 1);
              tempbcCache[insertPos + 1] = entry;
            }
            n++;
          }
          // 删除掉已经处理的
          tempMap.remove(key);
        } // while loop
        bcCount = 0;
        // 构建完成，清理掉stat数据
        bcStats.clear();
        bcCache = tempbcCache;
        if (log.isDebugEnabled()) {
          long t2 = System.currentTimeMillis();
          log.debug("ByteCache generation time: " +
                    (t2 - t1) + "ms");
        }
      } else {
        // ----------------- 以下是收集训练数据的过程 -----------------
        bcCount++;
        // Allocate new ByteEntry for the lookup
        ByteEntry entry = new ByteEntry();
        entry.value = value;
        int[] count = bcStats.get(entry);
        if (count == null) {
          int end = bc.getEnd();
          int start = bc.getStart();
          // Create byte array and copy bytes
          entry.name = new byte[bc.getLength()];
          System.arraycopy(bc.getBuffer(), start, entry.name,
                           0, end - start);
          // Set encoding
          entry.charset = bc.getCharset();
          // Initialize occurrence count to one
          count = new int[1];
          count[0] = 1;
          // Set in the stats hash map
          bcStats.put(entry, count);
        } else {
          // 更新出现的次数
          count[0] = count[0] + 1;
        }
      }
    }
  }
  return value;
} else {
  // 走缓存的逻辑，这里忽略
}

tomcat相关开关

tomcat.util.buf.StringCache.cacheSize
- 缓存大小
- 默认200个entry
tomcat.util.buf.StringCache.byte.enabled
- ByteChunk缓存开关
- 默认开启
tomcat.util.buf.StringCache.char.enabled
- CharChunk缓存开关
- 默认关闭
tomcat.util.buf.StringCache.trainThreshold
- 采样次数的阈值
- 默认20000
tomcat.util.buf.StringCache.maxStringSize
- 缓存的String最大长度
- 这个是有人反馈之后才加上的，参加这个bug 41057 – Tomcat leaks memory on every request

性能影响

从源码角度看，这个缓存的开销主要有两部分：

缓存生成的开销（前期统计和缓存生成）
缓存使用的开销（底层是有序数组，使用二分法查找）

tomcat默认的缓存大小是200，但是这个ByteChunk非常底层，uri中的参数、postbody中的内容、header中的内容等都会使用到，很容易被污染。而且缓存的效果取决于启动初期的流量，如果是预热请求，收集到的采样数据可能不准确。

生产环境，通过观测，有些场景下，cpu开销约为1%，主要花费在二分查找上：

看看是啥？

dump出来内存，直接看StringCache存了什么：

使用arthas查看tomcat暴露出来的mbean信息

[arthas@96]$ mbean | grep -i StringCache
Catalina:type=StringCache
[arthas@96]$ mbean Catalina:type=StringCache
 OBJECT_NAME     Catalina:type=StringCache                                                                                                               
--------------------------------------------------------                                                                                                 
 NAME            VALUE                                                                                                                                   
--------------------------------------------------------                                                                                                 
 accessCount     2120845422                                                                                                                              
 modelerType     org.apache.tomcat.util.buf.StringCache                                                                                                  
 hitCount        1218278493                                                                                                                              
 cacheSize       200                                                                                                                                     
 trainThreshold  20000                                                                                                                                   
 charEnabled     false                                                                                                                                   
 byteEnabled     true

注意，计数存在溢出的情况。

参考

Java的反射慢吗？

2022-08-07T10:00:13.000Z

反射很慢？

有些人说反射很慢，但是也没有人真正地测试过。spring的代码里有好多使用反射的地方，所以性能应该也没有那么差。

本文就来挖一挖反射的实现原理以及可能导致的问题。

简单使用

简单地用反射的方式获取一个field的属性：

@RunWith(JUnit4ClassRunner.class)
public class ReflectTest {
  public int count = 10;
  public int getCount() {
    try {
      // 为了查看调用栈
      new RuntimeException().printStackTrace();
    } catch (Throwable ignore) {

    }
    return count;
  }

  private void setCount(int count) {
    this.count = count;
  }
  
  @Test
  @SneakyThrows
  public void testReflection() {
    Class clazz = Class.forName("com.air.lang.reflect.ReflectTest");
    Method getCountMethod = clazz.getDeclaredMethod("getCount", null);
    final Object instance = clazz.newInstance();
    final Object o = getCountMethod.invoke(instance);
    System.out.println("o = " + o);
  }
}

运行起来（-XX:+TraceClassLoading ），输出如下：

[Loaded sun.reflect.NativeMethodAccessorImpl from /Library/Java/JavaVirtualMachines/zulu-8.jdk/Contents/Home/jre/lib/rt.jar]
[Loaded sun.reflect.DelegatingMethodAccessorImpl from /Library/Java/JavaVirtualMachines/zulu-8.jdk/Contents/Home/jre/lib/rt.jar]
java.lang.RuntimeException
at com.air.lang.reflect.ReflectTest.getCount(ReflectTest.java:21)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.air.lang.reflect.ReflectTest.testReflection(ReflectTest.java:82)
// 下面是junit用反射调用这个方法的栈
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.internal.runners.TestMethod.invoke(TestMethod.java:68)
o = 10

从调用栈可以看到，Method的invoke的调用路径：

DelegatingMethodAccessorImpl -> NativeMethodAccessorImpl

翻下invoke的实现：

// java.lang.reflect.Method#invoke
@CallerSensitive
public Object invoke(Object obj, Object... args)
  throws IllegalAccessException, IllegalArgumentException,
InvocationTargetException
{
  if (!override) {
    if (!Reflection.quickCheckMemberAccess(clazz, modifiers)) {
      Class caller = Reflection.getCallerClass();
      checkAccess(caller, clazz, obj, modifiers);
    }
  }
  MethodAccessor ma = methodAccessor;             // read volatile
  if (ma == null) {
    ma = acquireMethodAccessor();
  }
  return ma.invoke(obj, args);
}

最终调用是委托给了MethodAccessor，这是java中的一个接口：

// sun.reflect.MethodAccessor

/** This interface provides the declaration for
    java.lang.reflect.Method.invoke(). Each Method object is
    configured with a (possibly dynamically-generated) class which
    implements this interface.
*/

public interface MethodAccessor {
    /** Matches specification in {@link java.lang.reflect.Method} */
    public Object invoke(Object obj, Object[] args)
        throws IllegalArgumentException, InvocationTargetException;
}

实现类有三个，DelegatingMethodAccessorImpl是代理模式，主要是为了切换底层的实现。因此主要的实现就两种，一个是MethodAccessorImpl，一个是NativeMethodAccessorImpl。

Delegates its invocation to another MethodAccessorImpl and can change its delegate at run time.

简单测试下时间

@SneakyThrows
public void testReflection() {
  Class clazz = Class.forName("com.air.lang.reflect.ReflectTest");
  Method getCountMethod = clazz.getDeclaredMethod("getCount", null);
  final Object instance = clazz.newInstance();
  for (int i = 0; i < 20; i++) {
    // 注意，这里是nano time
    final long start = System.nanoTime();
    final Object o = getCountMethod.invoke(instance);
    System.out.println(i + 1 + ": cost " + (System.nanoTime() - start));
  }
  System.in.read();
}

1: cost 19792
2: cost 3625
3: cost 2583
4: cost 2333
5: cost 3250
6: cost 4166
7: cost 2459
8: cost 27041
9: cost 7875
10: cost 7500
11: cost 8167
12: cost 7500
13: cost 7250
14: cost 7459
15: cost 7750
16: cost 1085417
17: cost 7208
18: cost 2500
19: cost 1917
20: cost 2292

注意看，第1次调用和第16次调用，时间都比较长。inflation的默认阈值是15，超过15之后就会转为动态字节码生成的方式，中间要生成字节码，所以耗时较高，之后耗时就降下来了。

两种实现方式

Before Java 1.4 Method.invoke worked through a JNI call to VM runtime.
Since Java 1.4 Method.invoke uses dynamic bytecode generation if a method is called more than 15 times (configurable via sun.reflect.inflationThreshold system property).

Java 1.4之前都是使用Native的方式调用，1.4之后，会根据调用的阈值做优化，超过一定的阈值-Dsun.reflect.inflationThreshold,会转换成dynamic bytecode generation的方式。dynamic bytecode generation的性能会更好。

Native实现

// sun.reflect.NativeMethodAccessorImpl#invoke
public Object invoke(Object obj, Object[] args)
        throws IllegalArgumentException, InvocationTargetException
{
  // We can't inflate methods belonging to vm-anonymous classes because
  // that kind of class can't be referred to by name, hence can't be
  // found from the generated bytecode.
  if (++numInvocations > ReflectionFactory.inflationThreshold()
      && !ReflectUtil.isVMAnonymousClass(method.getDeclaringClass())) {
    MethodAccessorImpl acc = (MethodAccessorImpl)
      // 超过阈值之后，会切换成动态字节码的方式
      // 注意，这里没有加锁
      new MethodAccessorGenerator().
      generateMethod(method.getDeclaringClass(),
                     method.getName(),
                     method.getParameterTypes(),
                     method.getReturnType(),
                     method.getExceptionTypes(),
                     method.getModifiers());
    // parent就是刚才说的代理DelegatingMethodAccessorImpl
    // 生成结束之后，这里切换成新的调用方式
    parent.setDelegate(acc);
  }

  // 这里是native方法
  return invoke0(method, obj, args);
}

void setParent(DelegatingMethodAccessorImpl parent) {
  this.parent = parent;
}


private static native Object invoke0(Method m, Object obj, Object[] args);

去jdk的代码里看看这个nativev方法的实现：

// NativeAccessors.c
JNIEXPORT jobject JNICALL Java_jdk_internal_reflect_NativeMethodAccessorImpl_invoke0
(JNIEnv *env, jclass unused, jobject m, jobject obj, jobjectArray args)
{
    return JVM_InvokeMethod(env, m, obj, args);
}

// jvm.cpp
JVM_ENTRY(jobject, JVM_InvokeMethod(JNIEnv *env, jobject method, jobject obj, jobjectArray args0))
  Handle method_handle;
  if (thread->stack_overflow_state()->stack_available((address) &method_handle) >= JVMInvokeMethodSlack) {
    method_handle = Handle(THREAD, JNIHandles::resolve(method));
    Handle receiver(THREAD, JNIHandles::resolve(obj));
    objArrayHandle args(THREAD, objArrayOop(JNIHandles::resolve(args0)));
    oop result = Reflection::invoke_method(method_handle(), receiver, args, CHECK_NULL);
    jobject res = JNIHandles::make_local(THREAD, result);
    if (JvmtiExport::should_post_vm_object_alloc()) {
      oop ret_type = java_lang_reflect_Method::return_type(method_handle());
      assert(ret_type != NULL, "sanity check: ret_type oop must not be NULL!");
      if (java_lang_Class::is_primitive(ret_type)) {
        // Only for primitive type vm allocates memory for java object.
        // See box() method.
        JvmtiExport::post_vm_object_alloc(thread, result);
      }
    }
    return res;
  } else {
    THROW_0(vmSymbols::java_lang_StackOverflowError());
  }
JVM_END
  
  
  
// reflection.cpp
  // This would be nicer if, say, java.lang.reflect.Method was a subclass
// of java.lang.reflect.Constructor

oop Reflection::invoke_method(oop method_mirror, Handle receiver, objArrayHandle args, TRAPS) {
  oop mirror             = java_lang_reflect_Method::clazz(method_mirror);
  int slot               = java_lang_reflect_Method::slot(method_mirror);
  bool override          = java_lang_reflect_Method::override(method_mirror) != 0;
  objArrayHandle ptypes(THREAD, objArrayOop(java_lang_reflect_Method::parameter_types(method_mirror)));

  oop return_type_mirror = java_lang_reflect_Method::return_type(method_mirror);
  BasicType rtype;
  if (java_lang_Class::is_primitive(return_type_mirror)) {
    rtype = basic_type_mirror_to_basic_type(return_type_mirror);
  } else {
    rtype = T_OBJECT;
  }

  InstanceKlass* klass = InstanceKlass::cast(java_lang_Class::as_Klass(mirror));
  Method* m = klass->method_with_idnum(slot);
  if (m == NULL) {
    THROW_MSG_0(vmSymbols::java_lang_InternalError(), "invoke");
  }
  methodHandle method(THREAD, m);

  return invoke(klass, method, receiver, override, ptypes, rtype, args, true, THREAD);
}

invoke方法就比较复杂了，这里就不跟进了，可以看到native的实现就是使用JNI调用，然后利用jvm内部的数据结构完成方法的调用。

dynamic bytecode generation

The approach with dynamic bytecode generation is much faster since it

does not suffer from JNI overhead;
does not need to parse method signature each time, because each method invoked via Reflection has its own unique MethodAccessor;
can be further optimized, e.g. these MethodAccessors can benefit from all regular JIT optimizations like inlining, constant propagation, autoboxing elimination etc.
Note, that this optimization is implemented mostly in Java code without JVM assistance. The only thing HotSpot VM does to make this optimization possible - is skipping bytecode verification for such generated MethodAccessors. Otherwise the verifier would not allow, for example, to call private methods.

稍微改造下代码：

@Test
@SneakyThrows
public void testReflection() {
  Class clazz = Class.forName("com.air.lang.reflect.ReflectTest");
  Method getCountMethod = clazz.getDeclaredMethod("getCount", null);
  final Object instance = clazz.newInstance();
  for (int i = 0; i < 20; i++) {
    final Object o = getCountMethod.invoke(instance);
    System.out.println("o = " + o);
  }
  // 阻塞退出，等待输入
  System.in.read();
}

程序跑起来之后，反复调用了20次，超过了默认的阈值，会自动生成字节码。

第一次输出的调用栈：

java.lang.RuntimeException
at com.air.lang.reflect.ReflectTest.getCount(ReflectTest.java:21)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.air.lang.reflect.ReflectTest.testReflection(ReflectTest.java:83)

最后一次输出的调用栈：

java.lang.RuntimeException
at com.air.lang.reflect.ReflectTest.getCount(ReflectTest.java:21)
at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.air.lang.reflect.ReflectTest.testReflection(ReflectTest.java:83)

调用栈发生了变化，从NativeMethodAccessorImpl变为了GeneratedMethodAccessor1

我们用arthas找下生成的字节码：

[arthas@45767]$ sc -d *GeneratedMethodAccessor1
 class-info        sun.reflect.GeneratedMethodAccessor1
 code-source
 name              sun.reflect.GeneratedMethodAccessor1
 isInterface       false
 isAnnotation      false
 isEnum            false
 isAnonymousClass  false
 isArray           false
 isLocalClass      false
 isMemberClass     false
 isPrimitive       false
 isSynthetic       false
 simple-name       GeneratedMethodAccessor1
 modifier          public
 annotation
 interfaces
 super-class       +-sun.reflect.MethodAccessorImpl
                     +-sun.reflect.MagicAccessorImpl
                       +-java.lang.Object
 class-loader      +-sun.reflect.DelegatingClassLoader@57fa26b7
                     +-sun.misc.Launcher$AppClassLoader@18b4aac2
                       +-sun.misc.Launcher$ExtClassLoader@6d3a7064
 classLoaderHash   57fa26b7

Affect(row-cnt:1) cost in 29 ms.

反编译下看看生成的类：

[arthas@45767]$ jad sun.reflect.GeneratedMethodAccessor1

ClassLoader:
+-sun.reflect.DelegatingClassLoader@57fa26b7
  +-sun.misc.Launcher$AppClassLoader@18b4aac2
    +-sun.misc.Launcher$ExtClassLoader@6d3a7064

Location:

/*
 * Decompiled with CFR.
 *
 * Could not load the following classes:
 *  com.air.lang.reflect.ReflectTest
 */
package sun.reflect;

import com.air.lang.reflect.ReflectTest;
import java.lang.reflect.InvocationTargetException;
import sun.reflect.MethodAccessorImpl;

public class GeneratedMethodAccessor1
extends MethodAccessorImpl {
    /*
     * Loose catch block
     */
    public Object invoke(Object object, Object[] objectArray) throws InvocationTargetException {
        ReflectTest reflectTest;
        block5: {
            if (object == null) {
                throw new NullPointerException();
            }
          // 这里直接强转了，把Object类型转成了目标类型ReflectTest
            reflectTest = (ReflectTest)object;
            if (objectArray == null || objectArray.length == 0) break block5;
            throw new IllegalArgumentException();
        }
        try {
          // 调用对应的方法
            return new Integer(reflectTest.getCount());
        }
        catch (Throwable throwable) {
            throw new InvocationTargetException(throwable);
        }
        catch (ClassCastException | NullPointerException runtimeException) {
            throw new IllegalArgumentException(super.toString());
        }
    }
}

Affect(row-cnt:1) cost in 537 ms.

可以看到，动态生成的字节码，跟直接方法调用差别并不是很大。值得注意的是，这个类的classloader是sun.reflect.DelegatingClassLoader.

DelegatingClassLoader

DelegatingClassLoader有何特殊之处？看代码也没有特殊的实现，应该只是为了做classloader隔离。

// sun.reflect.DelegatingClassLoader
// NOTE: this class's name and presence are known to the virtual
// machine as of the fix for 4474172.
class DelegatingClassLoader extends ClassLoader {
    DelegatingClassLoader(ClassLoader parent) {
        super(parent);
    }
}

之所以搞一个新的类加载器，是为了性能考虑，在某些情况下可以卸载这些生成的类，因为类的卸载是只有在类加载器可以被回收的情况下才会被回收的，如果用了原来的类加载器，那可能导致这些新创建的类一直无法被卸载，从其设计来看本身就不希望他们一直存在内存里的，在需要的时候有就行了，在内存紧俏的时候可以释放掉内存
——你假笨假笨说-从一起GC血案谈到反射原理

first, it avoids any possible security risk of having these bytecodes in the same loader.
Second, it allows the generated bytecodes to be unloaded earlier
than would otherwise be possible, decreasing run-time footprint.

// jdk.internal.reflect.ClassDefiner
/** Utility class which assists in calling defineClass() by
    creating a new class loader which delegates to the one needed in
    order for proper resolution of the given bytecodes to occur. */

class ClassDefiner {
    static final JavaLangAccess JLA = SharedSecrets.getJavaLangAccess();

    /**  We define generated code into a new class loader which
      delegates to the defining loader of the target class. It is
      necessary for the VM to be able to resolve references to the
      target class from the generated bytecodes, which could not occur
      if the generated code was loaded into the bootstrap class
      loader. 

       There are two primary reasons for creating a new loader
      instead of defining these bytecodes directly into the defining
      loader of the target class: first, it avoids any possible
      security risk of having these bytecodes in the same loader.
      Second, it allows the generated bytecodes to be unloaded earlier
      than would otherwise be possible, decreasing run-time
      footprint. 
    */
    static Class defineClass(String name, byte[] bytes, int off, int len,
                                final ClassLoader parentClassLoader)
    {
        ClassLoader newLoader = AccessController.doPrivileged(
            new PrivilegedAction() {
                public ClassLoader run() {
                        return new DelegatingClassLoader(parentClassLoader);
                    }
                });
        return JLA.defineClass(newLoader, name, bytes, null, "__ClassDefiner__");
    }
}

反射使用过多可能造成的问题

前面说到达到阈值，切换为动态字节码生成时没有加锁。而每次生成动态字节码，都会生成自己的类加载器。如果并发很高，会导致classloader和class过多，占用相应的内存。

参考

open-tracing

2022-07-07T16:19:21.000Z

Open-Tracing

现代微服务架构正在逐渐普及。面对真正高并发的生产系统，解耦成大量微服务后，以前容易实现的重点任务变得不容易实现了：用户体验优化、后台真实错误原因分析、系统内各组件的调用情况等。分布式跟踪系统（Zipkin、Dapper、HTrace、X-Trace等）可以解决这个问题，但是这些系统使用不兼容的API，难以整合到一起。
OpenTracing提供平台无关、厂商无关的API，让开发人员可以方便地添加、更换追踪系统。

相当于是在做标准化，类似日志中的SLF4j，目前还在发展中。

Trace概念

1、Trace(追踪):
在广义上，一个trace代表了一个事务或者流程在（分布式）系统中的执行过程。在OpenTracing标准中，trace是多个span组成的一个有向无环图（DAG），每一个span代表trace中被命名并计时的连续性的执行片段。

2、Span(跨度)：一个span代表系统中具有开始时间和执行时长的逻辑运行单元。span之间通过嵌套或者顺序排列建立逻辑因果关系。

TraceId作用

串起来一次请求

request-id

{
    "RequestId": "4C467B38-3910-447D-87BC-AC049166F216"
    /* 返回结果数据 */
}

第三方有问题反馈时，可以拿着这个id作为凭证，就省去了很多沟通的问题

[qisheng.li@YD-app-api-01 logs]$ curl -sI 'http://api2.yaduo.com/atourlife/duomicang/queryDuoMiCangTabOtherData?appVer=3.6.0&channelId=10005&platType=1&token=7254035f0e3e4d05bc7af3afb54f313e&deviceId=73519b32-c539-3c18-af4c-ce4523938bb9&activitySource=ydaandroid&activeId=&inactiveId='
HTTP/1.1 200
Date: Fri, 12 Mar 2021 06:19:56 GMT
Content-Type: application/json;charset=UTF-8
Content-Length: 2477
Connection: keep-alive
Set-Cookie: acw_tc=2760829916155299964998880ec4036c629fa0b9319095cdd9fffc150bc930;path=/;HttpOnly;Max-Age=1800
ZIPKIN-TRACE-ID: f39f5791988ff5b2

elk关联日志
幂等

OpenZipkin

Brave

Brave is a distributed tracing instrumentation library.
Brave’s dependency-free tracer library works against JRE6+.

可以简单理解为标准的实现（类比logback和log4j）

Trace上下文传递

 Client Tracer                                                  Server Tracer     
┌───────────────────────┐                                       ┌───────────────────────┐
│                       │                                       │                       │
│   TraceContext        │          Http Request Headers         │   TraceContext        │
│ ┌───────────────────┐ │         ┌───────────────────┐         │ ┌───────────────────┐ │
│ │ TraceId           │ │         │ X-B3-TraceId      │         │ │ TraceId           │ │
│ │                   │ │         │                   │         │ │                   │ │
│ │ ParentSpanId      │ │ Inject  │ X-B3-ParentSpanId │ Extract │ │ ParentSpanId      │ │
│ │                   ├─┼────────>│                   ├─────────┼>│                   │ │
│ │ SpanId            │ │         │ X-B3-SpanId       │         │ │ SpanId            │ │
│ │                   │ │         │                   │         │ │                   │ │
│ │ Sampling decision │ │         │ X-B3-Sampled      │         │ │ Sampling decision │ │
│ └───────────────────┘ │         └───────────────────┘         │ └───────────────────┘ │
│                       │                                       │                       │
└───────────────────────┘                                       └───────────────────────┘

http请求

2021-03-12 01:29:19.624 INFO [order-center,f211feedd7b9904e,9c4b9442005296fb,true] --- [o-9301-exec-131] http.request.response.log                :
ip: 192.168.6.214
POST http://192.168.6.215:9301/point/pay/query/list?
x-b3-spanid: 9c4b9442005296fb
x-b3-parentspanid: 5e901c4a1fb6be73
x-b3-sampled: 1
x-b3-traceid: f211feedd7b9904e
appcode: pms
content-type: application/json;charset=UTF-8
accept: */*
host: 192.168.6.215:9301
connection: Keep-Alive
user-agent: Apache-HttpClient/4.5.6 (Java/1.8.0_171)
accept-encoding: gzip,deflate
atour-time-out: 1000,20000
atour-proxyee-info: http://192.168.6.215:9301

{  "chainId" : 440319,  "folioIdList" : [ 2589101966 ]}

ret code 200, start time 1615483759621 --> end time 1615483759624, cost: 3

header中的x-b3开头的会自动传递下去

采样：

                                Server Tracer     
                              ┌───────────────────────┐
 Health check request         │                       │
┌───────────────────┐         │   TraceContext        │
│ GET /health       │ Extract │ ┌───────────────────┐ │
│ X-B3-Sampled: 0   ├─────────┼>│ NoOp              │ │
└───────────────────┘         │ └───────────────────┘ │
                              └───────────────────────┘

zipkin

上报

上报方式

@Bean
Tracing tracing(@Value("${spring.application.name}") String serviceName, @Value("${spring.zipkin.base-url:}") String zipkinServer) {
  Reporter reporter = Reporter.NOOP;
  if (StringUtils.isNotBlank(zipkinServer)) {
    reporter = AsyncReporter.builder(OkHttpSender.create(zipkinServer))
      .queuedMaxSpans(1000) // historical constraint. Note: AsyncReporter supports memory bounds
      .messageTimeout(1, TimeUnit.SECONDS)
      .metrics(ReporterMetrics.NOOP_METRICS)
      .build(SpanBytesEncoder.JSON_V2);
  }
  final SamplerProperties samplerProperties = new SamplerProperties();
  // 默认全采样
  samplerProperties.setProbability(1);
  return Tracing.newBuilder()
    .sampler(new ProbabilityBasedSampler(samplerProperties))
    .localServiceName(serviceName)
    .propagationFactory(ExtraFieldPropagation.newFactory(B3Propagation.FACTORY, "user-name"))
    .currentTraceContext(Slf4jCurrentTraceContext.create(ThreadLocalCurrentTraceContext.newBuilder()
                                                         .build()))
    .spanReporter(reporter)
    .build();
}

- 采样- Reporter- eureka支持

挂掉影响
- Zipkin 展示端
- zipkin存储
mysql
- 玩具
elastic-search
- 调优
  - translog
  - Refresh_interval
  - _id
- 保留几天
- 定时删除脚本
- elastic-search的template

系统接入

Spring-Cloud

Sleuth

Sleuth configures everything you need to get started. This includes where trace data (spans) are reported to, how many traces to keep (sampling), if remote fields (baggage) are sent, and which libraries are traced.
Spring Cloud Sleuth integrates with the OpenZipkin Brave tracer via the bridge that is available in the spring-cloud-sleuth-brave module.

baggage
- Request级别的日志debug开关
  @see Sleuth-debug-flag - Atour Wiki

@NewSpan
@SpanTag
@ContinueSpan

问题排查

参考

Zikin-server运维 - Atour Wiki
What is Distributed Tracing?
Spring Cloud Sleuth 2.0概要使用说明 - BTStream’s Blog
GitHub - spring-cloud/spring-cloud-sleuth: Distributed tracing for spring cloud
OpenTracing基本原理 - 知乎
openTracing文档中文版
GitHub - openzipkin/brave: Java distributed tracing implementation compatible with Zipkin backend services.
Sleuth-debug-flag - Atour Wiki
Introducing to Zipkin - Distribution Tracing - ITZone
OpenZipkin · A distributed tracing system
GitHub - openzipkin/b3-propagation: Repository that describes and sometimes implements B3 propagation
干货 | Qunar全链路跟踪及Debug
zipkin-Kibana
elk-Discover - Kibana&_a=(columns:!(_source),index:’01f5dec0-e772-11ea-9d81-e1017b1b6645’,interval:auto,query:(language:lucene,query:ee3ceac468425f6e),sort:!(‘@timestamp’,desc)))

tomcat-startup-2

2021-12-05T12:07:45.000Z

书接上回，我们从启动脚本跟踪到了Bootstrap类，发现它只是个传话筒，内部通过发射将调用都转给了Catalina，用官方的话来说就是roundabout approach（迂回战术），目的是为了不将tomcat的内部lib暴露给class path。

这篇文章，我们就分析下Catalina以及tomcat内部的关键组件的启动。

先看下tomcat的整体组件，按web.xml中的声明，主要包含Catalina、Server、Service、Connector、Engine、Host、Context、Wrapper等，以及图中没有画到的Valve、Listener等组件。

组件启动顺序：

Catalina

Startup/Shutdown shell program for Catalina.

Catalina提供了命令行参数的解析，持有Server对象，主要提供的功能：

start
- digester解析web.xml
- 调用Server的init方法
stop
- ShutdownHook
Configtest

从前面的分析我们知道，Bootstrap是通过反射直接调用的Catalina的start方法，start方法的实现如下：

// org.apache.catalina.startup.Catalina#start
    /**
     * Start a new server instance.
     */
    public void start() {

        if (getServer() == null) {
          // 首次会走到这里，负责加载web.xml，初始化对应的组件
            load();
        }

        if (getServer() == null) {
            log.fatal("Cannot start server. Server instance is not configured.");
            return;
        }

        long t1 = System.nanoTime();

        // Start the new server
        try {
          // 调用server的start方法
            getServer().start();
        } catch (LifecycleException e) {
            log.fatal(sm.getString("catalina.serverStartFail"), e);
            try {
                getServer().destroy();
            } catch (LifecycleException e1) {
                log.debug("destroy() failed for failed Server ", e1);
            }
            return;
        }

        long t2 = System.nanoTime();
        if(log.isInfoEnabled()) {
            log.info("Server startup in " + ((t2 - t1) / 1000000) + " ms");
        }

        // Register shutdown hook
        if (useShutdownHook) {
            if (shutdownHook == null) {
                shutdownHook = new CatalinaShutdownHook();
            }
            Runtime.getRuntime().addShutdownHook(shutdownHook);

            // If JULI is being used, disable JULI's shutdown hook since
            // shutdown hooks run in parallel and log messages may be lost
            // if JULI's hook completes before the CatalinaShutdownHook()
            LogManager logManager = LogManager.getLogManager();
            if (logManager instanceof ClassLoaderLogManager) {
                ((ClassLoaderLogManager) logManager).setUseShutdownHook(
                        false);
            }
        }

      // startup时Bootstrap会设置为true
      // 调用server的await，退出后调用自身的stop方法
        if (await) {
            await();
            stop();
        }
    }

load方法里就是解析web.xml的具体过程，这里就不赘述了，同时load方法里会调用server的init方法进行初始化，绑定Server所属的Catalina。

初始化之后，就直接调用了Server的start方法，触发其包含的组件的启动。然后这里还注册了Jvm的shutdownHook，关闭的时候也会调用Catalina的stop方法。

最后，调用server的await方法，等待Server的声明周期结束。

Server

Server是tomcat中比较重要的组件，默认实现是StandardServer。主要提供的功能：

管理Service组件
- addService
- removeService
- findService
shutdown端口监听
naming相关的功能
可以设置ParentClassLoader（后面讲类加载的时候，会统一讲）

Server实现了Lifecycle接口，我们着重关注下initInternal方法和startInternal方法。

initInternal

// org.apache.catalina.core.StandardServer#initInternal
@Override
    protected void initInternal() throws LifecycleException {

        super.initInternal();

        // Register global String cache
        // Note although the cache is global, if there are multiple Servers
        // present in the JVM (may happen when embedding) then the same cache
        // will be registered under multiple names
        onameStringCache = register(new StringCache(), "type=StringCache");

        // Register the MBeanFactory
        MBeanFactory factory = new MBeanFactory();
        factory.setContainer(this);
        onameMBeanFactory = register(factory, "type=MBeanFactory");

        // Register the naming resources
        globalNamingResources.init();

        // Populate the extension validator with JARs from common and shared
        // class loaders
       // 省略...
      
        // Initialize our defined Services
        for (int i = 0; i < services.length; i++) {
            services[i].init();
        }
    }

在前面的文章中，我们知道Server默认实现了LifecycleMbeanBase,会自动将自身暴露给Jmx，这里Server手动也额外地注册了个MBean的对象。然后初始化了Naming相关的东西，extension validator。最后也是最关键的，对Server中包含的所有的Service调用其init方法，触发其初始化。

startInternal

// org.apache.catalina.core.StandardServer#startInternal
    @Override
    protected void startInternal() throws LifecycleException {

        fireLifecycleEvent(CONFIGURE_START_EVENT, null);
        setState(LifecycleState.STARTING);

        globalNamingResources.start();

        // Start our defined Services
        synchronized (servicesLock) {
            for (int i = 0; i < services.length; i++) {
                services[i].start();
            }
        }
    }

这里除了基类默认触发的时间，这里也有自己定义的CONFIGURE_START_EVENT事件，然后触发naming相关的启动。最后，调用对应Service的start方法。

await

Catalina会调用Server的await，来等待Server结束服务。await的实现如下：

// org.apache.catalina.core.StandardServer#await
 /**
     * Wait until a proper shutdown command is received, then return.
     * This keeps the main thread alive - the thread pool listening for http
     * connections is daemon threads.
     */
    @Override
    public void await() {
        // Negative values - don't wait on port - tomcat is embedded or we just don't like ports
        if( port == -2 ) {
            // undocumented yet - for embedding apps that are around, alive.
            return;
        }
      // port没有定义的话，就直接没10s检查一次是否结束服务
      // 这里使用了变量awaitThread来标识结束，当然他是volatile的
        if( port==-1 ) {
            try {
                awaitThread = Thread.currentThread();
                while(!stopAwait) {
                    try {
                        Thread.sleep( 10000 );
                    } catch( InterruptedException ex ) {
                        // continue and check the flag
                    }
                }
            } finally {
                awaitThread = null;
            }
            return;
        }

      // 这里会启动一个Server，监听shutdown的端口，和发过来的命令
        // Set up a server socket to wait on
        try {
            awaitSocket = new ServerSocket(port, 1,
                    InetAddress.getByName(address));
        } catch (IOException e) {
            log.error("StandardServer.await: create[" + address
                               + ":" + port
                               + "]: ", e);
            return;
        }

        try {
            awaitThread = Thread.currentThread();

            // Loop waiting for a connection and a valid command
            while (!stopAwait) {
                ServerSocket serverSocket = awaitSocket;
                if (serverSocket == null) {
                    break;
                }

                // Wait for the next connection
                Socket socket = null;
                StringBuilder command = new StringBuilder();
                try {
                    InputStream stream;
                    long acceptStartTime = System.currentTimeMillis();
                    try {
                        socket = serverSocket.accept();
                        socket.setSoTimeout(10 * 1000);  // Ten seconds
                        stream = socket.getInputStream();
                    } catch (SocketTimeoutException ste) {
                        // This should never happen but bug 56684 suggests that
                        // it does.
                        log.warn(sm.getString("standardServer.accept.timeout",
                                Long.valueOf(System.currentTimeMillis() - acceptStartTime)), ste);
                        continue;
                    } catch (AccessControlException ace) {
                        log.warn("StandardServer.accept security exception: "
                                + ace.getMessage(), ace);
                        continue;
                    } catch (IOException e) {
                        if (stopAwait) {
                            // Wait was aborted with socket.close()
                            break;
                        }
                        log.error("StandardServer.await: accept: ", e);
                        break;
                    }

                    // Read a set of characters from the socket
                    int expected = 1024; // Cut off to avoid DoS attack
                    while (expected < shutdown.length()) {
                        if (random == null)
                            random = new Random();
                        expected += (random.nextInt() % 1024);
                    }
                    while (expected > 0) {
                        int ch = -1;
                        try {
                            ch = stream.read();
                        } catch (IOException e) {
                            log.warn("StandardServer.await: read: ", e);
                            ch = -1;
                        }
                        // Control character or EOF (-1) terminates loop
                        if (ch < 32 || ch == 127) {
                            break;
                        }
                        command.append((char) ch);
                        expected--;
                    }
                } finally {
                    // Close the socket now that we are done with it
                    try {
                        if (socket != null) {
                            socket.close();
                        }
                    } catch (IOException e) {
                        // Ignore
                    }
                }

                // Match against our command string
                boolean match = command.toString().equals(shutdown);
                if (match) {
                    log.info(sm.getString("standardServer.shutdownViaPort"));
                    break;
                } else
                    log.warn("StandardServer.await: Invalid command '"
                            + command.toString() + "' received");
            }
        } finally {
            ServerSocket serverSocket = awaitSocket;
            awaitThread = null;
            awaitSocket = null;

            // Close the server socket and return
            if (serverSocket != null) {
                try {
                    serverSocket.close();
                } catch (IOException e) {
                    // Ignore
                }
            }
        }
    }

配置了shutdown端口，会监听这个端口，如果发送过来的是SHUTDOWN的命令，就会调用

1	<Server port="8005" shutdown="SHUTDOWN">

测试下：

➜  bin  telnet localhost 8005
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
sdf
Connection closed by foreign host.

# 上面的命令不对，tomcat没有反应，这里还能连接8005端口
➜  bin  telnet localhost 8005
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
SHUTDOWN
Connection closed by foreign host.

# 此时tomcat已经被shutdown了
➜  bin  telnet localhost 8005
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused
telnet: Unable to connect to remote host

被shutdown的同时，会在Catalina.out中打印如下的日志：

1	27-Nov-2021 21:00:14.933 INFO [main] org.apache.catalina.core.StandardServer.await A valid shutdown command was received via the shutdown port. Stopping the Server instance.

如果下次，tomcat莫名奇妙shutdown了，可以考虑下是不是被人打接口导致的。

Service

A “Service” is a collection of one or more “Connectors” that share
a single “Container” Note: A “Service” is not itself a “Container”,
so you may not define subcomponents such as “Valves” at this level.

service的作用就是连接多个Connectors和一个Container。主要提供的功能：

管理Engine
- getContainer/setContainer
管理Connector组件
- addConnector
- findConnectors
- removeConnector
管理executor
- addExecutor
- findExecutors
- getExecutor
- removeExecutor
Mapper/MapperListener的管理

initInternal

init操作也是中规中矩，没有特殊操作，挨个调用被管理的Engine/Connector/Executor/MapperListener的init方法。

startInternal

同initInternal一样，调用子组件的start方法。

其他

Connector内部是数组存储的，每次修改操作会加锁：

/**
     * The set of Connectors associated with this Service.
     */
protected Connector connectors[] = new Connector[0];
private final Object connectorsLock = new Object();

 /**
     * Add a new Connector to the set of defined Connectors, and associate it
     * with this Service's Container.
     *
     * @param connector The Connector to be added
     */
    @Override
    public void addConnector(Connector connector) {

        synchronized (connectorsLock) {
          // 省略
        }
    }

重要属性变更时，会发出一个PropertyChangeEvent:

/**
 * The property change support for this component.
 */
protected final PropertyChangeSupport support = new PropertyChangeSupport(this);

// Report this property change to interested listeners
support.firePropertyChange("container", oldEngine, this.engine);

Engine

If used, an Engine is always the top level Container in a Catalina hierarchy.
It is useful in the following types of scenarios:
You wish to use Interceptors that see every single request processed
by the entire engine.
You wish to run Catalina in with a standalone HTTP connector, but still
want support for multiple virtual hosts.

Engine容器的子容器，必须是Host容器，而且他自身必须是top level的容器，也就是不能有parent 容器。Engine下可以配置Valve，可以拦截所有的请求。同时可以配置多个virtual host。

默认的实现是StandardEngine，StandardEngine继承了ContainerBase，ContainerBase实现了子容器的管理、以及ContainerListener的管理。

initInternal

Engine自身没有特殊的实现，逻辑都在ContainerBase中：

// org.apache.catalina.core.ContainerBase#initInternal
@Override
protected void initInternal() throws LifecycleException {
  BlockingQueue startStopQueue = new LinkedBlockingQueue<>();
  startStopExecutor = new ThreadPoolExecutor(
    getStartStopThreadsInternal(),
    getStartStopThreadsInternal(), 10, TimeUnit.SECONDS,
    startStopQueue,
    new StartStopThreadFactory(getName() + "-startStop-"));
  startStopExecutor.allowCoreThreadTimeOut(true);
  super.initInternal();
}

仅仅是初始化了一个startStopExecutor

startInternal

逻辑也在ContainerBase中：

// org.apache.catalina.core.ContainerBase#startInternal
    @Override
    protected synchronized void startInternal() throws LifecycleException {

        // Start our subordinate components, if any

        // Start our child containers, if any
        Container children[] = findChildren();
        List> results = new ArrayList<>();
        for (int i = 0; i < children.length; i++) {
            // 子容器的启动是在刚才创建的线程池中
            results.add(startStopExecutor.submit(new StartChild(children[i])));
        }

        boolean fail = false;
        for (Future result : results) {
            try {
                result.get();
            } catch (Exception e) {
                log.error(sm.getString("containerBase.threadedStartFailed"), e);
                fail = true;
            }

        }
        if (fail) {
            throw new LifecycleException(
                    sm.getString("containerBase.threadedStartFailed"));
        }

        // Start the Valves in our pipeline (including the basic), if any
        if (pipeline instanceof Lifecycle)
            ((Lifecycle) pipeline).start();


        setState(LifecycleState.STARTING);

      // 后台线程
        // Start our thread
        threadStart();

    }

Host

Engine的子容器是Host容器，它与url中的host对应，server.xml中的配置如下：

1 2	<Host name="localhost" appBase="webapps" unpackWARs="true" autoDeploy="true">

配置中指定了该host的部署目录，比如webapps，是否自动解压war包，自动部署等属性。默认实现是StandardHost，init和start没有特殊的逻辑，只是设置了error report valve。valve的机制，会在后面请求处理过程中详细解析。

Context

A Context is a Container that represents a servlet context, and therefore an individual web application, in the Catalina servlet engine.

Context代表一个tomcat的应用，也就是appBase下的一个目录。可以包含一个或者多个Servlet。

Wrapper

Standard implementation of the Wrapper interface that represents an individual servlet definition. No child Containers are allowed, and the parent Container must be a Context.

wrapper就是servlet的包装，默认实现是StandardWrapper，init和start没有特殊的逻辑。

Connector

Connector组件负责网络连接的处理、协议的解析等。网络协议的处理是tomcat中很重要的一块儿，后面也会单独分析不同协议的实现。

initInternal

// org.apache.catalina.connector.Connector#initInternal
    @Override
    protected void initInternal() throws LifecycleException {

        super.initInternal();

        // Initialize adapter
        adapter = new CoyoteAdapter(this);
        protocolHandler.setAdapter(adapter);

      // 省略
      
        try {
            protocolHandler.init();
        } catch (Exception e) {
            throw new LifecycleException(
                    sm.getString("coyoteConnector.protocolHandlerInitializationFailed"), e);
        }
    }

主要是protocolHandler的初始化

startInternal

// org.apache.catalina.connector.Connector#startInternal
    @Override
    protected void startInternal() throws LifecycleException {

        // Validate settings before starting
        if (getPort() < 0) {
            throw new LifecycleException(sm.getString(
                    "coyoteConnector.invalidPort", Integer.valueOf(getPort())));
        }

        setState(LifecycleState.STARTING);

        try {
            protocolHandler.start();
        } catch (Exception e) {
            String errPrefix = "";
            if(this.service != null) {
                errPrefix += "service.getName(): \"" + this.service.getName() + "\"; ";
            }

            throw new LifecycleException
                (errPrefix + " " + sm.getString
                 ("coyoteConnector.protocolHandlerStartFailed"), e);
        }
    }

同样的委托给protocolHandler。

Executor

Executor也是标准的tomcat组件，它的默认实现类是StandardThreadExecutor。可以在server.xml的Service节点下配置，默认是没有配置的。tomcat给了一个示例：

55     
56

如果这里设置了，是可以在Connector中共享的，这一部分是在解析server.xml时实现的：

// org.apache.catalina.startup.ConnectorCreateRule#begin
 @Override
    public void begin(String namespace, String name, Attributes attributes)
            throws Exception {
        Service svc = (Service)digester.peek();
        Executor ex = null;
        if ( attributes.getValue("executor")!=null ) {
          // 如果配置executor属性，则从service中，查找对应的executor
            ex = svc.getExecutor(attributes.getValue("executor"));
        }
        Connector con = new Connector(attributes.getValue("protocol"));
        if (ex != null) {
          // 设置executor为共享的
            setExecutor(con, ex);
        }
        String sslImplementationName = attributes.getValue("sslImplementationName");
        if (sslImplementationName != null) {
            setSSLImplementationName(con, sslImplementationName);
        }
        digester.push(con);
    }

executor A reference to the name in an Executor element. If this attribute is set, and the named executor exists, the connector will use the executor, and all the other thread attributes will be ignored. Note that if a shared executor is not specified for a connector then the connector will use a private, internal executor to provide the thread pool

initInternal

无特殊逻辑

startInternal

//org.apache.catalina.core.StandardThreadExecutor#startInternal
 /**
     * Start the component and implement the requirements
     * of {@link org.apache.catalina.util.LifecycleBase#startInternal()}.
     *
     * @exception LifecycleException if this component detects a fatal error
     *  that prevents this component from being used
     */
    @Override
    protected void startInternal() throws LifecycleException {

        taskqueue = new TaskQueue(maxQueueSize);
        TaskThreadFactory tf = new TaskThreadFactory(namePrefix,daemon,getThreadPriority());
      // 注意，这里是tomcat自己实现的ThreadPoolExecutor
        executor = new ThreadPoolExecutor(getMinSpareThreads(), getMaxThreads(), maxIdleTime, TimeUnit.MILLISECONDS,taskqueue, tf);
        executor.setThreadRenewalDelay(threadRenewalDelay);
        if (prestartminSpareThreads) {
            executor.prestartAllCoreThreads();
        }
        taskqueue.setParent(executor);

        setState(LifecycleState.STARTING);
    }

没有特殊的逻辑，只是这个tomcat的自己实现的Executor，和jdk的默认executor在行为上有所差异，后面会专门分析。

MapperListener

MapperListener实现了ContainerListener接口和LifecycleListener接口，可以监听容器发出的ContainerEvent。MapperListener主要是为了Mapper服务的，通过监听到的事件，注册对应的信息到Mapper中。

这个组件没有覆写initInternal，startInternal的时候，将自己注册为Engine以及Engine的各个子容器的listener：

//org.apache.catalina.mapper.MapperListener#addListeners
 /**
     * Add this mapper to the container and all child containers
     *
     * @param container
     */
    private void addListeners(Container container) {
        container.addContainerListener(this);
        container.addLifecycleListener(this);
        for (Container child : container.findChildren()) {
          // 递归
            addListeners(child);
        }
    }

同时会将Host组件的相关信息注册至Mapper：

// org.apache.catalina.mapper.MapperListener#registerHost
 /**
     * Register host.
     */
    private void registerHost(Host host) {

        String[] aliases = host.findAliases();
        mapper.addHost(host.getName(), aliases, host);

        for (Container container : host.findChildren()) {
            if (container.getState().isAvailable()) {
              // 子容器的映射信息
                registerContext((Context) container);
            }
        }
        if(log.isDebugEnabled()) {
            log.debug(sm.getString("mapperListener.registerHost",
                    host.getName(), domain, service));
        }
    }

以此类推，从 Engine -> Host -> Context -> Wrapper都会将映射信息注册到Mapper中，为后面的查找提供支撑。

除了启动时，自动注册信息到Mapper中，动态添加组件时，MapperListener也能监听到对应的变动：

// org.apache.catalina.mapper.MapperListener#lifecycleEvent
 @Override
    public void lifecycleEvent(LifecycleEvent event) {
        if (event.getType().equals(Lifecycle.AFTER_START_EVENT)) {
          
        } else if (event.getType().equals(Lifecycle.BEFORE_STOP_EVENT)) {
          
        }
    }

// org.apache.catalina.mapper.MapperListener#containerEvent

   @Override
    public void containerEvent(ContainerEvent event) {

        if (Container.ADD_CHILD_EVENT.equals(event.getType())) {
          
        } else if (Container.REMOVE_CHILD_EVENT.equals(event.getType())) {
            // No need to unregister - life-cycle listener will handle this when
            // the child stops
        } else if (Host.ADD_ALIAS_EVENT.equals(event.getType())) {
            // Handle dynamically adding host aliases
        } else if (Host.REMOVE_ALIAS_EVENT.equals(event.getType())) {
            // Handle dynamically removing host aliases
        } else if (Wrapper.ADD_MAPPING_EVENT.equals(event.getType())) {
            // Handle dynamically adding wrappers
        } else if (Wrapper.REMOVE_MAPPING_EVENT.equals(event.getType())) {
            // Handle dynamically removing wrappers
        } else if (Context.ADD_WELCOME_FILE_EVENT.equals(event.getType())) {
            // Handle dynamically adding welcome files
        } else if (Context.REMOVE_WELCOME_FILE_EVENT.equals(event.getType())) {
            // Handle dynamically removing welcome files
        } else if (Context.CLEAR_WELCOME_FILES_EVENT.equals(event.getType())) {
            // Handle dynamically clearing welcome files
        }
    }

Mapper

Mapper, which implements the servlet API mapping rules (which are derived
from the HTTP rules).

Mapper，顾名思义，是专门做映射的。请求进来的时候负责根据请求中的host、uri等参数找到对应的容器。

映射的代码在org.apache.catalina.mapper.Mapper#internalMap，后续我们会在请求处理篇章中，具体分析映射的过程。

这个类没有实现接口。

总结

本文走马观花似的，过了一遍tomcat启动过程中涉及到的各个基础组件，分析了各个组件的initInternal和startInternal方法，详细地梳理了tomcat初始化的流程详细。

参考

tomcat-component-lifecycle

2021-11-27T11:08:08.000Z

Tomcat将组件的声明周期抽象为了不同的状态，同时定义了组件状态转移的状态机，并将其定义为Lifecycle接口，通过这个接口来管理所有组件。

Lifecycle 接口

Lifecycle 接口主要定义三个功能：

tomcat组件生命周期对应的方法（init、start、stop、destroy等），这些方法会触发组件状态的变化，方法对应的状态转移如图：

获取当前状态的一些方法（getState/getStateName）
以及Listener管理相关的方法（addLifecycleListener、findLifecycleListeners、removeLifecycleListener）

Lifecycle接口是tomcat中很基础的接口，tomcat的组件都直接或者间接地实现了这个接口，继承这个接口的类如图所示。

从图中可以看出，tomcat的Server接口、Service接口、以及Container接口都继承了Lifecycle。这些常用的组件一般不会直接实现这个接口，一般会通过继承LifeCycleBase（LifecycleBase —> Lifecycle）或者LifecycleMbeanBase（LifecycleMbeanBase —> LifecycleBase —> Lifecycle）

LifeCycleBase

Base implementation of the {@link Lifecycle} interface that implements the
state transition rules for {@link Lifecycle#start()} and
{@link Lifecycle#stop()}

这个类实现了接口定义中的LifecycleListener管理、以及组件状态的管理。他的子类无需关心状态转移、以及Listener的通知，只用实现对应的抽象方法：

protected abstract void startInternal() throws LifecycleException;
protected abstract void initInternal() throws LifecycleException;
protected abstract void stopInternal() throws LifecycleException;
protected abstract void destroyInternal() throws LifecycleException;

以这个接口实现的init为例：

// org.apache.catalina.util.LifecycleBase#init
@Override
public final synchronized void init() throws LifecycleException {
  if (!state.equals(LifecycleState.NEW)) {
    invalidTransition(Lifecycle.BEFORE_INIT_EVENT);
  }

  try {
    setStateInternal(LifecycleState.INITIALIZING, null, false);
    initInternal();
    setStateInternal(LifecycleState.INITIALIZED, null, false);
  } catch (Throwable t) {
    ExceptionUtils.handleThrowable(t);
    setStateInternal(LifecycleState.FAILED, null, false);
    throw new LifecycleException(
      sm.getString("lifecycleBase.initFail",toString()), t);
  }
}

代码中已经做了状态转移的判断，只有从NEW状态才能调用init，抽象方法initInternal，实现了状态从INITIALIZING到状态INITIALIZED的转义，发生异常时会自动的将状态转移到FAILED。

setStateInternal中也完成了Listener的触发：

// org.apache.catalina.util.LifecycleBase#setStateInternal
private synchronized void setStateInternal(LifecycleState state,
                                           Object data, boolean check) throws LifecycleException {

  if (log.isDebugEnabled()) {
    log.debug(sm.getString("lifecycleBase.setState", this, state));
  }

  if (check) {
    // Must have been triggered by one of the abstract methods (assume
    // code in this class is correct)
    // null is never a valid state
    if (state == null) {
      invalidTransition("null");
      // Unreachable code - here to stop eclipse complaining about
      // a possible NPE further down the method
      return;
    }

    // Any method can transition to failed
    // startInternal() permits STARTING_PREP to STARTING
    // stopInternal() permits STOPPING_PREP to STOPPING and FAILED to
    // STOPPING
    if (!(state == LifecycleState.FAILED ||
          (this.state == LifecycleState.STARTING_PREP &&
           state == LifecycleState.STARTING) ||
          (this.state == LifecycleState.STOPPING_PREP &&
           state == LifecycleState.STOPPING) ||
          (this.state == LifecycleState.FAILED &&
           state == LifecycleState.STOPPING))) {
      // No other transition permitted
      invalidTransition(state.name());
    }
  }

  this.state = state;
  // 状态转移对应的事件
  String lifecycleEvent = state.getLifecycleEvent();
  if (lifecycleEvent != null) {
    fireLifecycleEvent(lifecycleEvent, data);
  }
}

// org.apache.catalina.util.LifecycleBase#fireLifecycleEvent
protected void fireLifecycleEvent(String type, Object data) {
  LifecycleEvent event = new LifecycleEvent(this, type, data);
  for (LifecycleListener listener : lifecycleListeners) {
    listener.lifecycleEvent(event);
  }
}

这样状态转移的时候，listener也能感知到了，注意这都是在一个线程中通知的，不要在Listener中做特别重的操作。

状态对应的event：

// org.apache.catalina.LifecycleState
NEW(false, null),
INITIALIZING(false, Lifecycle.BEFORE_INIT_EVENT),
INITIALIZED(false, Lifecycle.AFTER_INIT_EVENT),
STARTING_PREP(false, Lifecycle.BEFORE_START_EVENT),
STARTING(true, Lifecycle.START_EVENT),
STARTED(true, Lifecycle.AFTER_START_EVENT),
STOPPING_PREP(true, Lifecycle.BEFORE_STOP_EVENT),
STOPPING(false, Lifecycle.STOP_EVENT),
STOPPED(false, Lifecycle.AFTER_STOP_EVENT),
DESTROYING(false, Lifecycle.BEFORE_DESTROY_EVENT),
DESTROYED(false, Lifecycle.AFTER_DESTROY_EVENT),
FAILED(false, null);

LifecycleMbeanBase

LifecycleMbeanBase继承了LifeCycleBase，同时也实现了JmxEnabled接口:

public interface JmxEnabled extends MBeanRegistration {

    /**
     * @return the domain under which this component will be / has been
     * registered.
     */
    String getDomain();


    /**
     * Specify the domain under which this component should be registered. Used
     * with components that cannot (easily) navigate the component hierarchy to
     * determine the correct domain to use.
     *
     * @param domain The name of the domain under which this component should be
     *               registered
     */
    void setDomain(String domain);


    /**
     * @return the name under which this component has been registered with JMX.
     */
    ObjectName getObjectName();
}

JmxEnabled接口继承了javax.management.MBeanRegistration,用以通过Mbean来暴露对应的组件。可以用arthas 查看tomcat暴露的mbean信息：

[arthas@62513]$ mbean
Catalina:type=Service
Catalina:type=StringCache
Catalina:type=Valve,host=localhost,context=/servlet,name=NonLoginAuthenticator
Catalina:type=JspMonitor,WebModule=//localhost/servlet,name=jsp,J2EEApplication=none,J2EEServer=none
Catalina:type=NamingResources,host=localhost,context=/servlet
Catalina:type=WebResourceRoot,host=localhost,context=/atour_crawler_war
Catalina:type=ThreadPool,name="ajp-nio-8009"

可以看到这里暴露了一个Service，正是StandardService,他继承了LifecycleMbeanBase,于是自动的暴露出去了。下面来分析下他是如何实现的：


// org.apache.catalina.util.LifecycleMBeanBase#initInternal
/**
     * Sub-classes wishing to perform additional initialization should override
     * this method, ensuring that super.initInternal() is the first call in the
     * overriding method.
     */
    @Override
    protected void initInternal() throws LifecycleException {

        // If oname is not null then registration has already happened via
        // preRegister().
        if (oname == null) {
            mserver = Registry.getRegistry(null, null).getMBeanServer();

            oname = register(this, getObjectNameKeyProperties());
        }
    }

// org.apache.catalina.util.LifecycleMBeanBase#destroyInternal
  /**
     * Sub-classes wishing to perform additional clean-up should override this
     * method, ensuring that super.destroyInternal() is the last call in the
     * overriding method.
     */
    @Override
    protected void destroyInternal() throws LifecycleException {
        unregister(oname);
    }

在初始化的时候，如果当前组件没有注册到Registry，会自动的进行注册。注意，子类在覆盖这个方法的时候，不要忘了调用父类的initInternal。在组件声明周期结束的时候，也会自动的将其从Registry移除。

具体的注册逻辑：

    /**
     * Default domain for MBeans if none can be determined
     */
    public static final String DEFAULT_MBEAN_DOMAIN = "Catalina";

// org.apache.catalina.util.LifecycleMBeanBase#register
protected final ObjectName register(Object obj,
            String objectNameKeyProperties) {

  // Construct an object name with the right domain
  StringBuilder name = new StringBuilder(getDomain());
  name.append(':');
  name.append(objectNameKeyProperties);

  ObjectName on = null;

  try {
    on = new ObjectName(name.toString());

    Registry.getRegistry(null, null).registerComponent(obj, on, null);
  } catch (MalformedObjectNameException e) {
    log.warn(sm.getString("lifecycleMBeanBase.registerFail", obj, name),
             e);
  } catch (Exception e) {
    log.warn(sm.getString("lifecycleMBeanBase.registerFail", obj, name),
             e);
  }

  return on;
}

默认注册的名称，格式是domain:组件名称，这里默认的domain就是Catalina。组件的名称是通过getObjectNameKeyProperties，这是个抽象方法，留给子类的钩子。我们看下StandardService是如何实现的：

// org.apache.catalina.core.StandardService#getObjectNameKeyProperties
    @Override
    public final String getObjectNameKeyProperties() {
        return "type=Service";
    }

这个跟arthas的输出结果正好印证上了。

总结

tomcat通过Lifecycle接口来管理各个组件，定义了init/start/stop/destroy等方法。同时提供了抽象类的实现，对子类屏蔽了状态转移和Listener机制的实现。也通过LifecycleMbeanBase提供了通一的暴露到jmx的方式。

至于这些组件的init/start/stop/destroy等方法是何时被调用的，我们会在接下来的文章中接着分析启动的过程。

tomcat-startup

2021-11-20T11:25:08.000Z

启动脚本

startup.sh

一般是用$CATALINA_HOME/bin/startup.sh脚本启动：

➜  bin  cat startup.sh
#!/bin/sh

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# -----------------------------------------------------------------------------
# Start Script for the CATALINA Server
# -----------------------------------------------------------------------------

# Better OS/400 detection: see Bugzilla 31132
os400=false
case "`uname`" in
OS400*) os400=true;;
esac

# resolve links - $0 may be a softlink
PRG="$0"

while [ -h "$PRG" ] ; do
  ls=`ls -ld "$PRG"`
  link=`expr "$ls" : '.*-> \(.*\)$'`
  if expr "$link" : '/.*' > /dev/null; then
    PRG="$link"
  else
    PRG=`dirname "$PRG"`/"$link"
  fi
done

PRGDIR=`dirname "$PRG"`
EXECUTABLE=catalina.sh

# Check that target executable exists
if $os400; then
  # -x will Only work on the os400 if the files are:
  # 1. owned by the user
  # 2. owned by the PRIMARY group of the user
  # this will not work if the user belongs in secondary groups
  eval
else
  if [ ! -x "$PRGDIR"/"$EXECUTABLE" ]; then
    echo "Cannot find $PRGDIR/$EXECUTABLE"
    echo "The file is absent or does not have execute permission"
    echo "This file is needed to run this program"
    exit 1
  fi
fi

exec "$PRGDIR"/"$EXECUTABLE" start "$@"

这个脚本最终调用的是catalina.sh,传入的参数是start和我们的命令行参数

这个脚本除了start，还有其他的命令，相当于其他脚本的一个入口：

➜  bin  catalina.sh
Using CATALINA_BASE:   /Users/qishengli/software/apache-tomcat-8.5.32
Using CATALINA_HOME:   /Users/qishengli/software/apache-tomcat-8.5.32
Using CATALINA_TMPDIR: /Users/qishengli/software/apache-tomcat-8.5.32/temp
Using JRE_HOME:        /Users/qishengli/software/jdk8/jre
Using CLASSPATH:       /Users/qishengli/software/apache-tomcat-8.5.32/bin/bootstrap.jar:/Users/qishengli/software/apache-tomcat-8.5.32/bin/tomcat-juli.jar
Usage: catalina.sh ( commands ... )
commands:
  debug             Start Catalina in a debugger
  debug -security   Debug Catalina with a security manager
  jpda start        Start Catalina under JPDA debugger
  run               Start Catalina in the current window
  run -security     Start in the current window with security manager
  start             Start Catalina in a separate window
  start -security   Start in a separate window with security manager
  stop              Stop Catalina, waiting up to 5 seconds for the process to end
  stop n            Stop Catalina, waiting up to n seconds for the process to end
  stop -force       Stop Catalina, wait up to 5 seconds and then use kill -KILL if still running
  stop n -force     Stop Catalina, wait up to n seconds and then use kill -KILL if still running
  configtest        Run a basic syntax check on server.xml - check exit code for result
  version           What version of tomcat are you running?
Note: Waiting for the process to end and use of the -force option require that $CATALINA_PID is defined

比如version:

➜  bin  catalina.sh version
Using CATALINA_BASE:   /Users/qishengli/software/apache-tomcat-8.5.32
Using CATALINA_HOME:   /Users/qishengli/software/apache-tomcat-8.5.32
Using CATALINA_TMPDIR: /Users/qishengli/software/apache-tomcat-8.5.32/temp
Using JRE_HOME:        /Users/qishengli/software/jdk8/jre
Using CLASSPATH:       /Users/qishengli/software/apache-tomcat-8.5.32/bin/bootstrap.jar:/Users/qishengli/software/apache-tomcat-8.5.32/bin/tomcat-juli.jar
Server version: Apache Tomcat/8.5.32
Server built:   Jun 20 2018 19:50:35 UTC
Server number:  8.5.32.0
OS Name:        Mac OS X
OS Version:     11.6
Architecture:   aarch64
JVM Version:    1.8.0_282-b08
JVM Vendor:     Azul Systems, Inc.

看一下start对应的源码部分：

elif [ "$1" = "start" ] ; then

 # CATALINA_PID的处理逻辑，此处省略
  shift
  touch "$CATALINA_OUT"
  if [ "$1" = "-security" ] ; then
    if [ $have_tty -eq 1 ]; then
      echo "Using Security Manager"
    fi
    shift
    eval $_NOHUP "\"$_RUNJAVA\"" "\"$LOGGING_CONFIG\"" $LOGGING_MANAGER $JAVA_OPTS $CATALINA_OPTS \
      -D$ENDORSED_PROP="\"$JAVA_ENDORSED_DIRS\"" \
      -classpath "\"$CLASSPATH\"" \
      -Djava.security.manager \
      -Djava.security.policy=="\"$CATALINA_BASE/conf/catalina.policy\"" \
      -Dcatalina.base="\"$CATALINA_BASE\"" \
      -Dcatalina.home="\"$CATALINA_HOME\"" \
      -Djava.io.tmpdir="\"$CATALINA_TMPDIR\"" \
      org.apache.catalina.startup.Bootstrap "$@" start \
      >> "$CATALINA_OUT" 2>&1 "&"

  else
    eval $_NOHUP "\"$_RUNJAVA\"" "\"$LOGGING_CONFIG\"" $LOGGING_MANAGER $JAVA_OPTS $CATALINA_OPTS \
      -D$ENDORSED_PROP="\"$JAVA_ENDORSED_DIRS\"" \
      -classpath "\"$CLASSPATH\"" \
      -Dcatalina.base="\"$CATALINA_BASE\"" \
      -Dcatalina.home="\"$CATALINA_HOME\"" \
      -Djava.io.tmpdir="\"$CATALINA_TMPDIR\"" \
      org.apache.catalina.startup.Bootstrap "$@" start \
      >> "$CATALINA_OUT" 2>&1 "&"

  fi

  if [ ! -z "$CATALINA_PID" ]; then
    echo $! > "$CATALINA_PID"
  fi

  echo "Tomcat started."

基本上就是把之前detect到的各种环境变量当做参数，传递给java命令，这个脚本里默认会执行bin/setenv.sh，所以一般会在这个文件中设置tomcat的环境变量，比如本机的设置：

1
2

➜  bin  cat setenv.sh
export CATALINA_OPTS="-agentpath:/Users/qishengli/Downloads/async-profiler-2.5-macos/build/libasyncProfiler.so=start,event=cpu,interval=1ms,file=profile.html   -Djava.rmi.server.logCalls=true   -Dsun.rmi.server.logLevel=debug"

对应的脚本位置：

➜  bin  grep -n  setenv catalina.sh
24:#   setenv.sh in CATALINA_BASE/bin to keep your customizations separate.
145:# but allow them to be specified in setenv.sh, in rare case when it is needed.
148:if [ -r "$CATALINA_BASE/bin/setenv.sh" ]; then
149:  . "$CATALINA_BASE/bin/setenv.sh"
150:elif [ -r "$CATALINA_HOME/bin/setenv.sh" ]; then
151:  . "$CATALINA_HOME/bin/setenv.sh"

最终我们能拿到的命令形式：

"/Users/qishengli/software/jdk8/jre/bin/java" "-Djava.util.logging.config.file=/Users/qishengli/software/apache-tomcat-8.5.32/conf/logging.properties" -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djdk.tls.ephemeralDHKeySize=2048 -Djava.protocol.handler.pkgs=org.apache.catalina.webresources -Dorg.apache.catalina.security.SecurityListener.UMASK=0027 -agentpath:/Users/qishengli/Downloads/async-profiler-2.5-macos/build/libasyncProfiler.so=start,event=cpu,interval=1ms,file=profile.html -Djava.rmi.server.logCalls=true -Dsun.rmi.server.logLevel=debug -Dignore.endorsed.dirs="" -classpath "/Users/qishengli/software/apache-tomcat-8.5.32/bin/bootstrap.jar:/Users/qishengli/software/apache-tomcat-8.5.32/bin/tomcat-juli.jar" -Dcatalina.base="/Users/qishengli/software/apache-tomcat-8.5.32" -Dcatalina.home="/Users/qishengli/software/apache-tomcat-8.5.32" -Djava.io.tmpdir="/Users/qishengli/software/apache-tomcat-8.5.32/temp" org.apache.catalina.startup.Bootstrap start &

最后终于到了对应的java代码org.apache.catalina.startup.Bootstrap start。

idea

/Users/qishengli/software/apache-tomcat-8.5.32/bin/catalina.sh run
NOTE: Picked up JDK_JAVA_OPTIONS:  --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.rmi/sun.rmi.transport=ALL-UNNAMED
-Dcatalina.base=/Users/qishengli/Library/Caches/JetBrains/IntelliJIdea2021.2/tomcat/15632928-a384-44e8-ba78-fe9ca3f37059
[2021-10-31 05:19:05,458] Artifact web:war exploded: Waiting for server connection to start artifact deployment...

直接调用的catalina.sh的run命令

376 elif [ "$1" = "run" ]; then
377
378   shift
379   if [ "$1" = "-security" ] ; then
380     if [ $have_tty -eq 1 ]; then
381       echo "Using Security Manager"
382     fi
383     shift
384     eval exec "\"$_RUNJAVA\"" "\"$LOGGING_CONFIG\"" $LOGGING_MANAGER $JAVA_OPTS $CATALINA_OPTS \
385       -D$ENDORSED_PROP="\"$JAVA_ENDORSED_DIRS\"" \
386       -classpath "\"$CLASSPATH\"" \
387       -Djava.security.manager \
388       -Djava.security.policy=="\"$CATALINA_BASE/conf/catalina.policy\"" \
389       -Dcatalina.base="\"$CATALINA_BASE\"" \
390       -Dcatalina.home="\"$CATALINA_HOME\"" \
391       -Djava.io.tmpdir="\"$CATALINA_TMPDIR\"" \
392       org.apache.catalina.startup.Bootstrap "$@" start
393   else
394     eval exec "\"$_RUNJAVA\"" "\"$LOGGING_CONFIG\"" $LOGGING_MANAGER $JAVA_OPTS $CATALINA_OPTS \
395       -D$ENDORSED_PROP="\"$JAVA_ENDORSED_DIRS\"" \
396       -classpath "\"$CLASSPATH\"" \
397       -Dcatalina.base="\"$CATALINA_BASE\"" \
398       -Dcatalina.home="\"$CATALINA_HOME\"" \
399       -Djava.io.tmpdir="\"$CATALINA_TMPDIR\"" \
400       org.apache.catalina.startup.Bootstrap "$@" start
401   fi

跟脚本里启动相比，这里有两点不同：

没有创建PID文件
使用的是eval exec，而不是eval
通过-Dcatalina.base=xxx，指定了catalina.base的位置为idea自定义的目录（tomcat 默认读取catalina.base下的web.xml）

catalina.sh run starts tomcat in the foreground, displaying the logs on the console that you started it. Hitting Ctrl-C will terminate tomcat.
startup.sh will start tomcat in the background. You’ll have to tail -f logs/catalina.out to see the logs.
Both will do the same things, apart from the foreground/background distinction.

后续的流程就到了java代码里

Java代码中的启动流程

Bootstrap

The purpose of this roundabout approach is to keep the Catalina internal classes (and any
other classes they depend on, such as an XML parser) out of the system
class path and therefore not visible to application level classes.

bootstrap只是一张皮，先初始化了org.apache.catalina.startup.Catalina，然后调用其start方法。这么做的原因，注释中也给出了解释——防止tomcat的内部类被应用层感知（不在class path中，class path中只引入两个jar包，一个叫/bin/bootstrap.jar，一个叫/tomcat-juli.jar，其他的内部的jar包都在lib目录中，这部分是不在class path中的）。

// org.apache.catalina.startup.Bootstrap#start
    /**
     * Start the Catalina daemon.
     * @throws Exception Fatal start error
     */
    public void start()
        throws Exception {
        if( catalinaDaemon==null ) init();

        Method method = catalinaDaemon.getClass().getMethod("start", (Class [] )null);
        method.invoke(catalinaDaemon, (Object [])null);

    }

初始化的时候，会初始化三个类加载器commonLoader、catalinaLoader、sharedLoader。这三个类加载器本质上都是URLClassLoader，只是负责的加载的路径不同，可以在catalina.properties中配置：

38 # List of comma-separated paths defining the contents of the "common"
39 # classloader.
53 common.loader="${catalina.base}/lib","${catalina.base}/lib/*.jar","${catalina.home}/lib","${catalina.home}/lib/*.jar"

56 # List of comma-separated paths defining the contents of the "server"
57 # classloader.
71 server.loader=

73 #
74 # List of comma-separated paths defining the contents of the "shared"
75 # classloader. 
90 shared.loader=

这部分涉及到tomcat的类加载机制，会单独写一篇解析的文章，可以暂且跳过。

接力棒转交到Catalina之后，就涉及到配置文件的解析、tomcat的各个组件的启动了，会在第二篇中接着讲。

idea tomcat configuration 启动

从火焰图中看，Servlet是在RMI的线程中加载的：

debug，获取对应的socket信息

可以看出这个RMI调用是idea发起的，server是tomcat

➜  conf  lsof -i:54276
COMMAND   PID      USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
java    28192 qishengli   84u  IPv6 0x9b294f61ef653d83      0t0  TCP localhost:54268->localhost:54276 (ESTABLISHED)
idea    55040 qishengli  227u  IPv4 0x9b294f61f4b22e13      0t0  TCP localhost:54276->localhost:54268 (ESTABLISHED)

查看idea此时的栈信息，可以找到对应的线程栈：

"javaee connector" #5620 prio=4 os_prio=31 cpu=17.64ms elapsed=926.91s tid=0x000000036be2e400 nid=0x4e78b runnable  [0x000000039bbb9000]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(java.base@11.0.12/Native Method)
        at java.net.SocketInputStream.socketRead(java.base@11.0.12/SocketInputStream.java:115)
        at java.net.SocketInputStream.read(java.base@11.0.12/SocketInputStream.java:168)
        at java.net.SocketInputStream.read(java.base@11.0.12/SocketInputStream.java:140)
        at java.io.BufferedInputStream.fill(java.base@11.0.12/BufferedInputStream.java:252)
        at java.io.BufferedInputStream.read(java.base@11.0.12/BufferedInputStream.java:271)
        - locked <0x0000000794f63408> (a java.io.BufferedInputStream)
        at java.io.DataInputStream.readByte(java.base@11.0.12/DataInputStream.java:270)
        at sun.rmi.transport.StreamRemoteCall.executeCall(java.rmi@11.0.12/StreamRemoteCall.java:240)
        at sun.rmi.server.UnicastRef.invoke(java.rmi@11.0.12/UnicastRef.java:164)
        at jdk.jmx.remote.internal.rmi.PRef.invoke(jdk.remoteref/Unknown Source)
        at javax.management.remote.rmi.RMIConnectionImpl_Stub.invoke(java.management.rmi@11.0.12/Unknown Source)
        at javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.invoke(java.management.rmi@11.0.12/RMIConnector.java:1021)
        at com.intellij.javaee.oss.util.AbstractConnectorCommand.invokeOperation(AbstractConnectorCommand.java:139)
        at org.jetbrains.idea.tomcat.admin.TomcatAdminServerBase$2.doExecute(TomcatAdminServerBase.java:159)
        at org.jetbrains.idea.tomcat.admin.TomcatAdminServerBase$2.doExecute(TomcatAdminServerBase.java:155)
        at com.intellij.javaee.oss.util.AbstractConnectorCommand$1.call(AbstractConnectorCommand.java:36)
        at java.util.concurrent.FutureTask.run(java.base@11.0.12/FutureTask.java:264)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.12/ThreadPoolExecutor.java:1128)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.12/ThreadPoolExecutor.java:628)
        at java.lang.Thread.run(java.base@11.0.12/Thread.java:829)

idea的社区版里没有找到这个类，用arthas 反编译org.jetbrains.idea.tomcat.admin.TomcatAdminServerBase，得到源码：

[arthas@55040]$ jad org.jetbrains.idea.tomcat.admin.TomcatAdminServerBase$2

ClassLoader:
+-PluginClassLoader(plugin=PluginDescriptor(name=Tomcat and TomEE, id=Tomcat, descriptorPath=plugin.xml, path=/Applications/IntelliJ IDEA.app/Contents/plugins/Tomcat, version=2
  12.5284.40, package=null), packagePrefix=null, instanceId=190, state=active)

Location:


        /*
         * Decompiled with CFR.
         *
         * Could not load the following classes:
         *  org.jetbrains.idea.tomcat.admin.TomcatJmxAdminServerBase
         *  org.jetbrains.idea.tomcat.admin.TomcatJmxAdminServerBase$TomcatConnectorCommandBase
         */
        package org.jetbrains.idea.tomcat.admin;

        import java.io.IOException;
        import javax.management.JMException;
        import javax.management.MBeanServerConnection;
        import javax.management.ObjectName;
        import org.jetbrains.idea.tomcat.admin.TomcatJmxAdminServerBase;

        class TomcatAdminServerBase.2
        extends TomcatJmxAdminServerBase.TomcatConnectorCommandBase<String> {
            final /* synthetic */ String val$contextPath;
            final /* synthetic */ String val$deploymentPath;

            TomcatAdminServerBase.2(String string, String string2) {
                this.val$contextPath = string;
                this.val$deploymentPath = string2;
                super((TomcatJmxAdminServerBase)TomcatAdminServerBase.this);
            }

            protected String doExecute(MBeanServerConnection connection) throws JMException, IOException {
/*159*/         return (String)TomcatAdminServerBase.2.invokeOperation((MBeanServerConnection)connection, (ObjectName)TomcatAdminServerBase.2.createObjectName((String)TomcatAdminServerBase.this.getFactoryObjectName()), (String)"createStandardContext", (Object[])new Object[]{TomcatAdminServerBase.this.getHostObjectName(), this.val$contextPath, this.val$deploymentPath});
            }

            protected Integer getTimeoutSeconds() {
/*167*/         return null;
            }
        }

Affect(row-cnt:1) cost in 2415 ms.

正是这里调用了tomcat的createStandardContext

idea为何这么做？

idea通过RMI调用tomcat的DynamicBean，可以显示的指定app的class目录，而无需放到tomcat的指定目录下：

同时，ide里对应配置的修改，也会反应到idea自己创建的web.xml上：

➜  conf  cat  /Users/qishengli/Library/Caches/JetBrains/IntelliJIdea2021.2/tomcat/15632928-a384-44e8-ba78-fe9ca3f37059/conf/server.xml
<Server port="8005" shutdown="SHUTDOWN">
  <Listener className="org.apache.catalina.startup.VersionLoggerListener" />
  <Listener className="org.apache.catalina.core.AprLifecycleListener" SSLEngine="on" />
  <Listener className="org.apache.catalina.core.JreMemoryLeakPreventionListener" />
  <Listener className="org.apache.catalina.mbeans.GlobalResourcesLifecycleListener" />
  <Listener className="org.apache.catalina.core.ThreadLocalLeakPreventionListener" />
  <GlobalNamingResources>
    <Resource name="UserDatabase" auth="Container" type="org.apache.catalina.UserDatabase" description="User database that can be updated and saved" factory="org.apache.catalina.users.MemoryUserDatabaseFactory" pathname="conf/tomcat-users.xml" />
  GlobalNamingResources>
  <Service name="Catalina">
    <Connector port="8087" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" />
    <Connector port="8009" protocol="AJP/1.3" redirectPort="8443" />
    <Engine name="Catalina" defaultHost="localhost">
      <Realm className="org.apache.catalina.realm.LockOutRealm">
        <Realm className="org.apache.catalina.realm.UserDatabaseRealm" resourceName="UserDatabase" />
      Realm>
      <Host name="localhost" appBase="/Users/qishengli/software/apache-tomcat-8.5.32/webapps" unpackWARs="true" autoDeploy="true" deployOnStartup="false" deployIgnore="^(?!(manager)|(tomee)$).*">
        <Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs" prefix="localhost_access_log" suffix=".txt" pattern="%h %l %u %t "%r" %s %b" />
      Host>
    Engine>
  Service>
Server>

使用jmc也能看到tomcat暴露出来的mbena是包含一些operation的，可以通过RMI调用：

参考

web.xml

2021-10-16T09:45:06.884Z

load-on-startup标签

Servlets are initialized either lazily at request processing time or eagerly during
deployment. In the latter case, they are initialized in the order indicated by
their load-on-startup elements.

在web容器启动的时候，可以采用lazily加载的方式和eagerly的方式。

load-on-startup中的值决定了进行哪种方式。

If the value is a negative integer, or the element is not present, the
container is free to load the servlet whenever it chooses. If the value is a positive
integer or 0, the container must load and initialize the servlet as the application is
deployed.

如果这个元素没有出现，或者出现了但是里面的值是负的，容器可以按照自己的需要选择加载Servlet的时机。

如果里面的值是正数或者0，容器必须保证在容器启动的时候加载和初始化这个servlet

The container must guarantee that servlets marked with lower integers
are loaded before servlets marked with higher integers.

这个值越小，优先级越高，容器优先加载。

The container may choose
the order of loading of servlets with the same load-on-startup value.

如果里面的值是一样的，那么加载的顺序由容器来决定（不同实现可能不同）

参考

Java Servlet Specification 3.0

hexo迁移到ubuntu

2021-10-16T09:45:06.741Z

系统切换到ubuntu之后，使用的apt安装的node，默认权限是sudo。安装hexo之后也必须以sudo身份执行。
需要修改下node的权限，命令如下：

1 2	➜ qsli.github.com (hexo\|✚1…) npm config get prefix /usr/local

修改owner

1	sudo chown -R $(whoami) $(npm config get prefix)/{lib/node_modules,bin,share}

修改owner之后就可以正常执行hexo了。

参考

03 - Fixing npm permissions | npm Documentation

git-commit-id执行耗时过长

2021-08-08T10:07:18.000Z

背景

maven的git-commit-id插件，可以在release jar包时，生成一个git.properties文件，文件中可以附带上git的一些信息。git.properties示例：

#Generated by Git-Commit-Id-Plugin
#Sun Jun 27 14:58:49 CST 2021
git.branch=6b3dbc38d106181da431300c928cc961d2454c66
git.build.host=2926090-11428607-20260424
git.build.time=20210627145849282
git.build.user.email=
git.build.user.name=
git.build.version=1.0.2458
git.commit.id=6b3dbc38d106181da431300c928cc961d2454c66
git.commit.id.abbrev=6b3dbc38
git.commit.time=20210627145707000
git.commit.user.email=xx@xx.com
git.commit.user.name=xx
git.remote.origin.url=git@git.xxx

使用过程中，会发现一些大的项目，执行这个插件的时间总是很长：

1 2	2021-07-16 21:23:25.000 [INFO] [INFO] --- git-commit-id-plugin:2.2.6:revision (get-the-git-infos) @ xxx --- 2021-07-16 21:24:03.000 [INFO] [INFO]

比如这个模块，就执行了38s，如果有多个模块的话，这个时间的花费就非常的客观了，一次release都能好几分钟。

原因

获取git的哪些属性可以通过xml来配置：

 <plugin>
   <groupId>pl.project13.mavengroupId>
   <artifactId>git-commit-id-pluginartifactId>
   <version>${git-commit-id-plugin.version}version>
   <executions>
     <execution>
       <id>get-the-git-infosid>
       <goals>
         <goal>revisiongoal>
       goals>
     execution>
   executions>
   <configuration>
     <verbose>falseverbose>
     <abbrevLength>8abbrevLength>
     <dateFormat>yyyyMMddHHmmssSSSdateFormat>
     <failOnNoGitDirectory>falsefailOnNoGitDirectory>
     <failOnUnableToExtractRepoInfo>falsefailOnUnableToExtractRepoInfo>
     <generateGitPropertiesFile>truegenerateGitPropertiesFile>
     <gitDescribe>
       <skip>trueskip>
     gitDescribe>
     <includeOnlyProperties>
       <include>git.branchinclude>
       <include>git.buildinclude>
       <include>git.commit.idinclude>
       <include>git.commit.timeinclude>
       <include>git.commit.userinclude>
       <include>git.remote.origin.urlinclude>
     includeOnlyProperties>
   configuration>
plugin>

有些属性的获取是比较耗时的，需要遍历所有的commit记录（比如Tags等）。但是我们的配置中并没有这个属性，时间还是很长，执行火焰图发现：

从火焰图中，可以明细的看出，在递归地遍历git的history，而且这个操作是getTags触发的。阅读源码发现，低版本的插件，是先计算，后过滤。也就是不管你配置了没有配置这个属性，都会参与一遍计算：

// pl.project13.maven.git.GitCommitIdMojo#execute

// 1. 获取所有git的数据，包含tags
loadGitData(properties);
loadBuildData(properties);
loadShortDescribe(properties);
propertiesReplacer.performReplacement(properties, replacementProperties);
// 2. 根据传入的参数进行过滤，相当于总是获取全量数据，然后给用户的视图进行了过滤
propertiesFilterer.filter(properties, includeOnlyProperties, this.prefixDot);
propertiesFilterer.filterNot(properties, excludeProperties, this.prefixDot);
logProperties();


//1.  pl.project13.maven.git.GitDataProvider#loadGitData
// 这里直接是getTags()，而不是provider的模式，所以直接计算了所有的tags，很耗时
put(properties, GitCommitPropertyConstant.TAGS, getTags());
put(properties,GitCommitPropertyConstant.CLOSEST_TAG_NAME, getClosestTagName());
put(properties,GitCommitPropertyConstant.CLOSEST_TAG_COMMIT_COUNT, getClosestTagCommitCount());

loadGitData时，将所有的属性都计算了一遍，然后扔到properties中，后续再propertiesFilterer进行过滤。切换到最新版的代码，已经修改为：

// pl.project13.core.GitDataProvider#loadGitData
// 已经改成了provider的形式，这里是方法引用，压栈时不会触发属性的计算
maybePut(properties, GitCommitPropertyConstant.TAGS, this::getTags);
maybePut(properties,GitCommitPropertyConstant.CLOSEST_TAG_NAME, this::getClosestTagName);


// pl.project13.core.GitDataProvider#maybePut
protected void maybePut(@Nonnull Properties properties, String key, SupplierEx value)
  throws GitCommitIdExecutionException {
  String keyWithPrefix = prefixDot + key;
  if (properties.stringPropertyNames().contains(keyWithPrefix)) {
    String propertyValue = properties.getProperty(keyWithPrefix);
    log.info("Using cached {} with value {}", keyWithPrefix, propertyValue);
  } else if (PropertiesFilterer.isIncluded(keyWithPrefix, includeOnlyProperties, excludeProperties)) {
    // 符合条件（配置文件中配置了对应的属性）的才会get，触发计算
    String propertyValue = value.get();
    log.info("Collected {} with value {}", keyWithPrefix, propertyValue);
    PropertyManager.putWithoutPrefix(properties, keyWithPrefix, propertyValue);
  }
}

新的代码中已经改为provider的模式了，这种是懒加载的，实际去get的时候，才会触发计算。因此直接升级之后就好了，升级之后，耗时直接变为毫秒级。

JGit

除了计算逻辑上的bug，还有一个jgit与native git的性能差异，

issue中也有人反馈tag过多导致执行慢的，但是通过使用本地的git替换之后从38s到3.6s

Long execution times with jgit · Issue #408 · git-commit-id/git-commit-id-maven-plugin

作者的回复中也比较了JGit和NativeGit的区别，JGit可以不用关心git的版本导致的输出形式的变化（这些问题由JGit来负责）；如果使用Native Git的话，是自己解析的git的输出，如果git版本变了，这个解析可能出错。所以默认是使用JGit。

using the native git binary should usually give your build some performance boost, it may randomly break if you upgrade your git version and it decides to print information in a different format suddenly. As rule of thumb, keep using the default jgit implementation until you notice performance problems within your build

结论

Git-commit-id的低版本（2.2.6至少是有问题的）有计算逻辑的问题（先计算，后过滤），升级之后就好了，时间从秒级下降至毫秒级
JGit的性能不如native git，必要时可以进行替换
发布脚本，在clone代码时，可以加上–depth=1，避免不必要的提交历史

参考

keepalive

2021-03-29T08:02:45.000Z

TCP keep alive

TCP协议栈的keepalive，连接空闲一定时间后，会进行保活探测

[qisheng.li@YD-order-center-01 ~]$ sudo sysctl -a | grep keep
net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 7200

tcp_keepalive_time
the interval between the last data packet sent (simple ACKs are not considered data) and the first keepalive probe; after the connection is marked to need keepalive, this counter is not used any further
连接空闲tcp_keepalive_time这么久之后，系统协议栈会认为连接需要保活
tcp_keepalive_intvl
the interval between subsequential keepalive probes, regardless of what the connection has exchanged in the meantime
两次探测的间隔
tcp_keepalive_probes
the number of unacknowledged probes to send before considering the connection dead and notifying the application layer
探测次数

HTTP keep alive

从HTTP/1.1之后默认就使用keepalive了，http请求之后，连接不会关闭。这里只是实现了连接的复用，但是并没有保活相关的逻辑。

主要是通过header中的Connection: Keep-Alive来实现连接的复用的，http/1.1之后默认就是keepalive，除非显式地声明为close。

parameters
A comma-separated list of parameters, each consisting of an identifier and a value separated by the equal sign ('='). The following identifiers are possible:
timeout: indicating the minimum amount of time an idle connection has to be kept opened (in seconds). Note that timeouts longer than the TCP timeout may be ignored if no keep-alive TCP message is set at the transport level.
max: indicating the maximum number of requests that can be sent on this connection before closing it. Unless 0, this value is ignored for non-pipelined connections as another request will be sent in the next response. An HTTP pipeline can use it to limit the pipelining.

返回示例：

HTTP/1.1 200 OK
Connection: Keep-Alive
Content-Encoding: gzip
Content-Type: text/html; charset=utf-8
Date: Thu, 11 Aug 2016 15:23:13 GMT
Keep-Alive: timeout=5, max=1000
Last-Modified: Mon, 25 Jul 2016 04:32:39 GMT
Server: Apache

(body)

浏览器

那么 TCP 连接在发送后将仍然保持打开状态，这样浏览器就可以继续通过同一个 TCP 连接发送请求。保持 TCP 连接可以省去下次请求时需要建立连接的时间，提升资源加载速度。比如，一个 Web 页面中内嵌的图片就都来自同一个 Web 站点，如果初始化了一个持久连接，你就可以复用该连接，以请求其他资源，而不需要重新再建立新的 TCP 连接。

Nginx

http {
    upstream  BACKEND {
        server   192.168.0.1：8080  weight=1 max_fails=2 fail_timeout=30s;
        server   192.168.0.2：8080  weight=1 max_fails=2 fail_timeout=30s;

        keepalive 300;        // 这个很重要！
    }

    server {
        listen 8080 default_server;
        server_name "";

        location /  {
            proxy_pass http://BACKEND;
            proxy_set_header Host  $Host;
            proxy_set_header x-forwarded-for $remote_addr;
            proxy_set_header X-Real-IP $remote_addr;
            add_header Cache-Control no-store;
            add_header Pragma  no-cache;

            proxy_http_version 1.1;                    // 这两个最好也设置
            proxy_set_header Connection "";

            client_max_body_size  3072k;
            client_body_buffer_size 128k;
        }
    }
}

默认情况下，nginx已经自动开启了对client连接的keep alive支持。一般场景可以直接使用，但是对于一些比较特殊的场景，还是有必要调整个别参数。
需要修改nginx的配置文件(在nginx安装目录下的conf/nginx.conf):
1
2
3
4
5
> http {
>     keepalive_timeout  120s 120s; // 默认75s
>     keepalive_requests 10000; // 默认是100
> }
>

keepalive_timeout
第一个参数设置keep-alive客户端连接在服务器端保持开启的超时值。值为0会禁用keep-alive客户端连接。可选的第二个参数在响应的header域中设置一个值“Keep-Alive: timeout=time”。这两个参数可以不一样。
keepalive_requests
keepalive_requests指令用于设置一个keep-alive连接上可以服务的请求的最大数量。当最大请求数量达到时，连接被关闭。默认是100。
keepalive
The *connections* parameter sets the maximum number of idle keepalive connections to upstream servers that are preserved in the cache of each worker process. When this number is exceeded, the least recently used connections are closed.
类似maxIdle

Tomcat

配置名称	备注
`keepAliveTimeout`	The number of milliseconds this Connector will wait for another HTTP request before closing the connection. The default value is to use the value that has been set for the connectionTimeout attribute. Use a value of -1 to indicate no (i.e. infinite) timeout.
`maxKeepAliveRequests`	The maximum number of HTTP requests which can be pipelined until the connection is closed by the server. Setting this attribute to 1 will disable HTTP/1.0 keep-alive, as well as HTTP/1.1 keep-alive and pipelining. Setting this to -1 will allow an unlimited amount of pipelined or keep-alive HTTP requests. If not specified, this attribute is set to 100.

HttpClient

apache的httpclient也没有保活的机制，连接的复用依赖于HTTP协议中的keep-alive。HttpClient中有定时的任务，去清理过期和空闲的连接。

/**
     * Closes connections that have been idle longer than the given period
     * of time and evicts them from the pool.
     *
     * @param idletime maximum idle time.
     * @param tunit time unit.
     */
// org.apache.http.pool.AbstractConnPool#closeIdle
public void closeIdle(final long idletime, final TimeUnit tunit) {
  Args.notNull(tunit, "Time unit");
  long time = tunit.toMillis(idletime);
  if (time < 0) {
    time = 0;
  }
  final long deadline = System.currentTimeMillis() - time;
  enumAvailable(new PoolEntryCallback() {

    @Override
    public void process(final PoolEntry entry) {
      // 空闲超过idleTime的，给关闭掉
      if (entry.getUpdated() <= deadline) {
        entry.close();
      }
    }
  });
}


/**
     * Closes expired connections and evicts them from the pool.
     */
// org.apache.http.pool.AbstractConnPool#closeExpired
public void closeExpired() {
  final long now = System.currentTimeMillis();
  enumAvailable(new PoolEntryCallback() {

    @Override
    public void process(final PoolEntry entry) {
      // 过期的，Keep-Alive: timeout=5, max=1000
      if (entry.isExpired(now)) {
        entry.close();
      }
    }

  });
}


/**
     * Enumerates all available connections.
     *
     * @since 4.3
     */
// org.apache.http.pool.AbstractConnPool#enumAvailable
protected void enumAvailable(final PoolEntryCallback callback) {
  this.lock.lock();
  try {
    final Iterator it = this.available.iterator();
    while (it.hasNext()) {
      final E entry = it.next();
      callback.process(entry);
      if (entry.isClosed()) {
        final RouteSpecificPool pool = getPool(entry.getRoute());
        pool.remove(entry);
        it.remove();
      }
    }
    purgePoolMap();
  } finally {
    this.lock.unlock();
  }
}

归还连接时，根据response header中的来判断是否可以复用：

// org.apache.http.impl.execchain.MinimalClientExec#execute
// The connection is in or can be brought to a re-usable state.
if (reuseStrategy.keepAlive(response, context)) {
  // Set the idle duration of this connection
  final long duration = keepAliveStrategy.getKeepAliveDuration(response, context);
  // 连接有效期
  releaseTrigger.setValidFor(duration, TimeUnit.MILLISECONDS);
  // 标记为可以复用
  releaseTrigger.markReusable();
} else {
  releaseTrigger.markNonReusable();
}


// org.apache.http.impl.conn.PoolingHttpClientConnectionManager#releaseConnection
public void releaseConnection(
  final HttpClientConnection managedConn,
  final Object state,
  final long keepalive, final TimeUnit tunit) {
  Args.notNull(managedConn, "Managed connection");
  synchronized (managedConn) {
    final CPoolEntry entry = CPoolProxy.detach(managedConn);
    if (entry == null) {
      return;
    }
    final ManagedHttpClientConnection conn = entry.getConnection();
    try {
      if (conn.isOpen()) {
        entry.setState(state);
        // 设置对象的过期时间
        entry.updateExpiry(keepalive, tunit != null ? tunit : TimeUnit.MILLISECONDS);
        // debug 日志
        if (this.log.isDebugEnabled()) {
          final String s;
          if (keepalive > 0) {
            s = "for " + (double) keepalive / 1000 + " seconds";
          } else {
            s = "indefinitely";
          }
          this.log.debug("Connection " + format(entry) + " can be kept alive " + s);
        }
      }
    } finally {
      this.pool.release(entry, conn.isOpen() && entry.isRouteComplete());
      if (this.log.isDebugEnabled()) {
        this.log.debug("Connection released: " + format(entry) + formatStats(entry.getRoute()));
      }
    }
  }
}

ConnectionKeepAliveStrategy

// org.apache.http.conn.ConnectionKeepAliveStrategy
public interface ConnectionKeepAliveStrategy {

  /**
     * Returns the duration of time which this connection can be safely kept
     * idle. If the connection is left idle for longer than this period of time,
     * it MUST not reused. A value of 0 or less may be returned to indicate that
     * there is no suitable suggestion.
     *
     * When coupled with a {@link org.apache.http.ConnectionReuseStrategy}, if
     * {@link org.apache.http.ConnectionReuseStrategy#keepAlive(
     *   HttpResponse, HttpContext)} returns true, this allows you to control
     * how long the reuse will last. If keepAlive returns false, this should
     * have no meaningful impact
     *
     * @param response
     *            The last response received over the connection.
     * @param context
     *            the context in which the connection is being used.
     *
     * @return the duration in ms for which it is safe to keep the connection
     *         idle, or <=0 if no suggested duration.
     */
  long getKeepAliveDuration(HttpResponse response, HttpContext context);

}

默认实现：

// org.apache.http.impl.client.DefaultConnectionKeepAliveStrategy
@Immutable
public class DefaultConnectionKeepAliveStrategy implements ConnectionKeepAliveStrategy {

    public static final DefaultConnectionKeepAliveStrategy INSTANCE = new DefaultConnectionKeepAliveStrategy();

    public long getKeepAliveDuration(final HttpResponse response, final HttpContext context) {
        Args.notNull(response, "HTTP response");
      // header中的Keep-Alive
        final HeaderElementIterator it = new BasicHeaderElementIterator(
                response.headerIterator(HTTP.CONN_KEEP_ALIVE));
        while (it.hasNext()) {
            final HeaderElement he = it.nextElement();
            final String param = he.getName();
            final String value = he.getValue();
            if (value != null && param.equalsIgnoreCase("timeout")) {
                try {
                  // 解析timeout的值
                    return Long.parseLong(value) * 1000;
                } catch(final NumberFormatException ignore) {
                }
            }
        }
        return -1;
    }

}

ConnectionReuseStrategy

// org.apache.http.ConnectionReuseStrategy
public interface ConnectionReuseStrategy {

    /**
     * Decides whether a connection can be kept open after a request.
     * If this method returns false, the caller MUST
     * close the connection to correctly comply with the HTTP protocol.
     * If it returns true, the caller SHOULD attempt to
     * keep the connection open for reuse with another request.
     * 

     * One can use the HTTP context to retrieve additional objects that
     * may be relevant for the keep-alive strategy: the actual HTTP
     * connection, the original HTTP request, target host if known,
     * number of times the connection has been reused already and so on.
     * 

     * If the connection is already closed, false is returned.
     * The stale connection check MUST NOT be triggered by a
     * connection reuse strategy.
     *
     * @param response
     *          The last response received over that connection.
     * @param context   the context in which the connection is being
     *          used.
     *
     * @return true if the connection is allowed to be reused, or
     *         false if it MUST NOT be reused
     */
    boolean keepAlive(HttpResponse response, HttpContext context);

}

默认实现：

// org.apache.http.impl.DefaultConnectionReuseStrategy
// see interface ConnectionReuseStrategy
public boolean keepAlive(final HttpResponse response,
                         final HttpContext context) {
  Args.notNull(response, "HTTP response");
  Args.notNull(context, "HTTP context");

  // Check for a self-terminating entity. If the end of the entity will
  // be indicated by closing the connection, there is no keep-alive.
  final ProtocolVersion ver = response.getStatusLine().getProtocolVersion();
  final Header teh = response.getFirstHeader(HTTP.TRANSFER_ENCODING);
  if (teh != null) {
    // 有Transfer-Encoding，但是值不是chunked的，不可复用
    // 看上面的注释是因为，有些encoding会以连接关闭来标识entity结束
    if (!HTTP.CHUNK_CODING.equalsIgnoreCase(teh.getValue())) {
      return false;
    }
  } else {
    // 有response body，但是content-length不合法的也应该关闭
    // 这是RFC中规定的
    if (canResponseHaveBody(response)) {
      final Header[] clhs = response.getHeaders(HTTP.CONTENT_LEN);
      // Do not reuse if not properly content-length delimited
      if (clhs.length == 1) {
        final Header clh = clhs[0];
        try {
          final int contentLen = Integer.parseInt(clh.getValue());
          if (contentLen < 0) {
            return false;
          }
        } catch (final NumberFormatException ex) {
          return false;
        }
      } else {
        return false;
      }
    }
  }

  // Check for the "Connection" header. If that is absent, check for
  // the "Proxy-Connection" header. The latter is an unspecified and
  // broken but unfortunately common extension of HTTP.
  // header中的Connection
  HeaderIterator hit = response.headerIterator(HTTP.CONN_DIRECTIVE);
  if (!hit.hasNext()) {
    hit = response.headerIterator("Proxy-Connection");
  }

  // Experimental usage of the "Connection" header in HTTP/1.0 is
  // documented in RFC 2068, section 19.7.1. A token "keep-alive" is
  // used to indicate that the connection should be persistent.
  // Note that the final specification of HTTP/1.1 in RFC 2616 does not
  // include this information. Neither is the "Connection" header
  // mentioned in RFC 1945, which informally describes HTTP/1.0.
  //
  // RFC 2616 specifies "close" as the only connection token with a
  // specific meaning: it disables persistent connections.
  //
  // The "Proxy-Connection" header is not formally specified anywhere,
  // but is commonly used to carry one token, "close" or "keep-alive".
  // The "Connection" header, on the other hand, is defined as a
  // sequence of tokens, where each token is a header name, and the
  // token "close" has the above-mentioned additional meaning.
  //
  // To get through this mess, we treat the "Proxy-Connection" header
  // in exactly the same way as the "Connection" header, but only if
  // the latter is missing. We scan the sequence of tokens for both
  // "close" and "keep-alive". As "close" is specified by RFC 2068,
  // it takes precedence and indicates a non-persistent connection.
  // If there is no "close" but a "keep-alive", we take the hint.

  if (hit.hasNext()) {
    try {
      final TokenIterator ti = createTokenIterator(hit);
      boolean keepalive = false;
      while (ti.hasNext()) {
        final String token = ti.nextToken();
        if (HTTP.CONN_CLOSE.equalsIgnoreCase(token)) {
          // 如果是Connection: close，是不能复用的
          return false;
        } else if (HTTP.CONN_KEEP_ALIVE.equalsIgnoreCase(token)) {
          // continue the loop, there may be a "close" afterwards
          // Connection: Keep-Alive
          keepalive = true;
        }
      }
      if (keepalive)
      {
        return true;
        // neither "close" nor "keep-alive", use default policy
      }

    } catch (final ParseException px) {
      // invalid connection header means no persistent connection
      // we don't have logging in HttpCore, so the exception is lost
      return false;
    }
  }

  // HTTP/1.1之后，默认都是可以keepalive的
  // default since HTTP/1.1 is persistent, before it was non-persistent
  return !ver.lessEquals(HttpVersion.HTTP_1_0);
}

应用层

Dubbo

// org.apache.dubbo.remoting.exchange.support.header.HeartbeatTimerTask#doTask
@Override
protected void doTask(Channel channel) {
  try {
    Long lastRead = lastRead(channel);
    Long lastWrite = lastWrite(channel);
    if ((lastRead != null && now() - lastRead > heartbeat)
        || (lastWrite != null && now() - lastWrite > heartbeat)) {
      Request req = new Request();
      req.setVersion(Version.getProtocolVersion());
      req.setTwoWay(true);
      req.setEvent(HEARTBEAT_EVENT);
      channel.send(req);
      if (logger.isDebugEnabled()) {
        logger.debug("Send heartbeat to remote channel " + channel.getRemoteAddress()
                     + ", cause: The channel has no data-transmission exceeds a heartbeat period: "
                     + heartbeat + "ms");
      }
    }
  } catch (Throwable t) {
    logger.warn("Exception when heartbeat to remote channel " + channel.getRemoteAddress(), t);
  }
}

Druid

在Druid-1.0.27之前的版本，DruidDataSource建议使用TestWhileIdle来保证连接的有效性，但仍有很多场景需要对连接进行保活处理。在1.0.28版本之后，新加入keepAlive配置，缺省关闭。
使用keepAlive功能，建议使用最新版本，比如1.1.21或者更高版本

Hikari CP

⏳keepaliveTime
This property controls how frequently HikariCP will attempt to keep a connection alive, in order to prevent it from being timed out by the database or network infrastructure. This value must be less than the maxLifetime value. A “keepalive” will only occur on an idle connection. When the time arrives for a “keepalive” against a given connection, that connection will be removed from the pool, “pinged”, and then returned to the pool. The ‘ping’ is one of either: invocation of the JDBC4 isValid() method, or execution of the connectionTestQuery. Typically, the duration out-of-the-pool should be measured in single digit milliseconds or even sub-millisecond, and therefore should have little or no noticible performance impact. The minimum allowed value is 30000ms (30 seconds), but a value in the range of minutes is most desirable. Default: 0 (disabled)

⏳idleTimeout
This property controls the maximum amount of time that a connection is allowed to sit idle in the pool. This setting only applies when minimumIdle is defined to be less than maximumPoolSize. Idle connections will not be retired once the pool reaches minimumIdle connections. Whether a connection is retired as idle or not is subject to a maximum variation of +30 seconds, and average variation of +15 seconds. A connection will never be retired as idle before this timeout. A value of 0 means that idle connections are never removed from the pool. The minimum allowed value is 10000ms (10 seconds). Default: 600000 (10 minutes)

参考

spring-transaction

2021-03-23T09:35:02.000Z

使用

JDBC事务

@Test
@SneakyThrows
public void testTransaction() {
    try (Connection conn = DriverManager.getConnection(connectString)) {
        conn.setAutoCommit(false);
        try (PreparedStatement psts = conn.prepareStatement("update words set word=CONCAT(word, '++') where id=?")) {
            // 第一个更新语句
            psts.setInt(1, 2);
            psts.executeUpdate();
            // 第二个更新语句
            // 抛出异常
            int i = 1/0;
            psts.setInt(1, 3);
            psts.executeUpdate();
            // 提交事务
            conn.commit();
        } catch (Throwable t) {
            conn.rollback();
        }

    }
}

结果：

2021-03-23T03:07:15.228849Z   93 Connectroot@localhost on test using TCP/IP
2021-03-23T03:07:15.235215Z   93 Query/* mysql-connector-java-8.0.20 (Revision: afc0a13cd3c5a0bf57eaa809ee0ee6df1fd5ac9b) */SELECT  @@session.auto_increment_increment AS auto_increment_increment, @@character_set_client AS character_set_client, @@character_set_connection AS character_set_connection, @@character_set_results AS character_set_results, @@character_set_server AS character_set_server, @@collation_server AS collation_server, @@collation_connection AS collation_connection, @@init_connect AS init_connect, @@interactive_timeout AS interactive_timeout, @@license AS license, @@lower_case_table_names AS lower_case_table_names, @@max_allowed_packet AS max_allowed_packet, @@net_write_timeout AS net_write_timeout, @@performance_schema AS performance_schema, @@sql_mode AS sql_mode, @@system_time_zone AS system_time_zone, @@time_zone AS time_zone, @@transaction_isolation AS transaction_isolation, @@wait_timeout AS wait_timeout
2021-03-23T03:07:15.257727Z   93 QuerySET character_set_results = NULL
2021-03-23T03:07:15.261596Z   93 QuerySET autocommit=0
2021-03-23T03:07:15.296076Z   93 Queryupdate words set word=CONCAT(word, '++') where id=2
2021-03-23T03:07:15.305666Z   93 Queryrollback
2021-03-23T03:07:15.330138Z   93 Queryrollback
2021-03-23T03:07:15.347768Z   93 Quit

可以看出使用原始的JDBC提供的接口，需要获取conn，设置各种属性，获取statement，同时还需要处理各种资源的关闭，事务的commit或者rollback。这些步骤就是boilerplate code——样板化的代码，非常适合使用模板方法，将这些细节隐藏起来。

spring提供了JdbcTemplate来简化jdbc相关的开发，对于事务相关的开发，提供了声明式事务和编程式事务。

声明式事务（Declarative transaction management）

<bean id="mindTransactionManager" class="org.springframework.jdbc.datasource.DataSourceTransactionManager">
  <property name="dataSource" ref="mindDataSource"/>
bean>


<tx:annotation-driven/>

使用

@Transactional(transactionManager = "mindTransactionManager", readOnly = true)
public void query() {
  final MindEntity mindEntity = minderDao.selectOne(1);
  log.info("mindEntity = {}", mindEntity);
}

编程式事务（Programmatic transaction management）

@Bean(name = "orderShardingTransactionTemplate")
public TransactionTemplate transactionTemplate(
  @Qualifier("orderShardingTransactionManager") DataSourceTransactionManager dataSourceTransactionManager) {
  final TransactionTemplate transactionTemplate = new TransactionTemplate(dataSourceTransactionManager);
  transactionTemplate.setTimeout(60);
  return transactionTemplate;
}

使用：

1	transactionTemplate.execute(status -> {})

源码分析

声明式事务，最终通过AOP代理到了TransactionInterceptor(org.springframework.transaction.interceptor.TransactionInterceptor)，处理逻辑可以顺着配置类查看，这里不再赘述。

下面从编程式事务入手，解析下源码

TransactionTemplate

// org.springframework.transaction.support.TransactionTemplate#execute
@Override
@Nullable
public  T execute(TransactionCallback action) throws TransactionException {
  Assert.state(this.transactionManager != null, "No PlatformTransactionManager set");

  if (this.transactionManager instanceof CallbackPreferringPlatformTransactionManager) {
    return ((CallbackPreferringPlatformTransactionManager) this.transactionManager).execute(this, action);
  }
  else {
    // 获取TransactionStatus
    TransactionStatus status = this.transactionManager.getTransaction(this);
    T result;
    try {
      // 执行业务代码
      result = action.doInTransaction(status);
    }
    catch (RuntimeException | Error ex) {
      // Transactional code threw application exception -> rollback
      // 回滚
      rollbackOnException(status, ex);
      throw ex;
    }
    catch (Throwable ex) {
      // Transactional code threw unexpected exception -> rollback
      // 回滚
      rollbackOnException(status, ex);
      throw new UndeclaredThrowableException(ex, "TransactionCallback threw undeclared checked exception");
    }
    // 提交事务
    this.transactionManager.commit(status);
    return result;
  }
}

模板里的代码没什么看的，整体的逻辑都委托给了PlatformTransactionManager

PlatformTransactionManager

This is the central interface in Spring’s transaction infrastructure.
Applications can use this directly, but it is not primarily meant as API:
Typically, applications will work with either TransactionTemplate or
declarative transaction demarcation through AOP.

PlatformTransactionManager定义了三个接口，getTransaction/commit/rollback

public interface PlatformTransactionManager {

/**
 * Return a currently active transaction or create a new one, according to
 * the specified propagation behavior.
 * Note that parameters like isolation level or timeout will only be applied
 * to new transactions, and thus be ignored when participating in active ones.
 * 
Furthermore, not all transaction definition settings will be supported
 * by every transaction manager: A proper transaction manager implementation
 * should throw an exception when unsupported settings are encountered.
 * 
An exception to the above rule is the read-only flag, which should be
 * ignored if no explicit read-only mode is supported. Essentially, the
 * read-only flag is just a hint for potential optimization.
 * @param definition TransactionDefinition instance (can be {@code null} for defaults),
 * describing propagation behavior, isolation level, timeout etc.
 * @return transaction status object representing the new or current transaction
 * @throws TransactionException in case of lookup, creation, or system errors
 * @throws IllegalTransactionStateException if the given transaction definition
 * cannot be executed (for example, if a currently active transaction is in
 * conflict with the specified propagation behavior)
 * @see TransactionDefinition#getPropagationBehavior
 * @see TransactionDefinition#getIsolationLevel
 * @see TransactionDefinition#getTimeout
 * @see TransactionDefinition#isReadOnly
 */
TransactionStatus getTransaction(TransactionDefinition definition) throws TransactionException;

/**
 * Commit the given transaction, with regard to its status. If the transaction
 * has been marked rollback-only programmatically, perform a rollback.
 * 
If the transaction wasn't a new one, omit the commit for proper
 * participation in the surrounding transaction. If a previous transaction
 * has been suspended to be able to create a new one, resume the previous
 * transaction after committing the new one.
 * 
Note that when the commit call completes, no matter if normally or
 * throwing an exception, the transaction must be fully completed and
 * cleaned up. No rollback call should be expected in such a case.
 * 
If this method throws an exception other than a TransactionException,
 * then some before-commit error caused the commit attempt to fail. For
 * example, an O/R Mapping tool might have tried to flush changes to the
 * database right before commit, with the resulting DataAccessException
 * causing the transaction to fail. The original exception will be
 * propagated to the caller of this commit method in such a case.
 * @param status object returned by the {@code getTransaction} method
 * @throws UnexpectedRollbackException in case of an unexpected rollback
 * that the transaction coordinator initiated
 * @throws HeuristicCompletionException in case of a transaction failure
 * caused by a heuristic decision on the side of the transaction coordinator
 * @throws TransactionSystemException in case of commit or system errors
 * (typically caused by fundamental resource failures)
 * @throws IllegalTransactionStateException if the given transaction
 * is already completed (that is, committed or rolled back)
 * @see TransactionStatus#setRollbackOnly
 */
void commit(TransactionStatus status) throws TransactionException;

/**
 * Perform a rollback of the given transaction.
 * 
If the transaction wasn't a new one, just set it rollback-only for proper
 * participation in the surrounding transaction. If a previous transaction
 * has been suspended to be able to create a new one, resume the previous
 * transaction after rolling back the new one.
 * 
Do not call rollback on a transaction if commit threw an exception.
 * The transaction will already have been completed and cleaned up when commit
 * returns, even in case of a commit exception. Consequently, a rollback call
 * after commit failure will lead to an IllegalTransactionStateException.
 * @param status object returned by the {@code getTransaction} method
 * @throws TransactionSystemException in case of rollback or system errors
 * (typically caused by fundamental resource failures)
 * @throws IllegalTransactionStateException if the given transaction
 * is already completed (that is, committed or rolled back)
 */
void rollback(TransactionStatus status) throws TransactionException;

}

AbstractPlatformTransactionManager

AbstractPlatformTransactionManager主要实现了两个功能：

挂起和恢复事务（propagation behavior）
通知TransactionSynchronization事务的状态

获取事务过程

/**
 * This implementation handles propagation behavior. Delegates to
 * {@code doGetTransaction}, {@code isExistingTransaction}
 * and {@code doBegin}.
 * @see #doGetTransaction
 * @see #isExistingTransaction
 * @see #doBegin
 */
@Override
public final TransactionStatus getTransaction(TransactionDefinition definition) throws TransactionException {
  // 留给子类实现
  // template method，
  Object transaction = doGetTransaction();

  // Cache debug flag to avoid repeated checks.
  boolean debugEnabled = logger.isDebugEnabled();

  if (definition == null) {
    // Use defaults if no transaction definition given.
    definition = new DefaultTransactionDefinition();
  }

  // 当前是否存在事务
  // isExistingTransaction也是子类负责实现的
  // template method，
  if (isExistingTransaction(transaction)) {
    // Existing transaction found -> check propagation behavior to find out how to behave.
    // 处理事务的传播和挂起
    // 直接返回handleExistingTransaction的结果
    return handleExistingTransaction(definition, transaction, debugEnabled);
  }

  // ------------------------------------- 新事务处理流程 -------------------------------------
  // Check definition settings for new transaction.
  // 事务超时
  if (definition.getTimeout() < TransactionDefinition.TIMEOUT_DEFAULT) {
    throw new InvalidTimeoutException("Invalid transaction timeout", definition.getTimeout());
  }

  // No existing transaction found -> check propagation behavior to find out how to proceed.
  if (definition.getPropagationBehavior() == TransactionDefinition.PROPAGATION_MANDATORY) {
    throw new IllegalTransactionStateException(
      "No existing transaction found for transaction marked with propagation 'mandatory'");
  }
  else if (definition.getPropagationBehavior() == TransactionDefinition.PROPAGATION_REQUIRED ||
           definition.getPropagationBehavior() == TransactionDefinition.PROPAGATION_REQUIRES_NEW ||
           definition.getPropagationBehavior() == TransactionDefinition.PROPAGATION_NESTED) {
    // just suspend active synchronizations, if any
    // 返回值保存了之前的状态
    SuspendedResourcesHolder suspendedResources = suspend(null);
    if (debugEnabled) {
      logger.debug("Creating new transaction with name [" + definition.getName() + "]: " + definition);
    }
    try {
      boolean newSynchronization = (getTransactionSynchronization() != SYNCHRONIZATION_NEVER);
      // 构建TransactionStatus，后续的事务操作都是以此为依据
      // 这个类中包含各种必要的信息
      DefaultTransactionStatus status = newTransactionStatus(
        definition, transaction, true, newSynchronization, debugEnabled, suspendedResources);
      // template method，子类负责实现
      // 这里就无需考虑propagation behavior，上面已经处理了
      doBegin(transaction, definition);
      // 初始化ThreadLocal中的事务信息
      prepareSynchronization(status, definition);
      return status;
    }
    catch (RuntimeException ex) {
      // 发生异常，恢复之前挂起的事务
      resume(null, suspendedResources);
      throw ex;
    }
    catch (Error err) {
      // 发生异常，恢复之前挂起的事务
      resume(null, suspendedResources);
      throw err;
    }
  }
  else {
    // Create "empty" transaction: no actual transaction, but potentially synchronization.
    if (definition.getIsolationLevel() != TransactionDefinition.ISOLATION_DEFAULT && logger.isWarnEnabled()) {
      logger.warn("Custom isolation level specified but no actual transaction initiated; " +
                  "isolation level will effectively be ignored: " + definition);
    }
    boolean newSynchronization = (getTransactionSynchronization() == SYNCHRONIZATION_ALWAYS);
    return prepareTransactionStatus(definition, null, true, newSynchronization, debugEnabled, null);
  }
}

Commit事务过程

// org.springframework.transaction.support.AbstractPlatformTransactionManager#commit
/**
 * This implementation of commit handles participating in existing
 * transactions and programmatic rollback requests.
 * Delegates to {@code isRollbackOnly}, {@code doCommit}
 * and {@code rollback}.
 * @see org.springframework.transaction.TransactionStatus#isRollbackOnly()
 * @see #doCommit
 * @see #rollback
 */
@Override
public final void commit(TransactionStatus status) throws TransactionException {
  if (status.isCompleted()) {
    throw new IllegalTransactionStateException(
      "Transaction is already completed - do not call commit or rollback more than once per transaction");
  }

  DefaultTransactionStatus defStatus = (DefaultTransactionStatus) status;
  if (defStatus.isLocalRollbackOnly()) {
    if (defStatus.isDebug()) {
      logger.debug("Transactional code has requested rollback");
    }
    processRollback(defStatus);
    return;
  }
  if (!shouldCommitOnGlobalRollbackOnly() && defStatus.isGlobalRollbackOnly()) {
    if (defStatus.isDebug()) {
      logger.debug("Global transaction is marked as rollback-only but transactional code requested commit");
    }
    processRollback(defStatus);
    // Throw UnexpectedRollbackException only at outermost transaction boundary
    // or if explicitly asked to.
    if (status.isNewTransaction() || isFailEarlyOnGlobalRollbackOnly()) {
      throw new UnexpectedRollbackException(
        "Transaction rolled back because it has been marked as rollback-only");
    }
    return;
  }

  processCommit(defStatus);
}




// org.springframework.transaction.support.AbstractPlatformTransactionManager#processCommit
/**
 * Process an actual commit.
 * Rollback-only flags have already been checked and applied.
 * @param status object representing the transaction
 * @throws TransactionException in case of commit failure
 */
private void processCommit(DefaultTransactionStatus status) throws TransactionException {
  try {
    boolean beforeCompletionInvoked = false;
    try {
      prepareForCommit(status);
      // 下面的几个方法会通知之前注册的TransactionSynchronization，告知事务的状态
      triggerBeforeCommit(status);
      triggerBeforeCompletion(status);
      beforeCompletionInvoked = true;
      boolean globalRollbackOnly = false;
      if (status.isNewTransaction() || isFailEarlyOnGlobalRollbackOnly()) {
        globalRollbackOnly = status.isGlobalRollbackOnly();
      }
      if (status.hasSavepoint()) {
        if (status.isDebug()) {
          logger.debug("Releasing transaction savepoint");
        }
        status.releaseHeldSavepoint();
      }
      else if (status.isNewTransaction()) {
        if (status.isDebug()) {
          logger.debug("Initiating transaction commit");
        }
        // template method
        // 子类负责实现
        doCommit(status);
      }
      // Throw UnexpectedRollbackException if we have a global rollback-only
      // marker but still didn't get a corresponding exception from commit.
      if (globalRollbackOnly) {
        throw new UnexpectedRollbackException(
          "Transaction silently rolled back because it has been marked as rollback-only");
      }
    }
    catch (UnexpectedRollbackException ex) {
      // can only be caused by doCommit
      // 通知回调函数
      triggerAfterCompletion(status, TransactionSynchronization.STATUS_ROLLED_BACK);
      throw ex;
    }
    catch (TransactionException ex) {
      // can only be caused by doCommit
      if (isRollbackOnCommitFailure()) {
        doRollbackOnCommitException(status, ex);
      }
      else {
        // 通知回调函数
        triggerAfterCompletion(status, TransactionSynchronization.STATUS_UNKNOWN);
      }
      throw ex;
    }
    catch (RuntimeException ex) {
      if (!beforeCompletionInvoked) {
        // 通知回调函数
        triggerBeforeCompletion(status);
      }
      doRollbackOnCommitException(status, ex);
      throw ex;
    }
    catch (Error err) {
      if (!beforeCompletionInvoked) {
        triggerBeforeCompletion(status);
      }
      doRollbackOnCommitException(status, err);
      throw err;
    }

    // Trigger afterCommit callbacks, with an exception thrown there
    // propagated to callers but the transaction still considered as committed.
    try {
      triggerAfterCommit(status);
    }
    finally {
      triggerAfterCompletion(status, TransactionSynchronization.STATUS_COMMITTED);
    }

  }
  finally {
    // 清空TransactionSynchronizationManager中保存的状态
    // 触发事务回调对应的接口
    // 恢复挂起的事务
    cleanupAfterCompletion(status);
  }
}

Rollback过程

// org.springframework.transaction.support.AbstractPlatformTransactionManager#rollback
/**
 * This implementation of rollback handles participating in existing
 * transactions. Delegates to {@code doRollback} and
 * {@code doSetRollbackOnly}.
 * @see #doRollback
 * @see #doSetRollbackOnly
 */
@Override
public final void rollback(TransactionStatus status) throws TransactionException {
  if (status.isCompleted()) {
    throw new IllegalTransactionStateException(
      "Transaction is already completed - do not call commit or rollback more than once per transaction");
  }

  DefaultTransactionStatus defStatus = (DefaultTransactionStatus) status;
  processRollback(defStatus);
}


/**
 * Process an actual rollback.
 * The completed flag has already been checked.
 * @param status object representing the transaction
 * @throws TransactionException in case of rollback failure
 */
private void processRollback(DefaultTransactionStatus status) {
  try {
    try {
      triggerBeforeCompletion(status);
      if (status.hasSavepoint()) {
        if (status.isDebug()) {
          logger.debug("Rolling back transaction to savepoint");
        }
        status.rollbackToHeldSavepoint();
      }
      else if (status.isNewTransaction()) {
        if (status.isDebug()) {
          logger.debug("Initiating transaction rollback");
        }
        // template method
        // 子类实现
        doRollback(status);
      }
      else if (status.hasTransaction()) {
        if (status.isLocalRollbackOnly() || isGlobalRollbackOnParticipationFailure()) {
          if (status.isDebug()) {
            logger.debug("Participating transaction failed - marking existing transaction as rollback-only");
          }
          doSetRollbackOnly(status);
        }
        else {
          if (status.isDebug()) {
            logger.debug("Participating transaction failed - letting transaction originator decide on rollback");
          }
        }
      }
      else {
        logger.debug("Should roll back transaction but cannot - no transaction available");
      }
    }
    catch (RuntimeException ex) {
      // 触发事务回调
      triggerAfterCompletion(status, TransactionSynchronization.STATUS_UNKNOWN);
      throw ex;
    }
    catch (Error err) {
      // 触发事务回调
      triggerAfterCompletion(status, TransactionSynchronization.STATUS_UNKNOWN);
      throw err;
    }
    // 触发事务回调
    triggerAfterCompletion(status, TransactionSynchronization.STATUS_ROLLED_BACK);
  }
  finally {
    // 清空TransactionSynchronizationManager中保存的状态
    // 触发事务回调对应的接口
    // 恢复挂起的事务
    cleanupAfterCompletion(status);
  }
}

DataSourceTransactionManager

这个类是AbstractPlatformTransactionManager的实现，使用javax.sql.DataSource获取连接的都可以是用这个类来管理事务。

几个关键template method的实现：

开始事务：

//org.springframework.jdbc.datasource.DataSourceTransactionManager#doGetTransaction
@Override
protected Object doGetTransaction() {
  DataSourceTransactionObject txObject = new DataSourceTransactionObject();
  txObject.setSavepointAllowed(isNestedTransactionAllowed());
  // 从ThreadLocal中获取
  ConnectionHolder conHolder =
    (ConnectionHolder) TransactionSynchronizationManager.getResource(this.dataSource);
  txObject.setConnectionHolder(conHolder, false);
  return txObject;
}

/**
 * This implementation sets the isolation level but ignores the timeout.
 */
@Override
protected void doBegin(Object transaction, TransactionDefinition definition) {
  DataSourceTransactionObject txObject = (DataSourceTransactionObject) transaction;
  Connection con = null;

  try {
    // 开启事务同步，但是没有ConnectionHolder，进行初始化
    if (txObject.getConnectionHolder() == null ||
        txObject.getConnectionHolder().isSynchronizedWithTransaction()) {
      // 这里从数据源拿连接
      Connection newCon = this.dataSource.getConnection();
      if (logger.isDebugEnabled()) {
        logger.debug("Acquired Connection [" + newCon + "] for JDBC transaction");
      }
      txObject.setConnectionHolder(new ConnectionHolder(newCon), true);
    }

    // 标记事务同步
    txObject.getConnectionHolder().setSynchronizedWithTransaction(true);
    con = txObject.getConnectionHolder().getConnection();
// 暂存之前的隔离级别
    Integer previousIsolationLevel = DataSourceUtils.prepareConnectionForTransaction(con, definition);
    txObject.setPreviousIsolationLevel(previousIsolationLevel);

    // Switch to manual commit if necessary. This is very expensive in some JDBC drivers,
    // so we don't want to do it unnecessarily (for example if we've explicitly
    // configured the connection pool to set it already).
    // 连接置为手动commit
    if (con.getAutoCommit()) {
      txObject.setMustRestoreAutoCommit(true);
      if (logger.isDebugEnabled()) {
        logger.debug("Switching JDBC Connection [" + con + "] to manual commit");
      }
      con.setAutoCommit(false);
    }
    // 标记事务开始
    txObject.getConnectionHolder().setTransactionActive(true);

    // 事务超时时间
    int timeout = determineTimeout(definition);
    if (timeout != TransactionDefinition.TIMEOUT_DEFAULT) {
      txObject.getConnectionHolder().setTimeoutInSeconds(timeout);
    }

    // Bind the session holder to the thread.
    // 新创建的连接，需要交给spring管理(ThreadLocal)
    // DataSource --> ConnectionHolder
    if (txObject.isNewConnectionHolder()) {
      TransactionSynchronizationManager.bindResource(getDataSource(), txObject.getConnectionHolder());
    }
  }

  catch (Throwable ex) {
    // 异常时释放连接
    if (txObject.isNewConnectionHolder()) {
      DataSourceUtils.releaseConnection(con, this.dataSource);
      txObject.setConnectionHolder(null, false);
    }
    throw new CannotCreateTransactionException("Could not open JDBC Connection for transaction", ex);
  }
}

提交事务：

// org.springframework.jdbc.datasource.DataSourceTransactionManager#doCommit
@Override
protected void doCommit(DefaultTransactionStatus status) {
  DataSourceTransactionObject txObject = (DataSourceTransactionObject) status.getTransaction();
  Connection con = txObject.getConnectionHolder().getConnection();
  if (status.isDebug()) {
    logger.debug("Committing JDBC transaction on Connection [" + con + "]");
  }
  try {
    con.commit();
  }
  catch (SQLException ex) {
    throw new TransactionSystemException("Could not commit JDBC transaction", ex);
  }
}

回滚事务：

@Override
protected void doRollback(DefaultTransactionStatus status) {
  DataSourceTransactionObject txObject = (DataSourceTransactionObject) status.getTransaction();
  Connection con = txObject.getConnectionHolder().getConnection();
  if (status.isDebug()) {
    logger.debug("Rolling back JDBC transaction on Connection [" + con + "]");
  }
  try {
    con.rollback();
  }
  catch (SQLException ex) {
    throw new TransactionSystemException("Could not roll back JDBC transaction", ex);
  }
}

清理工作：

// org.springframework.jdbc.datasource.DataSourceTransactionManager#doCleanupAfterCompletion
@Override
protected void doCleanupAfterCompletion(Object transaction) {
  DataSourceTransactionObject txObject = (DataSourceTransactionObject) transaction;

  // Remove the connection holder from the thread, if exposed.
  // 清理ThreadLocal里绑定的连接
  if (txObject.isNewConnectionHolder()) {
    TransactionSynchronizationManager.unbindResource(this.dataSource);
  }

  // Reset connection.
  // 把conn 恢复原样
  Connection con = txObject.getConnectionHolder().getConnection();
  try {
    if (txObject.isMustRestoreAutoCommit()) {
      con.setAutoCommit(true);
    }
    DataSourceUtils.resetConnectionAfterTransaction(con, txObject.getPreviousIsolationLevel());
  }
  catch (Throwable ex) {
    logger.debug("Could not reset JDBC Connection after transaction", ex);
  }

  if (txObject.isNewConnectionHolder()) {
    if (logger.isDebugEnabled()) {
      logger.debug("Releasing JDBC Connection [" + con + "] after transaction");
    }
    // 减少Conn计数
    DataSourceUtils.releaseConnection(con, this.dataSource);
  }

  // 清空holder
  txObject.getConnectionHolder().clear();
}

TransactionalEventListener

/**
 * @author 代故
 * @date 2021/3/22 2:45 PM
 */
@Service
@Slf4j
public class TransactionalEventListenerTest {

    @TransactionalEventListener(phase = TransactionPhase.AFTER_COMMIT)
    public void listen(ApplicationEvent event) {
        log.info("recieved after commit event {}", event);
    }

    @TransactionalEventListener(phase = TransactionPhase.AFTER_ROLLBACK)
    public void listenRollback(ApplicationEvent event) {
        log.info("recieved after rollback event {}", event);
    }
}


/**
 * @author 代故
 * @date 2021/3/22 2:58 PM
 */
public class MyTransactionEvent extends ApplicationEvent {
    /**
     * Create a new ApplicationEvent.
     *
     * @param source the object on which the event initially occurred (never {@code null})
     */
    public MyTransactionEvent(Object source) {
        super(source);
    }
}


@Transactional(transactionManager = "proxyDataSourceTransactionManager", readOnly = true)
public void query() {
  final MindEntity mindEntity = minderDao.selectOne(1);
  log.info("mindEntity = {}", mindEntity);
  final List wordEntities = wordDao.selectAll(new RowBounds(0, 20));
  log.info("wordEntities = {}", wordEntities);
  transactionService.query0();
  applicationContext.publishEvent(new MyTransactionEvent(transactionService));
}

可以监听事务的不同阶段的信息

TransactionSynchronizationManager

是主要的ThreadLocal管理类，用来做事务的同步。Mybatis的SqlSession，Jdbc的Connection都会通过这个类来和ThreadLocal交互。

主要属性：

public abstract class TransactionSynchronizationManager {

private static final Log logger = LogFactory.getLog(TransactionSynchronizationManager.class);

  // 资源管理
  // datasource -> connection
  // sqlsessionFactory -> sqlSession
private static final ThreadLocal resources =
new NamedThreadLocal("Transactional resources");

  // 事务回调，当前事务状态发生变化时会收到通知，可以做一些清理的工作
private static final ThreadLocal> synchronizations =
new NamedThreadLocal>("Transaction synchronizations");

 // 当前事务的名称
private static final ThreadLocal currentTransactionName =
new NamedThreadLocal("Current transaction name");

  // read-only 状态
private static final ThreadLocal currentTransactionReadOnly =
new NamedThreadLocal("Current transaction read-only status");

  // 隔离级别
private static final ThreadLocal currentTransactionIsolationLevel =
new NamedThreadLocal("Current transaction isolation level");

  // 事务是否active
private static final ThreadLocal actualTransactionActive =
new NamedThreadLocal("Actual transaction active");
  
}

资源的管理的主要接口：

// org.springframework.transaction.support.TransactionSynchronizationManager#bindResource
/**
 * Bind the given resource for the given key to the current thread.
 * @param key the key to bind the value to (usually the resource factory)
 * @param value the value to bind (usually the active resource object)
 * @throws IllegalStateException if there is already a value bound to the thread
 * @see ResourceTransactionManager#getResourceFactory()
 */
public static void bindResource(Object key, Object value) throws IllegalStateException {
  Object actualKey = TransactionSynchronizationUtils.unwrapResourceIfNecessary(key);
  Assert.notNull(value, "Value must not be null");
  Map map = resources.get();
  // set ThreadLocal Map if none found
  if (map == null) {
    map = new HashMap();
    resources.set(map);
  }
  Object oldValue = map.put(actualKey, value);
  // Transparently suppress a ResourceHolder that was marked as void...
  if (oldValue instanceof ResourceHolder && ((ResourceHolder) oldValue).isVoid()) {
    oldValue = null;
  }
  // 事务挂起的时候要清理绑定的资源，不然开启新事务时，同一个dataSource会抛异常
  if (oldValue != null) {
    throw new IllegalStateException("Already value [" + oldValue + "] for key [" +
                                    actualKey + "] bound to thread [" + Thread.currentThread().getName() + "]");
  }
  if (logger.isTraceEnabled()) {
    logger.trace("Bound value [" + value + "] for key [" + actualKey + "] to thread [" +
                 Thread.currentThread().getName() + "]");
  }
}


/**
 * Unbind a resource for the given key from the current thread.
 * @param key the key to unbind (usually the resource factory)
 * @return the previously bound value (usually the active resource object)
 * @throws IllegalStateException if there is no value bound to the thread
 * @see ResourceTransactionManager#getResourceFactory()
 */
public static Object unbindResource(Object key) throws IllegalStateException {
  Object actualKey = TransactionSynchronizationUtils.unwrapResourceIfNecessary(key);
  Object value = doUnbindResource(actualKey);
  if (value == null) {
    throw new IllegalStateException(
      "No value for key [" + actualKey + "] bound to thread [" + Thread.currentThread().getName() + "]");
  }
  return value;
}

ResourceHolderSupport

这个提供了引用计数的功能：

/**
 * Increase the reference count by one because the holder has been requested
 * (i.e. someone requested the resource held by it).
 */
public void requested() {
  this.referenceCount++;
}

/**
 * Decrease the reference count by one because the holder has been released
 * (i.e. someone released the resource held by it).
 */
public void released() {
  this.referenceCount--;
}

/**
 * Return whether there are still open references to this holder.
 */
public boolean isOpen() {
  return (this.referenceCount > 0);
}

Spring在处理这些带状态的类SqlSession、Connection都做了ThreadLocal的绑定。

在事务的场景下，需要复用同一个连接，spring存到ThreadLocal里的就是ResourceHolderSupport的子类，每次请求计数就加一

// org.springframework.jdbc.datasource.DataSourceUtils#doGetConnection
if (TransactionSynchronizationManager.isSynchronizationActive()) {
  logger.debug("Registering transaction synchronization for JDBC Connection");
  // Use same Connection for further JDBC actions within the transaction.
  // Thread-bound object will get removed by synchronization at transaction completion.
  ConnectionHolder holderToUse = conHolder;
  if (holderToUse == null) {
    holderToUse = new ConnectionHolder(con);
  }
  else {
    holderToUse.setConnection(con);
  }
  // 计数加一
  holderToUse.requested();
  TransactionSynchronizationManager.registerSynchronization(
    new ConnectionSynchronization(holderToUse, dataSource));
  holderToUse.setSynchronizedWithTransaction(true);
  if (holderToUse != conHolder) {
    TransactionSynchronizationManager.bindResource(dataSource, holderToUse);
  }
}

当计数减到0的时候，可以认为这个连接已经没有人用，可以回收。有点类似java中的垃圾回收算法。

mybatis源码解析（三）—— Spring集成

2021-03-23T00:08:22.000Z

使用

依赖包地址：

<dependency>
  <groupId>org.mybatisgroupId>
  <artifactId>mybatis-springartifactId>
  <version>1.3.0version>
dependency>

使用配置：

 
<bean id="mindSqlSessionFactory" class="org.mybatis.spring.SqlSessionFactoryBean">
  <property name="dataSource" ref="mindDataSource"/>
  <property name="configLocation" value="classpath:mybatis-config.xml"/>
  <property name="mapperLocations" value="classpath*:mapper/minder/*Mapper.xml"/>
  <property name="plugins">
    <array>
      <ref bean="sqlInterceptor"/>
    array>
  property>
bean>


<bean class="org.mybatis.spring.mapper.MapperScannerConfigurer">
  <property name="sqlSessionFactoryBeanName" value="mindSqlSessionFactory" />
  <property name="basePackage" value="com.air.persistence.minder.dao"/>
bean>

接口类也要交给spring管理：

@Repository
public interface MinderDao {

    MindEntity selectOne(@Param("id") int id);
}

经过上面的配置，MinderDao就交给spring容器管理了，使用的时候直接注入就行：

@Service
@Slf4j
public class TransactionService {

    @Resource
    private MinderDao minderDao;
  
    public MindEntity query() {
        final MindEntity mindEntity = minderDao.selectOne(1);
        return mindEntity;
    }

源码分析

SqlSessionFactoryBean

FactoryBean that creates an MyBatis SqlSessionFactory.
This is the usual way to set up a shared MyBatis {@code SqlSessionFactory} in a Spring > >application context;
the SqlSessionFactory can then be passed to MyBatis-based DAOs via dependency injection.

/**
   * {@inheritDoc}
   */
@Override
public SqlSessionFactory getObject() throws Exception {
  if (this.sqlSessionFactory == null) {
    afterPropertiesSet();
  }

  return this.sqlSessionFactory;
}

/**
   * {@inheritDoc}
   */
@Override
public void afterPropertiesSet() throws Exception {
  notNull(dataSource, "Property 'dataSource' is required");
  notNull(sqlSessionFactoryBuilder, "Property 'sqlSessionFactoryBuilder' is required");
  state((configuration == null && configLocation == null) || !(configuration != null && configLocation != null),
        "Property 'configuration' and 'configLocation' can not specified with together");

  this.sqlSessionFactory = buildSqlSessionFactory();
}

// 配置Configuration并且生成SQLSessionFactory
protected SqlSessionFactory buildSqlSessionFactory() throws IOException {

  Configuration configuration;

  XMLConfigBuilder xmlConfigBuilder = null;
  if (this.configuration != null) {
    // 指定了Configuration
    configuration = this.configuration;
    if (configuration.getVariables() == null) {
      configuration.setVariables(this.configurationProperties);
    } else if (this.configurationProperties != null) {
      configuration.getVariables().putAll(this.configurationProperties);
    }
  } else if (this.configLocation != null) {
    // 指定单个xml文件的位置
    xmlConfigBuilder = new XMLConfigBuilder(this.configLocation.getInputStream(), null, this.configurationProperties);
    configuration = xmlConfigBuilder.getConfiguration();
  } else {
    if (LOGGER.isDebugEnabled()) {
      LOGGER.debug("Property `configuration` or 'configLocation' not specified, using default MyBatis Configuration");
    }
    configuration = new Configuration();
    configuration.setVariables(this.configurationProperties);
  }

  if (this.objectFactory != null) {
    configuration.setObjectFactory(this.objectFactory);
  }

  if (this.objectWrapperFactory != null) {
    configuration.setObjectWrapperFactory(this.objectWrapperFactory);
  }

  if (this.vfs != null) {
    configuration.setVfsImpl(this.vfs);
  }

  if (hasLength(this.typeAliasesPackage)) {
    String[] typeAliasPackageArray = tokenizeToStringArray(this.typeAliasesPackage,
                                                           ConfigurableApplicationContext.CONFIG_LOCATION_DELIMITERS);
    for (String packageToScan : typeAliasPackageArray) {
      configuration.getTypeAliasRegistry().registerAliases(packageToScan,
                                                           typeAliasesSuperType == null ? Object.class : typeAliasesSuperType);
      if (LOGGER.isDebugEnabled()) {
        LOGGER.debug("Scanned package: '" + packageToScan + "' for aliases");
      }
    }
  }

  if (!isEmpty(this.typeAliases)) {
    for (Class typeAlias : this.typeAliases) {
      configuration.getTypeAliasRegistry().registerAlias(typeAlias);
      if (LOGGER.isDebugEnabled()) {
        LOGGER.debug("Registered type alias: '" + typeAlias + "'");
      }
    }
  }

  // 加载Mybatis的插件
  if (!isEmpty(this.plugins)) {
    for (Interceptor plugin : this.plugins) {
      configuration.addInterceptor(plugin);
      if (LOGGER.isDebugEnabled()) {
        LOGGER.debug("Registered plugin: '" + plugin + "'");
      }
    }
  }

  // TypeHandler
  if (hasLength(this.typeHandlersPackage)) {
    String[] typeHandlersPackageArray = tokenizeToStringArray(this.typeHandlersPackage,
                                                              ConfigurableApplicationContext.CONFIG_LOCATION_DELIMITERS);
    for (String packageToScan : typeHandlersPackageArray) {
      configuration.getTypeHandlerRegistry().register(packageToScan);
      if (LOGGER.isDebugEnabled()) {
        LOGGER.debug("Scanned package: '" + packageToScan + "' for type handlers");
      }
    }
  }

  if (!isEmpty(this.typeHandlers)) {
    for (TypeHandler typeHandler : this.typeHandlers) {
      configuration.getTypeHandlerRegistry().register(typeHandler);
      if (LOGGER.isDebugEnabled()) {
        LOGGER.debug("Registered type handler: '" + typeHandler + "'");
      }
    }
  }

  if (this.databaseIdProvider != null) {//fix #64 set databaseId before parse mapper xmls
    try {
      configuration.setDatabaseId(this.databaseIdProvider.getDatabaseId(this.dataSource));
    } catch (SQLException e) {
      throw new NestedIOException("Failed getting a databaseId", e);
    }
  }

  if (this.cache != null) {
    configuration.addCache(this.cache);
  }

  if (xmlConfigBuilder != null) {
    try {
      xmlConfigBuilder.parse();

      if (LOGGER.isDebugEnabled()) {
        LOGGER.debug("Parsed configuration file: '" + this.configLocation + "'");
      }
    } catch (Exception ex) {
      throw new NestedIOException("Failed to parse config resource: " + this.configLocation, ex);
    } finally {
      ErrorContext.instance().reset();
    }
  }

  // Transaction
  if (this.transactionFactory == null) {
    // 注意默认用的是SpringManagedTransactionFactory
    this.transactionFactory = new SpringManagedTransactionFactory();
  }

  configuration.setEnvironment(new Environment(this.environment, this.transactionFactory, this.dataSource));

  // mapper的位置
  if (!isEmpty(this.mapperLocations)) {
    for (Resource mapperLocation : this.mapperLocations) {
      if (mapperLocation == null) {
        continue;
      }

      try {
        XMLMapperBuilder xmlMapperBuilder = new XMLMapperBuilder(mapperLocation.getInputStream(),
                                                                 configuration, mapperLocation.toString(), configuration.getSqlFragments());
        xmlMapperBuilder.parse();
      } catch (Exception e) {
        throw new NestedIOException("Failed to parse mapping resource: '" + mapperLocation + "'", e);
      } finally {
        ErrorContext.instance().reset();
      }

      if (LOGGER.isDebugEnabled()) {
        LOGGER.debug("Parsed mapper file: '" + mapperLocation + "'");
      }
    }
  } else {
    if (LOGGER.isDebugEnabled()) {
      LOGGER.debug("Property 'mapperLocations' was not specified or no matching resources found");
    }
  }

  // 生成SqlSessionFactory
  return this.sqlSessionFactoryBuilder.build(configuration);
}

MapperScannerConfigurer

BeanDefinitionRegistryPostProcessor that searches recursively starting from a base package for
interfaces and registers them as MapperFactoryBean . Note that only interfaces with at
least one method will be registered; concrete classes will be ignored.

这个类的作用是，扫描配置的接口，生成代理的Dao对象，这里扫描完之后，注册的是MapperFactoryBean。对象生成过程如下：

// org.mybatis.spring.mapper.MapperFactoryBean#getObject
/**
   * {@inheritDoc}
   */
public class MapperFactoryBean<T> extends SqlSessionDaoSupport implements FactoryBean<T> {
  @Override
  public T getObject() throws Exception {
    // 通过sqlSession的getMapper方法生成代理类对象MapperProxy，Dao对象就跟这个sqlSession关联了
    return getSqlSession().getMapper(this.mapperInterface);
  }
  
  public void setSqlSessionFactory(SqlSessionFactory sqlSessionFactory) {
    if (!this.externalSqlSession) {
      // 这里用的是SqlSessionTemplate，他也实现了SqlSession相关的接口
      this.sqlSession = new SqlSessionTemplate(sqlSessionFactory);
    }
  }
}

简单看下扫描过程：

public class MapperScannerConfigurer implements BeanDefinitionRegistryPostProcessor, InitializingBean, ApplicationContextAware, BeanNameAware {
  
  @Override
  public void postProcessBeanDefinitionRegistry(BeanDefinitionRegistry registry) {
    if (this.processPropertyPlaceHolders) {
      processPropertyPlaceHolders();
    }

    ClassPathMapperScanner scanner = new ClassPathMapperScanner(registry);
    scanner.setAddToConfig(this.addToConfig);
    scanner.setAnnotationClass(this.annotationClass);
    scanner.setMarkerInterface(this.markerInterface);
    scanner.setSqlSessionFactory(this.sqlSessionFactory);
    scanner.setSqlSessionTemplate(this.sqlSessionTemplate);
    scanner.setSqlSessionFactoryBeanName(this.sqlSessionFactoryBeanName);
    scanner.setSqlSessionTemplateBeanName(this.sqlSessionTemplateBeanName);
    scanner.setResourceLoader(this.applicationContext);
    scanner.setBeanNameGenerator(this.nameGenerator);
    scanner.registerFilters();
    // 开始从指定的位置扫描类
    scanner.scan(StringUtils.tokenizeToStringArray(this.basePackage, ConfigurableApplicationContext.CONFIG_LOCATION_DELIMITERS));
  }
}

// org.mybatis.spring.mapper.ClassPathMapperScanner#processBeanDefinitions
// 注册bean过程：
private void processBeanDefinitions(Set beanDefinitions) {
  GenericBeanDefinition definition;
  for (BeanDefinitionHolder holder : beanDefinitions) {
    definition = (GenericBeanDefinition) holder.getBeanDefinition();

    if (logger.isDebugEnabled()) {
      logger.debug("Creating MapperFactoryBean with name '" + holder.getBeanName() 
                   + "' and '" + definition.getBeanClassName() + "' mapperInterface");
    }

    // the mapper interface is the original class of the bean
    // but, the actual class of the bean is MapperFactoryBean
    // 注册的是MapperFactoryBean， 构造参数第一个字段是接口类型
    definition.getConstructorArgumentValues().addGenericArgumentValue(definition.getBeanClassName()); // issue #59
    definition.setBeanClass(this.mapperFactoryBean.getClass());

    definition.getPropertyValues().add("addToConfig", this.addToConfig);

    // 添加sqlSessionFactory属性
    boolean explicitFactoryUsed = false;
    if (StringUtils.hasText(this.sqlSessionFactoryBeanName)) {
      definition.getPropertyValues().add("sqlSessionFactory", new RuntimeBeanReference(this.sqlSessionFactoryBeanName));
      explicitFactoryUsed = true;
    } else if (this.sqlSessionFactory != null) {
      definition.getPropertyValues().add("sqlSessionFactory", this.sqlSessionFactory);
      explicitFactoryUsed = true;
    }

    // 添加sqlSessionTemplate属性
    if (StringUtils.hasText(this.sqlSessionTemplateBeanName)) {
      if (explicitFactoryUsed) {
        logger.warn("Cannot use both: sqlSessionTemplate and sqlSessionFactory together. sqlSessionFactory is ignored.");
      }
      definition.getPropertyValues().add("sqlSessionTemplate", new RuntimeBeanReference(this.sqlSessionTemplateBeanName));
      explicitFactoryUsed = true;
    } else if (this.sqlSessionTemplate != null) {
      if (explicitFactoryUsed) {
        logger.warn("Cannot use both: sqlSessionTemplate and sqlSessionFactory together. sqlSessionFactory is ignored.");
      }
      definition.getPropertyValues().add("sqlSessionTemplate", this.sqlSessionTemplate);
      explicitFactoryUsed = true;
    }

    if (!explicitFactoryUsed) {
      if (logger.isDebugEnabled()) {
        logger.debug("Enabling autowire by type for MapperFactoryBean with name '" + holder.getBeanName() + "'.");
      }
      definition.setAutowireMode(AbstractBeanDefinition.AUTOWIRE_BY_TYPE);
    }
  }
}

SqlSessionTemplate

从上面的代码分析可以看出来，拿到的代理类，底层的SqlSession实际上是SqlSessionTemplate，SqlSessionTemplate内部采用了JDK的代理，将实际的请求代理给了SqlSessionInterceptor：

// org.mybatis.spring.SqlSessionTemplate
public class SqlSessionTemplate implements SqlSession, DisposableBean {

  private final SqlSessionFactory sqlSessionFactory;

  private final ExecutorType executorType;

  private final SqlSession sqlSessionProxy;

  private final PersistenceExceptionTranslator exceptionTranslator;
  
  /**
   * Constructs a Spring managed {@code SqlSession} with the given
   * {@code SqlSessionFactory} and {@code ExecutorType}.
   * A custom {@code SQLExceptionTranslator} can be provided as an
   * argument so any {@code PersistenceException} thrown by MyBatis
   * can be custom translated to a {@code RuntimeException}
   * The {@code SQLExceptionTranslator} can also be null and thus no
   * exception translation will be done and MyBatis exceptions will be
   * thrown
   *
   * @param sqlSessionFactory
   * @param executorType
   * @param exceptionTranslator
   */
  public SqlSessionTemplate(SqlSessionFactory sqlSessionFactory, ExecutorType executorType,
      PersistenceExceptionTranslator exceptionTranslator) {

    notNull(sqlSessionFactory, "Property 'sqlSessionFactory' is required");
    notNull(executorType, "Property 'executorType' is required");

    this.sqlSessionFactory = sqlSessionFactory;
    this.executorType = executorType;
    this.exceptionTranslator = exceptionTranslator;
    // 这里生成了JDK的代理类，实际调用请求被转发给了SqlSessionInterceptor
    this.sqlSessionProxy = (SqlSession) newProxyInstance(
        SqlSessionFactory.class.getClassLoader(),
        new Class[] { SqlSession.class },
        new SqlSessionInterceptor());
  }
  
  
    /**
   * {@inheritDoc}
   */
  @Override
  public  T selectOne(String statement) {
    // 接口的相关方法都转发给代理类
    return this.sqlSessionProxy. selectOne(statement);
  }
}

// org.mybatis.spring.SqlSessionTemplate.SqlSessionInterceptor
/**
   * Proxy needed to route MyBatis method calls to the proper SqlSession got
   * from Spring's Transaction Manager
   * It also unwraps exceptions thrown by {@code Method#invoke(Object, Object...)} to
   * pass a {@code PersistenceException} to the {@code PersistenceExceptionTranslator}.
   */
private class SqlSessionInterceptor implements InvocationHandler {
  @Override
  public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
    // 这一步是关键，获取sqlSession
    SqlSession sqlSession = getSqlSession(
      SqlSessionTemplate.this.sqlSessionFactory,
      SqlSessionTemplate.this.executorType,
      SqlSessionTemplate.this.exceptionTranslator);
    try {
      // 反射调用delegate对应的方法
      Object result = method.invoke(sqlSession, args);
      if (!isSqlSessionTransactional(sqlSession, SqlSessionTemplate.this.sqlSessionFactory)) {
        // force commit even on non-dirty sessions because some databases require
        // a commit/rollback before calling close()
        // 非spring管理的，强制commit
        sqlSession.commit(true);
      }
      return result;
    } catch (Throwable t) {
      Throwable unwrapped = unwrapThrowable(t);
      if (SqlSessionTemplate.this.exceptionTranslator != null && unwrapped instanceof PersistenceException) {
        // release the connection to avoid a deadlock if the translator is no loaded. See issue #22
    // 异常时要确保sqlSession被关闭
        closeSqlSession(sqlSession, SqlSessionTemplate.this.sqlSessionFactory);
        sqlSession = null;
        // 翻译成spring定义的标准异常
        Throwable translated = SqlSessionTemplate.this.exceptionTranslator.translateExceptionIfPossible((PersistenceException) unwrapped);
        if (translated != null) {
          unwrapped = translated;
        }
      }
      throw unwrapped;
    } finally {
      if (sqlSession != null) {
        closeSqlSession(sqlSession, SqlSessionTemplate.this.sqlSessionFactory);
      }
    }
  }
}

SqlSessionUtils

SqlSessionUtils是mybatis提供的工具类，处理了SqlSession的ThreadLocal绑定和事务结束后的释放：

// org.mybatis.spring.SqlSessionUtils#getSqlSession(org.apache.ibatis.session.SqlSessionFactory, org.apache.ibatis.session.ExecutorType, org.springframework.dao.support.PersistenceExceptionTranslator)
public static SqlSession getSqlSession(SqlSessionFactory sessionFactory, ExecutorType executorType, PersistenceExceptionTranslator exceptionTranslator) {

  notNull(sessionFactory, NO_SQL_SESSION_FACTORY_SPECIFIED);
  notNull(executorType, NO_EXECUTOR_TYPE_SPECIFIED);

  SqlSessionHolder holder = (SqlSessionHolder) TransactionSynchronizationManager.getResource(sessionFactory);

  SqlSession session = sessionHolder(executorType, holder);
  if (session != null) {
    return session;
  }

  if (LOGGER.isDebugEnabled()) {
    LOGGER.debug("Creating a new SqlSession");
  }

  // ThreadLocal中为空，就open一个新的session
  session = sessionFactory.openSession(executorType);
// 将新open的session，交给spring来管理
  // 会通过TransactionSynchronizationManager，绑定到当前线程的ThreadLocal
  registerSessionHolder(sessionFactory, executorType, exceptionTranslator, session);

  return session;
}

// org.mybatis.spring.SqlSessionUtils#closeSqlSession
/**
   * Checks if {@code SqlSession} passed as an argument is managed by Spring {@code TransactionSynchronizationManager}
   * If it is not, it closes it, otherwise it just updates the reference counter and
   * lets Spring call the close callback when the managed transaction ends
   *
   * @param session
   * @param sessionFactory
   */
public static void closeSqlSession(SqlSession session, SqlSessionFactory sessionFactory) {
  notNull(session, NO_SQL_SESSION_SPECIFIED);
  notNull(sessionFactory, NO_SQL_SESSION_FACTORY_SPECIFIED);

  // ThreadLocal中取
  SqlSessionHolder holder = (SqlSessionHolder) TransactionSynchronizationManager.getResource(sessionFactory);
  if ((holder != null) && (holder.getSqlSession() == session)) {
    // spring管理的
    if (LOGGER.isDebugEnabled()) {
      LOGGER.debug("Releasing transactional SqlSession [" + session + "]");
    }
    // 只用减少引用计数(referenceCount--)就行了，其他的交给spring来做，
    // 在getSqlSession的时候注册了SqlSessionSynchronization，在事务完成的时候，会负责做关闭的工作
    holder.released();
  } else {
    // 非spring管理的
    if (LOGGER.isDebugEnabled()) {
      LOGGER.debug("Closing non transactional SqlSession [" + session + "]");
    }
    // 需要手动关闭
    session.close();
  }
}

线程安全问题

SqlSession不是线程安全的，多线程环境下必然有竞争问题。众所周知，spring是通过给每个Thread做绑定来解决竞争问题的。
SqlSessionTemplate中的sqlSession实际上是个代理对象，他是没有状态的，每次执行的时候，再通过工具类类创建或者复用ThreadLocal中的session，从而避免了多线程的问题。

SqlSessionSynchronization

事务状态变化的时候，这个回调会得到通知，在通知里做了一些清理的工作：

  /**
   * Callback for cleaning up resources. It cleans TransactionSynchronizationManager and
   * also commits and closes the {@code SqlSession}.
   * It assumes that {@code Connection} life cycle will be managed by
   * {@code DataSourceTransactionManager} or {@code JtaTransactionManager}
   */
  private static final class SqlSessionSynchronization extends TransactionSynchronizationAdapter {
    
    /**
     * {@inheritDoc}
     */
    @Override
    public void suspend() {
      if (this.holderActive) {
        if (LOGGER.isDebugEnabled()) {
          LOGGER.debug("Transaction synchronization suspending SqlSession [" + this.holder.getSqlSession() + "]");
        }
        // 清空ThreadLocal，为新事务做准备
        TransactionSynchronizationManager.unbindResource(this.sessionFactory);
      }
    }

    /**
     * {@inheritDoc}
     */
    @Override
    public void resume() {
      if (this.holderActive) {
        if (LOGGER.isDebugEnabled()) {
          LOGGER.debug("Transaction synchronization resuming SqlSession [" + this.holder.getSqlSession() + "]");
        }
        TransactionSynchronizationManager.bindResource(this.sessionFactory, this.holder);
      }
    }

    /**
     * {@inheritDoc}
     */
    @Override
    public void beforeCommit(boolean readOnly) {
      // Connection commit or rollback will be handled by ConnectionSynchronization or
      // DataSourceTransactionManager.
      // But, do cleanup the SqlSession / Executor, including flushing BATCH statements so
      // they are actually executed.
      // SpringManagedTransaction will no-op the commit over the jdbc connection
      // TODO This updates 2nd level caches but the tx may be rolledback later on! 
      if (TransactionSynchronizationManager.isActualTransactionActive()) {
        try {
          if (LOGGER.isDebugEnabled()) {
            LOGGER.debug("Transaction synchronization committing SqlSession [" + this.holder.getSqlSession() + "]");
          }
          this.holder.getSqlSession().commit();
        } catch (PersistenceException p) {
          if (this.holder.getPersistenceExceptionTranslator() != null) {
            DataAccessException translated = this.holder
              .getPersistenceExceptionTranslator()
              .translateExceptionIfPossible(p);
            if (translated != null) {
              throw translated;
            }
          }
          throw p;
        }
      }
    }

    /**
     * {@inheritDoc}
     */
    @Override
    public void beforeCompletion() {
      // Issue #18 Close SqlSession and deregister it now
      // because afterCompletion may be called from a different thread
      if (!this.holder.isOpen()) {
        if (LOGGER.isDebugEnabled()) {
          LOGGER.debug("Transaction synchronization deregistering SqlSession [" + this.holder.getSqlSession() + "]");
        }
        TransactionSynchronizationManager.unbindResource(sessionFactory);
        this.holderActive = false;
        if (LOGGER.isDebugEnabled()) {
          LOGGER.debug("Transaction synchronization closing SqlSession [" + this.holder.getSqlSession() + "]");
        }
        this.holder.getSqlSession().close();
      }
    }

    /**
     * {@inheritDoc}
     */
    @Override
    public void afterCompletion(int status) {
      if (this.holderActive) {
        // afterCompletion may have been called from a different thread
        // so avoid failing if there is nothing in this one
        if (LOGGER.isDebugEnabled()) {
          LOGGER.debug("Transaction synchronization deregistering SqlSession [" + this.holder.getSqlSession() + "]");
        }
        TransactionSynchronizationManager.unbindResourceIfPossible(sessionFactory);
        this.holderActive = false;
        if (LOGGER.isDebugEnabled()) {
          LOGGER.debug("Transaction synchronization closing SqlSession [" + this.holder.getSqlSession() + "]");
        }
        this.holder.getSqlSession().close();
      }
      this.holder.reset();
    }
  }
}

SpringManagedTransaction

SpringManagedTransaction handles the lifecycle of a JDBC connection.
It retrieves a connection from Spring’s transaction manager and returns it back to it
when it is no longer needed.
If Spring’s transaction handling is active it will no-op all commit/rollback/close calls
assuming that the Spring transaction manager will do the job.
If it is not it will behave like JdbcTransaction.

这个Transaction也是mybatis为了适配spring的体系定制的，获取连接和释放连接都委托给了spring提供的工具类DataSourceUtil

public class SpringManagedTransaction implements Transaction {
    /**
   * {@inheritDoc}
   */
  @Override
  public Connection getConnection() throws SQLException {
    if (this.connection == null) {
      openConnection();
    }
    return this.connection;
  }
  
  /**
   * Gets a connection from Spring transaction manager and discovers if this
   * {@code Transaction} should manage connection or let it to Spring.
   * 
   * It also reads autocommit setting because when using Spring Transaction MyBatis
   * thinks that autocommit is always false and will always call commit/rollback
   * so we need to no-op that calls.
   */
  private void openConnection() throws SQLException {
    // 委托给了spring的工具类来获取连接
    this.connection = DataSourceUtils.getConnection(this.dataSource);
    this.autoCommit = this.connection.getAutoCommit();
    this.isConnectionTransactional = DataSourceUtils.isConnectionTransactional(this.connection, this.dataSource);

    if (LOGGER.isDebugEnabled()) {
      LOGGER.debug(
          "JDBC Connection ["
              + this.connection
              + "] will"
              + (this.isConnectionTransactional ? " " : " not ")
              + "be managed by Spring");
    }
  }
  
    /**
   * {@inheritDoc}
   */
  @Override
  public void close() throws SQLException {
    // 委托给了spring的工具类类关闭连接
    DataSourceUtils.releaseConnection(this.connection, this.dataSource);
  }
  
}

除了连接的获取和释放，这个类的commit和rollback也做了特殊处理：

/**
   * {@inheritDoc}
   */
@Override
public void commit() throws SQLException {
  // 事务连接，这里直接跳过了commit，最终的commit由spring框架处理
  if (this.connection != null && !this.isConnectionTransactional && !this.autoCommit) {
    if (LOGGER.isDebugEnabled()) {
      LOGGER.debug("Committing JDBC Connection [" + this.connection + "]");
    }
    this.connection.commit();
  }
}

/**
   * {@inheritDoc}
   */
@Override
public void rollback() throws SQLException {
   // 事务连接，这里直接跳过了rollback，最终的rollback由spring框架处理
  if (this.connection != null && !this.isConnectionTransactional && !this.autoCommit) {
    if (LOGGER.isDebugEnabled()) {
      LOGGER.debug("Rolling back JDBC Connection [" + this.connection + "]");
    }
    this.connection.rollback();
  }
}

DataSourceUtil

DataSourceUtils是spring提供的工具类，主要是加了一层ThreadLocal缓存的管理：

// org.springframework.jdbc.datasource.DataSourceUtils#getConnection
public static Connection getConnection(DataSource dataSource) throws CannotGetJdbcConnectionException {
  try {
    return doGetConnection(dataSource);
  }
  catch (SQLException ex) {
    throw new CannotGetJdbcConnectionException("Could not get JDBC Connection", ex);
  }
}

/**
 * Actually obtain a JDBC Connection from the given DataSource.
 * Same as {@link #getConnection}, but throwing the original SQLException.
 * Is aware of a corresponding Connection bound to the current thread, for example
 * when using {@link DataSourceTransactionManager}. Will bind a Connection to the thread
 * if transaction synchronization is active (e.g. if in a JTA transaction).
 * 
Directly accessed by {@link TransactionAwareDataSourceProxy}.
 * @param dataSource the DataSource to obtain Connections from
 * @return a JDBC Connection from the given DataSource
 * @throws SQLException if thrown by JDBC methods
 * @see #doReleaseConnection
 */
public static Connection doGetConnection(DataSource dataSource) throws SQLException {
Assert.notNull(dataSource, "No DataSource specified");

    // 从ThreadLocal中去获取ConnectionHolder
ConnectionHolder conHolder = (ConnectionHolder) TransactionSynchronizationManager.getResource(dataSource);
if (conHolder != null && (conHolder.hasConnection() || conHolder.isSynchronizedWithTransaction())) {
// 引用计数+1
      conHolder.requested();
if (!conHolder.hasConnection()) {
logger.debug("Fetching resumed JDBC Connection from DataSource");
conHolder.setConnection(dataSource.getConnection());
}
return conHolder.getConnection();
}
// Else we either got no holder or an empty thread-bound holder here.
// 获取连接
logger.debug("Fetching JDBC Connection from DataSource");
Connection con = dataSource.getConnection();
// 事务的场景下，才会复用连接
if (TransactionSynchronizationManager.isSynchronizationActive()) {
      // 第一次进来的时候ThreadLocal里是没有ConnectionHolder的，这里需要获取连接，然后注册给spring管理
logger.debug("Registering transaction synchronization for JDBC Connection");
// Use same Connection for further JDBC actions within the transaction.
// Thread-bound object will get removed by synchronization at transaction completion.
ConnectionHolder holderToUse = conHolder;
if (holderToUse == null) {
holderToUse = new ConnectionHolder(con);
}
else {
holderToUse.setConnection(con);
}
      // 引用计数+1
      // referenceCount++
holderToUse.requested();
      // 注册回调处理函数，事务状态发生变化时，会处理connHolder
TransactionSynchronizationManager.registerSynchronization(
new ConnectionSynchronization(holderToUse, dataSource));
holderToUse.setSynchronizedWithTransaction(true);
if (holderToUse != conHolder) {
        // 绑定到ThreadLocal
TransactionSynchronizationManager.bindResource(dataSource, holderToUse);
}
}

return con;
}

// org.springframework.jdbc.datasource.DataSourceUtils#releaseConnection
/**
 * Close the given Connection, obtained from the given DataSource,
 * if it is not managed externally (that is, not bound to the thread).
 * @param con the Connection to close if necessary
 * (if this is {@code null}, the call will be ignored)
 * @param dataSource the DataSource that the Connection was obtained from
 * (may be {@code null})
 * @see #getConnection
 */
public static void releaseConnection(Connection con, DataSource dataSource) {
  try {
    doReleaseConnection(con, dataSource);
  }
  catch (SQLException ex) {
    logger.debug("Could not close JDBC Connection", ex);
  }
  catch (Throwable ex) {
    logger.debug("Unexpected exception on closing JDBC Connection", ex);
  }
}


public static void doReleaseConnection(Connection con, DataSource dataSource) throws SQLException {
  if (con == null) {
    return;
  }
  if (dataSource != null) {
    ConnectionHolder conHolder = (ConnectionHolder) TransactionSynchronizationManager.getResource(dataSource);
    if (conHolder != null && connectionEquals(conHolder, con)) {
      // It's the transactional Connection: Don't close it.
      // 事务连接，这里只是减少计数，实际的释放在事务完成之后，在ConnectionSynchronization中处理的
      // referenceCount--
      conHolder.released();
      return;
    }
  }
  logger.debug("Returning JDBC Connection to DataSource");
  // 非事务连接，直接关闭
  doCloseConnection(con, dataSource);
}

ConnectionSynchronization

最终实现了TransactionSynchronization接口，AbstractPlatformTransactionManager接口会在当前事务状态发生变化（比如挂起，完成等）通知TransactionSynchronization。对于事务连接的关闭就是在ConnectionSynchronization接口中

// org.springframework.jdbc.datasource.DataSourceUtils.ConnectionSynchronization
/**
 * Callback for resource cleanup at the end of a non-native JDBC transaction
 * (e.g. when participating in a JtaTransactionManager transaction).
 * @see org.springframework.transaction.jta.JtaTransactionManager
 */
private static class ConnectionSynchronization extends TransactionSynchronizationAdapter {

private final ConnectionHolder connectionHolder;

private final DataSource dataSource;

private int order;

private boolean holderActive = true;

public ConnectionSynchronization(ConnectionHolder connectionHolder, DataSource dataSource) {
this.connectionHolder = connectionHolder;
this.dataSource = dataSource;
this.order = getConnectionSynchronizationOrder(dataSource);
}

@Override
public int getOrder() {
return this.order;
}

@Override
public void suspend() {
      // 比如事务的传播行为是REQUEST_NEW，每次都会创建一个新连接，
      // 会挂起当前的事务，当事务挂起的时候，就会回调到这里
      // conn在ThreadLocal中的缓存形式是  dataSource -> conn
      // 所以这里要先把之前的conn给unbind掉，新的连接才能正常的工作
if (this.holderActive) {
        // dataSource -> conn 清除上一个事务对应的连接
TransactionSynchronizationManager.unbindResource(this.dataSource);
        // isOpen就是看引用技术是否大于0，如果大于0标识还有人在用，这里不会关闭
if (this.connectionHolder.hasConnection() && !this.connectionHolder.isOpen()) {
// Release Connection on suspend if the application doesn't keep
// a handle to it anymore. We will fetch a fresh Connection if the
// application accesses the ConnectionHolder again after resume,
// assuming that it will participate in the same transaction.
releaseConnection(this.connectionHolder.getConnection(), this.dataSource);
this.connectionHolder.setConnection(null);
}
}
}

@Override
public void resume() {
if (this.holderActive) {
        // 恢复当前事务时，要把当前事务的connHolder恢复
TransactionSynchronizationManager.bindResource(this.dataSource, this.connectionHolder);
}
}

@Override
public void beforeCompletion() {
// Release Connection early if the holder is not open anymore
// (that is, not used by another resource like a Hibernate Session
// that has its own cleanup via transaction synchronization),
// to avoid issues with strict JTA implementations that expect
// the close call before transaction completion.
      // 事务完成之前
      // 这个阶段如果conn已经没有引用了，就直接关闭
if (!this.connectionHolder.isOpen()) {
TransactionSynchronizationManager.unbindResource(this.dataSource);
this.holderActive = false;
if (this.connectionHolder.hasConnection()) {
releaseConnection(this.connectionHolder.getConnection(), this.dataSource);
}
}
}

@Override
public void afterCompletion(int status) {
// If we haven't closed the Connection in beforeCompletion,
// close it now. The holder might have been used for other
// cleanup in the meantime, for example by a Hibernate Session.
if (this.holderActive) {
// The thread-bound ConnectionHolder might not be available anymore,
// since afterCompletion might get called from a different thread.
TransactionSynchronizationManager.unbindResourceIfPossible(this.dataSource);
        // 置为非活跃
this.holderActive = false;
if (this.connectionHolder.hasConnection()) {
releaseConnection(this.connectionHolder.getConnection(), this.dataSource);
// Reset the ConnectionHolder: It might remain bound to the thread.
this.connectionHolder.setConnection(null);
}
}
      // 清空当前holder的状态
this.connectionHolder.reset();
}
}

mybatis源码解析（二）—— 代理类生成分析

2021-03-19T09:09:51.000Z

mybatis虽然支持直接使用SqlSession来操作db，

1	final List selectAll = sqlSession.selectList("selectAll", null, new RowBounds(10, 20));

但是这种方式缺乏类型安全，参数传递的过程容易出错。

使用代理类

mybatis还支持生成代理类的方式来使用：

<mapper namespace="com.air.mybatis.sqlsession.WordsDao">
    <select id="selectAll" fetchSize="3" resultSetType="SCROLL_INSENSITIVE" resultType="com.air.mybatis.sqlsession.WordEntity">
       select * from words
    select>
mapper>

注意，namespace必须是WordsDao

package com.air.mybatis.sqlsession;

import org.apache.ibatis.session.RowBounds;

import java.util.List;

/**
 * @author 代故
 * @date 2021/3/19 2:36 PM
 */
public interface WordsDao {

    List selectAll(RowBounds rowBounds);
}

测试代码：

// com.air.mybatis.sqlsession.SqlSessionTest#testProxy
@Test
@SneakyThrows
public void testProxy() {
  try (Reader reader = Resources.getResourceAsReader("mybatis-config.xml")) {
    //创建SqlSessionFactory
    SqlSessionFactory sqlSessionFactory = new SqlSessionFactoryBuilder().build(reader);
    //获取SqlSession
    try (SqlSession sqlSession = sqlSessionFactory.openSession()) {
      final WordsDao mapper = sqlSession.getMapper(WordsDao.class);
      final List wordEntities = mapper.selectAll(new RowBounds(1, 10));
      System.out.println("words = " + wordEntities);
    }
  }
}

源码分析

代理类生成过程

先从sqlSession.getMapper(WordsDao.class);入手，看看大概：

// org.apache.ibatis.session.defaults.DefaultSqlSession#getMapper
@Override
public  T getMapper(Class type) {
  return configuration.getMapper(type, this);
}

// org.apache.ibatis.session.Configuration#getMapper
public  T getMapper(Class type, SqlSession sqlSession) {
  return mapperRegistry.getMapper(type, sqlSession);
}

// org.apache.ibatis.binding.MapperRegistry#getMapper
private final Map, MapperProxyFactory> knownMappers = new HashMap<>();
@SuppressWarnings("unchecked")
public  T getMapper(Class type, SqlSession sqlSession) {
  // 拿到类型对应的工厂类
  final MapperProxyFactory mapperProxyFactory = (MapperProxyFactory) knownMappers.get(type);
  if (mapperProxyFactory == null) {
    throw new BindingException("Type " + type + " is not known to the MapperRegistry.");
  }
  try {
    // 生成代理类
    return mapperProxyFactory.newInstance(sqlSession);
  } catch (Exception e) {
    throw new BindingException("Error getting mapper instance. Cause: " + e, e);
  }
}

// org.apache.ibatis.binding.MapperProxyFactory
/**
 * @author Lasse Voss
 */
public class MapperProxyFactory<T> {

  private final Class mapperInterface;
  private final Map methodCache = new ConcurrentHashMap<>();

  public MapperProxyFactory(Class mapperInterface) {
    this.mapperInterface = mapperInterface;
  }

  public Class getMapperInterface() {
    return mapperInterface;
  }

  public Map getMethodCache() {
    return methodCache;
  }

  // 创建JDK代理对象，实际的调用委托给mapperProxy
  @SuppressWarnings("unchecked")
  protected T newInstance(MapperProxy mapperProxy) {
    return (T) Proxy.newProxyInstance(mapperInterface.getClassLoader(), new Class[] { mapperInterface }, mapperProxy);
  }

  // 这里是入口
  public T newInstance(SqlSession sqlSession) {
    final MapperProxy mapperProxy = new MapperProxy<>(sqlSession, mapperInterface, methodCache);
    // 创建JDK代理对象，实际的调用委托给mapperProxy
    return newInstance(mapperProxy);
  }
}

客户端最终拿到的是一个MapperProxy的代理对象（com.sun.proxy.$Proxy6），下面看看调用过程的逻辑：

// org.apache.ibatis.binding.MapperProxy#invoke
@Override
public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
  try {
    if (Object.class.equals(method.getDeclaringClass())) {
      return method.invoke(this, args);
    } else {
      // 调用对应Invoker的invoke方法
      // public abstract java.util.List com.air.mybatis.sqlsession.WordsDao.selectAll(org.apache.ibatis.session.RowBounds)
      return cachedInvoker(method).invoke(proxy, method, args, sqlSession);
    }
  } catch (Throwable t) {
    throw ExceptionUtil.unwrapThrowable(t);
  }
}

private MapperMethodInvoker cachedInvoker(Method method) throws Throwable {
  try {
    // 首次调用会生成一个MethodInvoker
    return methodCache.computeIfAbsent(method, m -> {
      if (m.isDefault()) {
        try {
          if (privateLookupInMethod == null) {
            return new DefaultMethodInvoker(getMethodHandleJava8(method));
          } else {
            return new DefaultMethodInvoker(getMethodHandleJava9(method));
          }
        } catch (IllegalAccessException | InstantiationException | InvocationTargetException
                 | NoSuchMethodException e) {
          throw new RuntimeException(e);
        }
      } else {
        // 逻辑都在MapperMethod中
        return new PlainMethodInvoker(new MapperMethod(mapperInterface, method, sqlSession.getConfiguration()));
      }
    });
  } catch (RuntimeException re) {
    Throwable cause = re.getCause();
    throw cause == null ? re : cause;
  }
}


// org.apache.ibatis.binding.MapperMethod#execute

  public Object execute(SqlSession sqlSession, Object[] args) {
    Object result;
    switch (command.getType()) {
      case INSERT: {
        // 转换成sqlSession需要的采纳数
        Object param = method.convertArgsToSqlCommandParam(args);
        // 调用底层sqlSession的insert方法，并包装返回结果
        result = rowCountResult(sqlSession.insert(command.getName(), param));
        break;
      }
      case UPDATE: {
        Object param = method.convertArgsToSqlCommandParam(args);
        result = rowCountResult(sqlSession.update(command.getName(), param));
        break;
      }
      case DELETE: {
        Object param = method.convertArgsToSqlCommandParam(args);
        result = rowCountResult(sqlSession.delete(command.getName(), param));
        break;
      }
      case SELECT:
        if (method.returnsVoid() && method.hasResultHandler()) {
          executeWithResultHandler(sqlSession, args);
          result = null;
        } else if (method.returnsMany()) {
          result = executeForMany(sqlSession, args);
        } else if (method.returnsMap()) {
          result = executeForMap(sqlSession, args);
        } else if (method.returnsCursor()) {
          result = executeForCursor(sqlSession, args);
        } else {
          Object param = method.convertArgsToSqlCommandParam(args);
          result = sqlSession.selectOne(command.getName(), param);
          if (method.returnsOptional()
              && (result == null || !method.getReturnType().equals(result.getClass()))) {
            result = Optional.ofNullable(result);
          }
        }
        break;
      case FLUSH:
        result = sqlSession.flushStatements();
        break;
      default:
        throw new BindingException("Unknown execution method for: " + command.getName());
    }
    if (result == null && method.getReturnType().isPrimitive() && !method.returnsVoid()) {
      throw new BindingException("Mapper method '" + command.getName()
          + " attempted to return null from a method with a primitive return type (" + method.getReturnType() + ").");
    }
    return result;
  }

// 增删改的返回结果， rowCount就是SqlSession返回的影响的行数
// org.apache.ibatis.binding.MapperMethod#rowCountResult
private Object rowCountResult(int rowCount) {
  final Object result;
  if (method.returnsVoid()) {
    result = null;
  } else if (Integer.class.equals(method.getReturnType()) || Integer.TYPE.equals(method.getReturnType())) {
    result = rowCount;
  } else if (Long.class.equals(method.getReturnType()) || Long.TYPE.equals(method.getReturnType())) {
    result = (long)rowCount;
  } else if (Boolean.class.equals(method.getReturnType()) || Boolean.TYPE.equals(method.getReturnType())) {
    // 可以转成boolean类型的
    result = rowCount > 0;
  } else {
    throw new BindingException("Mapper method '" + command.getName() + "' has an unsupported return type: " + method.getReturnType());
  }
  return result;
}

注册代理类

// org.apache.ibatis.builder.xml.XMLMapperBuilder#bindMapperForNamespace
private void bindMapperForNamespace() {
  String namespace = builderAssistant.getCurrentNamespace();
  if (namespace != null) {
    Class boundType = null;
    try {
      boundType = Resources.classForName(namespace);
    } catch (ClassNotFoundException e) {
      //ignore, bound type is not required
    }
    // 如果namespace是一个类，比如WordsDao，就加到Mapper的Registry中
    if (boundType != null) {
      if (!configuration.hasMapper(boundType)) {
        // Spring may not know the real resource name so we set a flag
        // to prevent loading again this resource from the mapper interface
        // look at MapperAnnotationBuilder#loadXmlResource
        configuration.addLoadedResource("namespace:" + namespace);
        configuration.addMapper(boundType);
      }
    }
  }
}

// org.apache.ibatis.binding.MapperRegistry#addMapper
public  void addMapper(Class type) {
  if (type.isInterface()) {
    if (hasMapper(type)) {
      throw new BindingException("Type " + type + " is already known to the MapperRegistry.");
    }
    boolean loadCompleted = false;
    try {
      knownMappers.put(type, new MapperProxyFactory<>(type));
      // It's important that the type is added before the parser is run
      // otherwise the binding may automatically be attempted by the
      // mapper parser. If the type is already known, it won't try.
      MapperAnnotationBuilder parser = new MapperAnnotationBuilder(config, type);
      parser.parse();
      loadCompleted = true;
    } finally {
      if (!loadCompleted) {
        knownMappers.remove(type);
      }
    }
  }
}

参考

浅析MyBatis的动态代理原理

mybatis源码解析（一）

2021-03-18T12:31:01.000Z

基础组件

SqlSession

SqlSession是mybatis面向用户的一个类，使用如下：

@Test
@SneakyThrows
public void testSelect() {
  try (Reader reader = Resources.getResourceAsReader("mybatis-config.xml")) {
    //创建SqlSessionFactory
    SqlSessionFactory sqlSessionFactory = new SqlSessionFactoryBuilder().build(reader);
    //获取SqlSession
    SqlSession sqlSession = sqlSessionFactory.openSession();
    //执行Sql
    final List selectAll = sqlSession.selectList("selectAll", null, new RowBounds(10,20));
    System.out.println("selectAll = " + selectAll);
  }
}

SqlSession创建过程:

执行过程：

Executor

这一层提供的接口主要是针对MappedStatement的:

/**
 * @author Clinton Begin
 */
public interface Executor {

  ResultHandler NO_RESULT_HANDLER = null;

  int update(MappedStatement ms, Object parameter) throws SQLException;

   List query(MappedStatement ms, Object parameter, RowBounds rowBounds, ResultHandler resultHandler, CacheKey cacheKey, BoundSql boundSql) throws SQLException;

   List query(MappedStatement ms, Object parameter, RowBounds rowBounds, ResultHandler resultHandler) throws SQLException;

   Cursor queryCursor(MappedStatement ms, Object parameter, RowBounds rowBounds) throws SQLException;

  List flushStatements() throws SQLException;

  void commit(boolean required) throws SQLException;

  void rollback(boolean required) throws SQLException;

  CacheKey createCacheKey(MappedStatement ms, Object parameterObject, RowBounds rowBounds, BoundSql boundSql);

  boolean isCached(MappedStatement ms, CacheKey key);

  void clearLocalCache();

  void deferLoad(MappedStatement ms, MetaObject resultObject, String property, CacheKey key, Class targetType);

  Transaction getTransaction();

  void close(boolean forceRollback);

  boolean isClosed();

  void setExecutorWrapper(Executor executor);

}

结果缓存

在创建Session的时候，可以指定使用哪种executor

// org.apache.ibatis.session.Configuration#newExecutor(org.apache.ibatis.transaction.Transaction, org.apache.ibatis.session.ExecutorType)
public Executor newExecutor(Transaction transaction, ExecutorType executorType) {
  executorType = executorType == null ? defaultExecutorType : executorType;
  executorType = executorType == null ? ExecutorType.SIMPLE : executorType;
  Executor executor;
  if (ExecutorType.BATCH == executorType) {
    executor = new BatchExecutor(this, transaction);
  } else if (ExecutorType.REUSE == executorType) {
    // 缓存PreparedStatement
    executor = new ReuseExecutor(this, transaction);
  } else {
    executor = new SimpleExecutor(this, transaction);
  }
  // 如果开启了二级缓存，就用CachingExecutor装饰下
  if (cacheEnabled) {
    executor = new CachingExecutor(executor);
  }
  // 插件机制，后面会详细讲
  executor = (Executor) interceptorChain.pluginAll(executor);
  return executor;
}

Session级别的缓存（一级缓存）

一级缓存默认打开

MyBatis的一级缓存最大范围是SqlSession内部，有多个SqlSession或者分布式的环境下，数据库写操作会引起脏数据，建议设定缓存级别为Statement。

1	configuration.setLocalCacheScope(LocalCacheScope.valueOf(props.getProperty("localCacheScope", "SESSION")));

没有配置默认就是session级别的，配置示例：

1	<setting name="localCacheScope" value="SESSION"/>

Executor是跟session绑定的，所以这个缓存是session级别的，也就是连接级别的。连接关闭之后，这个缓存也就消失了。

// org.apache.ibatis.executor.BaseExecutor#query(org.apache.ibatis.mapping.MappedStatement, java.lang.Object, org.apache.ibatis.session.RowBounds, org.apache.ibatis.session.ResultHandler)
@Override
public  List query(MappedStatement ms, Object parameter, RowBounds rowBounds, ResultHandler resultHandler) throws SQLException {
  BoundSql boundSql = ms.getBoundSql(parameter);
  CacheKey key = createCacheKey(ms, parameter, rowBounds, boundSql);
  return query(ms, parameter, rowBounds, resultHandler, key, boundSql);
}


// org.apache.ibatis.executor.BaseExecutor#query(org.apache.ibatis.mapping.MappedStatement, java.lang.Object, org.apache.ibatis.session.RowBounds, org.apache.ibatis.session.ResultHandler, org.apache.ibatis.cache.CacheKey, org.apache.ibatis.mapping.BoundSql)
@Override
public  List query(MappedStatement ms, Object parameter, RowBounds rowBounds, ResultHandler resultHandler, CacheKey key, BoundSql boundSql) throws SQLException {
  ErrorContext.instance().resource(ms.getResource()).activity("executing a query").object(ms.getId());
  if (closed) {
    throw new ExecutorException("Executor was closed.");
  }
  if (queryStack == 0 && ms.isFlushCacheRequired()) {
    clearLocalCache();
  }
  List list;
  try {
    queryStack++;
    // 从缓存中取
    list = resultHandler == null ? (List) localCache.getObject(key) : null;
    if (list != null) {
      // 处理缓存的结果
      handleLocallyCachedOutputParameters(ms, key, parameter, boundSql);
    } else {
      list = queryFromDatabase(ms, parameter, rowBounds, resultHandler, key, boundSql);
    }
  } finally {
    queryStack--;
  }
  if (queryStack == 0) {
    for (DeferredLoad deferredLoad : deferredLoads) {
      deferredLoad.load();
    }
    // issue #601
    deferredLoads.clear();
    if (configuration.getLocalCacheScope() == LocalCacheScope.STATEMENT) {
      // issue #482
      clearLocalCache();
    }
  }
  return list;
}

// org.apache.ibatis.executor.BaseExecutor#queryFromDatabase
private  List queryFromDatabase(MappedStatement ms, Object parameter, RowBounds rowBounds, ResultHandler resultHandler, CacheKey key, BoundSql boundSql) throws SQLException {
  List list;
  // 占位
  localCache.putObject(key, EXECUTION_PLACEHOLDER);
  try {
    list = doQuery(ms, parameter, rowBounds, resultHandler, boundSql);
  } finally {
    // 清空缓存
    localCache.removeObject(key);
  }
  // 更新缓存
  localCache.putObject(key, list);
  if (ms.getStatementType() == StatementType.CALLABLE) {
    localOutputParameterCache.putObject(key, parameter);
  }
  return list;
}

Statement级别的缓存（二级缓存）

CachingExecutor加了一层Statement级别的缓存，其他的逻辑都是委托给其他的Executor来实现的。

// org.apache.ibatis.executor.CachingExecutor#query(org.apache.ibatis.mapping.MappedStatement, java.lang.Object, org.apache.ibatis.session.RowBounds, org.apache.ibatis.session.ResultHandler, org.apache.ibatis.cache.CacheKey, org.apache.ibatis.mapping.BoundSql)
@Override
public  List query(MappedStatement ms, Object parameterObject, RowBounds rowBounds, ResultHandler resultHandler, CacheKey key, BoundSql boundSql)
  throws SQLException {
  // statement 级别的cache，可以在配置文件中开启
  Cache cache = ms.getCache();
  if (cache != null) {
    flushCacheIfRequired(ms);
    if (ms.isUseCache() && resultHandler == null) {
      ensureNoOutParams(ms, boundSql);
      @SuppressWarnings("unchecked")
      List list = (List) tcm.getObject(cache, key);
      // 缓存未命中
      if (list == null) {
        // 委托给底层进行查询
        list = delegate.query(ms, parameterObject, rowBounds, resultHandler, key, boundSql);
        // 加入缓存
        tcm.putObject(cache, key, list); // issue #578 and #116
      }
      return list;
    }
  }
  // 未开启缓存，直接委托给底层的实现
  return delegate.query(ms, parameterObject, rowBounds, resultHandler, key, boundSql);
}

实际处理类的逻辑:

// org.apache.ibatis.executor.SimpleExecutor#doQuery
@Override
public  List doQuery(MappedStatement ms, Object parameter, RowBounds rowBounds, ResultHandler resultHandler, BoundSql boundSql) throws SQLException {
  Statement stmt = null;
  try {
    Configuration configuration = ms.getConfiguration();
    // 创建StatementHandler
    StatementHandler handler = configuration.newStatementHandler(wrapper, ms, parameter, rowBounds, resultHandler, boundSql);
    // 把配置的一些属性，传递个对应的Statement, 比如fetchSize， timeout等
    stmt = prepareStatement(handler, ms.getStatementLog());
    // 委托StatementHandler查询
    return handler.query(stmt, resultHandler);
  } finally {
    closeStatement(stmt);
  }
}

Cache的实现使用了装饰者模式：

SynchronizedCache -> LoggingCache -> SerializedCache -> LruCache -> PerpetualCache
以下是具体这些Cache实现类的介绍，他们的组合为Cache赋予了不同的能力。
SynchronizedCache：同步Cache，实现比较简单，直接使用synchronized修饰方法。
LoggingCache：日志功能，装饰类，用于记录缓存的命中率，如果开启了DEBUG模式，则会输出命中率日志。
SerializedCache：序列化功能，将值序列化后存到缓存中。该功能用于缓存返回一份实例的Copy，用于保存线程安全。
LruCache：采用了Lru算法的Cache实现，移除最近最少使用的Key/Value。
PerpetualCache：作为为最基础的缓存类，底层实现比较简单，直接使用了HashMap。

二级缓存跨session存在，有很大的风险会读到错误的数据。而且大部分的互联网应用都是分布式的，一般不共享状态，可以水平扩展；但是本地缓存打破了无状态下，很有可能会读到错误的数据，应该慎重使用。

PreparedStatement缓存（PSCache）

又叫PSCache，这里对应的是ReuseExecutor，这个缓存也是Session级别的。除了在Mybatis这一层做缓存，还可以在MySQL驱动和MysqlServer做缓存，参见jdbc预编译缓存加速sql执行 | KL’s blog

// org.apache.ibatis.executor.ReuseExecutor#prepareStatement
private Statement prepareStatement(StatementHandler handler, Log statementLog) throws SQLException {
  Statement stmt;
  BoundSql boundSql = handler.getBoundSql();
  String sql = boundSql.getSql();
  if (hasStatementFor(sql)) {
    // 从缓存中取
    stmt = getStatement(sql);
    applyTransactionTimeout(stmt);
  } else {
    Connection connection = getConnection(statementLog);
    stmt = handler.prepare(connection, transaction.getTimeout());
    putStatement(sql, stmt);
  }
  handler.parameterize(stmt);
  return stmt;
}

// private final Map statementMap = new HashMap();
private boolean hasStatementFor(String sql) {
  try {
    return statementMap.keySet().contains(sql) && !statementMap.get(sql).getConnection().isClosed();
  } catch (SQLException e) {
    return false;
  }
}

private Statement getStatement(String s) {
  return statementMap.get(s);
}

private void putStatement(String sql, Statement stmt) {
  statementMap.put(sql, stmt);
}

StatementHandler

StatementHandler主要是跟javax里的Statement打交道的，相当于对Statement的操作进行了一层封装，也是mybatis和jdbc的一个隔离层。

接口：

/**
 * @author Clinton Begin
 */
public interface StatementHandler {

  Statement prepare(Connection connection, Integer transactionTimeout)
      throws SQLException;

  void parameterize(Statement statement)
      throws SQLException;

  void batch(Statement statement)
      throws SQLException;

  int update(Statement statement)
      throws SQLException;

   List query(Statement statement, ResultHandler resultHandler)
      throws SQLException;

   Cursor queryCursor(Statement statement)
      throws SQLException;

  BoundSql getBoundSql();

  ParameterHandler getParameterHandler();

}

可以看出，接口中的参数，都是Statement而不是mybatis自己的MappedStatement

继承关系：

其中RoutingStatementHandler就是用来路由的，根据查询的类型路由到SimpleStatementHandler、CallableStatementHandler、PreparedStatementHandler

public RoutingStatementHandler(Executor executor, MappedStatement ms, Object parameter, RowBounds rowBounds, ResultHandler resultHandler, BoundSql boundSql) {

  switch (ms.getStatementType()) {
    case STATEMENT:
      delegate = new SimpleStatementHandler(executor, ms, parameter, rowBounds, resultHandler, boundSql);
      break;
    case PREPARED:
      delegate = new PreparedStatementHandler(executor, ms, parameter, rowBounds, resultHandler, boundSql);
      break;
    case CALLABLE:
      delegate = new CallableStatementHandler(executor, ms, parameter, rowBounds, resultHandler, boundSql);
      break;
    default:
      throw new ExecutorException("Unknown statement type: " + ms.getStatementType());
  }
}

TypeHandler

TypeHandler主要负责类型转换，类似spring的ConversionService, 主要用于两个地方，一个是设置PrepareStatement，占位符对应的参数；一个是将ResultSet返回的结果集转换成对象。

/**
 * @author Clinton Begin
 */
public interface TypeHandler<T> {

  void setParameter(PreparedStatement ps, int i, T parameter, JdbcType jdbcType) throws SQLException;

  T getResult(ResultSet rs, String columnName) throws SQLException;

  T getResult(ResultSet rs, int columnIndex) throws SQLException;

  T getResult(CallableStatement cs, int columnIndex) throws SQLException;

}

ParameterHandler

比如数据库里面存的是VARCHAR，传给mybatis的是一个Bean对象，就可以在这一层做一个转换：

@Override
public void setNonNullParameter(PreparedStatement ps, int i, T parameter, JdbcType jdbcType) throws SQLException {

  try {
    // Bean -> json string
    ps.setString(i, ObjectUtil.toJson(parameter));
  } catch (JsonProcessingException e) {
    throw new RuntimeException(e);
  }
}

默认实现org.apache.ibatis.scripting.defaults.DefaultParameterHandler

// org.apache.ibatis.scripting.defaults.DefaultParameterHandler#setParameters
@Override
public void setParameters(PreparedStatement ps) {
  ErrorContext.instance().activity("setting parameters").object(mappedStatement.getParameterMap().getId());
  List parameterMappings = boundSql.getParameterMappings();
  if (parameterMappings != null) {
    for (int i = 0; i < parameterMappings.size(); i++) {
      ParameterMapping parameterMapping = parameterMappings.get(i);
      if (parameterMapping.getMode() != ParameterMode.OUT) {
        Object value;
        String propertyName = parameterMapping.getProperty();
        if (boundSql.hasAdditionalParameter(propertyName)) { // issue #448 ask first for additional params
          value = boundSql.getAdditionalParameter(propertyName);
        } else if (parameterObject == null) {
          value = null;
        } else if (typeHandlerRegistry.hasTypeHandler(parameterObject.getClass())) {
          value = parameterObject;
        } else {
          MetaObject metaObject = configuration.newMetaObject(parameterObject);
          value = metaObject.getValue(propertyName);
        }
        // 拿到参数对应的TypeHandler，通过 --> TypeHandler， 解析的时候就确定了
        TypeHandler typeHandler = parameterMapping.getTypeHandler();
        JdbcType jdbcType = parameterMapping.getJdbcType();
        if (value == null && jdbcType == null) {
          jdbcType = configuration.getJdbcTypeForNull();
        }
        try {
          // 使用typeHandler做类型转换
          typeHandler.setParameter(ps, i + 1, value, jdbcType);
        } catch (TypeException e) {
          throw new TypeException("Could not set parameters for mapping: " + parameterMapping + ". Cause: " + e, e);
        } catch (SQLException e) {
          throw new TypeException("Could not set parameters for mapping: " + parameterMapping + ". Cause: " + e, e);
        }
      }
    }
  }
}

ResultSetHandler

用于转换JDBC返回的ResultSet对象为Statement中定义的返回值类型。

/**
 * @author Clinton Begin
 */
// 处理批量
public interface ResultSetHandler {

   List handleResultSets(Statement stmt) throws SQLException;

   Cursor handleCursorResultSets(Statement stmt) throws SQLException;

  void handleOutputParameters(CallableStatement cs) throws SQLException;

}

/**
 * @author Clinton Begin
 */
// 处理单个
public interface ResultHandler<T> {

  void handleResult(ResultContext resultContext);

}

默认实现：

// org.apache.ibatis.executor.resultset.DefaultResultSetHandler#handleResultSet
// for循环中调用
private void handleResultSet(ResultSetWrapper rsw, ResultMap resultMap, List multipleResults, ResultMapping parentMapping) throws SQLException {
  try {
    if (parentMapping != null) {
      handleRowValues(rsw, resultMap, null, RowBounds.DEFAULT, parentMapping);
    } else {
      if (resultHandler == null) {
        // 默认的ResultHandler
        DefaultResultHandler defaultResultHandler = new DefaultResultHandler(objectFactory);
        handleRowValues(rsw, resultMap, defaultResultHandler, rowBounds, null);
        multipleResults.add(defaultResultHandler.getResultList());
      } else {
        handleRowValues(rsw, resultMap, resultHandler, rowBounds, null);
      }
    }
  } finally {
    // issue #228 (close resultsets)
    closeResultSet(rsw.getResultSet());
  }
}

// org.apache.ibatis.executor.resultset.DefaultResultSetHandler#createUsingConstructor
private Object createUsingConstructor(ResultSetWrapper rsw, Class resultType, List> constructorArgTypes, List constructorArgs, Constructor constructor) throws SQLException {
  boolean foundValues = false;
  for (int i = 0; i < constructor.getParameterTypes().length; i++) {
    Class parameterType = constructor.getParameterTypes()[i];
    String columnName = rsw.getColumnNames().get(i);
    // 获取对应的TypeHandler
    TypeHandler typeHandler = rsw.getTypeHandler(parameterType, columnName);
    // 转换类型
    Object value = typeHandler.getResult(rsw.getResultSet(), columnName);
    constructorArgTypes.add(parameterType);
    constructorArgs.add(value);
    foundValues = value != null || foundValues;
  }
  return foundValues ? objectFactory.create(resultType, constructorArgTypes, constructorArgs) : null;
}