记录一次CPU无规律的跑满问题排查过程
特不靠谱 2023-11-29 10:15:45 522人已围观
1、一大早就收到了服务器告警提醒,进入后台一看CPU间歇性的跑满,徘徊在死机的边缘;
2、登录服务器,top查看一下,也没发现明显的异常,感觉跟平常也差不多:
3、检查系统后台日志,也未发现异常日志问题,持续观察top中CPU的波动,发现JAVA进程有问题:
4、使用 top -Hp <pid> 命令(为Java进程的id号)查看该Java进程内所有线程的资源占用情况。PID=1806
发现1808线程存在很大的波动20%-95%。这里需要将线程(tid=1808)转化成16进制,方便后面查询日志使用,使用 printf "%x\n" <tid> 命令(tid指线程的id号)将以上10进制的线程号转换为16进制:
输出结果是:710 ,加上16进制前面2位:0x 组合后结果:0x710 , 这个编号会对应后面jstack日志中的:nid=0x710 这个关键值。
5、通过使用dk自带命令jstack获取该java进程的线程快照并输入到文件中,命令: jstack -l PID > ./jstack_result.txt (使用第3、4步发现的进程id=1806)
6、日志文件中寻找,nid=0x710 的信息(出现在倒数第三行)
Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode): "NioBlockingSelector.BlockPoller-2" daemon prio=10 tid=0x00007fd8883c5800 nid=0x71a runnable [0x00007fd88639c000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) - locked <0x00000000e3e00d58> (a sun.nio.ch.Util$2) - locked <0x00000000e3e00d68> (a java.util.Collections$UnmodifiableSet) - locked <0x00000000e3e00d10> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) at org.apache.tomcat.util.net.NioBlockingSelector$BlockPoller.run(NioBlockingSelector.java:339) Locked ownable synchronizers: - None "NioBlockingSelector.BlockPoller-1" daemon prio=10 tid=0x00007fd888360000 nid=0x719 runnable [0x00007fd88649d000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) - locked <0x00000000e3e00e80> (a sun.nio.ch.Util$2) - locked <0x00000000e3e00e90> (a java.util.Collections$UnmodifiableSet) - locked <0x00000000e3e00e38> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) at org.apache.tomcat.util.net.NioBlockingSelector$BlockPoller.run(NioBlockingSelector.java:339) Locked ownable synchronizers: - None "GC Daemon" daemon prio=10 tid=0x00007fd88825d000 nid=0x718 in Object.wait() [0x00007fd886a0c000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000000e3e00f08> (a sun.misc.GC$LatencyLock) at sun.misc.GC$Daemon.run(GC.java:117) - locked <0x00000000e3e00f08> (a sun.misc.GC$LatencyLock) Locked ownable synchronizers: - None "Service Thread" daemon prio=10 tid=0x00007fd88808d800 nid=0x716 runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE Locked ownable synchronizers: - None "C2 CompilerThread1" daemon prio=10 tid=0x00007fd88808b000 nid=0x715 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE Locked ownable synchronizers: - None "C2 CompilerThread0" daemon prio=10 tid=0x00007fd888088800 nid=0x714 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE Locked ownable synchronizers: - None "Signal Dispatcher" daemon prio=10 tid=0x00007fd888086800 nid=0x713 runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE Locked ownable synchronizers: - None "Finalizer" daemon prio=10 tid=0x00007fd888065800 nid=0x712 in Object.wait() [0x00007fd887998000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135) - locked <0x00000000e3e01128> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209) Locked ownable synchronizers: - None "Reference Handler" daemon prio=10 tid=0x00007fd888063800 nid=0x711 in Object.wait() [0x00007fd887a99000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:503) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133) - locked <0x00000000e3e011d0> (a java.lang.ref.Reference$Lock) Locked ownable synchronizers: - None "main" prio=10 tid=0x00007fd888009000 nid=0x70f runnable [0x00007fd8918d7000] java.lang.Thread.State: RUNNABLE at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) at java.net.ServerSocket.implAccept(ServerSocket.java:530) at java.net.ServerSocket.accept(ServerSocket.java:498) at org.apache.catalina.core.StandardServer.await(StandardServer.java:470) at org.apache.catalina.startup.Catalina.await(Catalina.java:781) at org.apache.catalina.startup.Catalina.start(Catalina.java:727) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:294) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:428) Locked ownable synchronizers: - None "VM Thread" prio=10 tid=0x00007fd88805f000 nid=0x710 runnable "VM Periodic Task Thread" prio=10 tid=0x00007fd888099000 nid=0x717 waiting on condition JNI global references: 510
7、"VM Thread" prio=10 tid=0x00007fd88805f000 nid=0x710 runnable 这又是个啥?这里的VM Thread一行的最后显示nid=0x710,这里nid的意思就是操作系统线程id的意思。而VM Thread指的就是垃圾回收的线程。这里我们基本上可以确定,当前系统缓慢的原因主要是垃圾回收过于频繁,导致GC停顿时间较长。我们通过如下命令:jstat -gcutil PID 可以查看GC的情况,如下图:
可以看到,这里FGC指的是Full GC数量,这里高达15328次,而且还在不断增长。在这里我们就可以区分导致CPU过高的原因具体是Full GC次数过多还是代码中有比较耗时的计算了。如果是Full GC次数过多,那么通过jstack得到的线程信息会是类似于VM Thread之类的线程,而如果是代码中有比较耗时的计算,那么我们得到的就是一个线程的具体堆栈信息。
附录信息:
1、https://zhuanlan.zhihu.com/p/656557189