一? 問題描述
今天測試組更新測試環(huán)境后發(fā)現(xiàn)系統(tǒng)卡頓,無法辦理任何業(yè)務(wù),重啟系統(tǒng)后問題仍然存在。已經(jīng)到項目后期,迭代測試時間十分緊張。此問題直接影響到項目進度
二 排查過程
1.查看進程資源占用情況
Linux 下常用top命令顯示系統(tǒng)中各個進程的資源占用狀況,查看資源發(fā)現(xiàn)PID26799進程信息中
CPU使用率達到了333%。詳細信息如下
root@10.0.254.59 15:57:17 /data/jboss-4.2.3-9.0.0-8078/bin # top
top - 16:01:10 up 241 days, 22:37, 6 users, load average: 8.83, 3.23, 2.69
Tasks: 207 total, 1 running, 206 sleeping, 0 stopped, 0 zombie
%Cpu(s): 84.0 us, 0.7 sy, 0.0 ni, 15.2 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem : 16266800 total, 170440 free, 9203876 used, 6892484 buff/cache
KiB Swap: 4194300 total, 4066812 free, 127488 used. 6589968 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26799 root 20 0 8730000 4.6g 28024 S 333.0 29.9 11:09.94 java
27727 root 20 0 8728572 2.3g 36332 S 1.0 14.6 12:27.05 java
21841 root 20 0 5464404 1.6g 16284 S 0.7 10.2 145:29.47 java
9 root 20 0 0 0 0 S 0.3 0.0 11:51.47 rcu_sched
13985 root 20 0 162148 2408 1588 R 0.3 0.0 0:00.01 top
24029 root 20 0 162856 3156 1608 S 0.3 0.0 0:45.30 top
1 root 20 0 193640 4364 2832 S 0.0 0.0 19:42.60 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:01.32 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:28.28 ksoftirqd/0
2. 查看進程中占用線程情況
使用top -Hp命令查找該進程中CPU使用最高的線程,發(fā)現(xiàn)26801、26802、26803、28604線程CPU使用都達到了100%
root@10.0.254.59 16:01:15 /data/jboss-4.2.3-9.0.0-8078/bin # top -Hp 26799
top - 16:01:25 up 241 days, 22:37, 6 users, load average: 8.14, 3.35, 2.74
Threads: 553 total, 4 running, 549 sleeping, 0 stopped, 0 zombie
%Cpu(s): 98.5 us, 1.5 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 16266800 total, 169656 free, 9204352 used, 6892792 buff/cache
KiB Swap: 4194300 total, 4066812 free, 127488 used. 6589608 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26801 root 20 0 8730000 4.6g 28024 R 99.9 29.9 1:06.08 java
26803 root 20 0 8730000 4.6g 28024 R 99.9 29.9 1:06.22 java
26804 root 20 0 8730000 4.6g 28024 R 99.9 29.9 1:06.30 java
26802 root 20 0 8730000 4.6g 28024 R 99.9 29.9 1:06.11 java
26799 root 20 0 8730000 4.6g 28024 S 0.0 29.9 0:00.00 java
26800 root 20 0 8730000 4.6g 28024 S 0.0 29.9 0:00.05 java
26805 root 20 0 8730000 4.6g 28024 S 0.0 29.9 0:08.64 java
26806 root 20 0 8730000 4.6g 28024 S 0.0 29.9 0:00.01 java
3. 獲取進程的棧信息
使用jstack將PID26799進程的棧信息導(dǎo)出到臨時文件以便分析
root@10.0.254.59 16:01:58 /data/jboss-4.2.3-9.0.0-8078/bin # jstack 26799 >pid26799.txt
4. 執(zhí)行printf 將線程號格式化為16進制
26801、26802、26803、28604分別對應(yīng)十六進制數(shù)068b1,、068b2、068b3、068b4
例如:
root@10.0.254.59 16:02:59 /data/jboss-4.2.3-9.0.0-8078/bin # printf "0%x\n" 26802
068b2
5. 查找線程的棧信息
通過占用轉(zhuǎn)換后的十六進制數(shù)去導(dǎo)出的pid26799.txt中查找棧信息。發(fā)現(xiàn)4個線程都是GC任務(wù)
"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00007faef408e800 nid=0x68b6 in Object.wait() [0x00007faee1603000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
- locked <0x00000006c043d188> (a java.lang.ref.Reference$Lock)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)
"VM Thread" os_prio=0 tid=0x00007faef4085000 nid=0x68b5 runnable
"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007faef4020000 nid=0x68b1 runnable
"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007faef4022000 nid=0x68b2 runnable
"GC task thread#2 (ParallelGC)" os_prio=0 tid=0x00007faef4024000 nid=0x68b3 runnable
"GC task thread#3 (ParallelGC)" os_prio=0 tid=0x00007faef4025800 nid=0x68b4 runnable
"VM Periodic Task Thread" os_prio=0 tid=0x00007faef40d4800 nid=0x68bd waiting on condition
6.查看GC垃圾回收
發(fā)現(xiàn)YGC(年輕代回收次數(shù))?和FGC(老年代垃圾回收次數(shù)) 都比較大,且YGCT(年輕代回收消耗時間)?和FGCT(老年代垃圾回收消耗時間)都比較長,說明系統(tǒng)因為GC才出現(xiàn)的卡頓現(xiàn)象。因為,在執(zhí)行垃圾收集算法時,Java應(yīng)用程序的其他所有線程都被掛起,在Java中的這個現(xiàn)象叫做Stop-The-World機制簡稱STW。 這是一種全局暫?,F(xiàn)象。
root@10.0.254.59 16:05:45 /data/STARSMS/jboss-4.2.3-9.0.0-8078/bin # jstat -gc 26799 5000
S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT GCT
200192.0 211456.0 0.0 0.0 919552.0 919552.0 2796544.0 2709387.4 180608.0 166270.8 22400.0 18766.9 61 17.137 33 394.263 411.400
200192.0 211456.0 0.0 0.0 919552.0 919552.0 2796544.0 2709387.4 180608.0 166270.8 22400.0 18766.9 61 17.137 33 394.263 411.400
200192.0 211456.0 0.0 0.0 919552.0 919552.0 2796544.0 2709387.2 180608.0 166292.3 22400.0 18766.9 61 17.137 34 409.916 427.052
200192.0 211456.0 0.0 0.0 919552.0 919552.0 2796544.0 2709387.2 180608.0 166292.3 22400.0 18766.9 61 17.137 34 409.916 427.052
200192.0 211456.0 0.0 0.0 919552.0 919552.0 2796544.0 2709387.2 180608.0 166292.3 22400.0 18766.9 61 17.137 34 409.916 427.052
200192.0 211456.0 0.0 0.0 919552.0 919552.0 2796544.0 2709236.9 180608.0 166306.0 22400.0 18769.6 61 17.137 35 426.082 443.219
200192.0 211456.0 0.0 0.0 919552.0 919552.0 2796544.0 2709236.9 180608.0 166306.0 22400.0 18769.6 61 17.137 35 426.082 443.219
200192.0 211456.0 0.0 0.0 919552.0 919552.0 2796544.0 2709236.9 180608.0 166306.0
7. 堆轉(zhuǎn)儲,分析dump.hprof 文件
先通過? arhtas? 的heapdump命令得到hprof文件。(?參考:arthas)通過瀏覽器下載: http://localhost:8563/arthas-output/?
[arthas@9312]$ heapdump arthas-output/dump.hprof
Dumping heap to arthas-output/dump.hprof ...
Heap dump file created
?
然后分析hprof文件, 分析方法有很多中我使用的是JDK自帶的bin/jvisualvm.exe 分析
文件- 裝入-選擇下載的dump.hprof文件
因為SWT時線程被掛起,所以泄露的程序應(yīng)該有線程信息, 根據(jù)對象和線程棧發(fā)現(xiàn)了可疑程序
"autoArrivalTimeChangeQuartzSchedule_Worker-1" #361 prio=5 os_prio=0 tid=0x00007fae4419c800 nid=0x6e97 runnable [0x00007fae36f2d000]
java.lang.Thread.State: RUNNABLE
at java.text.DateFormatSymbols.<init>(DateFormatSymbols.java:145)
at sun.util.locale.provider.DateFormatSymbolsProviderImpl.getInstance(DateFormatSymbolsProviderImpl.java:85)
at java.text.DateFormatSymbols.getProviderInstance(DateFormatSymbols.java:364)
at java.text.DateFormatSymbols.getInstance(DateFormatSymbols.java:340)
at java.util.Calendar.getDisplayName(Calendar.java:2110)
at java.text.SimpleDateFormat.subFormat(SimpleDateFormat.java:1125)
at java.text.SimpleDateFormat.format(SimpleDateFormat.java:966)
at java.text.SimpleDateFormat.format(SimpleDateFormat.java:936)
at java.text.DateFormat.format(DateFormat.java:345)
at com.star.sms.business.customer.job.GwNpDealContext.getDateList(GwNpDealContext.java:277)
8.分析線上代碼
使用arhtas 的jad命令查看線上代碼,發(fā)現(xiàn)275-279行提交的代碼存在死循環(huán)。至此找到內(nèi)存泄露代碼
[arthas@9312]$ jad com.star.sms.business.customer.job.GwNpDealContext getDateList
ClassLoader:
+-org.eclipse.osgi.internal.baseadaptor.DefaultClassLoader@105c375d[stariboss-customer:1.0.0(id=130)]
+-org.eclipse.osgi.baseadaptor.BaseAdaptor$1@d280fb
Location:
/E:/Jboss/jboss-4.2.3-9.0.0R1-8089-hebei-dev/server/default/tmp/stariboss_configuration/org.eclipse.osgi/bundles/130/1/bundlefile
public List<String> getDateList(String startTime, String endTime, GwNpUserType type) {
ArrayList<String> dateList = new ArrayList<String>();
/*246*/ SimpleDateFormat format = null;
/*247*/ int calendarField = 0;
/*248*/ if (type.equals((Object)GwNpUserType.HOR)) {
format = new SimpleDateFormat("yyyyMMddHH");
/*250*/ calendarField = 11;
/*251*/ } else if (type.equals((Object)GwNpUserType.DAY)) {
format = new SimpleDateFormat("yyyyMMdd");
/*253*/ calendarField = 5;
/*254*/ } else if (type.equals((Object)GwNpUserType.MONTH)) {
format = new SimpleDateFormat("yyyyMM");
/*256*/ calendarField = 2;
} else {
/*258*/ this.log.info((Object)"=====GwNpUserType is INVALID======");
/*259*/ return dateList;
}
try {
/*263*/ Date startTime1 = null;
/*264*/ Date endTime1 = null;
/*265*/ if (StringUtils.hasText((String)startTime)) {
/*266*/ startTime1 = format.parse(startTime);
/*267*/ endTime1 = format.parse(endTime.substring(0, startTime.length()));
} else {
/*269*/ startTime = this.assigStartTime(startTime, calendarField);
/*270*/ startTime1 = DateFormatUtil.parseStrToDate4((String)startTime);
/*271*/ endTime1 = DateFormatUtil.parseStrToDate4((String)endTime);
}
/*274*/ Calendar tempStart = Calendar.getInstance();
/*275*/ tempStart.setTime(startTime1);
while (endTime1.getTime() >= startTime1.getTime()) {
/*277*/ dateList.add(format.format(startTime1));
/*279*/ startTime1 = tempStart.getTime();
}
}
catch (Exception e) {
// empty catch block
}
/*284*/ return dateList;
}
三 問題總結(jié)
本問題是因為開發(fā)提交了幾行未經(jīng)測試的代碼引起, 此代碼直接引起內(nèi)存泄露,系統(tǒng)頻繁GC。這也是提交未測試代碼引起故障的典型案例。文章來源:http://www.zghlxwxcb.cn/news/detail-422900.html
?上一篇:了解高可用文章來源地址http://www.zghlxwxcb.cn/news/detail-422900.html
到了這里,關(guān)于頻繁GC引起卡頓問題排查與解決的文章就介紹完了。如果您還想了解更多內(nèi)容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!