關(guān)鍵詞:Android Framework 動(dòng)態(tài)庫(kù) 動(dòng)態(tài)鏈接 Binder
1、事件起因
Android Studio一次更新后發(fā)現(xiàn)install App,設(shè)備就重啟了,跑了一遍開(kāi)機(jī)動(dòng)畫(huà)但不是從開(kāi)機(jī)第一屏開(kāi)始重啟,tombstones內(nèi)容查看發(fā)現(xiàn)是surfaceflinger
掛在libbinder.so
,那install app做了什么這個(gè)不得而知,理論上有問(wèn)題應(yīng)該掛的是PackageManagerService。先不管Android Studio的事情,雖然掛在Binder的庫(kù)里,還是首先懷疑問(wèn)題出在surfaceflinger
的Binder使用邏輯。
2、原因分析
(以下分析使用RockPi4B還原現(xiàn)場(chǎng))
tombstone文件如下
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
Build fingerprint: 'rockchip/rk3399_ROCKPI4B_Android11/rk3399_ROCKPI4B_Android11:11/RQ3A.210705.001/eng.kryo.20240128.131540:userdebug/release-keys'
Revision: '0'
ABI: 'arm64'
Timestamp: 2024-01-28 23:42:32+0800
pid: 11263, tid: 11289, name: Binder:11263_2 >>> /system/bin/surfaceflinger <<<
uid: 1000
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x791c0e6cd0
x0 b4000079ce312fd8 x1 000000005f444d50 x2 000000791d20c9c0 x3 000000791d20c940
x4 0000000000000010 x5 0000000000000018 x6 b400007a7e312e50 x7 b4000079ce312fd8
x8 000000791c0e6c50 x9 0000000010000000 x10 0000000000000001 x11 0000000000000002
x12 0000000000000000 x13 0000007baff5e020 x14 0001096dce96e5c4 x15 0000000029aaaaf0
x16 0000007bafd6c420 x17 0000007bafd29e30 x18 000000791c592000 x19 000000791d20c940
x20 0000000000000010 x21 000000791d20c9c0 x22 b4000079ce312fd8 x23 000000005f444d50
x24 000000791d20d000 x25 0000000000000000 x26 ffffffff000003e8 x27 b400007a7e313014
x28 0000000000000000 x29 000000791d20c8d0
lr 0000007bafd167bc sp 000000791d20c8c0 pc 0000007bafd1685c pst 0000000080000000
backtrace:
#00 pc 000000000004985c /system/lib64/libbinder.so (android::BBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+228) (BuildId: d5e42e998e9031430bee87f595521231)
#01 pc 00000000000524a8 /system/lib64/libbinder.so (android::IPCThreadState::executeCommand(int)+1032) (BuildId: d5e42e998e9031430bee87f595521231)
#02 pc 0000000000051fec /system/lib64/libbinder.so (android::IPCThreadState::getAndExecuteCommand()+156) (BuildId: d5e42e998e9031430bee87f595521231)
#03 pc 000000000005282c /system/lib64/libbinder.so (android::IPCThreadState::joinThreadPool(bool)+60) (BuildId: d5e42e998e9031430bee87f595521231)
#04 pc 0000000000078e10 /system/lib64/libbinder.so (android::PoolThread::threadLoop()+24) (BuildId: d5e42e998e9031430bee87f595521231)
#05 pc 000000000001567c /system/lib64/libutils.so (android::Thread::_threadLoop(void*)+260) (BuildId: c081ab14bd4aef44c9c459d77d8c9b48)
#06 pc 0000000000014f14 /system/lib64/libutils.so (thread_data_t::trampoline(thread_data_t const*)+412) (BuildId: c081ab14bd4aef44c9c459d77d8c9b48)
#07 pc 00000000000b0c08 /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+64) (BuildId: 0a481e8df134382e9d3effff2fce8b74)
#08 pc 00000000000505d0 /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64) (BuildId: 0a481e8df134382e9d3effff2fce8b74)
使用addr2line工具查看代碼奔潰處源碼,aosp編譯時(shí)會(huì)在out目錄下生成symbols目錄,會(huì)額外保存一份帶符號(hào)的so文件方便回溯問(wèn)題,把崩潰處的地址4985c
傳入
aarch64-linux-android-addr2line -e out/target/product/rk3399_ROCKPI4B_Android11/symbols/system/lib64/libbinder.so 4985c
#frameworks/native/libs/binder/Binder.cpp:188
定位到Binder.cpp 188行處的源碼:
status_t BBinder::transact(
uint32_t code, const Parcel& data, Parcel* reply, uint32_t flags)
{
data.setDataPosition(0);
status_t err = NO_ERROR;
switch (code) {
case PING_TRANSACTION:
err = pingBinder();
break;
case EXTENSION_TRANSACTION:
err = reply->writeStrongBinder(getExtension());
break;
case DEBUG_PID_TRANSACTION:
err = reply->writeInt32(getDebugPid());
break;
default:
err = onTransact(code, data, reply, flags); // 188行處
break;
}
//... ...
}
還是目前很難看出崩潰原因,tombstone給出的是SEGV_MAPERR錯(cuò)誤,推測(cè)與訪存有關(guān),再結(jié)合objdump -S 查看libbinder.so的反匯編:
... ...
49840: 1400001a b 498a8 <_ZN7android7BBinder8transactEjRKNS_6ParcelEPS1_j@@Base+0x130>
49844: f94002c8 ldr x8, [x22]
49848: aa1603e0 mov x0, x22
4984c: 2a1703e1 mov w1, w23
49850: aa1503e2 mov x2, x21
49854: aa1303e3 mov x3, x19
49858: 2a1403e4 mov w4, w20
4985c: f9404108 ldr x8, [x8,#128]
49860: d63f0100 blr x8
... ...
可以看到4985c
處的匯編指令是通過(guò)x8寄存器偏移+128字節(jié)進(jìn)行訪存(寄存器相對(duì)尋址) x8內(nèi)容0x791c0e6c50加128正好是出錯(cuò)內(nèi)存地址0x791c0e6cd0,取得的內(nèi)存結(jié)果再存回x8
tombstone也會(huì)把進(jìn)程的pmap信息打印,由于地址空間布局隨機(jī)化(ASLR)機(jī)制,可能每次運(yùn)行結(jié)果的地址都不一樣:
memory map (836 entries): (fault address prefixed with --->)
... ...
00000079'1afe2000-00000079'1bc05fff --- 0 c24000
00000079'1bc06000-00000079'1bc07fff rw- 0 2000
00000079'1bc08000-00000079'1bfe1fff --- 0 3da000
00000079'1bfe2000-00000079'1bfe2fff --- 0 1000
00000079'1bfe3000-00000079'1c0defff rw- 0 fc000 [anon:stack_and_tls:11290]
00000079'1c0df000-00000079'1c0dffff --- 0 1000
--->Fault address falls at 00000079'1c0e6cd0 between mapped regions
00000079'1c113000-00000079'1c591fff --- 0 47f000
00000079'1c592000-00000079'1c593fff rw- 0 2000
00000079'1c594000-00000079'1d112fff --- 0 b7f000
00000079'1d113000-00000079'1d113fff --- 0 1000
... ...
出現(xiàn)錯(cuò)誤的地址是個(gè)空洞,也就是出現(xiàn)空指針,通過(guò)對(duì)比匯編代碼和C++代碼可以得知x0x3(w0w3)正好是傳遞了4個(gè)參數(shù),ldr x8, [x8,#128]
準(zhǔn)備賦值給x8寄存器onTransact
函數(shù)類(lèi)型的函數(shù)指針,然后blr x8
跳轉(zhuǎn)到x8執(zhí)行子程序,說(shuō)明沒(méi)有找到onTransact這個(gè)函數(shù)地址。
以上分析也不好判斷是前面這個(gè)x8寄存器本身出了問(wèn)題,在運(yùn)行過(guò)程中被意外修改,還是訪問(wèn)這塊內(nèi)存有毛病被修改了,但還是不要懷疑是binder庫(kù)本身的問(wèn)題,binder是系統(tǒng)核心功能,有問(wèn)題早就掛在其他地方了。好消息是這個(gè)問(wèn)題是必現(xiàn)的,我們有足夠的試錯(cuò)機(jī)會(huì)。
通過(guò)重新搜尋surfaceflinger相關(guān)代碼,發(fā)現(xiàn)MTK為其定制了一個(gè)庫(kù)libsurface_ext.so
,擴(kuò)展了一些功能,并且實(shí)現(xiàn)了一個(gè)名為SurfaceExtService的binder服務(wù)添加道ServiceManager里:
extern "C" void createSurfaceExtService() {
const sp<IServiceManager> sm = defaultServiceManager();
if (sm == nullptr) {
LOGE("Can't get ServiceManager");
} else {
sp<IBinder> binder = sm->checkService(String16(SERVICE_NAME));
if (binder != nullptr) {
LOGW("SurfaceExtService added");
} else {
sp<SurfaceExtService> sfext = new SurfaceExtService();
sm->addService(String16(SERVICE_NAME), sfext, false);
LOGI("SurfaceExtService add");
}
}
}
通過(guò)宏注釋編譯代碼,去掉SurfaceExtService服務(wù),重新編譯push,再次通過(guò)Android Studio安裝應(yīng)用,surfaceflinger不出現(xiàn)奔潰。這下問(wèn)題能夠縮小到libsurface_ext.so
與binder
相關(guān)功能。
3、解決問(wèn)題
反復(fù)看了巨久的代碼,并不能找到問(wèn)題,后面突發(fā)靈感,還原代碼重新編譯運(yùn)行surfaceflinger,通過(guò)cat /proc/[pid]/maps |grep libsurface_ext
命令查看surfaceflinger的內(nèi)存映射,發(fā)現(xiàn)libsurface_ext.so并不在內(nèi)存映射中。
搜索代碼發(fā)現(xiàn)其動(dòng)態(tài)加載so的位置:
SurfaceExtServiceHelper.cpp
//
// Created by kryo on 1/28/24.
//
#include <log/log.h>
#include <dlfcn.h>
void createSurfaceExtServiceProxy() {
void* soHandle = dlopen("libsurface_ext.so", RTLD_LAZY);
if (soHandle) {
void (*createSurfaceExtPtr)();
createSurfaceExtPtr = (decltype(createSurfaceExtPtr))(dlsym(soHandle, "createSurfaceExtService"));
if (NULL == createSurfaceExtPtr) {
dlclose(soHandle);
soHandle = nullptr;
ALOGE("createSurfaceExtService not found");
} else {
createSurfaceExtPtr();
dlclose(soHandle); // ?
ALOGD("createSurfaceExtPtr()");
}
} else {
soHandle = nullptr;
ALOGE("lib_surface_ext.so not found");
}
}
代碼邏輯dlopen加載libsurface_ext.so庫(kù),然后使用dlsym搜索符號(hào)createSurfaceExtService
拿到函數(shù)指針并調(diào)用(看起來(lái)和Java反射有點(diǎn)類(lèi)似),日志createSurfaceExtPtr()也成功打印。
乍一看沒(méi)有問(wèn)題,但是為啥maps里面并沒(méi)有l(wèi)ibsurface_ext.so,仔細(xì)一看createSurfaceExtPtr()調(diào)用后又調(diào)用了dlclose()
,去掉這個(gè)調(diào)用,重新編譯push,問(wèn)題不再?gòu)?fù)現(xiàn),之前繞了一大圈沒(méi)找到bug,原來(lái)被這塊給坑了。
回溯下之前的onTransact調(diào)用出錯(cuò)的問(wèn)題,x8寄存器訪問(wèn)的內(nèi)存所在的區(qū)間就是libsurface_ext.so加載的內(nèi)存映射,后面dlclose又把so庫(kù)卸載了,回頭再看tombstone文件memory map段也沒(méi)有這個(gè)so
這個(gè)庫(kù)使用了binder,有重寫(xiě)B(tài)Binder::onTransact方法:
status_t BBinder::onTransact(
uint32_t code, const Parcel& data, Parcel* reply, uint32_t /*flags*/);
推測(cè)進(jìn)行Binder調(diào)用時(shí)通過(guò)vtable(虛表指針偏移128字節(jié))訪問(wèn)該庫(kù)Binder對(duì)象找被重寫(xiě)的onTransact函數(shù)指針,再跳轉(zhuǎn)到函數(shù)去執(zhí)行,結(jié)果獲取函數(shù)指針時(shí)訪問(wèn)的不存在,導(dǎo)致進(jìn)程crash掉!
調(diào)用dumpsys
、service list
等可以觸發(fā)surfaceflinger的binder調(diào)用,均能復(fù)現(xiàn)該問(wèn)題,應(yīng)該也是Android Studio安裝app能復(fù)現(xiàn)的原因。
總結(jié)
對(duì)于代碼動(dòng)態(tài)添加binder功能的so庫(kù)時(shí),我們盡量保存打開(kāi)的so句柄,并在所有業(yè)務(wù)結(jié)束后才能通過(guò)dlclose()關(guān)閉句柄,以防止因so庫(kù)的引用計(jì)數(shù)為0時(shí)被系統(tǒng)從內(nèi)存中卸載,且對(duì)于surfaceflinger這種永不退出的進(jìn)程來(lái)說(shuō)沒(méi)有必要調(diào)用關(guān)閉,這個(gè)鍋要甩給MTK開(kāi)發(fā)。
Reference
-
Linux 動(dòng)態(tài)鏈接過(guò)程中的【重定位】底層原理文章來(lái)源:http://www.zghlxwxcb.cn/news/detail-844287.html
-
《程序員的自我修養(yǎng):鏈接、裝載與庫(kù)》文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-844287.html
到了這里,關(guān)于記一次dlopen使用問(wèn)題導(dǎo)致Framework重啟,tombstones、pmap與反匯編分析(上)的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!