缘由

自从老灯切到了 Wayland, 好像 epiphany 就基本上打不开了. 由于这个浏览器平常也不怎么用, 因此也就一直没管. 今天周末, 刚好抽空简单看下.

排查

先用 gdb 看看:

 ❯ gdb epiphany 
GNU gdb (GDB) 12.1
This GDB supports auto-downloading debuginfo from the following URLs:
https://debuginfod.archlinux.org 
Enable debuginfod for this session? (y or [n]) y
Debuginfod has been enabled.
To make this setting permanent, add 'set debuginfod enabled on' to .gdbinit.

(gdb) r
Starting program: /usr/bin/epiphany 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".

...



Thread 1 "epiphany" received signal SIGSEGV, Segmentation fault.
wl_resource_get_destroy_listener (resource=0x0, notify=0x7ffff1003980 <(anonymous namespace)::ClientBundleEGL::bufferDestroyListenerCallback(wl_listener*, void*)>) at ../wayland-1.21.0/src/wayland-server.c:850
850		if (resource_is_deprecated(resource))


(gdb) bt
#0  wl_resource_get_destroy_listener
    (resource=0x0, notify=0x7ffff1003980 <(anonymous namespace)::ClientBundleEGL::bufferDestroyListenerCallback(wl_listener*, void*)>) at ../wayland-1.21.0/src/wayland-server.c:850
#1  0x00007ffff1005b4b in (anonymous namespace)::ClientBundleEGL::findImage (bufferResource=0x0, this=0x55555686c9c0)
    at ../WPEBackend-fdo/src/view-backend-exportable-fdo-egl.cpp:270
#2  (anonymous namespace)::ClientBundleEGL::exportBuffer(wl_resource*) (this=0x55555686c9c0, bufferResource=0x0)
    at ../WPEBackend-fdo/src/view-backend-exportable-fdo-egl.cpp:181
#3  0x00007ffff18cb536 in ffi_call_unix64 () at ../src/x86/unix64.S:105
#4  0x00007ffff18c8037 in ffi_call_int
    (cif=<optimized out>, fn=<optimized out>, rvalue=<optimized out>, avalue=<optimized out>, closure=<optimized out>)
    at ../src/x86/ffi64.c:672
#5  0x00007ffff0403ada in wl_closure_invoke ([email protected]=0x555556cd4880, target=<optimized out>, 
    [email protected]=0x555556c7ba20, opcode=[email protected]=6, data=<optimized out>, [email protected]=0x555556a18b40, flags=2)
    at ../wayland-1.21.0/src/connection.c:1025
#6  0x00007ffff0408010 in wl_client_connection_data (fd=<optimized out>, mask=<optimized out>, data=<optimized out>)
    at ../wayland-1.21.0/src/wayland-server.c:437
#7  0x00007ffff04069e2 in wl_event_loop_dispatch (loop=0x5555559b30a0, timeout=<optimized out>)
    at ../wayland-1.21.0/src/event-loop.c:1027
#8  0x00007ffff10068c5 in operator() (__closure=0x0, base=0x555555a5f670) at ../WPEBackend-fdo/src/ws.cpp:77
#9  _FUN(GSource*, GSourceFunc, gpointer) () at ../WPEBackend-fdo/src/ws.cpp:86
#10 0x00007ffff734cc6b in g_main_dispatch (context=0x5555559210e0) at ../glib/glib/gmain.c:3417
#11 g_main_context_dispatch (context=0x5555559210e0) at ../glib/glib/gmain.c:4135
#12 0x00007ffff73a3001 in g_main_context_iterate.constprop.0
    (context=[email protected]=0x5555559210e0, block=[email protected]=1, dispatch=[email protected]=1, self=<optimized out>)
    at ../glib/glib/gmain.c:4211
--Type <RET> for more, q to quit, c to continue without paging--c
#13 0x00007ffff734a392 in g_main_context_iteration ([email protected]=0x5555559210e0, [email protected]=1) at ../glib/glib/gmain.c:4276
#14 0x00007ffff750730e in g_application_run (application=0x5555559601d0, [email protected]=1, [email protected]=0x7fffffffd3b8) at ../glib/gio/gapplication.c:2569
#15 0x0000555555558714 in main (argc=<optimized out>, argv=<optimized out>) at ../epiphany/src/ephy-main.c:428
(gdb) 

其实后台的调用栈基本上也没啥用. 简单的 segment fault 信息已经告诉我们原因了:

Thread 1 "epiphany" received signal SIGSEGV, Segmentation fault.
wl_resource_get_destroy_listener (resource=0x0, notify=0x7ffff1003980 <(anonymous namespace)::ClientBundleEGL::bufferDestroyListenerCallback(wl_listener*, void*)>) at ../wayland-1.21.0/src/wayland-server.c:850
850		if (resource_is_deprecated(resource))

调用wl_resource_get_destroy_listener 时传入的第一个 resource 参数是一个 null 指针, 导致 850 行的 resource_is_deprecated(resource) 调用 SIGSEGV 了

到这里你是不是觉得我要操刀开始去看代码了?

no. 首先,我打算把这个bt 信息给提交到 GNOME issue. 结果 Gitlab 的 issue 管理还是挺智能的. 当我敲出 “Epiphany crash Under Wayland” 时, 下面自动弹出了一些它觉得可能是同一个问题的issue, 于是我点进去了 https://gitlab.gnome.org/GNOME/epiphany/-/issues/1832

这个 issue creator 比我多做了一步, 他还尝试X11方式启动, 发现完全正常. 我试一下, GDK_BACKEND=x11 epiphany, 结果也是完全正常的.

GNOME 那边的人回复了, Michael Catanzaro @mcatanzaro (严格上来说是在 Red Hat的人):

Hi, you’ll need to report this on the wpebackend-fdo issue tracker, here. Good luck….

随即issue马上被关闭了, 并加上了 Not GNOME label.

嗯, GNOME 的人关 issue 速度都挺快的.

上游的问题, 关了.

看上去是个悲伤的故事.

不过, 3分钟后, the guy 又回复了:

Actually, looks like it is already fixed by https://github.com/Igalia/WPEBackend-fdo/pull/176/ which is awaiting review. (CC @aperezdc)

这个修复主要就是像我们上面说的, 在调用 wl_resource_get_destroy_listener 之前, 判断第一个参数确保它不是 null 指针.

不过除了这个提交, 他还做了第二个提交:

Only delete images in releaseImage Deleting images in bufferDestroyListenerCallback is incorrect, and caused a double free.

https://github.com/Igalia/WPEBackend-fdo/pull/176/commits/fcf330cc3036808b6fb83ee7a6cef4f5ff9e00c8

所以, 专业的东西,还是得专业的人去修, 如果是由我们自己动手, 可能只会有第一个commit, 也就是 null 判断.

自己动手

等上游的上游修复, 上游再修复, package packer 再更新, 这个周期可能相当长. 好在 Arch 里面要自己编译一个带 patch 的东西是非常简单的事情. 这可能也是 Arch 最大的魔力之一吧.

下载官方的 https://github.com/archlinux/svntogit-packages/blob/packages/wpebackend-fdo/trunk/PKGBUILD 然后加一行 patch 命令即可.

至于 patch 文件, Github 的 PR 都是直接 URL 后面加上 .patch即可取: https://github.com/Igalia/WPEBackend-fdo/pull/176.patch

From 3318283ffe62a536cfbff307c77505d848d7098f Mon Sep 17 00:00:00 2001
From: Jordy Vieira <[email protected]>
Date: Sat, 9 Jul 2022 17:17:14 -0300
Subject: [PATCH 1/2] Fix SIGSEGV

---
 src/view-backend-exportable-fdo-egl.cpp | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/src/view-backend-exportable-fdo-egl.cpp b/src/view-backend-exportable-fdo-egl.cpp
index 09bb2bf..1a73269 100644
--- a/src/view-backend-exportable-fdo-egl.cpp
+++ b/src/view-backend-exportable-fdo-egl.cpp
@@ -267,9 +267,11 @@ class ClientBundleEGL final : public ClientBundle {
 private:
     struct wpe_fdo_egl_exported_image* findImage(struct wl_resource* bufferResource)
     {
-        if (auto* listener = wl_resource_get_destroy_listener(bufferResource, bufferDestroyListenerCallback)) {
-            struct wpe_fdo_egl_exported_image* image;
-            return wl_container_of(listener, image, bufferDestroyListener);
+        if (bufferResource) {
+            if (auto* listener = wl_resource_get_destroy_listener(bufferResource, bufferDestroyListenerCallback)) {
+                struct wpe_fdo_egl_exported_image* image;
+                return wl_container_of(listener, image, bufferDestroyListener);
+            }
         }
 
         return nullptr;

From fcf330cc3036808b6fb83ee7a6cef4f5ff9e00c8 Mon Sep 17 00:00:00 2001
From: Jordy Vieira <[email protected]>
Date: Sat, 9 Jul 2022 19:16:03 -0300
Subject: [PATCH 2/2] Only delete images in releaseImage

Deleting images in bufferDestroyListenerCallback is incorrect, and
caused a double free.
---
 src/view-backend-exportable-fdo-egl.cpp | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/src/view-backend-exportable-fdo-egl.cpp b/src/view-backend-exportable-fdo-egl.cpp
index 1a73269..0031222 100644
--- a/src/view-backend-exportable-fdo-egl.cpp
+++ b/src/view-backend-exportable-fdo-egl.cpp
@@ -247,8 +247,6 @@ class ClientBundleEGL final : public ClientBundle {
 
     void releaseImage(struct wpe_fdo_egl_exported_image* image)
     {
-        image->exported = false;
-
         if (image->bufferResource)
             viewBackend->releaseBuffer(image->bufferResource);
         else
@@ -297,9 +295,6 @@ class ClientBundleEGL final : public ClientBundle {
         image = wl_container_of(listener, image, bufferDestroyListener);
 
         image->bufferResource = nullptr;
-
-        if (!image->exported)
-            deleteImage(image);
     }
 };

打包好的文件老灯放这:

https://github.com/ttys3/my-archlinux-pkgbuild/releases/tag/wpebackend-fdo-1.12.0-2

paru -U ./wpebackend-fdo-1.12.0-2-x86_64.pkg.tar.zst 安装即可.

测试下, 不再 crash 了.

Refs

https://wiki.archlinux.org/title/Debuginfod

https://github.com/archlinux/svntogit-packages/blob/packages/wpebackend-fdo/trunk/PKGBUILD