Following the development of the investigations on the recently reported Mesa problem triggered by Neverball and Metisse, Ademar reports that “the workaround is not effective with at least ATI 9250 video cards, where we now have a crash at a different place”. We set up a system with an ATI Radeon 9250, and, indeed, it still crashes:

Mesa: Mesa 7.0.1 DEBUG build Oct  1 2007 18:52:02
Mesa warning: couldn't open libtxc_dxtn.so, software DXTn compression/
decompression unavailable
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1218062640 (LWP 17221)]
_generic_read_RGBA_span_RGB565_MMX () at x86/read_rgba_span_x86.S:590
590             pushl   MASK_565_H

A segfault in a push instruction sounds very odd. Read on to see what the differential diagnosis session with Boto and Salem lead us into. And unlike the previous patch, this one resulted in a real fix for a real problem:

Don’t read the patch if you want to find the bug yourself based on the scenario description below.

As we said before, a segfault in a push instruction with immediate argument is quite uncommon, unless something very bad happens to the stack pointer. But in this case, it contains a sane value, in a valid page and inside the process stack area:

(gdb) info registers
eax            0xaef1292a       -1359926998
ecx            0x4      4
edx            0xae1a900c       -1373990900
ebx            0xb6bf60ec       -1228971796
esp            0xbfe8a56c       0xbfe8a56c
ebp            0xbfe8a5c8       0xbfe8a5c8
…

(gdb) print *(int*)0xbfe8a56c
$6 = -1231486065
(gdb) print (*(int*)0xbfe8a56c = 0)
$7 = 0

Checking /proc/<pid>/smaps shows us that:

bfe7b000-bfe90000 rwxp bfe7b000 00:00 0          [stack]
Size:                 84 kB
Rss:                  56 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:        56 kB
Referenced:           56 kB

Here is the context where the problem happens (MASK_565_H is defined as 0×0000001f and MASK_565_L is 0×07e0f800):

574 _generic_read_RGBA_span_RGB565_MMX:
575
576 #ifdef USE_INNER_EMMS
577     emms
578 #endif
579
580     movl    4(%esp), %eax   /* source pointer */
581     movl    8(%esp), %edx   /* destination pointer */
582     movl    12(%esp), %ecx  /* number of pixels to copy */
583
584 /* Kevin F. Quinn 2nd July 2006
585  * Replace data segment constants with text-segment instructions
586     movq    mask_565, %mm5
587     movq    prescale, %mm6
588     movq    scale, %mm7
589  */
590     pushl   MASK_565_H
591     pushl   MASK_565_L
592     movq    (%esp), %mm5

Now you have all the data you need to spot the bug, and the solution is really very simple. Can you solve the problem?

Leave a Reply