SAOC LLDB D integration: 1st Weekly Update

Hi D community!

I’m here to describe what I’ve done during the first week on the Symmetry Autumn of Code.

liblldbd

During the discussion for the milestones plan with my mentor, I decided to advance some work and wrote a simple C API around D runtime demangler to expose the D demangler API into a C interface. This would allow in the future to implement an LLDB language plugin into the LLVM. The source code is available on Github, liblldbd.

Alternatives to liblldbd

In the meanwhile, we decided to focus on porting libiberty demangler codebase to the LLVM upstream repository since it would provide much more benefits and acceptance to be upstreamed. So the liblldbd is a plan B if libiberty is not accepted by the LLVM team.

Port of libiberty demangler

Right after we finished the plan, in which you can follow up here, I started porting libiberty and integrate the code into the LLVM core. Similarly to Rust demangler, I tried to follow up some patches on the LLVM review platform and the awesome documentation that LLVM provides.

This ended up being relatively easy to plug into the LLVM codebase, since most of the demangler logic was isolated in one file, thanks to Iain (@ibuclaw) for the excelent code. Because I didn’t expect this to be so plug and play I decided to extensively test the code using the robust test suite that LLVM provides.

Testing

First, I started to port the libiberty test suite for D demangling and right after wrote some libfuzzer tests and ran it with an address sanitizer and UB sanitizer.

Security vulnerabilities

The libfuzzer results took some time to show up but I got some interesting outputs from there. The most interesting one was a heap/stack buffer overflow. I also managed to find a null dereferencing. Both, with a crafted malicious mangle name, can trigger a segmentation fault or undefined behaviour by reading/writing to a protected memory space.

I wrote a patch to fix both issues and contacted MITRE for standard vulnerabilities reporting procedure, since GCC is widely used and can potentially cause some issues. I pushed those patches into the GCC mailing list, and I’m currently waiting for appreciation. You can check those two patches here and here.

After patching the code I ran the fuzzer again and after some hours the fuzzer reported a timeout with a huge number of recursive calls. I carefully analyzed the generated output mangle that the fuzzer created and found out that it is a very repetitive name. Doing some superficial analysis I found out that those recursive calls are creating exponential time complexity and can cause the demangler to wait for hours or even days to complete. I believe that this can also be used to maliciously cause a denial of service, although I didn’t have much time to profile it yet.

To have some discussion about this I’m going to create a thread on the GCC security mailing list and express some solutions to mitigate those problems, such as integrating part of the codebase into the OSS fuzzer.

Before that, I’m waiting for a reply to the message I sent to MITRE, which was forwarded to Red Hat security team for further appreciation.

I don’t really know if this is crucial to share now, but I saved the fuzzer result, if anyone is interested in researching more ideas of crafted mangles to feed the address/UB sanitizer.

LLDB integration

The last task I was working on (today) was on finalizing the LLDB integration. I still need to write some tests but the most important fact is that it is already working! My LLDB tree can successfully pretty print the mangled names. My fork is available on my Github, here.

Some considerations

From the first time I built LLVM I found out that compiling it with debug information is extremely costly in terms of memory usage, since linking all those symbols at once can consume a lot of RAM. I recommend you build it with Release flags.

Here is my cmake config so far, if someone wants to test my work at any point.

cmake -S llvm -B build -G Ninja \
       -DLLVM_ENABLE_PROJECTS="clang;libcxx;libcxxabi;lldb" \
       -DCMAKE_BUILD_TYPE=Release \
       -DLLDB_EXPORT_ALL_SYMBOLS=0 \
       -DLLVM_ENABLE_ASSERTIONS=ON \
       -DLLVM_CCACHE_BUILD=ON \
       -DLLVM_LINK_LLVM_DYLIB=ON \
       -DCLANG_LINK_CLANG_DYLIB=ON

To build LLDB, you can do something like:

cmake --build build -- lldb -j$(nproc --all)

What’s next?

Next week, I’m going to have an eye on the time complexity problem, try to solve it, restructure the code to look a bit more C++ish and finishing the LLDB test suite to finally start upstreaming my changes. Although, this can take a while, since there is a challenge, described in the plan, which is dual-licensing the GCC codebase with LLVM codebase. This is cooperatively being handled by Mathias (my mentor), Iain and GCC team.

You can also read this on the D programming language forum, here, and discuss there.

Read about the next week.