git

My personal website source code
Log | Files | Refs | Submodules | README | LICENSE

commit 8aaf140dd381d0e39c31d42d0cf7b239628a0195
parent b9c04eb5b947724a795cf933a19b226cd28cad4b
Author: Luís Ferreira <contact@lsferreira.net>
Date:   Thu, 30 Sep 2021 17:01:01 +0100

content: posts: add 1st SAOC 2021 week update

Signed-off-by: Luís Ferreira <contact@lsferreira.net>

Diffstat:
Acontent/posts/d-saoc-2021-01.md | 124+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 124 insertions(+), 0 deletions(-)

diff --git a/content/posts/d-saoc-2021-01.md b/content/posts/d-saoc-2021-01.md @@ -0,0 +1,124 @@ +# SAOC LLDB D integration: 1st Weekly Update + +Hi D community! + +I'm here to describe what I've done during the first week on the Symmetry +Autumn of Code. + +## `liblldbd` + +During the discussion for the milestones plan with my mentor, I decided to +advance some work and wrote a simple C API around D runtime demangler to expose +the D demangler API into a C interface. This would allow in the future to +implement an LLDB language plugin into the LLVM. The source code is available +on Github, +[liblldbd](https://github.com/ljmf00/liblldbd). + +### Alternatives to `liblldbd` + +In the meanwhile, we decided to focus on porting libiberty demangler codebase +to the LLVM upstream repository since it would provide much more benefits and +acceptance to be upstreamed. So the `liblldbd` is a plan B if libiberty is not +accepted by the LLVM team. + +## Port of `libiberty` demangler + +Right after we finished the plan, in which you can follow up +[here](https://pad.riseup.net/p/r.05c919765a66f89368a3fc28c98432db), I started +porting `libiberty` and integrate the code into the LLVM core. Similarly to +Rust demangler, I tried to follow up some patches on the [LLVM review +platform](https://reviews.llvm.org/) and the awesome documentation that LLVM +provides. + +This ended up being relatively easy to plug into the LLVM codebase, since most +of the demangler logic was isolated in one file, thanks to Iain (@ibuclaw) for +the excelent code. Because I didn't expect this to be so plug and play I +decided to extensively test the code using the robust test suite that LLVM +provides. + +## Testing + +First, I started to port the `libiberty` test suite for D demangling and right +after wrote some `libfuzzer` tests and ran it with an address sanitizer and UB +sanitizer. + +### Security vulnerabilities + +The `libfuzzer` results took some time to show up but I got some interesting +outputs from there. The most interesting one was a heap/stack buffer overflow. +I also managed to find a null dereferencing. Both, with a crafted malicious +mangle name, can trigger a segmentation fault or undefined behaviour by +reading/writing to a protected memory space. + +I wrote a patch to fix both issues and contacted MITRE for standard +vulnerabilities reporting procedure, since GCC is widely used and can +potentially cause some issues. I pushed those patches into the GCC mailing +list, and I'm currently waiting for appreciation. You can check those two +patches +[here](https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579985.html) +and +[here](https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579987.html). + +After patching the code I ran the fuzzer again and after some hours the fuzzer +reported a timeout with a huge number of recursive calls. I carefully analyzed +the generated output mangle that the fuzzer created and found out that it is a +very repetitive name. Doing some superficial analysis I found out that those +recursive calls are creating exponential time complexity and can cause the +demangler to wait for hours or even days to complete. I believe that this can +also be used to maliciously cause a denial of service, although I didn't have +much time to profile it yet. + +To have some discussion about this I'm going to create a thread on the GCC +security mailing list and express some solutions to mitigate those problems, +such as integrating part of the codebase into the OSS fuzzer. + +Before that, I'm waiting for a reply to the message I sent to MITRE, which was +forwarded to Red Hat security team for further appreciation. + +I don't really know if this is crucial to share now, but I saved the fuzzer +result, if anyone is interested in researching more ideas of crafted mangles to +feed the address/UB sanitizer. + +## LLDB integration + +The last task I was working on (today) was on finalizing the LLDB integration. +I still need to write some tests but the most important fact is that it is +already working! My LLDB tree can successfully pretty print the mangled names. +My fork is available on my Github, +[here](https://github.com/ljmf00/llvm-project/tree/add-d-demangler). + +### Some considerations + +From the first time I built LLVM I found out that compiling it with debug +information is extremely costly in terms of memory usage, since linking all +those symbols at once can consume a lot of RAM. I recommend you build it with +`Release` flags. + +Here is my `cmake` config so far, if someone wants to test my work at any +point. +``` +cmake -S llvm -B build -G Ninja \ + -DLLVM_ENABLE_PROJECTS="clang;libcxx;libcxxabi;lldb" \ + -DCMAKE_BUILD_TYPE=Release \ + -DLLDB_EXPORT_ALL_SYMBOLS=0 \ + -DLLVM_ENABLE_ASSERTIONS=ON \ + -DLLVM_CCACHE_BUILD=ON \ + -DLLVM_LINK_LLVM_DYLIB=ON \ + -DCLANG_LINK_CLANG_DYLIB=ON +``` + +To build LLDB, you can do something like: + +``` +cmake --build build -- lldb -j$(nproc --all) +``` + +## What's next? + +Next week, I'm going to have an eye on the time complexity problem, try to +solve it, restructure the code to look a bit more C++ish and finishing the LLDB +test suite to finally start upstreaming my changes. Although, this can take a +while, since there is a challenge, described in the plan, which is +dual-licensing the GCC codebase with LLVM codebase. This is cooperatively being +handled by Mathias (my mentor), Iain and GCC team. +