commit 8aaf140dd381d0e39c31d42d0cf7b239628a0195
parent b9c04eb5b947724a795cf933a19b226cd28cad4b
Author: Luís Ferreira <contact@lsferreira.net>
Date: Thu, 30 Sep 2021 17:01:01 +0100
content: posts: add 1st SAOC 2021 week update
Signed-off-by: Luís Ferreira <contact@lsferreira.net>
Diffstat:
1 file changed, 124 insertions(+), 0 deletions(-)
diff --git a/content/posts/d-saoc-2021-01.md b/content/posts/d-saoc-2021-01.md
@@ -0,0 +1,124 @@
+# SAOC LLDB D integration: 1st Weekly Update
+
+Hi D community!
+
+I'm here to describe what I've done during the first week on the Symmetry
+Autumn of Code.
+
+## `liblldbd`
+
+During the discussion for the milestones plan with my mentor, I decided to
+advance some work and wrote a simple C API around D runtime demangler to expose
+the D demangler API into a C interface. This would allow in the future to
+implement an LLDB language plugin into the LLVM. The source code is available
+on Github,
+[liblldbd](https://github.com/ljmf00/liblldbd).
+
+### Alternatives to `liblldbd`
+
+In the meanwhile, we decided to focus on porting libiberty demangler codebase
+to the LLVM upstream repository since it would provide much more benefits and
+acceptance to be upstreamed. So the `liblldbd` is a plan B if libiberty is not
+accepted by the LLVM team.
+
+## Port of `libiberty` demangler
+
+Right after we finished the plan, in which you can follow up
+[here](https://pad.riseup.net/p/r.05c919765a66f89368a3fc28c98432db), I started
+porting `libiberty` and integrate the code into the LLVM core. Similarly to
+Rust demangler, I tried to follow up some patches on the [LLVM review
+platform](https://reviews.llvm.org/) and the awesome documentation that LLVM
+provides.
+
+This ended up being relatively easy to plug into the LLVM codebase, since most
+of the demangler logic was isolated in one file, thanks to Iain (@ibuclaw) for
+the excelent code. Because I didn't expect this to be so plug and play I
+decided to extensively test the code using the robust test suite that LLVM
+provides.
+
+## Testing
+
+First, I started to port the `libiberty` test suite for D demangling and right
+after wrote some `libfuzzer` tests and ran it with an address sanitizer and UB
+sanitizer.
+
+### Security vulnerabilities
+
+The `libfuzzer` results took some time to show up but I got some interesting
+outputs from there. The most interesting one was a heap/stack buffer overflow.
+I also managed to find a null dereferencing. Both, with a crafted malicious
+mangle name, can trigger a segmentation fault or undefined behaviour by
+reading/writing to a protected memory space.
+
+I wrote a patch to fix both issues and contacted MITRE for standard
+vulnerabilities reporting procedure, since GCC is widely used and can
+potentially cause some issues. I pushed those patches into the GCC mailing
+list, and I'm currently waiting for appreciation. You can check those two
+patches
+[here](https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579985.html)
+and
+[here](https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579987.html).
+
+After patching the code I ran the fuzzer again and after some hours the fuzzer
+reported a timeout with a huge number of recursive calls. I carefully analyzed
+the generated output mangle that the fuzzer created and found out that it is a
+very repetitive name. Doing some superficial analysis I found out that those
+recursive calls are creating exponential time complexity and can cause the
+demangler to wait for hours or even days to complete. I believe that this can
+also be used to maliciously cause a denial of service, although I didn't have
+much time to profile it yet.
+
+To have some discussion about this I'm going to create a thread on the GCC
+security mailing list and express some solutions to mitigate those problems,
+such as integrating part of the codebase into the OSS fuzzer.
+
+Before that, I'm waiting for a reply to the message I sent to MITRE, which was
+forwarded to Red Hat security team for further appreciation.
+
+I don't really know if this is crucial to share now, but I saved the fuzzer
+result, if anyone is interested in researching more ideas of crafted mangles to
+feed the address/UB sanitizer.
+
+## LLDB integration
+
+The last task I was working on (today) was on finalizing the LLDB integration.
+I still need to write some tests but the most important fact is that it is
+already working! My LLDB tree can successfully pretty print the mangled names.
+My fork is available on my Github,
+[here](https://github.com/ljmf00/llvm-project/tree/add-d-demangler).
+
+### Some considerations
+
+From the first time I built LLVM I found out that compiling it with debug
+information is extremely costly in terms of memory usage, since linking all
+those symbols at once can consume a lot of RAM. I recommend you build it with
+`Release` flags.
+
+Here is my `cmake` config so far, if someone wants to test my work at any
+point.
+```
+cmake -S llvm -B build -G Ninja \
+ -DLLVM_ENABLE_PROJECTS="clang;libcxx;libcxxabi;lldb" \
+ -DCMAKE_BUILD_TYPE=Release \
+ -DLLDB_EXPORT_ALL_SYMBOLS=0 \
+ -DLLVM_ENABLE_ASSERTIONS=ON \
+ -DLLVM_CCACHE_BUILD=ON \
+ -DLLVM_LINK_LLVM_DYLIB=ON \
+ -DCLANG_LINK_CLANG_DYLIB=ON
+```
+
+To build LLDB, you can do something like:
+
+```
+cmake --build build -- lldb -j$(nproc --all)
+```
+
+## What's next?
+
+Next week, I'm going to have an eye on the time complexity problem, try to
+solve it, restructure the code to look a bit more C++ish and finishing the LLDB
+test suite to finally start upstreaming my changes. Although, this can take a
+while, since there is a challenge, described in the plan, which is
+dual-licensing the GCC codebase with LLVM codebase. This is cooperatively being
+handled by Mathias (my mentor), Iain and GCC team.
+