d-saoc-2021-01.md (6047B)
1 --- 2 title: 'SAOC LLDB D integration: 1st Weekly Update' 3 date: '2021-09-23T07:32:00+01:00' 4 tags: ['saoc', 'saoc2021', 'dlang', 'llvm', 'lldb', 'debug', 'debugging'] 5 description: "This post describes what I’ve done on the 1st week of the 6 Symmetry Autumn of Code 2021, including the proposed liblldbd demangler API 7 alternative, port of the libiberty demangler to LLVM codebase, tests performed 8 and security vulnerabilities found including a stack/heap buffer overflow on 9 the GCC codebase. I also mention some considerations to build the project." 10 --- 11 12 # SAOC LLDB D integration: 1st Weekly Update 13 14 Hi D community! 15 16 I'm here to describe what I've done during the first week on the Symmetry 17 Autumn of Code. 18 19 ## `liblldbd` 20 21 During the discussion for the milestones plan with my mentor, I decided to 22 advance some work and wrote a simple C API around D runtime demangler to expose 23 the D demangler API into a C interface. This would allow in the future to 24 implement an LLDB language plugin into the LLVM. The source code is available 25 on Github, 26 [liblldbd](https://github.com/ljmf00/liblldbd). 27 28 ### Alternatives to `liblldbd` 29 30 In the meanwhile, we decided to focus on porting libiberty demangler codebase 31 to the LLVM upstream repository since it would provide much more benefits and 32 acceptance to be upstreamed. So the `liblldbd` is a plan B if libiberty is not 33 accepted by the LLVM team. 34 35 ## Port of `libiberty` demangler 36 37 Right after we finished the plan, in which you can follow up 38 [here](../../public/assets/posts/d-saoc-2021-01/milestones.md), I started 39 porting `libiberty` and integrate the code into the LLVM core. Similarly to 40 Rust demangler, I tried to follow up some patches on the [LLVM review 41 platform](https://reviews.llvm.org/) and the awesome documentation that LLVM 42 provides. 43 44 This ended up being relatively easy to plug into the LLVM codebase, since most 45 of the demangler logic was isolated in one file, thanks to Iain (@ibuclaw) for 46 the excelent code. Because I didn't expect this to be so plug and play I 47 decided to extensively test the code using the robust test suite that LLVM 48 provides. 49 50 ## Testing 51 52 First, I started to port the `libiberty` test suite for D demangling and right 53 after wrote some `libfuzzer` tests and ran it with an address sanitizer and UB 54 sanitizer. 55 56 ### Security vulnerabilities 57 58 The `libfuzzer` results took some time to show up but I got some interesting 59 outputs from there. The most interesting one was a heap/stack buffer overflow. 60 I also managed to find a null dereferencing. Both, with a crafted malicious 61 mangle name, can trigger a segmentation fault or undefined behaviour by 62 reading/writing to a protected memory space. 63 64 I wrote a patch to fix both issues and contacted MITRE for standard 65 vulnerabilities reporting procedure, since GCC is widely used and can 66 potentially cause some issues. I pushed those patches into the GCC mailing 67 list, and I'm currently waiting for appreciation. You can check those two 68 patches 69 [here](https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579985.html) 70 and 71 [here](https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579987.html). 72 73 After patching the code I ran the fuzzer again and after some hours the fuzzer 74 reported a timeout with a huge number of recursive calls. I carefully analyzed 75 the generated output mangle that the fuzzer created and found out that it is a 76 very repetitive name. Doing some superficial analysis I found out that those 77 recursive calls are creating exponential time complexity and can cause the 78 demangler to wait for hours or even days to complete. I believe that this can 79 also be used to maliciously cause a denial of service, although I didn't have 80 much time to profile it yet. 81 82 To have some discussion about this I'm going to create a thread on the GCC 83 security mailing list and express some solutions to mitigate those problems, 84 such as integrating part of the codebase into the OSS fuzzer. 85 86 Before that, I'm waiting for a reply to the message I sent to MITRE, which was 87 forwarded to Red Hat security team for further appreciation. 88 89 I don't really know if this is crucial to share now, but I saved the fuzzer 90 result, if anyone is interested in researching more ideas of crafted mangles to 91 feed the address/UB sanitizer. 92 93 ## LLDB integration 94 95 The last task I was working on (today) was on finalizing the LLDB integration. 96 I still need to write some tests but the most important fact is that it is 97 already working! My LLDB tree can successfully pretty print the mangled names. 98 My fork is available on my Github, 99 [here](https://github.com/ljmf00/llvm-project/tree/add-d-demangler). 100 101 ### Some considerations 102 103 From the first time I built LLVM I found out that compiling it with debug 104 information is extremely costly in terms of memory usage, since linking all 105 those symbols at once can consume a lot of RAM. I recommend you build it with 106 `Release` flags. 107 108 Here is my `cmake` config so far, if someone wants to test my work at any 109 point. 110 ```bash 111 cmake -S llvm -B build -G Ninja \ 112 -DLLVM_ENABLE_PROJECTS="clang;libcxx;libcxxabi;lldb" \ 113 -DCMAKE_BUILD_TYPE=Release \ 114 -DLLDB_EXPORT_ALL_SYMBOLS=0 \ 115 -DLLVM_ENABLE_ASSERTIONS=ON \ 116 -DLLVM_CCACHE_BUILD=ON \ 117 -DLLVM_LINK_LLVM_DYLIB=ON \ 118 -DCLANG_LINK_CLANG_DYLIB=ON 119 ``` 120 121 To build LLDB, you can do something like: 122 123 ```bash 124 cmake --build build -- lldb -j$(nproc --all) 125 ``` 126 127 ## What's next? 128 129 Next week, I'm going to have an eye on the time complexity problem, try to 130 solve it, restructure the code to look a bit more C++ish and finishing the LLDB 131 test suite to finally start upstreaming my changes. Although, this can take a 132 while, since there is a challenge, described in the plan, which is 133 dual-licensing the GCC codebase with LLVM codebase. This is cooperatively being 134 handled by Mathias (my mentor), Iain and GCC team. 135 136 You can also read this on the D programming language forum, 137 [here](https://forum.dlang.org/thread/mailman.437.1632358782.21945.digitalmars-d@puremagic.com), 138 and discuss there. 139 140 Read about the [next week](../d-saoc-2021-02/).