git

My personal website source code
Log | Files | Refs | Submodules | README | LICENSE

d-saoc-2021-01.md (6047B)


      1 ---
      2 title: 'SAOC LLDB D integration: 1st Weekly Update'
      3 date: '2021-09-23T07:32:00+01:00'
      4 tags: ['saoc', 'saoc2021', 'dlang', 'llvm', 'lldb', 'debug', 'debugging']
      5 description: "This post describes what I’ve done on the 1st week of the
      6 Symmetry Autumn of Code 2021, including the proposed liblldbd demangler API
      7 alternative, port of the libiberty demangler to LLVM codebase, tests performed
      8 and security vulnerabilities found including a stack/heap buffer overflow on
      9 the GCC codebase. I also mention some considerations to build the project."
     10 ---
     11 
     12 # SAOC LLDB D integration: 1st Weekly Update
     13 
     14 Hi D community!
     15 
     16 I'm here to describe what I've done during the first week on the Symmetry
     17 Autumn of Code.
     18 
     19 ## `liblldbd`
     20 
     21 During the discussion for the milestones plan with my mentor, I decided to
     22 advance some work and wrote a simple C API around D runtime demangler to expose
     23 the D demangler API into a C interface. This would allow in the future to
     24 implement an LLDB language plugin into the LLVM. The source code is available
     25 on Github,
     26 [liblldbd](https://github.com/ljmf00/liblldbd).
     27 
     28 ### Alternatives to `liblldbd`
     29 
     30 In the meanwhile, we decided to focus on porting libiberty demangler codebase
     31 to the LLVM upstream repository since it would provide much more benefits and
     32 acceptance to be upstreamed. So the `liblldbd` is a plan B if libiberty is not
     33 accepted by the LLVM team.
     34 
     35 ## Port of `libiberty` demangler
     36 
     37 Right after we finished the plan, in which you can follow up
     38 [here](../../public/assets/posts/d-saoc-2021-01/milestones.md), I started
     39 porting `libiberty` and integrate the code into the LLVM core. Similarly to
     40 Rust demangler, I tried to follow up some patches on the [LLVM review
     41 platform](https://reviews.llvm.org/) and the awesome documentation that LLVM
     42 provides.
     43 
     44 This ended up being relatively easy to plug into the LLVM codebase, since most
     45 of the demangler logic was isolated in one file, thanks to Iain (@ibuclaw) for
     46 the excelent code. Because I didn't expect this to be so plug and play I
     47 decided to extensively test the code using the robust test suite that LLVM
     48 provides.
     49 
     50 ## Testing
     51 
     52 First, I started to port the `libiberty` test suite for D demangling and right
     53 after wrote some `libfuzzer` tests and ran it with an address sanitizer and UB
     54 sanitizer.
     55 
     56 ### Security vulnerabilities
     57 
     58 The `libfuzzer` results took some time to show up but I got some interesting
     59 outputs from there. The most interesting one was a heap/stack buffer overflow.
     60 I also managed to find a null dereferencing.  Both, with a crafted malicious
     61 mangle name, can trigger a segmentation fault or undefined behaviour by
     62 reading/writing to a protected memory space.
     63 
     64 I wrote a patch to fix both issues and contacted MITRE for standard
     65 vulnerabilities reporting procedure, since GCC is widely used and can
     66 potentially cause some issues. I pushed those patches into the GCC mailing
     67 list, and I'm currently waiting for appreciation. You can check those two
     68 patches
     69 [here](https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579985.html)
     70 and
     71 [here](https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579987.html).
     72 
     73 After patching the code I ran the fuzzer again and after some hours the fuzzer
     74 reported a timeout with a huge number of recursive calls. I carefully analyzed
     75 the generated output mangle that the fuzzer created and found out that it is a
     76 very repetitive name. Doing some superficial analysis I found out that those
     77 recursive calls are creating exponential time complexity and can cause the
     78 demangler to wait for hours or even days to complete. I believe that this can
     79 also be used to maliciously cause a denial of service, although I didn't have
     80 much time to profile it yet.
     81 
     82 To have some discussion about this I'm going to create a thread on the GCC
     83 security mailing list and express some solutions to mitigate those problems,
     84 such as integrating part of the codebase into the OSS fuzzer.
     85 
     86 Before that, I'm waiting for a reply to the message I sent to MITRE, which was
     87 forwarded to Red Hat security team for further appreciation.
     88 
     89 I don't really know if this is crucial to share now, but I saved the fuzzer
     90 result, if anyone is interested in researching more ideas of crafted mangles to
     91 feed the address/UB sanitizer.
     92 
     93 ## LLDB integration
     94 
     95 The last task I was working on (today) was on finalizing the LLDB integration.
     96 I still need to write some tests but the most important fact is that it is
     97 already working! My LLDB tree can successfully pretty print the mangled names.
     98 My fork is available on my Github,
     99 [here](https://github.com/ljmf00/llvm-project/tree/add-d-demangler).
    100 
    101 ### Some considerations
    102 
    103 From the first time I built LLVM I found out that compiling it with debug
    104 information is extremely costly in terms of memory usage, since linking all
    105 those symbols at once can consume a lot of RAM. I recommend you build it with
    106 `Release` flags.
    107 
    108 Here is my `cmake` config so far, if someone wants to test my work at any
    109 point.
    110 ```bash
    111 cmake -S llvm -B build -G Ninja \
    112        -DLLVM_ENABLE_PROJECTS="clang;libcxx;libcxxabi;lldb" \
    113        -DCMAKE_BUILD_TYPE=Release \
    114        -DLLDB_EXPORT_ALL_SYMBOLS=0 \
    115        -DLLVM_ENABLE_ASSERTIONS=ON \
    116        -DLLVM_CCACHE_BUILD=ON \
    117        -DLLVM_LINK_LLVM_DYLIB=ON \
    118        -DCLANG_LINK_CLANG_DYLIB=ON
    119 ```
    120 
    121 To build LLDB, you can do something like:
    122 
    123 ```bash
    124 cmake --build build -- lldb -j$(nproc --all)
    125 ```
    126 
    127 ## What's next?
    128 
    129 Next week, I'm going to have an eye on the time complexity problem, try to
    130 solve it, restructure the code to look a bit more C++ish and finishing the LLDB
    131 test suite to finally start upstreaming my changes.  Although, this can take a
    132 while, since there is a challenge, described in the plan, which is
    133 dual-licensing the GCC codebase with LLVM codebase. This is cooperatively being
    134 handled by Mathias (my mentor), Iain and GCC team.
    135 
    136 You can also read this on the D programming language forum,
    137 [here](https://forum.dlang.org/thread/mailman.437.1632358782.21945.digitalmars-d@puremagic.com),
    138 and discuss there.
    139 
    140 Read about the [next week](../d-saoc-2021-02/).