Let's Crack the EU's Digital COVID Cert!

This month, millions of freshly-minted Digital COVID Certificates are being issued in Ireland and across the EU. One thing that's hard to miss is the giant QR code embedded in the document. By now, we're all familiar with the routine of presenting this QR code to verify our vaccination status. But how does it work? What data does it contain? When I got mine the other day, I figured it would make for a fun project, so let's dive in!

Let's use Python, because why not? First, we need to be able to grab QR codes from an image.

from PIL import Image
from pyzbar import pyzbar
from pyzbar.wrapper import ZBarSymbol

image = Image.open("path/to/cert.png")
qr_codes = pyzbar.decode(image, symbols=[ZBarSymbol.QRCODE])

Pretty easy, huh? Except the certs are sent out as PDFs, and PIL can't open those. Let's convert those before trying to read them.

import pdf2image

images = pdf2image.convert_from_path("path/to/cert.pdf")

Much better. For simplicity's sake, let's discuss a single qr_code from now on. Our QR code data is stored as an array of bytes. Let's decode this into a string so we can manipulate it more easily.

qr_code_data = qr_code.data.decode("utf-8")
print(qr_code_data)

The result looks something like this: HC1:<gibberish>. Hmm, hardly ideal. Maybe the data is encrypted? The tech specs for the Digital COVID Cert's QR code have been rather helpfully published by the EU. This is an excellent starting point to understand what we're dealing with — the serialization process for QR code data is as follows (credit to the EU eHealth Network):

QR code serialization process

Now that we know exactly how encoding works, we can reverse it without breaking a sweat. First, let's reverse the Base45 encoding and decompress the data.

import zlib
from base45 import b45decode

base45_data = qr_code_data[len("HC1:"):]
compressed_data = b45decode(base45_data)
cose_signed_document = zlib.decompress(compressed_data)
print(cose_signed_document)

Now we're getting somewhere — if you've been following along, you should see your name and some other personal info in the output, surrounded by seemingly random characters. From the spec, we know this is COSE/CBOR data.

CBOR stands for Concise Binary Object Representation. It was inspired by JSON, but sacrifices readability for brevity to pack more information into a smaller space.

COSE stands for CBOR Object Signature and Encryption. It's the secret sauce of the QR code, containing a digitally-signed payload (i.e. your vaccination info) that anyone can verify using a public key. Of course, if you got your hands on the private key, you could issue a cert to absolutely anyone... even Adolf Hilter. That actually happened.

We don't need to delve into COSE/CBOR too deeply here. Let's just crack the code open and grab our data!

import cbor2
from cose.messages import CoseMessage

cose_message = CoseMessage.decode(cose_signed_document)
payload = cbor2.loads(cose_message.payload)
print(payload)

It's as simple as that! We can go one step further and print the payload object with some nice formatting and syntax highlighting to make it easy to read.

import json
from pygments import highlight
from pygments.formatters import TerminalFormatter
from pygments.lexers import JsonLexer

payload_json = json.dumps(payload, indent=2)
print(highlight(payload_json, JsonLexer(), TerminalFormatter()))

Here's what the anonymised output looks like for someone who got three doses of Pfizer/Biontech's Comirnaty mRNA vaccine in Ireland, for example.

{
  "4": 0000000000,
  "6": 0000000000,
  "1": "IE",
  "-260": {
    "1": {
      "nam": {
        "fnt": "SMITH",
        "gnt": "JOHN",
        "fn": "Smith",
        "gn": "John"
      },
      "dob": "XXXX-XX-XX",
      "v": [
        {
          "dn": 3.0,
          "sd": 3.0,
          "dt": "XXXX-XX-XX",
          "ci": "URN:UVCI:01:IE:XXXXXXXX",
          "tg": "840539006",
          "vp": "1119349007",
          "mp": "EU/1/20/1528",
          "ma": "ORG-100030215",
          "co": "IE",
          "is": "Department of Health Ireland"
        }
      ],
      "ver": "1.3.0"
    }
  }
}

All of the JSON keys are abbreviated to help squeeze the data into a single QR code. The EU has published a JSON schema spec you can use to figure out what's what. For example:

4 is the timestamp the cert was issued at.
6 is the timestamp the cert expires at.
fn is the family name and gn is the given name.
ci is a unique identifier for the Digital COVID Cert.
tg is the targeted disease (i.e. COVID-19).
vp, mp and ma contain info on the vaccine and its manufacturer.
dn is the number of doses given.
sd is the total number of doses recommended.
dt is the date of the most recent dose in YYYY-MM-DD format.

As you can see, your name, date of birth and vaccination record (down to the date of your last dose) can be extracted from the QR code in a couple of lines of code. The QR code's data may be digitally signed, but it's not encrypted. Think twice before posting a screenshot of your QR code if you're not comfortable sharing all of that info.

You can check out the full source on GitHub. If there's interest, then I'll do a follow-up on how to verify the QR code yourself and what happens if you try to make a fake one.

Over to you — fire up Python, decode your QR code and see what you find!