4 min read

The Push Protocol and the Web

by Vihan Bhargava
  • Web

Push notifications offer user retention and engagement. Integration is now possible with the web through the Push Protocols. Information is pretty scattered and most is generally regarding libraries so I'll cover in this article how to create a modern Push Notification server in a language-agnostic way using the Push API.

Push notifications offer user retention and engagement. Integration is now possible with the web through the Push Protocols.

A few years ago if someone was asked where the trend was going you’d probably say apps— an app for pizza, an app for clothes, for everything. This isn’t wrong, apps have taken off and vendors have provided many conveniences, specifically Push Notifications to deliver updates. The web, however, hasn’t gone away. The age of ‘web apps’ is making a sort-of come back as we see in new popular things such as AMP, and specifically PWAs (Progressive Web Apps) which are backed by Service Workers.

As a totally non-technical explanation, service workers are essentially ‘downloaded’ resources that allow you to run JS for your site without it being active (including offline). It is impractical to have a heavy V8 instance running a JavaScript with a WebSocket instance open to every server so you’ll quickly find service workers can’t obtain Push Notifications through that. Instead vendors will use alternative protocols (which are up to implementation) which will then refer to the JavaScript handler for the Push Notification.

Information is pretty scattered and most is generally regarding libraries so I’ll cover in this article how to create a modern Push Notification server in a language-agnostic way using the Push API.


#VAPID

VAPID (RFC link) is one of the key (no pun intended) ways your server and Push Notifiactions are authenticated. “VAPID keys” are essentially your regular old key pair along the P-256 ecliptic curve.

These keys can be easily generated with openssl ec utility:

openssl ecparam -name prime256v1 -genkey -noout -out PRIVATE_KEY.pem
openssl ec -in PRIVATE_KEY.pem -pubout -out PUBLIC_KEY.pem

what we’ll be doing with these keys is:

  1. Provide them to the client
  2. Use them to encrypt our “JWT” which lets the client/push server validate the Push Notifications we send are indeed coming from us

#Client

Now let’s go back to the client. You’ll have two pieces of JavaScript on the frontend:

  1. Setup code: ordinary JS which obtains permission to display notifications and then prompts to user^[Please, please, please don’t show a modal on page-load offering PNs or show the permission request directly. Nobody cares about PNs until they do so leave it later in the UI flow]
  2. Service Worker: to display notification

Now your client needs to have the public key unencoded. Both PEM and DER key formats encode the key1 (not just with base64) so what I recommend is using an OpenSSL library[^If you are using Python M2Crypto] to read the key and have an endpoint which returns the raw key (should begin with 0x04).

If you do have an endpoint here’s some sample code:

const response = await fetch("/my/pubkey/endpoint", { method: "POST" });
const key = new Uint8Array(await response.arraybuffer());

Now when approriate, prompt the user for the permission to display Notifications. The notification API for some browsers uses Promises while others use a callback so to comply we’ll use this:

const permission = await new Promise((resolve, reject) => {
  const promise = Notification.requestPermission(resolve);
  if (promise) promise.then(resolve, reject);
});

const allowedToDisplayNotifs = permission === "granted";

Now we can ‘setup’ Push Notifications. This step DOES NOT talk to the server, you must, yourself, send the information regarding the registration for PNs to your server:

const pushSubscription = await registration.pushManager.subscribe({
  userVisibleOnly: true,
  applicationServerKey: key,
});

The userVisibleOnly key specifies that we’ll only use this channel to send user-facing notifications. This is required to be true in some browsers (including Chrome at time of writing).

This pushSubscription object will look a little like:

{
  "endpoint": "https://push.example.com/unique-auth-token",
  "keys": {
    "p256dh": "<long string>",
    "auth": "<shorter string>"
  }
}

The endpoint is all we need for sending PNs but if we wish to send a payload (which yeah we probably want to do) we’ll use the p256dh and auth fields to encrypt our payload. This way only your server and the client can ever read PNs (the intermediate PN server by Chrome/Firefox can’t).

Now what you should do is setup and endpoint which receives all these values and stores it in your database.

#Sending the Notificatoin

Now when it’s time to send a notification we have to do quite some work to prepare the payload. They are two crypto-things we need:

  • Tell the PN server that we are in fact the server
  • Encrypt the payload so the PN server can’t snoop in on the notifications

#Key Derivation

So we’ll need to create some EC keys based off the client. We’ll at the end have 5 keys going around:

 Server           +
  and      Client | Server
 Client    Public | Public
  know            |
         +------------------+
  Only            |
  owner   Client  | Server
  knows   Private | Private
                  +

Combining diagonally we’ll be able to derive another key. However in this case. `Client Public + Serve Private = Server Public + Client Private`. This combination is known as a ‘Share Secret’ and this way the private keys are never exposed yet only the client and the server (even in the presence of a MITM) only the client and server know this. This process is known as ‘ECDH’ (Diffie-Hellman with Elliptic Curve key pairs)

Here’s a code example with Python (they are 10000 examples of Node.js on Google’s dev blog if that’s your thing):

from cryptography.hazmat.backends import default_backend
from cryptography.hazmat.primitives.serialization import load_pem_private_key
from cryptography.hazmat.primitives.asymmetric import ec
from base64 import urlsafe_b64decode

private_key = ec.generate_private_key(ec.SECP256R1(), default_backend())
client_public_key = urlsafe_b64decode(KEY_DATA)
shared_secret = private_key.exchange(
    ec.ECDH(),
    ec.EllipticCurvePublicNumbers.from_encoded_point(
        ec.SECP256R1(),
        client_public_key
    ).public_key(default_backend())
)

We’ll use this ‘shared secret’ as a symmetric cipher. Since both we (the server) and the client have this key. We can use symmetric encryption to encrypt the payload.

#Parameter Derivation

At the very end we’ll use AES to encrypt the message however we first have to produce the encryption parameters to pass to it. We’ll need a ‘key’ and an ‘iv’. This part has a binary format for the offical source I’ll refer you to [the RFC section][2].

from cryptography.hazmat.backends import default_backend
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.kdf.hkdf import HKDF
from base64 import urlsafe_b64decode

prk = HKDF(
    algorithm=hashes.SHA256(),
    length=32,
    salt=urlsafe_b64decode(AUTH_DATA),
    info=b'Content-Encoding: auth\0',
    backend=default_backend()
).derive(shared_secret)

This prk is a secure ‘random enough’ key that both we and the client know. We’ll use this to basically generate two other strings that we’ll pass into the AES function:

from cryptography.hazmat.primitives.serialization import load_pem_public_key
from struct import pack

with open('PATH_TO_PUBLIC_KEY.pem', 'rb') as key:
    server_public_key = load_pem_public_key(
        key.read(),
        default_backend()
    ).public_numbers().encode_point()

context = b'P-256\0' + pack('!H', len(client_public_key)) + client_public_key + pack('!H', len(server_public_key)) + server_public_key

Now that we have this ‘context’ which provides the keys used to encrypt it we’ll create the two encryption parameters by suffixing a type. For this we’ll need a salt:

from M2Crypto.m2 import rand_bytes
salt = rand_bytes(16)

Now we’ll generate the two params (nonce and CEK) from the prk. We only vary the Content-Encoding between the two

nonce = HKDF(
    algorithm=hashes.SHA256(),
    length=12,
    salt=salt,
    info=b'Content-Encoding: nonce\0' + context,
    backend=default_backend()
).derive(prk)

cek = HKDF(
    algorithm=hashes.SHA256(),
    length=16,
    salt=salt,
    info=b'Content-Encoding: aesgem\0' + context,
    backend=default_backend()
).derive(prk)

#Diagram

Here’s a diagram of what we just did:

Server Private      Shared     Client
      +        ---> Secret     'auth'
Client Public         |          |
                      +----+-----+
                           |
                           v
                          PRK
Server Public              |
      +           +--------+----------+
Client Public     |                   |
      |           v                   v
      +-------> nonce <--- Salt ---> CEK
                  +                   +
                  |      payload      |
                  |         |         |
                  |         v         |
                  +---> encrypted <---+
                         payload

#Encrypting the payload

This is the final part. We take the ‘nonce’ and CEK that we generated before and we use these as our parameters to the AES cipher. Implementation is simple enough we are using AES128-GCM which we can implement with:

from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
from cryptography.hazmat.backends import default_backend

cipher = Cipher(algorithms.AES(cek), modes.GCM(nonce), default_backend())
encryptor = cipher.encryptor()
encrypted_payload = encryptor.update(PAYLOAD_DATA) + encryptor.finalize() + encryptor.tag

That’s it for the payload! Now we just have to create the JWT header

#Preparing the JWT