Caching part #1: Cache-control header

This won’t be an exhaustive tutorial because caching, although, is necessary because it reduces loading times, bandwidth, disk and CPU usage, thus saving costs. It is also very specific to a given scenario, and require good knowledge and fine-tuning, usually caching becomes good when it meets one of two goals:

It caches frequently accessed data that does not vary.
It caches data that is computational expensive to compute otherwise.

We meet the first (#1) case when we have a good cache-hit ratio, that means, the data is frequently accessed in common patterns that does not change.

We meet with the second (#2) criteria when, despite maybe having a bad cache-hit ratio, when we have a hit, we save a good number of resources.

It is possible to get both scenarios: Having frequently accessed data that does not change, but it is computationally expensive to calculate.

But cache have some downsides:

Every layer of caching introduces more components in the system, making the system more complex.
Caching introduces the chance to have wrong/stale critical data if used incorrectly.
Caching can decrease performance when we have a high cache-miss ratio on data that is rather trivial to compute.

That’s why we have different caching techniques and several caching levels, and the more you know about it and your business requirements, the better you can implement caching in your application.

So, even if I want, I can’t make a tutorial for every tool and every scenario, this is more a mini-guide to introduce some concepts about caching than an in-depth tutorial.

In this post, i’d like to introduce the Cache-Control header, that can be present in HTTP responses.

Let’s see how it can be useful, but how it can be dangerous and produce undesired results as well.

When this header is present, you’re basically telling your browser:

Trust me, this resource won’t change in at least X time, so please, don’t bother me again for this.

Without further ado, I shall give a demonstration:

Environment configuration:

OS: Windows 10 20H2
NodeJS: v16.5.0
NGINX/OpenResty: openresty/1.19.3.2

I installed OpenResty locally on my Windows machine because it is easier to manage, but you can easily install it on Linux/Mac or use a docker image.

OpenResty is usually shipped with useful built-in modules and a nice Lua integration, so I’ll take it any day over plain NGINX.

The source code can be found on GitHub.

Note: From now, I’ll use OpenResty and NGINX interchangeably.

NodeJS Code:

const app = Express();

app.get( "/" , function( req , res )
{
  res.sendFile( "index.html" , { root : Path.resolve() });
});

// Middleware to log requests that start with 'counter'.
app.all( "/counter*" , function( req , res , next )
{
  const path = `${req.baseUrl}${req.path}`;
  console.log( req.method , path , Moment().format( "HH:mm:ss" ) );
  next();
});

let counter = 0;

// Without cache.
app.get( "/counter" , function( req , res )
{
  res.json({ from : "sample" , counter : ++counter });
});

// With Cache-control.
app.get( "/counter-cache-control" , function( req , res )
{
  const seconds = 10;
  res.set( "Cache-control" , `private, max-age=${seconds}` );
  res.json({ from : "sample-cache-control" , counter : ++counter });
});

// Create the server.
app.listen( Env.PORT , () =>
{
  console.log( `Server running at: http://localhost:${Env.PORT}` );
});

Basically, that’s the full code (ignoring imports).

To get the point, we should use a browser that will handle the Cache-Control header as expected, so I crafted a webpage that will be served on root path / when you start the server ⇾
node index.js ⇾ Server running at: http://localhost:4010

In short terms the page have two buttons, each button have a corresponding route:

  // Click handler for: counter-btn.
  $( "#counter-btn" ).click( async function()
  {
    const res  = await fetch( "/counter" );
    const data = await res.json();
    console.log( data );
    $( "#counter" ).html( data.counter );
  });

  // Click handler for: counter-cache-control-btn.
  $( "#counter-cache-control-btn" ).click( async function()
  {
    const res  = await fetch( "/counter-cache-control" );
    const data = await res.json();
    console.log( data );
    $( "#counter-cache-control" ).html( data.counter );
  });

Let’s trigger the first button counter-btn 3 times, it will call /counter route three times, so we’ll have this (If you’re following and running the project, you can open the browser console, I’ll paste the results here).

{from: "counter", counter: 1}
{from: "counter", counter: 2}
{from: "counter", counter: 3}

Let’s use now the second one counter-cache-control-btn, that will call the /counter-cache-control route, let’s click it 3 times as well:

{from: "counter-cache-control", counter: 4}
{from: "counter-cache-control", counter: 4}
{from: "counter-cache-control", counter: 4}

We got the same response 3 times.

If we see in the inspection panel, at the tab network the /counter request returned those headers.

Connection: keep-alive
Content-Length: 29
Content-Type: application/json; charset=utf-8
ETag: W/"1d-nlRHVEZmqGVmoOM63ZhCzoh/284"
Keep-Alive: timeout=5
X-Powered-By: Express

And if we watch now the /counter-cache-control request:

Cache-control: private, max-age=10
Connection: keep-alive
Content-Length: 43
Content-Type: application/json; charset=utf-8
ETag: W/"2b-rErROKrobKVM07UdPuw99nu6d0k"
Keep-Alive: timeout=5
X-Powered-By: Express

And now we got our header: Cache-control: private, max-age=10!

So, this header is telling the browser: Hey, I’m valid for 10 seconds, if were asking the server again it’ll come with the same information, and when requested again before those 10 seconds have passed, re-use the response.

If we inspect the logs within our NodeJS Server instances, we got:

GET /counter 16:33:57
GET /counter 16:33:58
GET /counter 16:33:58
GET /counter-cache-control 16:35:54

So basically because we issued two extra calls to /counter-cache-control within those 10 seconds, the browser didn’t even bother to call the server again.

This is very useful when you want to reduce the latency and usage of your server, but also consider that if your resources turns to be stale or not valid anymore, you’ll need to wait that time to pass or ask the users to clear their cache. Another solution would be changing the request route or your base URL/domain, so the browser will fetch that resource again.

Same example, but with NGINX/OpenResty:

I included the nginx.conf file at the root of the project, so you can move it to your NGINX/OpenResty /conf folder and test it.

You can find the instructions to download or even build from source at this link provided by OpenResty, granted you’ll need to read a bit more, but it will be worth.

Because the nginx.conf is short because it just for a simple demo, I’ll paste it here as well:

worker_processes auto ;

events {
  worker_connections 1024;
}

http {
  include mime.types;
  default_type application/octet-stream;

  # Set up the target NodeJS server.
  upstream my_node_server {
    server localhost:4010;
    keepalive 512;
  }

  # Set up the proxy server.
  server {
    listen 4040;
    server_name localhost;

    location / {
      proxy_pass $scheme://my_node_server; # Redirect request to my_node_server.
      expires 5s; # Set/Override Cache-Control max-age.
      add_header Cache-Control "private"; # Set Cache-Control mode.
      proxy_hide_header X-Powered-By; # Clear header: X-Powered-By.
      add_header X-Powered-By "NGINX"; # Re-set header: X-Powered-By.
    }
  }
}

Let’s start the OpenResty web server and instead of using directly the url of NodeJS (localhost:4010 in my case) we will use the OpenResty one: localhost:4040 (configured in the nginx.conf file).

If you look, we use NGINX to override the cache-control header on every response returned by our proxied server (NodeJS).

Responses in the browser console look like this if we repeat the same previous clicks to the buttons:

{from: "counter", counter: 5}
{from: "counter", counter: 5}
{from: "counter", counter: 5}
{from: "counter-cache-control", counter: 6}
{from: "counter-cache-control", counter: 6}
{from: "counter-cache-control", counter: 6}

So, now both routes (/counter & /counter-cache-control) have Cache-Control: max-age=5.

Let’s take a quick look at both headers responses:

For /counter:

Cache-Control: max-age=5
Cache-Control: private
Content-Length: 29
Content-Type: application/json; charset=utf-8
ETag: W/"1d-kvZ6xs94Djn/iMBMG1FsTNbXxis"
Server: openresty/1.19.3.2
X-Powered-By: NGINX

For /counter-cache-control:

Cache-control: max-age=5
Cache-Control: private
Content-Length: 43
Content-Type: application/json; charset=utf-8
ETag: W/"2b-eK7VHwqwEN3Z78JpH6MgIpdWDQ0"
Server: openresty/1.19.3.2
X-Powered-By: NGINX

In the next part of this tutorial we’ll talk about on how to implement another layer of caching, but this time on NGINX/OpenResty, so, even if the browser still makes a request, we can skip our server and allow NGINX to handle it by itself.

Thanks for reading, see you on the next one!