This won’t be an exhaustive tutorial because caching, although, is necessary because it reduces loading times, bandwidth, disk and CPU usage, thus saving costs. It is also very specific to a given scenario, and require good knowledge and fine-tuning, usually caching becomes good when it meets one of two goals:
We meet the first (#1) case when we have a good cache-hit ratio, that means, the data is frequently accessed in common patterns that does not change.
We meet with the second (#2) criteria when, despite maybe having a bad cache-hit ratio, when we have a hit, we save a good number of resources.
It is possible to get both scenarios: Having frequently accessed data that does not change, but it is computationally expensive to calculate.
But cache have some downsides:
That’s why we have different caching techniques and several caching levels, and the more you know about it and your business requirements, the better you can implement caching in your application.
So, even if I want, I can’t make a tutorial for every tool and every scenario, this is more a mini-guide to introduce some concepts about caching than an in-depth tutorial.
In this post, i’d like to introduce the Cache-Control
header, that can be present in HTTP responses.
Let’s see how it can be useful, but how it can be dangerous and produce undesired results as well.
When this header is present, you’re basically telling your browser:
Trust me, this resource won’t change in at least X time, so please, don’t bother me again for this.
Without further ado, I shall give a demonstration:
I installed OpenResty locally on my Windows machine because it is easier to manage, but you can easily install it on Linux/Mac or use a docker image.
OpenResty is usually shipped with useful built-in modules and a nice Lua integration, so I’ll take it any day over plain NGINX.
The source code can be found on GitHub.
Note: From now, I’ll use OpenResty and NGINX interchangeably.
const app = Express();
app.get( "/" , function( req , res )
{
res.sendFile( "index.html" , { root : Path.resolve() });
});
// Middleware to log requests that start with 'counter'.
app.all( "/counter*" , function( req , res , next )
{
const path = `${req.baseUrl}${req.path}`;
console.log( req.method , path , Moment().format( "HH:mm:ss" ) );
next();
});
let counter = 0;
// Without cache.
app.get( "/counter" , function( req , res )
{
res.json({ from : "sample" , counter : ++counter });
});
// With Cache-control.
app.get( "/counter-cache-control" , function( req , res )
{
const seconds = 10;
res.set( "Cache-control" , `private, max-age=${seconds}` );
res.json({ from : "sample-cache-control" , counter : ++counter });
});
// Create the server.
app.listen( Env.PORT , () =>
{
console.log( `Server running at: http://localhost:${Env.PORT}` );
});
Basically, that’s the full code (ignoring imports).
To get the point, we should use a browser that will handle the Cache-Control
header as expected, so I crafted a webpage that will be served on
root path /
when you start the server ⇾
node index.js
⇾ Server running at: http://localhost:4010
In short terms the page have two buttons, each button have a corresponding route:
// Click handler for: counter-btn.
$( "#counter-btn" ).click( async function()
{
const res = await fetch( "/counter" );
const data = await res.json();
console.log( data );
$( "#counter" ).html( data.counter );
});
// Click handler for: counter-cache-control-btn.
$( "#counter-cache-control-btn" ).click( async function()
{
const res = await fetch( "/counter-cache-control" );
const data = await res.json();
console.log( data );
$( "#counter-cache-control" ).html( data.counter );
});
Let’s trigger the first button counter-btn
3 times, it will call /counter
route three times, so we’ll have this (If you’re following and running the project, you can open the browser console, I’ll paste the results here).
{from: "counter", counter: 1}
{from: "counter", counter: 2}
{from: "counter", counter: 3}
Let’s use now the second one counter-cache-control-btn
, that will call the /counter-cache-control
route, let’s click it 3 times as well:
{from: "counter-cache-control", counter: 4}
{from: "counter-cache-control", counter: 4}
{from: "counter-cache-control", counter: 4}
We got the same response 3 times.
If we see in the inspection panel, at the tab network
the /counter
request returned those headers.
Connection: keep-alive
Content-Length: 29
Content-Type: application/json; charset=utf-8
ETag: W/"1d-nlRHVEZmqGVmoOM63ZhCzoh/284"
Keep-Alive: timeout=5
X-Powered-By: Express
And if we watch now the /counter-cache-control
request:
Cache-control: private, max-age=10
Connection: keep-alive
Content-Length: 43
Content-Type: application/json; charset=utf-8
ETag: W/"2b-rErROKrobKVM07UdPuw99nu6d0k"
Keep-Alive: timeout=5
X-Powered-By: Express
And now we got our header: Cache-control: private, max-age=10
!
So, this header is telling the browser: Hey, I’m valid for 10 seconds, if were asking the server again it’ll come with the same information, and when requested again before those 10 seconds have passed, re-use the response.
If we inspect the logs within our NodeJS Server instances, we got:
GET /counter 16:33:57
GET /counter 16:33:58
GET /counter 16:33:58
GET /counter-cache-control 16:35:54
So basically because we issued two extra calls to /counter-cache-control
within those 10 seconds, the browser didn’t even bother to call the server again.
This is very useful when you want to reduce the latency and usage of your server, but also consider that if your resources turns to be stale or not valid anymore, you’ll need to wait that time to pass or ask the users to clear their cache. Another solution would be changing the request route or your base URL/domain, so the browser will fetch that resource again.
I included the nginx.conf
file at the root of the project, so you can move it to your NGINX/OpenResty /conf
folder and test it.
You can find the instructions to download or even build from source at this link provided by OpenResty, granted you’ll need to read a bit more, but it will be worth.
Because the nginx.conf
is short because it just for a simple demo, I’ll paste it here as well:
worker_processes auto ;
events {
worker_connections 1024;
}
http {
include mime.types;
default_type application/octet-stream;
# Set up the target NodeJS server.
upstream my_node_server {
server localhost:4010;
keepalive 512;
}
# Set up the proxy server.
server {
listen 4040;
server_name localhost;
location / {
proxy_pass $scheme://my_node_server; # Redirect request to my_node_server.
expires 5s; # Set/Override Cache-Control max-age.
add_header Cache-Control "private"; # Set Cache-Control mode.
proxy_hide_header X-Powered-By; # Clear header: X-Powered-By.
add_header X-Powered-By "NGINX"; # Re-set header: X-Powered-By.
}
}
}
Let’s start the OpenResty web server and instead of using directly the url of NodeJS (localhost:4010
in my case) we will use the OpenResty one:
localhost:4040
(configured in the nginx.conf
file).
If you look, we use NGINX to override the cache-control header on every response returned by our proxied server (NodeJS).
Responses in the browser console look like this if we repeat the same previous clicks to the buttons:
{from: "counter", counter: 5}
{from: "counter", counter: 5}
{from: "counter", counter: 5}
{from: "counter-cache-control", counter: 6}
{from: "counter-cache-control", counter: 6}
{from: "counter-cache-control", counter: 6}
So, now both routes (/counter
& /counter-cache-control
) have Cache-Control: max-age=5
.
Let’s take a quick look at both headers responses:
For /counter
:
Cache-Control: max-age=5
Cache-Control: private
Content-Length: 29
Content-Type: application/json; charset=utf-8
ETag: W/"1d-kvZ6xs94Djn/iMBMG1FsTNbXxis"
Server: openresty/1.19.3.2
X-Powered-By: NGINX
For /counter-cache-control
:
Cache-control: max-age=5
Cache-Control: private
Content-Length: 43
Content-Type: application/json; charset=utf-8
ETag: W/"2b-eK7VHwqwEN3Z78JpH6MgIpdWDQ0"
Server: openresty/1.19.3.2
X-Powered-By: NGINX
In the next part of this tutorial we’ll talk about on how to implement another layer of caching, but this time on NGINX/OpenResty, so, even if the browser still makes a request, we can skip our server and allow NGINX to handle it by itself.
Thanks for reading, see you on the next one!