In the previous part of this demonstration, we used C# .NET Core to create a log classifier. In this continuation, we’ll implement the same but in our dear NodeJS.
I’ll start copying-n-pasting the previous task for my previous entry (which you should read if you haven’t).
The task will consists on classifying an Android log file, that log file contains lines like this:
12-17 19:31:36.263 1795 1825 I PowerManager_screenOn: DisplayPowerStatesetColorFadeLevel: level=1.0
12-17 19:31:36.263 5224 5283 I SendBroadcastPermission: action:android.com.huawei.bone.NOTIFY_SPORT_DATA, mPermissionType:0
12-17 19:31:36.264 1795 1825 D DisplayPowerController: Animating brightness: target=21, rate=40
12-17 19:31:36.264 1795 2750 I PowerManager_screenOn: DisplayPowerState Updating screen state: state=ON, backlight=823
12-17 19:31:36.264 1795 2750 I HwLightsService: back light level before map = 823
12-17 19:31:36.264 1795 1825 D DisplayPowerController: Animating brightness: target=21, rate=40
12-17 19:31:36.264 1795 1825 V KeyguardServiceDelegate: onScreenTurnedOn()
12-17 19:31:36.264 1795 1825 I WindowManger_keyguard: onScreenTurnedOn()
12-17 19:31:36.264 1795 1825 D DisplayPowerController: Display ready!
12-17 19:31:36.264 1795 1825 D DisplayPowerController: Finished business...
12-17 19:31:36.264 2852 3328 D KeyguardService: Caller checkPermission fail
12-17 19:31:36.264 2852 3328 D KeyguardService: KGSvcCall onScreenTurnedOn.
12-17 19:31:36.264 2852 3328 D KeyguardViewMediator: notifyScreenTurnedOn
12-17 19:31:36.265 2852 2852 D KeyguardViewMediator: handleNotifyScreenTurnedOn
12-17 19:31:36.265 2852 2852 I PhoneStatusBar: onScreenTurnedOn
12-17 19:31:36.265 2852 2852 D KGWallpaper_Magazine: getNextIndex: 0; from 5 to 5; size: 44
12-17 19:31:36.265 2852 2852 I HwLockScreenReporter: report msg is :{picture: Deepwater-05-2.3.001-bigpicture_05_8.jpg}
12-17 19:31:36.265 2852 2852 W HwLockScreenReporter: report result = falsereport type:162 msg:{picture: Deepwater-05-2.3.001-bigpicture_05_8.jpg, channelId: 05}
Our goal is to read the log file, classify every log with its category/type which will be defined as a single capital letter located before the process/activity name.
1825 I WindowManger_keyguard...
→ Category is I
.2852 D KeyguardViewMediator: handleNotifyScreenTurnedOn...
→ Category is D
.Every log type must be written in a separate log file, which must preserve the initial order.
All possible category/types are: I , D , V , W , E , F
.
Let’s copy paste (again) from my previous post.
BEGIN COPY PASTE
With NodeJS, one could leverage the use of several cores with Cluster, for multiple process that can communicate via IPC or with Workers that implements threads and can share the same memory.
But NodeJS is not the best environment to do this kind of operations besides asynchronous I/O operations (processing request, connecting to databases, serving some files, doing minimal data transformation, physical disk write/reads that are limited by the hardware rather than the language), it is just not the right tool, I mean, in case you need it, you can use it, without having to change neither your language nor your working environment.
END COPY PASTE
In order to come close to the C# performance, we’ll need to use Workers. But this is too much setup, so we could make our development easier by using yet another JS library:
As we will probably want to share the array within threads, we would need to store that string data into a SharedArrayBuffer
And, if we ever need to move data around threads without cloning it, we’ll need to use a Transferable object
.
Those things make the development take more time, and also more error prone.
In order to be fair and match the simplicity that C# offers, we’ll forget about that and just implement two versions of it:
Without further ctrl+c & ctrl+v
, let’s begin:
My computer setup is the following:
As always, the source code of this post is available on GitHub.
const classify_with_filestreams = async function()
{
const reader = readline.createInterface({
input : fs.createReadStream( LogPath ) ,
output : null ,
});
const outputStreams = createOutput.streams( LogTypes );
// Handle when a new <line> is available in a asynchronous way.
for await( const line of reader )
{
const matchesIterator = line.matchAll( /[\d\-\.\s]+([A-Z])/g );
const matches = matchesIterator.next().value;
const identifier = matches[1];
const writeable = outputStreams[ identifier ];
if( !writeable.write( line ) ) // If the stream buffer is full, wait until is drained.
await events.once( writeable , "drain" ); // Wait for the drain event.
writeable.write( os.EOL ); // Append a new line.
};
// Close all write streams.
for( const [ key , value ] of Object.entries( outputStreams ) )
{
value.close();
}
reader.close();
};
This was the correct implementation, in case you don’t want to use workers (threads), later you’ll see why.
const classify_with_files = async function()
{
const reader = readline.createInterface({
input : fs.createReadStream( LogPath ) ,
output : null ,
terminal : false ,
});
const writePromises = {};
const writeTask = [];
// Handle when a new <line> is available in a asynchronous way.
for await( const line of reader )
{
const matchesIterator = line.matchAll( /[\d\-\.\s]+([A-Z])/g );
const matches = matchesIterator.next().value;
const identifier = matches[1];
writeTask.push({ type : identifier , line : line + os.EOL });
};
const outputFiles = await createOutput.files( LogTypes );
// Write each line on its target file.
for await( const task of writeTask )
{
await writePromises[ task.type ];
writePromises[ task.type ] = outputFiles[ task.type ].write( task.line );
}
// Close all files handlers.
for( const [ key , value ] of Object.entries( outputFiles ) )
{
value.close();
}
reader.close();
};
This is a good implementation if we have need to await write()
with longer strings but fewer times, but in this scenario: we’re writing small strings and doing so several times, then wait
for each one, this is needed because obvious reasons and also NodeJS official documentation state it.
It is unsafe to use fs.write() multiple times on the same file without waiting for the callback. For this scenario, fs.createWriteStream() is recommended.
Quoted from here.
Method | Time | Memory |
---|---|---|
File streams | 3730ms | 73.07 MB |
Files (without streams) | 36406ms | 1028.51 MB |
I usually run the code more than once, but this time is really not needed, the 2nd implementation was:
In our second scenario, this code:
for await( const task of writeTask )
{
await writePromises[ task.type ];
writePromises[ task.type ] = outputFiles[ task.type ].write( task.line );
}
Is totally bottle-necking our application, an optimization would be concatenating lines into larger chunks, but it will be even better if you simply use the 1st implementation.
Despite I love NodeJS, and I could make some hacks and what-not to make it perform on par with C#, it won’t be as clean and easy as C# provides. Also, this is such a small example, as I said in my previous post, in a real scenario we’ll probably do more data transformations than a simple regex expression. And your program must be about getting those transformations rights instead of faster, yes, we need those to be fast, so we choose the right tool, to make it easier and faster.