Tag Archives: journaling

Reactor paradigm for multi-thread enabling your application

If you are working on highly stateful system and have restricted parallel access to your application by any single user, Howdy! we are on same boat.

Concurrency is tough, not to implement but to support, inculcate the thought to your users and to catch it if things go wrong. Especially if you are carrying huge baggage of statefulness, multi-threading your application is like holidaying on Cayo Santiago šŸ™‚

Preface

Generally any web application is multi-threaded unless you restrict somehow to not allow more than one thread for a given user. How is your web application single threaded? Probably by some session scoped locking mechanism, may be? We did that. We had locks at session scope and each user request carried some identifier to indicate the context and we multiplex the context threads to the HTTP threads. The processing HTTP thread acquires lock over the session so that user doesn’t perform parallel operations on session scoped data leading to any inconsistent transactions.

Moving to cloud with such design will cost you heavily as your resources are under utilized. And we have the challenge to allow parallel processing inĀ the application. A single approach couldn’t solve our problem but hereĀ I am going to discuss one plausible solution.

Learning from Event-driven programming models

In Event-driven programming the execution flow is decided by the event of execution and is not procedural. Consider below code snippet from NodeJs to read a file.

fs.readFile('input.txt', function (err, data) {
   if (err) {
      return console.error(err);
   }
   console.log("Asynchronous read: " + data.toString());
});

You call a function to read the file and pass in a callback function to perform the operation after the runtimeĀ returns from the file operation.

Apparently these languages claim to support thousands of threads unlike multi-threaded applications like Java.Ā That’s because the server is highly utilized here – when the current thread is performing any operation which is not utilizing CPU push it to background and switch to other thread which is waiting for execution.

There is an other advantage with this model too. When the thread is performing operations like I/O the threadĀ cannot perform any operation on shared data and so it is essentially safe to allow other thread to execute. This advantage was very lucrative, we attempted to enable multi-threading with this paradigm.

Reactor Paradigm

We had to identify execution flows which are context safe like I/O operations. The system has to act and react to these operations by releasing session scoped locks and allowing other threads waiting for the execution to proceed. Suppose the current thread is performing file operation as our example above, the system has to release the session scoped lock and perform the operation out of lock context and once the operation is complete the thread has to wait for session lock. Essentially the application is still single threaded but intelligently sequences the thread execution – a kind of thread scheduling on single core machines. This can make your applications pseudo multi-threaded increasing the throughput of the application, server utilization and usability of the application.

The challenge with this approach is in dealing with shared data like session scoped data. As you release and re-acquire the locks the shared data might have changed its state leading to dirty read or dirty write. This is a common problem with concurrency and there many ways to deal with – Optimistic locking, pessimistic locking, Journaling etc…

Locking is the most convenient way to handle the problem. We could acquire lock on each shared data object as thread accesses it but this kind of optimistic locking might lead to deadlocks. So other way is to be pessimistic and acquire lock on the shared data as a wholeĀ – shared data lock. In enterprise systems it is always recommended to go with pecimistic locking to handle deadlocks.

We could maintain record of accesses and version writes, this is called Journaling. This is how version control systems handle deadlocks. Journaling can be done in many ways and I prefer to keep the subject out of the scope of this document.

Conclusion

Reactor paradigm provides us with promising strategy to convert single-threaded applications to multi-threaded but not without the overhead of locking. The reason behind single-threaded nature of application, shared data, will keep haunting us as locking strategy to handle the shared data.