Supporting Named capturing groups before Java 7

Java 7 has introduced named capturing groups in regular expressions. Which is cool. But I had a requirement to support named capturing groups for prior versions of Java too. The requirement was from my own mobile application where I had to read regular expression from the user and give the user a way to read matching elements of the data to specified variables. Named capturing groups was perfect solution as it solves both the purposes but I didn’t want to restrict my application to specific version of Android.

So I have supported named capturing groups with a wrapper. And the wrapper turned out to be very straight forward. Here it goes:

import java.util.LinkedHashMap;
import java.util.Map;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Parser {
    private Map<String, String> parseResult = new LinkedHashMap<>();
    private String pattern;

    public Parser(String pattern) {
        this.pattern = reformatRegex(pattern);
    }

    public Map<String, String> parse(String message, String pattern) {
        Pattern regex = Pattern.compile(pattern);
        Matcher matcher = regex.matcher(message);
        if(matcher.matches()) {
            int groupIndex = 1;
            for(String key : parseResult.keySet()) {
                if (groupIndex <= matcher.groupCount()) {
                    parseResult.put(key, matcher.group(groupIndex++));
                }
            }
        }

        return parseResult;
    }

    private String reformatRegex(String pattern) {
        StringBuilder newRegex = new StringBuilder(pattern);
        Pattern regex = Pattern.compile(".*?\\?<(.*?)>.*?");

        Matcher matcher = regex.matcher(newRegex);
        while(matcher.matches()) {
            parseResult.put(matcher.group(1), null);
            newRegex.replace(matcher.start(1)-2, matcher.end(1)+1, "");
            matcher = regex.matcher(newRegex);
        }

        return newRegex.toString();
    }
}

The important part of the code is the method reformatRegex(). The Parser class pre-processes the expression by converting named capturing groups to numbered groups and builds the map of group names which will be filled with result later in parse() method.

Advertisements

Reactor paradigm for multi-thread enabling your application

If you are working on highly stateful system and have restricted parallel access to your application by any single user, Howdy! we are on same boat.

Concurrency is tough, not to implement but to support, inculcate the thought to your users and to catch it if things go wrong. Especially if you are carrying huge baggage of statefulness, multi-threading your application is like holidaying on Cayo Santiago ­čÖé

Preface

Generally any web application is multi-threaded unless you restrict somehow to not allow more than one thread for a given user. How is your web application single threaded? Probably by some session scoped locking mechanism, may be? We did that. We had locks at session scope and each user request carried some identifier to indicate the context and we multiplex the context threads to the HTTP threads. The processing HTTP thread acquires lock over the session so that user doesn’t perform parallel operations on session scoped data leading to any inconsistent transactions.

Moving to cloud with such design will cost you heavily as your resources are under utilized. And we have the challenge to allow parallel processing in┬áthe application. A single approach couldn’t solve our problem but here┬áI am going to discuss one plausible solution.

Learning from Event-driven programming models

In Event-driven programming the execution flow is decided by the event of execution and is not procedural. Consider below code snippet from NodeJs to read a file.

fs.readFile('input.txt', function (err, data) {
   if (err) {
      return console.error(err);
   }
   console.log("Asynchronous read: " + data.toString());
});

You call a function to read the file and pass in a callback function to perform the operation after the runtime returns from the file operation.

Apparently these languages claim to support thousands of threads unlike multi-threaded applications like Java.┬áThat’s because the server is highly utilized here – when the current thread is performing any operation which is not utilizing CPU push it to background and switch to other thread which is waiting for execution.

There is an other advantage with this model too. When the thread is performing operations like I/O the thread cannot perform any operation on shared data and so it is essentially safe to allow other thread to execute. This advantage was very lucrative, we attempted to enable multi-threading with this paradigm.

Reactor Paradigm

We had to identify execution flows which are context safe like I/O operations. The system has to act and react to these operations by releasing session scoped locks and allowing other threads waiting for the execution to proceed. Suppose the current thread is performing file operation as our example above, the system has to release the session scoped lock and perform the operation out of lock context and once the operation is complete the thread has to wait for session lock. Essentially the application is still single threaded but intelligently sequences the thread execution – a kind of thread scheduling on single core machines. This can make your applications pseudo multi-threaded increasing the throughput of the application, server utilization and usability of the application.

The challenge with this approach is in dealing with shared data like session scoped data. As you release and re-acquire the locks the shared data might have changed its state leading to dirty read or dirty write. This is a common problem with concurrency and there many ways to deal with – Optimistic locking, pessimistic locking, Journaling etc…

Locking is the most convenient way to handle the problem. We could acquire lock on each shared data object as thread accesses it but this kind of optimistic locking might lead to deadlocks. So other way is to be pessimistic and acquire lock on the shared data as a whole┬á– shared data lock. In enterprise systems it is always recommended to go with pecimistic locking to handle deadlocks.

We could maintain record of accesses and version writes, this is called Journaling. This is how version control systems handle deadlocks. Journaling can be done in many ways and I prefer to keep the subject out of the scope of this document.

Conclusion

Reactor paradigm provides us with promising strategy to convert single-threaded applications to multi-threaded but not without the overhead of locking. The reason behind single-threaded nature of application, shared data, will keep haunting us as locking strategy to handle the shared data.

Amidst the knowledge