Wednesday, June 16, 2010

Group pattern matching with regular expressions in Java and Scala

Even though this has been around for some time I have only recently used it and think it is quite nice and worth blogging about.

The use case is pretty straight forward, you have a string of data and you want to extract values out of the string based on a pattern. An example would be a date “16-Jun-2010” and you want to extract the day, month and year. Another example could be an email address where you want to extract the username and domain. I will show you an example of extracting the day, month and year values from a string using regular expressions. Regular expressions allows us to match the format of the string as well as to group matches within the string so that we can get our day, month and year values.

Java
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class GroupCaptureEx {
  public static void main(String[] args) {
   String input = "16-Jun-2010";
   String patternStr = "(\\d{2})-([a-zA-Z]{3})-(\\d{4})";

   Pattern pattern = Pattern.compile(patternStr);
   Matcher matcher = pattern.matcher(input);
   if (matcher.find() && matcher.groupCount() == 3) {
      System.out.println("Day is: " + matcher.group(1));
      System.out.println("Month is: " + matcher.group(2));
      System.out.println("Year is: " + matcher.group(3));
    } else {
      System.out.println("No match found or unexpected match found");
    }
  }
}


The Scala version uses a Scala Regex class to simplify matters a little. It compiles the pattern by default so you don’t have to explicitly do that. It is also an Extractor which is used to extract the data you are looking for from the string based on the group matching and then to bind those values to the returned elements. The only thing we need to concern ourselves with is Scala’s pattern matching ability. The pattern we are interested in matching on would look like:

DateRegex(day, month, year)

We then use Scala’s match expression (similar to switch in Java) on the input string. If the pattern DateRegex(day, month, year) matches the string than we have a match

Scala
object RegExGroupCapture {
  def main(args : Array[String]) : Unit = {
    val Input = "16-Jun-2010"
    val DateRegex = """(\d{2})-([a-zA-Z]{3})-(\d{4})""".r
  
    Input match {
      case DateRegex(day, month, year) => {
        println("match found")
        println("Day: " + day)
        println("Month: " + month)
        println("Year: " + year)
      } case _ => println("No match found")
    }
  }
}

No comments: