Quantcast
Channel: linq – adamprescott.net
Viewing all articles
Browse latest Browse all 8

Group Strings Using LINQ and Regular Expressions

$
0
0

I was working on a problem yesterday where I needed to combine strings that were the same except for one part. Here’s a simplified version of the problem:

Input Array:
"Adam likes apples."
"Adam likes bananas."

Desired Output:
"Adam likes apples and bananas."

It was a no-brainer to use regular expressions to do the matching and parsing, but I couldn’t figure out immediately how to use them in to accomplish my goal. I decided to use LINQ’s ToLookup method to create groups of matching items, and then loop through the groups to implement my combine logic.

The first step is to define a regular expression that lets me do two things. It needs to let me create a group “key,” and it needs to let me extract the data part that I’m ultimately trying to combine. For the simple example above, I can use the following pattern:

^(Adam likes )(.*)\.$

I can create the lookup using the regular expression like so:

var input = new[] 
    {
        "Adam likes apples.",
        "Adam likes bananas.",
    };
var regex = new Regex(@"^(Adam likes )(.*)\.$");
var lookup = input.ToLookup(x => regex.Replace(x, "$1"), x => x);

The final step is to loop through the lookup’s keys and do processing on the groups:

foreach (var key in lookup.Select(x => x.Key).ToList())
{
    if (lookup[key].Count() > 1)
    {
        var items = string.Join(
            " and ",
            lookup[key].Select(x => regex.Replace(x, "$2")).ToArray());
        
        var output = regex.Replace(
            lookup[key].First(),
            string.Format("$1{0}.", items));
        Console.WriteLine(output);
    }
    else
    {
        Console.WriteLine(lookup[key].First());
    }
}

Here’s another example to illustrate how this might be useful:

static void Main(string[] args)
{
    var input = new[] 
        {
            "Adam ate 3 apples.",
            "Adam ate 1 apple.",
            "Adam ate 1 banana.",
            "Adam ate 1 banana.",
            "Adam ate 1 orange.",
        };
    var regex = new Regex(@"^(Adam ate)\s+(\d+)\s+(.*?)s?\.$");
    var lookup = input.ToLookup(x => regex.Replace(x, "$1$3"), x => x);
    foreach (var key in lookup.Select(x => x.Key).ToList())
    {
        if (lookup[key].Count() > 1)
        {
            int sum = 0;
            foreach (var item in lookup[key])
            {
                sum += int.Parse(regex.Replace(item, "$2"));
            }

            var target = regex.Replace(lookup[key].First(), "$3");
            if (sum > 1)
            {
                target += "s";
            }
            var output = regex.Replace(
                lookup[key].First(),
                string.Format("$1 {0} {1}.", sum, target));
            Console.WriteLine(output);
        }
        else
        {
            Console.WriteLine(lookup[key].First());
        }
    }
    Console.ReadLine();
}

// Output:
//   Adam ate 4 apples.
//   Adam ate 2 bananas.
//   Adam ate 1 orange.


Viewing all articles
Browse latest Browse all 8

Trending Articles