I was working on a problem yesterday where I needed to combine strings that were the same except for one part. Here’s a simplified version of the problem:
Input Array: "Adam likes apples." "Adam likes bananas." Desired Output: "Adam likes apples and bananas."
It was a no-brainer to use regular expressions to do the matching and parsing, but I couldn’t figure out immediately how to use them in to accomplish my goal. I decided to use LINQ’s ToLookup method to create groups of matching items, and then loop through the groups to implement my combine logic.
The first step is to define a regular expression that lets me do two things. It needs to let me create a group “key,” and it needs to let me extract the data part that I’m ultimately trying to combine. For the simple example above, I can use the following pattern:
^(Adam likes )(.*)\.$
I can create the lookup using the regular expression like so:
var input = new[] { "Adam likes apples.", "Adam likes bananas.", }; var regex = new Regex(@"^(Adam likes )(.*)\.$"); var lookup = input.ToLookup(x => regex.Replace(x, "$1"), x => x);
The final step is to loop through the lookup’s keys and do processing on the groups:
foreach (var key in lookup.Select(x => x.Key).ToList()) { if (lookup[key].Count() > 1) { var items = string.Join( " and ", lookup[key].Select(x => regex.Replace(x, "$2")).ToArray()); var output = regex.Replace( lookup[key].First(), string.Format("$1{0}.", items)); Console.WriteLine(output); } else { Console.WriteLine(lookup[key].First()); } }
Here’s another example to illustrate how this might be useful:
static void Main(string[] args) { var input = new[] { "Adam ate 3 apples.", "Adam ate 1 apple.", "Adam ate 1 banana.", "Adam ate 1 banana.", "Adam ate 1 orange.", }; var regex = new Regex(@"^(Adam ate)\s+(\d+)\s+(.*?)s?\.$"); var lookup = input.ToLookup(x => regex.Replace(x, "$1$3"), x => x); foreach (var key in lookup.Select(x => x.Key).ToList()) { if (lookup[key].Count() > 1) { int sum = 0; foreach (var item in lookup[key]) { sum += int.Parse(regex.Replace(item, "$2")); } var target = regex.Replace(lookup[key].First(), "$3"); if (sum > 1) { target += "s"; } var output = regex.Replace( lookup[key].First(), string.Format("$1 {0} {1}.", sum, target)); Console.WriteLine(output); } else { Console.WriteLine(lookup[key].First()); } } Console.ReadLine(); } // Output: // Adam ate 4 apples. // Adam ate 2 bananas. // Adam ate 1 orange.
