Understanding The Best C# Set Theory Methods Between Lists

Understanding The Best Set Theory Methods Between Lists Banner Image

Introduction

Comparisons between lists can be made through built-in C# LINQ functions such as union, intersect, and except.. These operations are helpful to quickly get values that are different in one list than in another or are only in both lists or exclude from one list.

Another way to C# provides built-in functions is for HashSet. It is a collection type that has intersection, union, and except operations. So we can compare LINQ functions with the HashSet in performance and setup.

So I'll go into more detail about the types of operations to do comparisons with lists.

Intersection Methods

Union Image

Based on the diagram, the intersection is where both circles overlap. Items that exist in both circles are the intersection.

From the code perspective, the intersection of two lists means which elements they both have in common. For example, if I have two lists of 5 numbers one list has the odd number, and another list where it's multiples of 3. Then the intersection of the two lists would be 3 and 9. Which is often one of the things we're interested in between two lists.

LINQ's Intersect Method

LINQ provides a built-in function to return the common elements in an IEnumerable which we can convert to a list. This gives us a powerful one-liner that gives us exactly what we want and quickly. This function is readable and it describes what it is doing without us having to guess.

List<int> list1 = new List<int>() { 1, 3, 5, 7, 9 };//odd numbers
List<int> list2 = new List<int>() { 0, 3, 6, 9, 12 };//number by 3

List<int> intersectList = list1.Intersect(list2).ToList();//Get common elements between the lists
Console.WriteLine("intersectList:" + String.Join(",",intersectList));//Print to screen
Code Output
intersectList:3,9

This function sucuessfully only returned the two correct common elements of 3 and 9

LINQ's Intersect Method With Objects

Next, we take a more complicated case when we have a list of objects. We'll create a class called Transaction that will have an id, price, and timestamp. In this class, I will overload equals and GetHashCode so that defines what a unique object is. In this case, I will make based on the timestamp and id. I will give Ids that are the same as in the previous example and the date will be the same for all the transactions. See the example below.


List<Transaction> list1 = new List<Transaction>();
list1.Add(new Transaction(1, 30, 1));//Generate a new Transaction
list1.Add(new Transaction(3, 40, 1));//Generate a new Transaction
list1.Add(new Transaction(5, 60, 1));//Generate a new Transaction
list1.Add(new Transaction(7, 70, 1));//Generate a new Transaction
list1.Add(new Transaction(9, 80, 1));//Generate a new Transaction


List<Transaction> list2 = new List<Transaction>();
list2.Add(new Transaction(0, 40,1));//Generate a new Transaction
list2.Add(new Transaction(3, 60,1));//Generate a new Transaction
list2.Add(new Transaction(6, 80,1));//Generate a new Transaction
list2.Add(new Transaction(9, 100,1));//Generate a new Transaction
list2.Add(new Transaction(12, 110,1));//Generate a new Transaction

List<Transaction> intersectList = list1.Intersect(list2).ToList();//Get common elements between the lists
Console.WriteLine("intersectList:" + String.Join(",", intersectList));//Print to screen

class Transaction
{
    public Transaction(int userId, int price, int day)
    {
        UserId = userId;
        Price = price;
        TimeStamp = new DateTime(2022, 12, day);//Keep year and month fixed and let the day vary
    }
    public int UserId { get; set; }
    public int Price { get; set; }
    public DateTime TimeStamp { get; set; }

    public override bool Equals(object? obj)
    {
        return obj is Transaction transaction &&
               UserId == transaction.UserId &&
               TimeStamp.Day == transaction.TimeStamp.Day &&
               TimeStamp.Month == transaction.TimeStamp.Month &&
               TimeStamp.Year == transaction.TimeStamp.Year;
    }

    public override int GetHashCode()
    {
        return HashCode.Combine(UserId, TimeStamp.Year, TimeStamp.Month, TimeStamp.Day);
    }

    public override string? ToString()
    {
        return UserId.ToString();
    }
}
Code Output
intersectList:3,9

These are the ids of the intersection and this is correct. This is the same output that we get from the previous example.

LINQ Intersect Speed Test

The next thing we need to test is how the performance is on this LINQ function. In this test, I will generate two one million lists and find the common ids between the two lists.

Parameters for the test

10 Tests

10 function calls per test

2 lists and 1 million objects per list


using System.Diagnostics;
int numberOfTests = 10;//Number of tests 
List<double> testSpeedList = new List<double>();

for (int i = 0; i < numberOfTests; i++)
{
    testSpeedList.Add(GetIntersectWithLINQMethodSpeedTest());
}
Console.WriteLine($"LINQ Intersect Method Average speed:{Math.Round(testSpeedList.Average())}ms, In {numberOfTests} tests");

double GetIntersectWithLINQMethodSpeedTest()
{
    int numberOfFunctionCalls = 10;//Number of function calls made
    Stopwatch stopwatch = new Stopwatch();
    List<Transaction> list1 = GetInitialTransactionList();//Get intial random generated list
    List<Transaction> list2 = GetInitialTransactionList();//Get intial random generated list
    for (int i = 0; i < numberOfFunctionCalls; i++)
    {
        stopwatch.Start();//Start the Stopwatch timer
        List<Transaction> intersectList = GetIntersectWithLINQ(list1, list2);//Use list constructor method
        stopwatch.Stop();//Stop the Stopwatch timer
    }
    stopwatch.Stop();//Stop the Stopwatch timer
    Console.WriteLine($"Function calls:{numberOfFunctionCalls}, In {stopwatch.Elapsed.Minutes}m {stopwatch.Elapsed.Seconds}s {stopwatch.Elapsed.Milliseconds}ms");
    return stopwatch.Elapsed.TotalMilliseconds;
}

List<Transaction> GetIntersectWithLINQ(List<Transaction> list1, List<Transaction> list2)
{
    List<Transaction> intersectList = list1.Intersect(list2).ToList();//Get common elements between the lists
    return intersectList;
}
Supporting Code
int GetRandomInt(int maxNumber, int minNumber = 1)
{
    Random random = new Random();//Create Random class
    int randomInt = random.Next(minNumber, maxNumber);//Get a random number between 1 and the maxnumber
    return randomInt;
}


List<Transaction> GetInitialTransactionList()
{
    List<Transaction> transactionList = new List<Transaction>();//Create new Car List as empty
    int numberOfObjectsToCreate = 1000000;
    int maxTransactionNumber = 100000;
    int maxUserId = 9999;
    int minUserId = 1000;
    int maxDays = 5;
    for (int i = 0; i < numberOfObjectsToCreate; i++)
    {
        transactionList.Add(new Transaction(GetRandomInt(maxUserId, minUserId), GetRandomInt(maxTransactionNumber), GetRandomInt(maxDays)));//Add a new Transaction to the list
    }
    return transactionList.OrderBy(x => x.TimeStamp).ThenBy(y => y.UserId).ToList();//Order the transactions by date and then user id
}

class Transaction
{
    public Transaction(int userId, int price, int day)
    {
        UserId = userId;
        Price = price;
        TimeStamp = new DateTime(2022, 12, day);//Keep year and month fixed and let the day vary
    }
    public int UserId { get; set; }
    public int Price { get; set; }
    public DateTime TimeStamp { get; set; }

    public override bool Equals(object? obj)
    {
        return obj is Transaction transaction &&
               UserId == transaction.UserId &&
               TimeStamp.Day == transaction.TimeStamp.Day &&
               TimeStamp.Month == transaction.TimeStamp.Month &&
               TimeStamp.Year == transaction.TimeStamp.Year;
    }

    public override int GetHashCode()
    {
        return HashCode.Combine(UserId, TimeStamp.Year, TimeStamp.Month, TimeStamp.Day);
    }

    public override string? ToString()
    {
        return UserId.ToString();
    }
}
Function calls:10, In 0m 2s 752ms
Function calls:10, In 0m 2s 799ms
Function calls:10, In 0m 2s 691ms
Function calls:10, In 0m 2s 660ms
Function calls:10, In 0m 2s 774ms
Function calls:10, In 0m 2s 649ms
Function calls:10, In 0m 2s 645ms
Function calls:10, In 0m 2s 648ms
Function calls:10, In 0m 2s 624ms
Function calls:10, In 0m 2s 669ms
LINQ Intersect Method Average speed:2692ms, In 10 tests

With one million objects per list, LINQ intersect can complete the test in 2.6 seconds. It seems decent fast so let's see if there are any faster methods.

HashSet's IntersectWith Method

HashSet provides its own intersect method. Note that for this method, only set1 will be modified so it will need to be converted to a list after the operation is completed. So in the following example, I will generate two HashSets and use the intersect method to find the common elements just like the LINQ function.

HashSet<Transaction> set1 = new HashSet<Transaction>();
set1.Add(new Transaction(1, 30, 1));//Generate a new Transaction
set1.Add(new Transaction(3, 40, 1));//Generate a new Transaction
set1.Add(new Transaction(5, 60, 1));//Generate a new Transaction
set1.Add(new Transaction(7, 70, 1));//Generate a new Transaction
set1.Add(new Transaction(9, 80, 1));//Generate a new Transaction


HashSet<Transaction> set2 = new HashSet<Transaction>();
set2.Add(new Transaction(0, 40, 1));//Generate a new Transaction
set2.Add(new Transaction(3, 60, 1));//Generate a new Transaction
set2.Add(new Transaction(6, 80, 1));//Generate a new Transaction
set2.Add(new Transaction(9, 100, 1));//Generate a new Transaction
set2.Add(new Transaction(12, 110, 1));//Generate a new Transaction

set1.IntersectWith(set2);//Get common elements between the lists
List<Transaction> intersectList = set1.ToList();
Console.WriteLine("intersectList:" + String.Join(",", intersectList));//Print to screen
Supporting Code
class Transaction
{
    public Transaction(int userId, int price, int day)
    {
        UserId = userId;
        Price = price;
        TimeStamp = new DateTime(2022, 12, day);//Keep year and month fixed and let the day vary
    }
    public int UserId { get; set; }
    public int Price { get; set; }
    public DateTime TimeStamp { get; set; }

    public override bool Equals(object? obj)
    {
        return obj is Transaction transaction &&
               UserId == transaction.UserId &&
               TimeStamp.Day == transaction.TimeStamp.Day &&
               TimeStamp.Month == transaction.TimeStamp.Month &&
               TimeStamp.Year == transaction.TimeStamp.Year;
    }

    public override int GetHashCode()
    {
        return HashCode.Combine(UserId, TimeStamp.Year, TimeStamp.Month, TimeStamp.Day);
    }

    public override string? ToString()
    {
        return UserId.ToString();
    }
}
Code Output
intersectList:3,9

HashSet provides the exact output as with LINQ.

HashSet IntersectWith Speed Time

Next will show a speed test for HashSet IntersectWith. Since HashSet has its built-in functions and in general HashSet tends to be fast> I would expect to see HashSet functions to be fast.


using System.Diagnostics;
int numberOfTests = 10;//Number of tests 
List<double> testSpeedList = new List<double>();

for (int i = 0; i < numberOfTests; i++)
{
    testSpeedList.Add(GetIntersectWithHashSetMethodSpeedTest());
}
Console.WriteLine($"HashSet IntersectWith Method Average speed:{Math.Round(testSpeedList.Average())}ms, In {numberOfTests} tests");

double GetIntersectWithHashSetMethodSpeedTest()
{
    int numberOfFunctionCalls = 10;//Number of function calls made
    Stopwatch stopwatch = new Stopwatch();
    HashSet<Transaction> set1 = GetInitialTransactionList();//Get intial random generated set
    HashSet<Transaction> set2 = GetInitialTransactionList();//Get intial random generated set
    for (int i = 0; i < numberOfFunctionCalls; i++)
    {
        stopwatch.Start();//Start the Stopwatch timer
        List<Transaction> intersectList = GetHashsetIntersectWith(set1, set2);//Use list constructor method
        stopwatch.Stop();//Stop the Stopwatch timer
    }
    stopwatch.Stop();//Stop the Stopwatch timer
    Console.WriteLine($"Function calls:{numberOfFunctionCalls}, In {stopwatch.Elapsed.Minutes}m {stopwatch.Elapsed.Seconds}s {stopwatch.Elapsed.Milliseconds}ms");
    return stopwatch.Elapsed.TotalMilliseconds;
}

List<Transaction> GetHashsetIntersectWith(HashSet<Transaction> list1, HashSet<Transaction> list2)
{
    list1.IntersectWith(list2);//Get common elements between the lists
    List<Transaction> intersectList = list1.ToList();
    return intersectList;
}
Supporting Code
int GetRandomInt(int maxNumber, int minNumber = 1)
{
    Random random = new Random();//Create Random class
    int randomInt = random.Next(minNumber, maxNumber);//Get a random number between 1 and the maxnumber
    return randomInt;
}


HashSet<Transaction> GetInitialTransactionList()
{
    HashSet<Transaction> transactionList = new HashSet<Transaction>();//Create new Car List as empty
    int numberOfObjectsToCreate = 1000000;
    int maxTransactionNumber = 100000;
    int maxUserId = 9999;
    int minUserId = 1000;
    int maxDays = 5;
    for (int i = 0; i < numberOfObjectsToCreate; i++)
    {
        transactionList.Add(new Transaction(GetRandomInt(maxUserId, minUserId), GetRandomInt(maxTransactionNumber), GetRandomInt(maxDays)));//Add a new Transaction to the list
    }
    return transactionList;
}
class Transaction
{
    public Transaction(int userId, int price, int day)
    {
        UserId = userId;
        Price = price;
        TimeStamp = new DateTime(2022, 12, day);//Keep year and month fixed and let the day vary
    }
    public int UserId { get; set; }
    public int Price { get; set; }
    public DateTime TimeStamp { get; set; }

    public override bool Equals(object? obj)
    {
        return obj is Transaction transaction &&
               UserId == transaction.UserId &&
               TimeStamp.Day == transaction.TimeStamp.Day &&
               TimeStamp.Month == transaction.TimeStamp.Month &&
               TimeStamp.Year == transaction.TimeStamp.Year;
    }

    public override int GetHashCode()
    {
        return HashCode.Combine(UserId, TimeStamp.Year, TimeStamp.Month, TimeStamp.Day);
    }

    public override string? ToString()
    {
        return UserId.ToString();
    }
}
Code Output
Function calls:10, In 0m 0s 51ms
Function calls:10, In 0m 0s 47ms
Function calls:10, In 0m 0s 48ms
Function calls:10, In 0m 0s 50ms
Function calls:10, In 0m 0s 48ms
Function calls:10, In 0m 0s 47ms
Function calls:10, In 0m 0s 48ms
Function calls:10, In 0m 0s 48ms
Function calls:10, In 0m 0s 47ms
Function calls:10, In 0m 0s 49ms
HashSet IntersectWith Method Average speed:49ms, In 10 tests

At an average speed of 49ms, HashSet's IntersectWith function is significantly faster than LINQ's Intersect method.

Union Methods

Union Image

Union is all unique entries from both circles, so both circles are highlighted. For lists, it is all unique elements from both lists. Union is a common operation when combining two lists to form one list. We'll first look at LINQ's union method so see the example below.

LINQ's Union Method

LINQ provides a compact and simple one-liner method and since it's a built-in function for lists it comes in handy. We'll start with a simple example. A list of an odd numbers and a list of numbers by 3.

List<int> list1 = new List<int>() { 1, 3, 5, 7, 9 };//odd numbers
List<int> list2 = new List<int>() { 0, 3, 6, 9, 12 };//number by 3

List<int> unionList = list1.Union(list2).ToList();//Get all unique elements between the lists
Console.WriteLine("unionList:" + String.Join(",", unionList));//Print to screen

Code Output
unionList:1,3,5,7,9,0,6,12

Next another example by using objects.


List<Transaction> list1 = new List<Transaction>();
list1.Add(new Transaction(1, 30, 1));//Generate a new Transaction
list1.Add(new Transaction(3, 40, 1));//Generate a new Transaction
list1.Add(new Transaction(5, 60, 1));//Generate a new Transaction
list1.Add(new Transaction(7, 70, 1));//Generate a new Transaction
list1.Add(new Transaction(9, 80, 1));//Generate a new Transaction


List<Transaction> list2 = new List<Transaction>();
list2.Add(new Transaction(0, 40, 1));//Generate a new Transaction
list2.Add(new Transaction(3, 60, 1));//Generate a new Transaction
list2.Add(new Transaction(6, 80, 1));//Generate a new Transaction
list2.Add(new Transaction(9, 100, 1));//Generate a new Transaction
list2.Add(new Transaction(12, 110, 1));//Generate a new Transaction

List<Transaction> unionList = list1.Union(list2).ToList();//Get all unique elements between the lists
Console.WriteLine("unionList:" + String.Join(",", unionList));//Print to screen

Supporting Code
class Transaction
{
    public Transaction(int userId, int price, int day)
    {
        UserId = userId;
        Price = price;
        TimeStamp = new DateTime(2022, 12, day);//Keep year and month fixed and let the day vary
    }
    public int UserId { get; set; }
    public int Price { get; set; }
    public DateTime TimeStamp { get; set; }

    public override bool Equals(object? obj)
    {
        return obj is Transaction transaction &&
               UserId == transaction.UserId &&
               TimeStamp.Day == transaction.TimeStamp.Day &&
               TimeStamp.Month == transaction.TimeStamp.Month &&
               TimeStamp.Year == transaction.TimeStamp.Year;
    }

    public override int GetHashCode()
    {
        return HashCode.Combine(UserId, TimeStamp.Year, TimeStamp.Month, TimeStamp.Day);
    }

    public override string? ToString()
    {
        return UserId.ToString();
    }
}
Code Output
unionList:1,3,5,7,9,0,6,12

This is the correct output and list of odd and numbers by 3 combined into one list.

LINQ's Union Speed Test

This will be the same test as conducted for Intersection except we'll use the union method.



using System.Diagnostics;
int numberOfTests = 10;//Number of tests 
List<double> testSpeedList = new List<double>();

for (int i = 0; i < numberOfTests; i++)
{
    testSpeedList.Add(GetUnionWithLINQMethodSpeedTest());
}
Console.WriteLine($"LINQ Union Method Average speed:{Math.Round(testSpeedList.Average())}ms, In {numberOfTests} tests");

double GetUnionWithLINQMethodSpeedTest()
{
    int numberOfFunctionCalls = 10;//Number of function calls made
    Stopwatch stopwatch = new Stopwatch();
    List<Transaction> list1 = GetInitialTransactionList();//Get intial random generated list
    List<Transaction> list2 = GetInitialTransactionList();//Get intial random generated list
    for (int i = 0; i < numberOfFunctionCalls; i++)
    {
        stopwatch.Start();//Start the Stopwatch timer
        List<Transaction> intersectList = GetUnionWithLINQ(list1, list2);//Use list constructor method
        stopwatch.Stop();//Stop the Stopwatch timer
    }
    stopwatch.Stop();//Stop the Stopwatch timer
    Console.WriteLine($"Function calls:{numberOfFunctionCalls}, In {stopwatch.Elapsed.Minutes}m {stopwatch.Elapsed.Seconds}s {stopwatch.Elapsed.Milliseconds}ms");
    return stopwatch.Elapsed.TotalMilliseconds;
}

List<Transaction> GetUnionWithLINQ(List<Transaction> list1, List<Transaction> list2)
{
    List<Transaction> intersectList = list1.Union(list2).ToList();//Get common elements between the lists
    return intersectList;
}
Supporting Code
int GetRandomInt(int maxNumber, int minNumber = 1)
{
    Random random = new Random();//Create Random class
    int randomInt = random.Next(minNumber, maxNumber);//Get a random number between 1 and the maxnumber
    return randomInt;
}


List<Transaction> GetInitialTransactionList()
{
    List<Transaction> transactionList = new List<Transaction>();//Create new Car List as empty
    int numberOfObjectsToCreate = 1000000;
    int maxTransactionNumber = 100000;
    int maxUserId = 9999;
    int minUserId = 1000;
    int maxDays = 5;
    for (int i = 0; i < numberOfObjectsToCreate; i++)
    {
        transactionList.Add(new Transaction(GetRandomInt(maxUserId, minUserId), GetRandomInt(maxTransactionNumber), GetRandomInt(maxDays)));//Add a new Transaction to the list
    }
    return transactionList.OrderBy(x => x.TimeStamp).ThenBy(y => y.UserId).ToList();//Order the transactions by date and then user id
}

class Transaction
{
    public Transaction(int userId, int price, int day)
    {
        UserId = userId;
        Price = price;
        TimeStamp = new DateTime(2022, 12, day);//Keep year and month fixed and let the day vary
    }
    public int UserId { get; set; }
    public int Price { get; set; }
    public DateTime TimeStamp { get; set; }

    public override bool Equals(object? obj)
    {
        return obj is Transaction transaction &&
               UserId == transaction.UserId &&
               TimeStamp.Day == transaction.TimeStamp.Day &&
               TimeStamp.Month == transaction.TimeStamp.Month &&
               TimeStamp.Year == transaction.TimeStamp.Year;
    }

    public override int GetHashCode()
    {
        return HashCode.Combine(UserId, TimeStamp.Year, TimeStamp.Month, TimeStamp.Day);
    }

    public override string? ToString()
    {
        return UserId.ToString();
    }
}

Code Output
Function calls:10, In 0m 3s 478ms
Function calls:10, In 0m 3s 370ms
Function calls:10, In 0m 3s 443ms
Function calls:10, In 0m 3s 479ms
Function calls:10, In 0m 3s 354ms
Function calls:10, In 0m 3s 422ms
Function calls:10, In 0m 3s 499ms
Function calls:10, In 0m 3s 541ms
Function calls:10, In 0m 3s 464ms
Function calls:10, In 0m 3s 441ms
LINQ Union Method Average speed:3450ms, In 10 tests

LINQ's Union Method completes the test at about 3 and half seconds which is reasonable. We'll see if HashSet can improve on this.

HashSet Union Method

HashSet comes with a handy union method that returns another HashSet that is the combination of the two sets that are provided. The nice thing about this is that is a one-liner method. Let's see an example now.

HashSet<Transaction> set1 = new HashSet<Transaction>();
set1.Add(new Transaction(1, 30, 1));//Generate a new Transaction
set1.Add(new Transaction(3, 40, 1));//Generate a new Transaction
set1.Add(new Transaction(5, 60, 1));//Generate a new Transaction
set1.Add(new Transaction(7, 70, 1));//Generate a new Transaction
set1.Add(new Transaction(9, 80, 1));//Generate a new Transaction


HashSet<Transaction> set2 = new HashSet<Transaction>();
set2.Add(new Transaction(0, 40, 1));//Generate a new Transaction
set2.Add(new Transaction(3, 60, 1));//Generate a new Transaction
set2.Add(new Transaction(6, 80, 1));//Generate a new Transaction
set2.Add(new Transaction(9, 100, 1));//Generate a new Transaction
set2.Add(new Transaction(12, 110, 1));//Generate a new Transaction

List<Transaction> unionList = set1.Union(set2).ToList();//Get all unique elements between the lists
Console.WriteLine("unionList:" + String.Join(",", unionList));//Print to screen
Supporting Code
class Transaction
{
    public Transaction(int userId, int price, int day)
    {
        UserId = userId;
        Price = price;
        TimeStamp = new DateTime(2022, 12, day);//Keep year and month fixed and let the day vary
    }
    public int UserId { get; set; }
    public int Price { get; set; }
    public DateTime TimeStamp { get; set; }

    public override bool Equals(object? obj)
    {
        return obj is Transaction transaction &&
               UserId == transaction.UserId &&
               TimeStamp.Day == transaction.TimeStamp.Day &&
               TimeStamp.Month == transaction.TimeStamp.Month &&
               TimeStamp.Year == transaction.TimeStamp.Year;
    }

    public override int GetHashCode()
    {
        return HashCode.Combine(UserId, TimeStamp.Year, TimeStamp.Month, TimeStamp.Day);
    }

    public override string? ToString()
    {
        return UserId.ToString();
    }
}
Code Output
unionList:1,3,5,7,9,0,6,12

This is the expected result for a union method.

HashSet Union Speed Test

We also need to test the speed of this method. Will it continue the trend of HashSet methods that are faster than their LINQ counterparts? We'll let's see as there's an example below.


using System.Diagnostics;
int numberOfTests = 10;//Number of tests 
List<double> testSpeedList = new List<double>();

for (int i = 0; i < numberOfTests; i++)
{
    testSpeedList.Add(GetUnionHashSetMethodSpeedTest());
}
Console.WriteLine($"HashSet Union Method Average speed:{Math.Round(testSpeedList.Average())}ms, In {numberOfTests} tests");

double GetUnionHashSetMethodSpeedTest()
{
    int numberOfFunctionCalls = 10;//Number of function calls made
    Stopwatch stopwatch = new Stopwatch();
    HashSet<Transaction> set1 = GetInitialTransactionList();//Get intial random generated set
    HashSet<Transaction> set2 = GetInitialTransactionList();//Get intial random generated set
    for (int i = 0; i < numberOfFunctionCalls; i++)
    {
        stopwatch.Start();//Start the Stopwatch timer
        List<Transaction> intersectList = GetHashsetUnion(set1, set2);//Use list constructor method
        stopwatch.Stop();//Stop the Stopwatch timer
    }
    stopwatch.Stop();//Stop the Stopwatch timer
    Console.WriteLine($"Function calls:{numberOfFunctionCalls}, In {stopwatch.Elapsed.Minutes}m {stopwatch.Elapsed.Seconds}s {stopwatch.Elapsed.Milliseconds}ms");
    return stopwatch.Elapsed.TotalMilliseconds;
}

List<Transaction> GetHashsetUnion(HashSet<Transaction> list1, HashSet<Transaction> list2)
{
    List<Transaction> unionList  = list1.Union(list2).ToList();//Get all unique elements between the lists
    return unionList;
}
Supporting Code
int GetRandomInt(int maxNumber, int minNumber = 1)
{
    Random random = new Random();//Create Random class
    int randomInt = random.Next(minNumber, maxNumber);//Get a random number between 1 and the maxnumber
    return randomInt;
}


HashSet<Transaction> GetInitialTransactionList()
{
    HashSet<Transaction> transactionList = new HashSet<Transaction>();//Create new Car List as empty
    int numberOfObjectsToCreate = 1000000;
    int maxTransactionNumber = 100000;
    int maxUserId = 9999;
    int minUserId = 1000;
    int maxDays = 5;
    for (int i = 0; i < numberOfObjectsToCreate; i++)
    {
        transactionList.Add(new Transaction(GetRandomInt(maxUserId, minUserId), GetRandomInt(maxTransactionNumber), GetRandomInt(maxDays)));//Add a new Transaction to the list
    }
    return transactionList;
}
class Transaction
{
    public Transaction(int userId, int price, int day)
    {
        UserId = userId;
        Price = price;
        TimeStamp = new DateTime(2022, 12, day);//Keep year and month fixed and let the day vary
    }
    public int UserId { get; set; }
    public int Price { get; set; }
    public DateTime TimeStamp { get; set; }

    public override bool Equals(object? obj)
    {
        return obj is Transaction transaction &&
               UserId == transaction.UserId &&
               TimeStamp.Day == transaction.TimeStamp.Day &&
               TimeStamp.Month == transaction.TimeStamp.Month &&
               TimeStamp.Year == transaction.TimeStamp.Year;
    }

    public override int GetHashCode()
    {
        return HashCode.Combine(UserId, TimeStamp.Year, TimeStamp.Month, TimeStamp.Day);
    }

    public override string? ToString()
    {
        return UserId.ToString();
    }
}
Code Output
Function calls:10, In 0m 0s 87ms
Function calls:10, In 0m 0s 81ms
Function calls:10, In 0m 0s 84ms
Function calls:10, In 0m 0s 83ms
Function calls:10, In 0m 0s 79ms
Function calls:10, In 0m 0s 84ms
Function calls:10, In 0m 0s 82ms
Function calls:10, In 0m 0s 81ms
Function calls:10, In 0m 0s 84ms
Function calls:10, In 0m 0s 83ms
HashSet Union Method Average speed:83ms, In 10 tests

Of the two union methods, HashSet is the fastest at about 83ms. It is another sub-second performance from HashSet, While both LINQ methods are at over 2 seconds at least.

Except Methods

Union Image

Except for everything not in the current list so if I had two lists then this method would exclude everything from the first list even if they existed in the first list. Let's look at an example.

LINQ's Except Method

LINQ provides an except method to differentiate between the two lists. It is all items from list 1 that do not appear in list 2.

List<int> list1 = new List<int>() { 1, 3, 5, 7, 9 };//odd numbers
List<int> list2 = new List<int>() { 0, 3, 6, 9, 12 };//number by 3

List<int> exceptList = list1.Except(list2).ToList();//Get unique elements from second list only
Console.WriteLine("exceptList:" + String.Join(",", exceptList));//Print to screen

Code Output
exceptList:1,5,7

Here is also another example using objects.


List<Transaction> list1 = new List<Transaction>();
list1.Add(new Transaction(1, 30, 1));//Generate a new Transaction
list1.Add(new Transaction(3, 40, 1));//Generate a new Transaction
list1.Add(new Transaction(5, 60, 1));//Generate a new Transaction
list1.Add(new Transaction(7, 70, 1));//Generate a new Transaction
list1.Add(new Transaction(9, 80, 1));//Generate a new Transaction


List<Transaction> list2 = new List<Transaction>();
list2.Add(new Transaction(0, 40, 1));//Generate a new Transaction
list2.Add(new Transaction(3, 60, 1));//Generate a new Transaction
list2.Add(new Transaction(6, 80, 1));//Generate a new Transaction
list2.Add(new Transaction(9, 100, 1));//Generate a new Transaction
list2.Add(new Transaction(12, 110, 1));//Generate a new Transaction

List<Transaction> exceptList = list1.Except(list2).ToList();//Get common elements between the lists
Console.WriteLine("exceptList:" + String.Join(",", exceptList));//Print to screen
Supporting Code
class Transaction
{
    public Transaction(int userId, int price, int day)
    {
        UserId = userId;
        Price = price;
        TimeStamp = new DateTime(2022, 12, day);//Keep year and month fixed and let the day vary
    }
    public int UserId { get; set; }
    public int Price { get; set; }
    public DateTime TimeStamp { get; set; }

    public override bool Equals(object? obj)
    {
        return obj is Transaction transaction &&
               UserId == transaction.UserId &&
               TimeStamp.Day == transaction.TimeStamp.Day &&
               TimeStamp.Month == transaction.TimeStamp.Month &&
               TimeStamp.Year == transaction.TimeStamp.Year;
    }

    public override int GetHashCode()
    {
        return HashCode.Combine(UserId, TimeStamp.Year, TimeStamp.Month, TimeStamp.Day);
    }

    public override string? ToString()
    {
        return UserId.ToString();
    }
}
Code Output
exceptList:1,5,7

LINQ's Except Speed Test

Next, we'll test how this except function performs un 2 one million list load. This is the same test as in previous examples.


using System.Diagnostics;
int numberOfTests = 10;//Number of tests 
List<double> testSpeedList = new List<double>();

for (int i = 0; i < numberOfTests; i++)
{
    testSpeedList.Add(GetExceptWithLINQMethodSpeedTest());
}
Console.WriteLine($"LINQ Except Method Average speed:{Math.Round(testSpeedList.Average())}ms, In {numberOfTests} tests");

double GetExceptWithLINQMethodSpeedTest()
{
    int numberOfFunctionCalls = 10;//Number of function calls made
    Stopwatch stopwatch = new Stopwatch();
    List<Transaction> list1 = GetInitialTransactionList();//Get intial random generated list
    List<Transaction> list2 = GetInitialTransactionList();//Get intial random generated list
    for (int i = 0; i < numberOfFunctionCalls; i++)
    {
        stopwatch.Start();//Start the Stopwatch timer
        List<Transaction> intersectList = GetExceptWithLINQ(list1, list2);//Use list constructor method
        stopwatch.Stop();//Stop the Stopwatch timer
    }
    stopwatch.Stop();//Stop the Stopwatch timer
    Console.WriteLine($"Function calls:{numberOfFunctionCalls}, In {stopwatch.Elapsed.Minutes}m {stopwatch.Elapsed.Seconds}s {stopwatch.Elapsed.Milliseconds}ms");
    return stopwatch.Elapsed.TotalMilliseconds;
}

List<Transaction> GetExceptWithLINQ(List<Transaction> list1, List<Transaction> list2)
{
    List<Transaction> intersectList = list1.Except(list2).ToList();//Get all elements from list 1 that do not appear in list 2
    return intersectList;
}
Supporting Code
int GetRandomInt(int maxNumber, int minNumber = 1)
{
    Random random = new Random();//Create Random class
    int randomInt = random.Next(minNumber, maxNumber);//Get a random number between 1 and the maxnumber
    return randomInt;
}


List<Transaction> GetInitialTransactionList()
{
    List<Transaction> transactionList = new List<Transaction>();//Create new Car List as empty
    int numberOfObjectsToCreate = 1000000;
    int maxTransactionNumber = 100000;
    int maxUserId = 9999;
    int minUserId = 1000;
    int maxDays = 5;
    for (int i = 0; i < numberOfObjectsToCreate; i++)
    {
        transactionList.Add(new Transaction(GetRandomInt(maxUserId, minUserId), GetRandomInt(maxTransactionNumber), GetRandomInt(maxDays)));//Add a new Transaction to the list
    }
    return transactionList.OrderBy(x => x.TimeStamp).ThenBy(y => y.UserId).ToList();//Order the transactions by date and then user id
}

class Transaction
{
    public Transaction(int userId, int price, int day)
    {
        UserId = userId;
        Price = price;
        TimeStamp = new DateTime(2022, 12, day);//Keep year and month fixed and let the day vary
    }
    public int UserId { get; set; }
    public int Price { get; set; }
    public DateTime TimeStamp { get; set; }

    public override bool Equals(object? obj)
    {
        return obj is Transaction transaction &&
               UserId == transaction.UserId &&
               TimeStamp.Day == transaction.TimeStamp.Day &&
               TimeStamp.Month == transaction.TimeStamp.Month &&
               TimeStamp.Year == transaction.TimeStamp.Year;
    }

    public override int GetHashCode()
    {
        return HashCode.Combine(UserId, TimeStamp.Year, TimeStamp.Month, TimeStamp.Day);
    }

    public override string? ToString()
    {
        return UserId.ToString();
    }
}
Code Output
Function calls:10, In 0m 3s 458ms
Function calls:10, In 0m 3s 420ms
Function calls:10, In 0m 3s 420ms
Function calls:10, In 0m 3s 379ms
Function calls:10, In 0m 3s 322ms
Function calls:10, In 0m 3s 453ms
Function calls:10, In 0m 3s 623ms
Function calls:10, In 0m 3s 364ms
Function calls:10, In 0m 3s 780ms
Function calls:10, In 0m 3s 444ms
LINQ Except Method Average speed:3467ms, In 10 tests

Except LINQ comes in at about 3 and half seconds. This is very similar to the other LINQ methods tested so far.

HashSet's ExceptWith Method

HashSet also has a built function method for the difference of lists. It is compact and easy to use. But also note that it modifies the first set so that the set needs to convert to a list after this operation. Let's look at an example.

HashSet<Transaction> set1 = new HashSet<Transaction>();
set1.Add(new Transaction(1, 30, 1));//Generate a new Transaction
set1.Add(new Transaction(3, 40, 1));//Generate a new Transaction
set1.Add(new Transaction(5, 60, 1));//Generate a new Transaction
set1.Add(new Transaction(7, 70, 1));//Generate a new Transaction
set1.Add(new Transaction(9, 80, 1));//Generate a new Transaction


HashSet<Transaction> set2 = new HashSet<Transaction>();
set2.Add(new Transaction(0, 40, 1));//Generate a new Transaction
set2.Add(new Transaction(3, 60, 1));//Generate a new Transaction
set2.Add(new Transaction(6, 80, 1));//Generate a new Transaction
set2.Add(new Transaction(9, 100, 1));//Generate a new Transaction
set2.Add(new Transaction(12, 110, 1));//Generate a new Transaction

set1.ExceptWith(set2);//Get all elements from list 1 that do not appear in list 2
List<Transaction> exceptList = set1.ToList();
Console.WriteLine("exceptList:" + String.Join(",", exceptList));//Print to screen
Supporting Code
class Transaction
{
    public Transaction(int userId, int price, int day)
    {
        UserId = userId;
        Price = price;
        TimeStamp = new DateTime(2022, 12, day);//Keep year and month fixed and let the day vary
    }
    public int UserId { get; set; }
    public int Price { get; set; }
    public DateTime TimeStamp { get; set; }

    public override bool Equals(object? obj)
    {
        return obj is Transaction transaction &&
               UserId == transaction.UserId &&
               TimeStamp.Day == transaction.TimeStamp.Day &&
               TimeStamp.Month == transaction.TimeStamp.Month &&
               TimeStamp.Year == transaction.TimeStamp.Year;
    }

    public override int GetHashCode()
    {
        return HashCode.Combine(UserId, TimeStamp.Year, TimeStamp.Month, TimeStamp.Day);
    }

    public override string? ToString()
    {
        return UserId.ToString();
    }
}
Code Output
exceptList:1,5,7

HashSet's ExceptWith Method Speed Test

I will complete the test for HashSet's ExceptWith Method. It is expected to be fast as the other HashSet methods were also fast. This will be the same test as the other methods.



using System.Diagnostics;
int numberOfTests = 10;//Number of tests 
List<double> testSpeedList = new List<double>();

for (int i = 0; i < numberOfTests; i++)
{
    testSpeedList.Add(GetExceptWithHashSetMethodSpeedTest());
}
Console.WriteLine($"HashSet ExceptWith Method Average speed:{Math.Round(testSpeedList.Average())}ms, In {numberOfTests} tests");

double GetExceptWithHashSetMethodSpeedTest()
{
    int numberOfFunctionCalls = 10;//Number of function calls made
    Stopwatch stopwatch = new Stopwatch();
    HashSet<Transaction> set1 = GetInitialTransactionList();//Get intial random generated set
    HashSet<Transaction> set2 = GetInitialTransactionList();//Get intial random generated set
    for (int i = 0; i < numberOfFunctionCalls; i++)
    {
        stopwatch.Start();//Start the Stopwatch timer
        List<Transaction> exceptList = GetHashsetExceptWith(set1, set2);//Get all elements from list 1 that do not appear in list 2
        stopwatch.Stop();//Stop the Stopwatch timer
    }
    stopwatch.Stop();//Stop the Stopwatch timer
    Console.WriteLine($"Function calls:{numberOfFunctionCalls}, In {stopwatch.Elapsed.Minutes}m {stopwatch.Elapsed.Seconds}s {stopwatch.Elapsed.Milliseconds}ms");
    return stopwatch.Elapsed.TotalMilliseconds;
}

List<Transaction> GetHashsetExceptWith(HashSet<Transaction> list1, HashSet<Transaction> list2)
{
    list1.ExceptWith(list2);//Get all unique elements between the lists
    List<Transaction> exceptList = list1.ToList();
    return exceptList;
}
Supporting Code
int GetRandomInt(int maxNumber, int minNumber = 1)
{
    Random random = new Random();//Create Random class
    int randomInt = random.Next(minNumber, maxNumber);//Get a random number between 1 and the maxnumber
    return randomInt;
}


HashSet<Transaction> GetInitialTransactionList()
{
    HashSet<Transaction> transactionList = new HashSet<Transaction>();//Create new Car List as empty
    int numberOfObjectsToCreate = 1000000;
    int maxTransactionNumber = 100000;
    int maxUserId = 9999;
    int minUserId = 1000;
    int maxDays = 5;
    for (int i = 0; i < numberOfObjectsToCreate; i++)
    {
        transactionList.Add(new Transaction(GetRandomInt(maxUserId, minUserId), GetRandomInt(maxTransactionNumber), GetRandomInt(maxDays)));//Add a new Transaction to the list
    }
    return transactionList;
}
class Transaction
{
    public Transaction(int userId, int price, int day)
    {
        UserId = userId;
        Price = price;
        TimeStamp = new DateTime(2022, 12, day);//Keep year and month fixed and let the day vary
    }
    public int UserId { get; set; }
    public int Price { get; set; }
    public DateTime TimeStamp { get; set; }

    public override bool Equals(object? obj)
    {
        return obj is Transaction transaction &&
               UserId == transaction.UserId &&
               TimeStamp.Day == transaction.TimeStamp.Day &&
               TimeStamp.Month == transaction.TimeStamp.Month &&
               TimeStamp.Year == transaction.TimeStamp.Year;
    }

    public override int GetHashCode()
    {
        return HashCode.Combine(UserId, TimeStamp.Year, TimeStamp.Month, TimeStamp.Day);
    }

    public override string? ToString()
    {
        return UserId.ToString();
    }
}

Code Output
Function calls:10, In 0m 0s 6ms
Function calls:10, In 0m 0s 6ms
Function calls:10, In 0m 0s 6ms
Function calls:10, In 0m 0s 6ms
Function calls:10, In 0m 0s 6ms
Function calls:10, In 0m 0s 5ms
Function calls:10, In 0m 0s 6ms
Function calls:10, In 0m 0s 6ms
Function calls:10, In 0m 0s 6ms
Function calls:10, In 0m 0s 6ms
HashSet ExceptWith Method Average speed:6ms, In 10 tests

Conclusion

Overall RankMethodSpeed TestOrder Preserved
1HashSet's ExceptWith6msNo
2HashSet's IntersectWith49msNo
3HashSet's Union49msNo
4LINQ's Intersect2692msYes
5LINQ's Union3450msYes
6LINQ's Except3467msYes

The best methods for intersect, union, and except are with the HashSet collection type. They are very fast operations even with dealing with two lists of 1 million objects. These methods are best when the order of the items doesn't matter because HashSet does not guarantee, but depending on the size of the list you could just use HashSet operations and then sort quickly afterward.

If you need a sorted list then and you want to keep it simple then going with the LINQ functions might be worthwhile to look into if the dataset you are dealing with isn't too big.

Do you have any methods for Set Theory operations? Post in the comments.

Get Latest Updates
Comments Section