Understanding The Best Set Theory Methods Between Lists

Understanding The Best Set Theory Methods Between Lists Banner Image

Introduction

Comparisions between lists can be made through built in LINQ functions such as union, intersect, except. These operations are helpful to quickly get values that are different in one list than in another or are only in both lists or exlude from one list.

Another way to C# provides built in funtions is for HashSet. It is a collection type that has intersetion, union and except operations. So we can compare between LINQ functions with the HashSet in performance and setup.

So I'll go into more details about the types of operations to do comparsions with lists.

Intersection Methods

Union Image

Based on the diagram, intersection is where both circles overlap. Items that exists in both circles are the intersection

For the code perspective, intersection of two list means that which elements they both have in common. For exmaple, if I have two lists of 5 numbers which one list has odd number and anothe list where it's muliples of 3. Then the intersection of the two lists would be 3 and 9. Which is often one of things we're interested in between two lists.

LINQ's Intersect Method

LINQ provides a built in function to return the common elements in a IEnumerable which we can convert to a list. This gives us a powerful one liner that gives us exactly what we want and quickly. This function is readabiblity and it describles what it is doing without us having to guess. Examine the code below with two sets of ints.

List<int> list1 = new List<int>() { 1, 3, 5, 7, 9 };//odd numbers
List<int> list2 = new List<int>() { 0, 3, 6, 9, 12 };//number by 3

List<int> intersectList = list1.Intersect(list2).ToList();//Get common elements between the lists
Console.WriteLine("intersectList:" + String.Join(",",intersectList));//Print to screen
Code Output
intersectList:3,9

This function sucuessfully only returned the two correct common elements of 3 and 9

LINQ's Intersect Method with objects

Next we're take a more complicated case when we have a list of objects. We'll create a class called Transaction that will have an id, price, timestamp. In this class I will overload equals and GetHashCode so that defines what a unique object is. In this case, I will make based on the timestamp and id. I will give Ids that are the same in the previous example and the date will be the same for all the transactions. See example below.


List<Transaction> list1 = new List<Transaction>();
list1.Add(new Transaction(1, 30, 1));//Generate a new Transaction
list1.Add(new Transaction(3, 40, 1));//Generate a new Transaction
list1.Add(new Transaction(5, 60, 1));//Generate a new Transaction
list1.Add(new Transaction(7, 70, 1));//Generate a new Transaction
list1.Add(new Transaction(9, 80, 1));//Generate a new Transaction


List<Transaction> list2 = new List<Transaction>();
list2.Add(new Transaction(0, 40,1));//Generate a new Transaction
list2.Add(new Transaction(3, 60,1));//Generate a new Transaction
list2.Add(new Transaction(6, 80,1));//Generate a new Transaction
list2.Add(new Transaction(9, 100,1));//Generate a new Transaction
list2.Add(new Transaction(12, 110,1));//Generate a new Transaction

List<Transaction> intersectList = list1.Intersect(list2).ToList();//Get common elements between the lists
Console.WriteLine("intersectList:" + String.Join(",", intersectList));//Print to screen

class Transaction
{
    public Transaction(int userId, int price, int day)
    {
        UserId = userId;
        Price = price;
        TimeStamp = new DateTime(2022, 12, day);//Keep year and month fixed and let the day vary
    }
    public int UserId { get; set; }
    public int Price { get; set; }
    public DateTime TimeStamp { get; set; }

    public override bool Equals(object? obj)
    {
        return obj is Transaction transaction &&
               UserId == transaction.UserId &&
               TimeStamp.Day == transaction.TimeStamp.Day &&
               TimeStamp.Month == transaction.TimeStamp.Month &&
               TimeStamp.Year == transaction.TimeStamp.Year;
    }

    public override int GetHashCode()
    {
        return HashCode.Combine(UserId, TimeStamp.Year, TimeStamp.Month, TimeStamp.Day);
    }

    public override string? ToString()
    {
        return UserId.ToString();
    }
}
Code Output
intersectList:3,9

These are the ids of intersection and this is correct. This is the same output that we get from the previous example.

LINQ Intersect Speed Test

The next thing we need to test is how the performance is on the this LINQ function. In this test, I will generate two one million lists and find the common ids between the two lists.

Parameters for the test

10 Tests

10 function calls per test

2 lists and 1 million objects per list


using System.Diagnostics;
int numberOfTests = 10;//Number of tests 
List<double> testSpeedList = new List<double>();

for (int i = 0; i < numberOfTests; i++)
{
    testSpeedList.Add(GetIntersectWithLINQMethodSpeedTest());
}
Console.WriteLine($"LINQ Intersect Method Average speed:{Math.Round(testSpeedList.Average())}ms, In {numberOfTests} tests");

double GetIntersectWithLINQMethodSpeedTest()
{
    int numberOfFunctionCalls = 10;//Number of function calls made
    Stopwatch stopwatch = new Stopwatch();
    List<Transaction> list1 = GetInitialTransactionList();//Get intial random generated list
    List<Transaction> list2 = GetInitialTransactionList();//Get intial random generated list
    for (int i = 0; i < numberOfFunctionCalls; i++)
    {
        stopwatch.Start();//Start the Stopwatch timer
        List<Transaction> intersectList = GetIntersectWithLINQ(list1, list2);//Use list constructor method
        stopwatch.Stop();//Stop the Stopwatch timer
    }
    stopwatch.Stop();//Stop the Stopwatch timer
    Console.WriteLine($"Function calls:{numberOfFunctionCalls}, In {stopwatch.Elapsed.Minutes}m {stopwatch.Elapsed.Seconds}s {stopwatch.Elapsed.Milliseconds}ms");
    return stopwatch.Elapsed.TotalMilliseconds;
}

List<Transaction> GetIntersectWithLINQ(List<Transaction> list1, List<Transaction> list2)
{
    List<Transaction> intersectList = list1.Intersect(list2).ToList();//Get common elements between the lists
    return intersectList;
}
Supporting Code
int GetRandomInt(int maxNumber, int minNumber = 1)
{
    Random random = new Random();//Create Random class
    int randomInt = random.Next(minNumber, maxNumber);//Get a random number between 1 and the maxnumber
    return randomInt;
}


List<Transaction> GetInitialTransactionList()
{
    List<Transaction> transactionList = new List<Transaction>();//Create new Car List as empty
    int numberOfObjectsToCreate = 1000000;
    int maxTransactionNumber = 100000;
    int maxUserId = 9999;
    int minUserId = 1000;
    int maxDays = 5;
    for (int i = 0; i < numberOfObjectsToCreate; i++)
    {
        transactionList.Add(new Transaction(GetRandomInt(maxUserId, minUserId), GetRandomInt(maxTransactionNumber), GetRandomInt(maxDays)));//Add a new Transaction to the list
    }
    return transactionList.OrderBy(x => x.TimeStamp).ThenBy(y => y.UserId).ToList();//Order the transactions by date and then user id
}

class Transaction
{
    public Transaction(int userId, int price, int day)
    {
        UserId = userId;
        Price = price;
        TimeStamp = new DateTime(2022, 12, day);//Keep year and month fixed and let the day vary
    }
    public int UserId { get; set; }
    public int Price { get; set; }
    public DateTime TimeStamp { get; set; }

    public override bool Equals(object? obj)
    {
        return obj is Transaction transaction &&
               UserId == transaction.UserId &&
               TimeStamp.Day == transaction.TimeStamp.Day &&
               TimeStamp.Month == transaction.TimeStamp.Month &&
               TimeStamp.Year == transaction.TimeStamp.Year;
    }

    public override int GetHashCode()
    {
        return HashCode.Combine(UserId, TimeStamp.Year, TimeStamp.Month, TimeStamp.Day);
    }

    public override string? ToString()
    {
        return UserId.ToString();
    }
}
Function calls:10, In 0m 2s 752ms
Function calls:10, In 0m 2s 799ms
Function calls:10, In 0m 2s 691ms
Function calls:10, In 0m 2s 660ms
Function calls:10, In 0m 2s 774ms
Function calls:10, In 0m 2s 649ms
Function calls:10, In 0m 2s 645ms
Function calls:10, In 0m 2s 648ms
Function calls:10, In 0m 2s 624ms
Function calls:10, In 0m 2s 669ms
LINQ Intersect Method Average speed:2692ms, In 10 tests

With one million objects per list LINQ intersect is able to complete the test in 2.6 seconds. It seems decently fast so let's see if there are any faster methods.

HashSet's IntersectWith Method

HashSet provides it's own intersect method. Note that for this method, only set1 will be modified so it will need to be converted to a list after the opersation is completed. So in the following example, I will generate a two HashSets and use the intersect method to find the common elements just like the LINQ function.

HashSet<Transaction> set1 = new HashSet<Transaction>();
set1.Add(new Transaction(1, 30, 1));//Generate a new Transaction
set1.Add(new Transaction(3, 40, 1));//Generate a new Transaction
set1.Add(new Transaction(5, 60, 1));//Generate a new Transaction
set1.Add(new Transaction(7, 70, 1));//Generate a new Transaction
set1.Add(new Transaction(9, 80, 1));//Generate a new Transaction


HashSet<Transaction> set2 = new HashSet<Transaction>();
set2.Add(new Transaction(0, 40, 1));//Generate a new Transaction
set2.Add(new Transaction(3, 60, 1));//Generate a new Transaction
set2.Add(new Transaction(6, 80, 1));//Generate a new Transaction
set2.Add(new Transaction(9, 100, 1));//Generate a new Transaction
set2.Add(new Transaction(12, 110, 1));//Generate a new Transaction

set1.IntersectWith(set2);//Get common elements between the lists
List<Transaction> intersectList = set1.ToList();
Console.WriteLine("intersectList:" + String.Join(",", intersectList));//Print to screen
Supporting Code
class Transaction
{
    public Transaction(int userId, int price, int day)
    {
        UserId = userId;
        Price = price;
        TimeStamp = new DateTime(2022, 12, day);//Keep year and month fixed and let the day vary
    }
    public int UserId { get; set; }
    public int Price { get; set; }
    public DateTime TimeStamp { get; set; }

    public override bool Equals(object? obj)
    {
        return obj is Transaction transaction &&
               UserId == transaction.UserId &&
               TimeStamp.Day == transaction.TimeStamp.Day &&
               TimeStamp.Month == transaction.TimeStamp.Month &&
               TimeStamp.Year == transaction.TimeStamp.Year;
    }

    public override int GetHashCode()
    {
        return HashCode.Combine(UserId, TimeStamp.Year, TimeStamp.Month, TimeStamp.Day);
    }

    public override string? ToString()
    {
        return UserId.ToString();
    }
}
Code Output
intersectList:3,9

HashSet provides the exact output as with LINQ.

HashSet IntersectWith Speed Time

Next will show a speed test for HashSet IntersectWith. Since HashSet has it's own built in fuctions and in general HashSet tends to be fast> I would expect to see HashSet functions to be really fast.


using System.Diagnostics;
int numberOfTests = 10;//Number of tests 
List<double> testSpeedList = new List<double>();

for (int i = 0; i < numberOfTests; i++)
{
    testSpeedList.Add(GetIntersectWithHashSetMethodSpeedTest());
}
Console.WriteLine($"HashSet IntersectWith Method Average speed:{Math.Round(testSpeedList.Average())}ms, In {numberOfTests} tests");

double GetIntersectWithHashSetMethodSpeedTest()
{
    int numberOfFunctionCalls = 10;//Number of function calls made
    Stopwatch stopwatch = new Stopwatch();
    HashSet<Transaction> set1 = GetInitialTransactionList();//Get intial random generated set
    HashSet<Transaction> set2 = GetInitialTransactionList();//Get intial random generated set
    for (int i = 0; i < numberOfFunctionCalls; i++)
    {
        stopwatch.Start();//Start the Stopwatch timer
        List<Transaction> intersectList = GetHashsetIntersectWith(set1, set2);//Use list constructor method
        stopwatch.Stop();//Stop the Stopwatch timer
    }
    stopwatch.Stop();//Stop the Stopwatch timer
    Console.WriteLine($"Function calls:{numberOfFunctionCalls}, In {stopwatch.Elapsed.Minutes}m {stopwatch.Elapsed.Seconds}s {stopwatch.Elapsed.Milliseconds}ms");
    return stopwatch.Elapsed.TotalMilliseconds;
}

List<Transaction> GetHashsetIntersectWith(HashSet<Transaction> list1, HashSet<Transaction> list2)
{
    list1.IntersectWith(list2);//Get common elements between the lists
    List<Transaction> intersectList = list1.ToList();
    return intersectList;
}
Supporting Code
int GetRandomInt(int maxNumber, int minNumber = 1)
{
    Random random = new Random();//Create Random class
    int randomInt = random.Next(minNumber, maxNumber);//Get a random number between 1 and the maxnumber
    return randomInt;
}


HashSet<Transaction> GetInitialTransactionList()
{
    HashSet<Transaction> transactionList = new HashSet<Transaction>();//Create new Car List as empty
    int numberOfObjectsToCreate = 1000000;
    int maxTransactionNumber = 100000;
    int maxUserId = 9999;
    int minUserId = 1000;
    int maxDays = 5;
    for (int i = 0; i < numberOfObjectsToCreate; i++)
    {
        transactionList.Add(new Transaction(GetRandomInt(maxUserId, minUserId), GetRandomInt(maxTransactionNumber), GetRandomInt(maxDays)));//Add a new Transaction to the list
    }
    return transactionList;
}
class Transaction
{
    public Transaction(int userId, int price, int day)
    {
        UserId = userId;
        Price = price;
        TimeStamp = new DateTime(2022, 12, day);//Keep year and month fixed and let the day vary
    }
    public int UserId { get; set; }
    public int Price { get; set; }
    public DateTime TimeStamp { get; set; }

    public override bool Equals(object? obj)
    {
        return obj is Transaction transaction &&
               UserId == transaction.UserId &&
               TimeStamp.Day == transaction.TimeStamp.Day &&
               TimeStamp.Month == transaction.TimeStamp.Month &&
               TimeStamp.Year == transaction.TimeStamp.Year;
    }

    public override int GetHashCode()
    {
        return HashCode.Combine(UserId, TimeStamp.Year, TimeStamp.Month, TimeStamp.Day);
    }

    public override string? ToString()
    {
        return UserId.ToString();
    }
}
Code Output
Function calls:10, In 0m 0s 51ms
Function calls:10, In 0m 0s 47ms
Function calls:10, In 0m 0s 48ms
Function calls:10, In 0m 0s 50ms
Function calls:10, In 0m 0s 48ms
Function calls:10, In 0m 0s 47ms
Function calls:10, In 0m 0s 48ms
Function calls:10, In 0m 0s 48ms
Function calls:10, In 0m 0s 47ms
Function calls:10, In 0m 0s 49ms
HashSet IntersectWith Method Average speed:49ms, In 10 tests

At an average speed of 49ms, HashSet's IntersectWith function is significantly faster than LINQ's Intersect method.

Union Methods

Union Image

Union is all unique entries from both circles, so both circles are highlighted. For lists, it all unique elements from both lists. Union is a common operation when combining two lists together to form one list. We'll first look at LINQ's union method so see example below.

LINQ's Union Method

LINQ provides a simple one-liner method that is compact and since it's built in function for lists it comes in handy. We'll start with a simple example. A list of odd number and a list of numbers by 3.

List<int> list1 = new List<int>() { 1, 3, 5, 7, 9 };//odd numbers
List<int> list2 = new List<int>() { 0, 3, 6, 9, 12 };//number by 3

List<int> unionList = list1.Union(list2).ToList();//Get all unique elements between the lists
Console.WriteLine("unionList:" + String.Join(",", unionList));//Print to screen

Code Output
unionList:1,3,5,7,9,0,6,12

Next another example by using objects.


List<Transaction> list1 = new List<Transaction>();
list1.Add(new Transaction(1, 30, 1));//Generate a new Transaction
list1.Add(new Transaction(3, 40, 1));//Generate a new Transaction
list1.Add(new Transaction(5, 60, 1));//Generate a new Transaction
list1.Add(new Transaction(7, 70, 1));//Generate a new Transaction
list1.Add(new Transaction(9, 80, 1));//Generate a new Transaction


List<Transaction> list2 = new List<Transaction>();
list2.Add(new Transaction(0, 40, 1));//Generate a new Transaction
list2.Add(new Transaction(3, 60, 1));//Generate a new Transaction
list2.Add(new Transaction(6, 80, 1));//Generate a new Transaction
list2.Add(new Transaction(9, 100, 1));//Generate a new Transaction
list2.Add(new Transaction(12, 110, 1));//Generate a new Transaction

List<Transaction> unionList = list1.Union(list2).ToList();//Get all unique elements between the lists
Console.WriteLine("unionList:" + String.Join(",", unionList));//Print to screen

Supporting Code
class Transaction
{
    public Transaction(int userId, int price, int day)
    {
        UserId = userId;
        Price = price;
        TimeStamp = new DateTime(2022, 12, day);//Keep year and month fixed and let the day vary
    }
    public int UserId { get; set; }
    public int Price { get; set; }
    public DateTime TimeStamp { get; set; }

    public override bool Equals(object? obj)
    {
        return obj is Transaction transaction &&
               UserId == transaction.UserId &&
               TimeStamp.Day == transaction.TimeStamp.Day &&
               TimeStamp.Month == transaction.TimeStamp.Month &&
               TimeStamp.Year == transaction.TimeStamp.Year;
    }

    public override int GetHashCode()
    {
        return HashCode.Combine(UserId, TimeStamp.Year, TimeStamp.Month, TimeStamp.Day);
    }

    public override string? ToString()
    {
        return UserId.ToString();
    }
}
Code Output
unionList:1,3,5,7,9,0,6,12

This is the correct output and list of odd and numbers by 3 combined into one list.

LINQ's Union Speed Test

This will be the same test as conducted for Intersection except we'll use the union method.



using System.Diagnostics;
int numberOfTests = 10;//Number of tests 
List<double> testSpeedList = new List<double>();

for (int i = 0; i < numberOfTests; i++)
{
    testSpeedList.Add(GetUnionWithLINQMethodSpeedTest());
}
Console.WriteLine($"LINQ Union Method Average speed:{Math.Round(testSpeedList.Average())}ms, In {numberOfTests} tests");

double GetUnionWithLINQMethodSpeedTest()
{
    int numberOfFunctionCalls = 10;//Number of function calls made
    Stopwatch stopwatch = new Stopwatch();
    List<Transaction> list1 = GetInitialTransactionList();//Get intial random generated list
    List<Transaction> list2 = GetInitialTransactionList();//Get intial random generated list
    for (int i = 0; i < numberOfFunctionCalls; i++)
    {
        stopwatch.Start();//Start the Stopwatch timer
        List<Transaction> intersectList = GetUnionWithLINQ(list1, list2);//Use list constructor method
        stopwatch.Stop();//Stop the Stopwatch timer
    }
    stopwatch.Stop();//Stop the Stopwatch timer
    Console.WriteLine($"Function calls:{numberOfFunctionCalls}, In {stopwatch.Elapsed.Minutes}m {stopwatch.Elapsed.Seconds}s {stopwatch.Elapsed.Milliseconds}ms");
    return stopwatch.Elapsed.TotalMilliseconds;
}

List<Transaction> GetUnionWithLINQ(List<Transaction> list1, List<Transaction> list2)
{
    List<Transaction> intersectList = list1.Union(list2).ToList();//Get common elements between the lists
    return intersectList;
}
Supporting Code
int GetRandomInt(int maxNumber, int minNumber = 1)
{
    Random random = new Random();//Create Random class
    int randomInt = random.Next(minNumber, maxNumber);//Get a random number between 1 and the maxnumber
    return randomInt;
}


List<Transaction> GetInitialTransactionList()
{
    List<Transaction> transactionList = new List<Transaction>();//Create new Car List as empty
    int numberOfObjectsToCreate = 1000000;
    int maxTransactionNumber = 100000;
    int maxUserId = 9999;
    int minUserId = 1000;
    int maxDays = 5;
    for (int i = 0; i < numberOfObjectsToCreate; i++)
    {
        transactionList.Add(new Transaction(GetRandomInt(maxUserId, minUserId), GetRandomInt(maxTransactionNumber), GetRandomInt(maxDays)));//Add a new Transaction to the list
    }
    return transactionList.OrderBy(x => x.TimeStamp).ThenBy(y => y.UserId).ToList();//Order the transactions by date and then user id
}

class Transaction
{
    public Transaction(int userId, int price, int day)
    {
        UserId = userId;
        Price = price;
        TimeStamp = new DateTime(2022, 12, day);//Keep year and month fixed and let the day vary
    }
    public int UserId { get; set; }
    public int Price { get; set; }
    public DateTime TimeStamp { get; set; }

    public override bool Equals(object? obj)
    {
        return obj is Transaction transaction &&
               UserId == transaction.UserId &&
               TimeStamp.Day == transaction.TimeStamp.Day &&
               TimeStamp.Month == transaction.TimeStamp.Month &&
               TimeStamp.Year == transaction.TimeStamp.Year;
    }

    public override int GetHashCode()
    {
        return HashCode.Combine(UserId, TimeStamp.Year, TimeStamp.Month, TimeStamp.Day);
    }

    public override string? ToString()
    {
        return UserId.ToString();
    }
}

Code Output
Function calls:10, In 0m 3s 478ms
Function calls:10, In 0m 3s 370ms
Function calls:10, In 0m 3s 443ms
Function calls:10, In 0m 3s 479ms
Function calls:10, In 0m 3s 354ms
Function calls:10, In 0m 3s 422ms
Function calls:10, In 0m 3s 499ms
Function calls:10, In 0m 3s 541ms
Function calls:10, In 0m 3s 464ms
Function calls:10, In 0m 3s 441ms
LINQ Union Method Average speed:3450ms, In 10 tests

LINQ's Union Method completes the test at about 3 and half seconds which is reasable. We'll see if HashSet can make an improvement on this.

HashSet Union Method

HashSet comes with a handy union method that returns another HashSet that is the combination of the two sets that is provided. The nice thing about this is that is a one-liner method. Let's see an example now.

HashSet<Transaction> set1 = new HashSet<Transaction>();
set1.Add(new Transaction(1, 30, 1));//Generate a new Transaction
set1.Add(new Transaction(3, 40, 1));//Generate a new Transaction
set1.Add(new Transaction(5, 60, 1));//Generate a new Transaction
set1.Add(new Transaction(7, 70, 1));//Generate a new Transaction
set1.Add(new Transaction(9, 80, 1));//Generate a new Transaction


HashSet<Transaction> set2 = new HashSet<Transaction>();
set2.Add(new Transaction(0, 40, 1));//Generate a new Transaction
set2.Add(new Transaction(3, 60, 1));//Generate a new Transaction
set2.Add(new Transaction(6, 80, 1));//Generate a new Transaction
set2.Add(new Transaction(9, 100, 1));//Generate a new Transaction
set2.Add(new Transaction(12, 110, 1));//Generate a new Transaction

List<Transaction> unionList = set1.Union(set2).ToList();//Get all unique elements between the lists
Console.WriteLine("unionList:" + String.Join(",", unionList));//Print to screen
Supporting Code
class Transaction
{
    public Transaction(int userId, int price, int day)
    {
        UserId = userId;
        Price = price;
        TimeStamp = new DateTime(2022, 12, day);//Keep year and month fixed and let the day vary
    }
    public int UserId { get; set; }
    public int Price { get; set; }
    public DateTime TimeStamp { get; set; }

    public override bool Equals(object? obj)
    {
        return obj is Transaction transaction &&
               UserId == transaction.UserId &&
               TimeStamp.Day == transaction.TimeStamp.Day &&
               TimeStamp.Month == transaction.TimeStamp.Month &&
               TimeStamp.Year == transaction.TimeStamp.Year;
    }

    public override int GetHashCode()
    {
        return HashCode.Combine(UserId, TimeStamp.Year, TimeStamp.Month, TimeStamp.Day);
    }

    public override string? ToString()
    {
        return UserId.ToString();
    }
}
Code Output
unionList:1,3,5,7,9,0,6,12

This is expected result for a union method.

HashSet Union Speed Test

We also need to test the speed of this method. Will it continue the trend of HashSet methods that are faster than their LINQ counterparts. We'll let's see as there's an example below.


using System.Diagnostics;
int numberOfTests = 10;//Number of tests 
List<double> testSpeedList = new List<double>();

for (int i = 0; i < numberOfTests; i++)
{
    testSpeedList.Add(GetUnionHashSetMethodSpeedTest());
}
Console.WriteLine($"HashSet Union Method Average speed:{Math.Round(testSpeedList.Average())}ms, In {numberOfTests} tests");

double GetUnionHashSetMethodSpeedTest()
{
    int numberOfFunctionCalls = 10;//Number of function calls made
    Stopwatch stopwatch = new Stopwatch();
    HashSet<Transaction> set1 = GetInitialTransactionList();//Get intial random generated set
    HashSet<Transaction> set2 = GetInitialTransactionList();//Get intial random generated set
    for (int i = 0; i < numberOfFunctionCalls; i++)
    {
        stopwatch.Start();//Start the Stopwatch timer
        List<Transaction> intersectList = GetHashsetUnion(set1, set2);//Use list constructor method
        stopwatch.Stop();//Stop the Stopwatch timer
    }
    stopwatch.Stop();//Stop the Stopwatch timer
    Console.WriteLine($"Function calls:{numberOfFunctionCalls}, In {stopwatch.Elapsed.Minutes}m {stopwatch.Elapsed.Seconds}s {stopwatch.Elapsed.Milliseconds}ms");
    return stopwatch.Elapsed.TotalMilliseconds;
}

List<Transaction> GetHashsetUnion(HashSet<Transaction> list1, HashSet<Transaction> list2)
{
    List<Transaction> unionList  = list1.Union(list2).ToList();//Get all unique elements between the lists
    return unionList;
}
Supporting Code
int GetRandomInt(int maxNumber, int minNumber = 1)
{
    Random random = new Random();//Create Random class
    int randomInt = random.Next(minNumber, maxNumber);//Get a random number between 1 and the maxnumber
    return randomInt;
}


HashSet<Transaction> GetInitialTransactionList()
{
    HashSet<Transaction> transactionList = new HashSet<Transaction>();//Create new Car List as empty
    int numberOfObjectsToCreate = 1000000;
    int maxTransactionNumber = 100000;
    int maxUserId = 9999;
    int minUserId = 1000;
    int maxDays = 5;
    for (int i = 0; i < numberOfObjectsToCreate; i++)
    {
        transactionList.Add(new Transaction(GetRandomInt(maxUserId, minUserId), GetRandomInt(maxTransactionNumber), GetRandomInt(maxDays)));//Add a new Transaction to the list
    }
    return transactionList;
}
class Transaction
{
    public Transaction(int userId, int price, int day)
    {
        UserId = userId;
        Price = price;
        TimeStamp = new DateTime(2022, 12, day);//Keep year and month fixed and let the day vary
    }
    public int UserId { get; set; }
    public int Price { get; set; }
    public DateTime TimeStamp { get; set; }

    public override bool Equals(object? obj)
    {
        return obj is Transaction transaction &&
               UserId == transaction.UserId &&
               TimeStamp.Day == transaction.TimeStamp.Day &&
               TimeStamp.Month == transaction.TimeStamp.Month &&
               TimeStamp.Year == transaction.TimeStamp.Year;
    }

    public override int GetHashCode()
    {
        return HashCode.Combine(UserId, TimeStamp.Year, TimeStamp.Month, TimeStamp.Day);
    }

    public override string? ToString()
    {
        return UserId.ToString();
    }
}
Code Output
Function calls:10, In 0m 0s 87ms
Function calls:10, In 0m 0s 81ms
Function calls:10, In 0m 0s 84ms
Function calls:10, In 0m 0s 83ms
Function calls:10, In 0m 0s 79ms
Function calls:10, In 0m 0s 84ms
Function calls:10, In 0m 0s 82ms
Function calls:10, In 0m 0s 81ms
Function calls:10, In 0m 0s 84ms
Function calls:10, In 0m 0s 83ms
HashSet Union Method Average speed:83ms, In 10 tests

Of the two union metheds, HashSet is the fastest at about 83ms. It is another sub second performance from HashSet, While both LINQ methods are over 2 seconds at least.

Except Methods

Union Image

Except is everything not in the current list so if I had two lists then this method would exclude everything from the first list even if they existed in first list. Let's look at an example.

LINQ's Except Method

LINQ provides an except method to difference between the two lists. It is the all items from list 1 that do not appear in list 2.

List<int> list1 = new List<int>() { 1, 3, 5, 7, 9 };//odd numbers
List<int> list2 = new List<int>() { 0, 3, 6, 9, 12 };//number by 3

List<int> exceptList = list1.Except(list2).ToList();//Get unique elements from second list only
Console.WriteLine("exceptList:" + String.Join(",", exceptList));//Print to screen

Code Output
exceptList:1,5,7

Here is also another example using objects.


List<Transaction> list1 = new List<Transaction>();
list1.Add(new Transaction(1, 30, 1));//Generate a new Transaction
list1.Add(new Transaction(3, 40, 1));//Generate a new Transaction
list1.Add(new Transaction(5, 60, 1));//Generate a new Transaction
list1.Add(new Transaction(7, 70, 1));//Generate a new Transaction
list1.Add(new Transaction(9, 80, 1));//Generate a new Transaction


List<Transaction> list2 = new List<Transaction>();
list2.Add(new Transaction(0, 40, 1));//Generate a new Transaction
list2.Add(new Transaction(3, 60, 1));//Generate a new Transaction
list2.Add(new Transaction(6, 80, 1));//Generate a new Transaction
list2.Add(new Transaction(9, 100, 1));//Generate a new Transaction
list2.Add(new Transaction(12, 110, 1));//Generate a new Transaction

List<Transaction> exceptList = list1.Except(list2).ToList();//Get common elements between the lists
Console.WriteLine("exceptList:" + String.Join(",", exceptList));//Print to screen
Supporting Code
class Transaction
{
    public Transaction(int userId, int price, int day)
    {
        UserId = userId;
        Price = price;
        TimeStamp = new DateTime(2022, 12, day);//Keep year and month fixed and let the day vary
    }
    public int UserId { get; set; }
    public int Price { get; set; }
    public DateTime TimeStamp { get; set; }

    public override bool Equals(object? obj)
    {
        return obj is Transaction transaction &&
               UserId == transaction.UserId &&
               TimeStamp.Day == transaction.TimeStamp.Day &&
               TimeStamp.Month == transaction.TimeStamp.Month &&
               TimeStamp.Year == transaction.TimeStamp.Year;
    }

    public override int GetHashCode()
    {
        return HashCode.Combine(UserId, TimeStamp.Year, TimeStamp.Month, TimeStamp.Day);
    }

    public override string? ToString()
    {
        return UserId.ToString();
    }
}
Code Output
exceptList:1,5,7

LINQ's Except Speed Test

Next we'll test how this except function performs un 2 one million list load. This is the same test as in previous examples.


using System.Diagnostics;
int numberOfTests = 10;//Number of tests 
List<double> testSpeedList = new List<double>();

for (int i = 0; i < numberOfTests; i++)
{
    testSpeedList.Add(GetExceptWithLINQMethodSpeedTest());
}
Console.WriteLine($"LINQ Except Method Average speed:{Math.Round(testSpeedList.Average())}ms, In {numberOfTests} tests");

double GetExceptWithLINQMethodSpeedTest()
{
    int numberOfFunctionCalls = 10;//Number of function calls made
    Stopwatch stopwatch = new Stopwatch();
    List<Transaction> list1 = GetInitialTransactionList();//Get intial random generated list
    List<Transaction> list2 = GetInitialTransactionList();//Get intial random generated list
    for (int i = 0; i < numberOfFunctionCalls; i++)
    {
        stopwatch.Start();//Start the Stopwatch timer
        List<Transaction> intersectList = GetExceptWithLINQ(list1, list2);//Use list constructor method
        stopwatch.Stop();//Stop the Stopwatch timer
    }
    stopwatch.Stop();//Stop the Stopwatch timer
    Console.WriteLine($"Function calls:{numberOfFunctionCalls}, In {stopwatch.Elapsed.Minutes}m {stopwatch.Elapsed.Seconds}s {stopwatch.Elapsed.Milliseconds}ms");
    return stopwatch.Elapsed.TotalMilliseconds;
}

List<Transaction> GetExceptWithLINQ(List<Transaction> list1, List<Transaction> list2)
{
    List<Transaction> intersectList = list1.Except(list2).ToList();//Get all elements from list 1 that do not appear in list 2
    return intersectList;
}
Supporting Code
int GetRandomInt(int maxNumber, int minNumber = 1)
{
    Random random = new Random();//Create Random class
    int randomInt = random.Next(minNumber, maxNumber);//Get a random number between 1 and the maxnumber
    return randomInt;
}


List<Transaction> GetInitialTransactionList()
{
    List<Transaction> transactionList = new List<Transaction>();//Create new Car List as empty
    int numberOfObjectsToCreate = 1000000;
    int maxTransactionNumber = 100000;
    int maxUserId = 9999;
    int minUserId = 1000;
    int maxDays = 5;
    for (int i = 0; i < numberOfObjectsToCreate; i++)
    {
        transactionList.Add(new Transaction(GetRandomInt(maxUserId, minUserId), GetRandomInt(maxTransactionNumber), GetRandomInt(maxDays)));//Add a new Transaction to the list
    }
    return transactionList.OrderBy(x => x.TimeStamp).ThenBy(y => y.UserId).ToList();//Order the transactions by date and then user id
}

class Transaction
{
    public Transaction(int userId, int price, int day)
    {
        UserId = userId;
        Price = price;
        TimeStamp = new DateTime(2022, 12, day);//Keep year and month fixed and let the day vary
    }
    public int UserId { get; set; }
    public int Price { get; set; }
    public DateTime TimeStamp { get; set; }

    public override bool Equals(object? obj)
    {
        return obj is Transaction transaction &&
               UserId == transaction.UserId &&
               TimeStamp.Day == transaction.TimeStamp.Day &&
               TimeStamp.Month == transaction.TimeStamp.Month &&
               TimeStamp.Year == transaction.TimeStamp.Year;
    }

    public override int GetHashCode()
    {
        return HashCode.Combine(UserId, TimeStamp.Year, TimeStamp.Month, TimeStamp.Day);
    }

    public override string? ToString()
    {
        return UserId.ToString();
    }
}
Code Output
Function calls:10, In 0m 3s 458ms
Function calls:10, In 0m 3s 420ms
Function calls:10, In 0m 3s 420ms
Function calls:10, In 0m 3s 379ms
Function calls:10, In 0m 3s 322ms
Function calls:10, In 0m 3s 453ms
Function calls:10, In 0m 3s 623ms
Function calls:10, In 0m 3s 364ms
Function calls:10, In 0m 3s 780ms
Function calls:10, In 0m 3s 444ms
LINQ Except Method Average speed:3467ms, In 10 tests

Except from LINQ comes in at about 3 and half seconds. This is very similar to the other LINQ methods tested so far

HashSet's ExceptWith Method

HashSet also has a built function method for the difference of lists. It is compact and easy to use. But also note that it modifies the first set and so that set needs to converted to a list after this operation. Let's look at an example.

HashSet<Transaction> set1 = new HashSet<Transaction>();
set1.Add(new Transaction(1, 30, 1));//Generate a new Transaction
set1.Add(new Transaction(3, 40, 1));//Generate a new Transaction
set1.Add(new Transaction(5, 60, 1));//Generate a new Transaction
set1.Add(new Transaction(7, 70, 1));//Generate a new Transaction
set1.Add(new Transaction(9, 80, 1));//Generate a new Transaction


HashSet<Transaction> set2 = new HashSet<Transaction>();
set2.Add(new Transaction(0, 40, 1));//Generate a new Transaction
set2.Add(new Transaction(3, 60, 1));//Generate a new Transaction
set2.Add(new Transaction(6, 80, 1));//Generate a new Transaction
set2.Add(new Transaction(9, 100, 1));//Generate a new Transaction
set2.Add(new Transaction(12, 110, 1));//Generate a new Transaction

set1.ExceptWith(set2);//Get all elements from list 1 that do not appear in list 2
List<Transaction> exceptList = set1.ToList();
Console.WriteLine("exceptList:" + String.Join(",", exceptList));//Print to screen
Supporting Code
class Transaction
{
    public Transaction(int userId, int price, int day)
    {
        UserId = userId;
        Price = price;
        TimeStamp = new DateTime(2022, 12, day);//Keep year and month fixed and let the day vary
    }
    public int UserId { get; set; }
    public int Price { get; set; }
    public DateTime TimeStamp { get; set; }

    public override bool Equals(object? obj)
    {
        return obj is Transaction transaction &&
               UserId == transaction.UserId &&
               TimeStamp.Day == transaction.TimeStamp.Day &&
               TimeStamp.Month == transaction.TimeStamp.Month &&
               TimeStamp.Year == transaction.TimeStamp.Year;
    }

    public override int GetHashCode()
    {
        return HashCode.Combine(UserId, TimeStamp.Year, TimeStamp.Month, TimeStamp.Day);
    }

    public override string? ToString()
    {
        return UserId.ToString();
    }
}
Code Output
exceptList:1,5,7

HashSet's ExceptWith Method Speed Test

I will complete the test for HashSet's ExceptWith Method. It is expected to be fast as the other HashSet methods were also fast. This will be the same test as the other meothds.



using System.Diagnostics;
int numberOfTests = 10;//Number of tests 
List<double> testSpeedList = new List<double>();

for (int i = 0; i < numberOfTests; i++)
{
    testSpeedList.Add(GetExceptWithHashSetMethodSpeedTest());
}
Console.WriteLine($"HashSet ExceptWith Method Average speed:{Math.Round(testSpeedList.Average())}ms, In {numberOfTests} tests");

double GetExceptWithHashSetMethodSpeedTest()
{
    int numberOfFunctionCalls = 10;//Number of function calls made
    Stopwatch stopwatch = new Stopwatch();
    HashSet<Transaction> set1 = GetInitialTransactionList();//Get intial random generated set
    HashSet<Transaction> set2 = GetInitialTransactionList();//Get intial random generated set
    for (int i = 0; i < numberOfFunctionCalls; i++)
    {
        stopwatch.Start();//Start the Stopwatch timer
        List<Transaction> exceptList = GetHashsetExceptWith(set1, set2);//Get all elements from list 1 that do not appear in list 2
        stopwatch.Stop();//Stop the Stopwatch timer
    }
    stopwatch.Stop();//Stop the Stopwatch timer
    Console.WriteLine($"Function calls:{numberOfFunctionCalls}, In {stopwatch.Elapsed.Minutes}m {stopwatch.Elapsed.Seconds}s {stopwatch.Elapsed.Milliseconds}ms");
    return stopwatch.Elapsed.TotalMilliseconds;
}

List<Transaction> GetHashsetExceptWith(HashSet<Transaction> list1, HashSet<Transaction> list2)
{
    list1.ExceptWith(list2);//Get all unique elements between the lists
    List<Transaction> exceptList = list1.ToList();
    return exceptList;
}
Supporting Code
int GetRandomInt(int maxNumber, int minNumber = 1)
{
    Random random = new Random();//Create Random class
    int randomInt = random.Next(minNumber, maxNumber);//Get a random number between 1 and the maxnumber
    return randomInt;
}


HashSet<Transaction> GetInitialTransactionList()
{
    HashSet<Transaction> transactionList = new HashSet<Transaction>();//Create new Car List as empty
    int numberOfObjectsToCreate = 1000000;
    int maxTransactionNumber = 100000;
    int maxUserId = 9999;
    int minUserId = 1000;
    int maxDays = 5;
    for (int i = 0; i < numberOfObjectsToCreate; i++)
    {
        transactionList.Add(new Transaction(GetRandomInt(maxUserId, minUserId), GetRandomInt(maxTransactionNumber), GetRandomInt(maxDays)));//Add a new Transaction to the list
    }
    return transactionList;
}
class Transaction
{
    public Transaction(int userId, int price, int day)
    {
        UserId = userId;
        Price = price;
        TimeStamp = new DateTime(2022, 12, day);//Keep year and month fixed and let the day vary
    }
    public int UserId { get; set; }
    public int Price { get; set; }
    public DateTime TimeStamp { get; set; }

    public override bool Equals(object? obj)
    {
        return obj is Transaction transaction &&
               UserId == transaction.UserId &&
               TimeStamp.Day == transaction.TimeStamp.Day &&
               TimeStamp.Month == transaction.TimeStamp.Month &&
               TimeStamp.Year == transaction.TimeStamp.Year;
    }

    public override int GetHashCode()
    {
        return HashCode.Combine(UserId, TimeStamp.Year, TimeStamp.Month, TimeStamp.Day);
    }

    public override string? ToString()
    {
        return UserId.ToString();
    }
}

Code Output
Function calls:10, In 0m 0s 6ms
Function calls:10, In 0m 0s 6ms
Function calls:10, In 0m 0s 6ms
Function calls:10, In 0m 0s 6ms
Function calls:10, In 0m 0s 6ms
Function calls:10, In 0m 0s 5ms
Function calls:10, In 0m 0s 6ms
Function calls:10, In 0m 0s 6ms
Function calls:10, In 0m 0s 6ms
Function calls:10, In 0m 0s 6ms
HashSet ExceptWith Method Average speed:6ms, In 10 tests

Conclusion

Overall RankMethodSpeed TestOrder Preserved
1HashSet's ExceptWith6msNo
2HashSet's IntersectWith49msNo
3HashSet's Union49msNo
4LINQ's Intersect2692msYes
5LINQ's Union3450msYes
6LINQ's Except3467msYes

The best methods for intersect, union and except are with the HashSet collection type. They are very fast operations even with dealing with two lists 1 million objects. These methods are best when the order of the items doesn't matter because HashSet does not guarantee, but depending on the size of the list you could just use HashSet operations then sort quickly afterwards.

If you need a sorted list then and you want to keep it simple then going with the LINQ functions might be worthwhile to look into and if you the dataset you are dealing with isn't too big.

Do you have any motheds for Set Theory operations? Post in the comments

Get Latest Updates
Comments Section