Best Removal Method In C# Of Duplicates From An Array

Best Removal Method Of Duplicates From An Array Banner Image

Introduction

To remove duplicate items from a list we have a couple of options in C#. One is to use LINQ Distinct to remove the items and then apply ToArray to convert the IEnumerable to an array. The other option is to write a method. Sometimes LINQ doesn't perform well and I'm going to test both of these methods to see which is faster. For one smaller array, it may not matter so much which one we choose, but if you are working with large datasets then you'll pick the fastest method.

Video With Examples

Summary Table

Analysis TypeLINQ DistinctFor Loop With HashSet And List
.NET Version Available>= .NET 3.5>= .NET 3.5
PerformanceApproaches O(N)Approaches O(N)
Lines Of Code111
Order MaintainedNoYes

Distinct Values Using LINQ Distinct

This is a two-step process. LINQ Distinct returns an IEnumerable which then has to be converted to an array. This method also returns an unordered sequence that has no duplicates. The nice thing about this method it only takes up one line of code and is very readable since it does exactly what it says it does. And since it has been available since .NET 3.5, it should be available on you're system unless your using something much older. In terms of performance, it needs to go through the entire array to find any duplicates and remove them.

LINQ Distinct Video Example

LINQ Distinct Code Example

This code example shows the different types and how each one of the duplicates is removed. First, the starting array is printed and then you can compare it to after LINQ distinct is used and indeed the duplicates are removed.

string[] exerciseIdeasArray = new string[5] { "Squats", "Deadlifts", "Bench press", "Deadlifts", "Squats" };//Declare a string array
int[] intArray = new int[10] { 1, 2, 3, 3, 4, 6, 6, 7, 1, 35 };//Declare an int array
double[] doubleArray = new double[10] { 1.2, 1.5, 2.5, 1.5, 2.5, 44.9, 821, 28, 1.2, 10 };//Declare a double array
float[] floatArray = new float[10] { 1f, 2f, 3f, 4f, 5f, 1f, 2f, 10f, 8f, 9f };//Declare a float array
bool[] boolArray = new bool[5] { true, false, true, false, false};//Declare a bool array

RemoveDuplicates<string>(exerciseIdeasArray);
RemoveDuplicates<int>(intArray);
RemoveDuplicates<double>(doubleArray);
RemoveDuplicates<float>(floatArray);
RemoveDuplicates<bool>(boolArray);

void RemoveDuplicates<T>(T[] array)
{
    Console.WriteLine($"Start:{string.Join(",", array)}");//Print original array
    T[] distinctArray = array.Distinct().ToArray();//Use LINQ Distinct to remove duplicates from the array
    Console.WriteLine($"End:{string.Join(",", distinctArray)}");//Print original array after duplicates are removed
    Console.WriteLine();//Print original array after duplicates are removed

}
Code Output
Start:Squats,Deadlifts,Bench press,Deadlifts,Squats
End:Squats,Deadlifts,Bench press

Start:1,2,3,3,4,6,6,7,1,35
End:1,2,3,4,6,7,35

Start:1.2,1.5,2.5,1.5,2.5,44.9,821,28,1.2,10
End:1.2,1.5,2.5,44.9,821,28,10

Start:1,2,3,4,5,1,2,10,8,9
End:1,2,3,4,5,10,8,9

Start:True,False,True,False,False
End:True,False

Distinct Values Using A For Loop With HashSet

Next, this is a method I wrote myself and have used at other times to remove duplicates. HashSet is useful in removing duplicates because it doesn't store any duplicates themselves. This can be used anytime after .NET 3.5 so it is just as available as LINQ Distinct is.

For Loop With HashSet And List Example Code

This method loops through the array and saves the distinct values in a list and hashset. The hashset is used

string[] exerciseIdeasArray = new string[5] { "Pull-ups", "Lunges", "Rows", "Lunges", "Pull-ups" };//Declare a string array
int[] intArray = new int[10] { 1, 4, 3, 3, 4, 6, 6, 4, 1, 35 };//Declare an int array
double[] doubleArray = new double[10] { 1.4, 1.5, 2.5, 1.5, 2.5, 44.9, 821, 28, 1.4, 10 };//Declare a double array
float[] floatArray = new float[10] { 12f, 89f, 3f, 4f, 5f, 12f, 2f, 10f, 8f, 89f };//Declare a float array
bool[] boolArray = new bool[5] { false, true, true, true, false };//Declare a bool array

RemoveDuplicates<string>(exerciseIdeasArray);
RemoveDuplicates<int>(intArray);
RemoveDuplicates<double>(doubleArray);
RemoveDuplicates<float>(floatArray);
RemoveDuplicates<bool>(boolArray);

void RemoveDuplicates<T>(T[] array)
{
    Console.WriteLine($"Start:{string.Join(",", array)}");//Print original array
    HashSet<T> distinctValues = new HashSet<T>();//Store distinct values so it they can be quickly checked against later
    List<T> list = new List<T>();//Store the distinct values to preserve the order
    for (int i = 0; i < array.Length; i++)//Loop through the array
    {
        if (!distinctValues.Contains(array[i]))//If not in Hashset then add it to the list.
        {
            distinctValues.Add(array[i]);//Add distinst value to HashSet
            list.Add(array[i]);//Add distinct value to list to preseve order.
        }
    }
    T[] distinctArray = list.ToArray();//Convert list to array
    Console.WriteLine($"End:{string.Join(",", distinctArray)}");//Print original array after duplicates are removed
    Console.WriteLine();//Print original array after duplicates are removed
}
Code Output
Start:Pull-ups,Lunges,Rows,Lunges,Pull-ups
End:Pull-ups,Lunges,Rows

Start:1,4,3,3,4,6,6,4,1,35
End:1,4,3,6,35

Start:1.4,1.5,2.5,1.5,2.5,44.9,821,28,1.4,10
End:1.4,1.5,2.5,44.9,821,28,10

Start:12,89,3,4,5,12,2,10,8,89
End:12,89,3,4,5,2,10,8

Start:False,True,True,True,False
End:False,True
C# Programming for Unity Game Development Specialization

COURSERA

C# Programming for Unity Game Development Specialization

4.7 (2,230 reviews)

Beginner level

No previous experience is necessary

3-months at 10 hours a week (At your pace)

$59 / month or audit, and financial aid available

If you want to learn more about C# then check out this course on Coursera. It's a 3 month course to take you through the basics of C#.

You can get started with programming using C# and apply it to Unity games in this beginner-level program. Facilitated by the University of Colorado, the program offers a chance to grasp the basics of C# and utilize the knowledge to develop Unity games.

[Full Disclosure: As an affiliate, I receive compensation if you purchase through this link]

Start Your 7-day Free Trial

Performance

Next, I want to test the performance. It will be an average of 10 tests. Each test will have 100 function calls of the test method. Each test method will have 1 million array objects to check for distinct.

Test Parameters
Test ParametersTotal
Tests10
Function Calls Per Test100
Objects Per Function Call1 Million

LINQ Distinct Speed Test Code

void TestMethod(string[] array)
{
    string[] distinctArray = array.Distinct().ToArray();//Use LINQ Distinct to remove duplicates from the array
}

For Loop With HashSet And List Code

void TestMethod(string[] array)
{
    HashSet<string> distinctValues = new HashSet<string>();//Store distinct values so it they can be quickly checked against later
    List<string> list = new List<string>();//Store the distinct values to preserve the order
    for (int i = 0; i < array.Length; i++)//Loop through the array
    {
        if (!distinctValues.Contains(array[i]))//If not in HashSet then add it to the list.
        {
            distinctValues.Add(array[i]);//Add distinst value to HashSet
            list.Add(array[i]);//Add distinct value to list to preseve order.
        }
    }
    string[] distinctArray = list.ToArray();//Convert list to array
}

LINQ Distinct Speed Test Code Output

Test 1:Function Calls:100, In 0m 8s 283ms
Test 2:Function Calls:100, In 0m 8s 263ms
Test 3:Function Calls:100, In 0m 8s 279ms
Test 4:Function Calls:100, In 0m 8s 327ms
Test 5:Function Calls:100, In 0m 8s 355ms
Test 6:Function Calls:100, In 0m 8s 346ms
Test 7:Function Calls:100, In 0m 8s 336ms
Test 8:Function Calls:100, In 0m 9s 206ms
Test 9:Function Calls:100, In 0m 8s 475ms
Test 10:Function Calls:100, In 0m 8s 485ms
LINQ Distinct Average Speed:8436ms, In 10 Tests

For Loop With HashSet And List Speed Test Code Output

Test 1:Function Calls:100, In 0m 10s 774ms
Test 2:Function Calls:100, In 0m 10s 985ms
Test 3:Function Calls:100, In 0m 10s 179ms
Test 4:Function Calls:100, In 0m 10s 539ms
Test 5:Function Calls:100, In 0m 10s 398ms
Test 6:Function Calls:100, In 0m 10s 650ms
Test 7:Function Calls:100, In 0m 10s 517ms
Test 8:Function Calls:100, In 0m 11s 65ms
Test 9:Function Calls:100, In 0m 10s 645ms
Test 10:Function Calls:100, In 0m 10s 488ms
LINQ Distinct Average Speed:10625ms, In 10 Tests
Full LINQ Distinct Speed Test Code
using System.Diagnostics;

int numberOfTests = 10;//Number of tests 
int numberOfFunctionCalls = 100;//Number of function calls made per test
int numberOfObjectsToCreate = 1000000;//Number test objects
int lengthOfRandomString = 50;
string testName = "LINQ Distinct";//Test name to print to average

void TestMethod(string[] array)
{
    string[] distinctArray = array.Distinct().ToArray();//Use LINQ Distinct to remove duplicates from the array
}

List<double> testSpeedList = new List<double>();
for (int testIndex = 0; testIndex < numberOfTests; testIndex++)
{
    testSpeedList.Add(StartTest(testIndex));
}
Console.WriteLine($"{testName} Average Speed:{Math.Round(testSpeedList.Average())}ms, In {numberOfTests} Tests");

double StartTest(int testIndex)
{
    Stopwatch stopwatch = new Stopwatch();
    string[] testData = GetArrayData();//Get intial random generated data
    for (int i = 0; i < numberOfFunctionCalls; i++)
    {
        stopwatch.Start();//Start the Stopwatch timer
        TestMethod(testData);//
        stopwatch.Stop();//Stop the Stopwatch timer
    }
    stopwatch.Stop();//Stop the Stopwatch timer
    Console.WriteLine($"Test {testIndex + 1}:Function Calls:{numberOfFunctionCalls}, In {stopwatch.Elapsed.Minutes}m {stopwatch.Elapsed.Seconds}s {stopwatch.Elapsed.Milliseconds}ms");
    return stopwatch.Elapsed.TotalMilliseconds;
}

string[] GetArrayData()
{
    string[] testData = new string[numberOfObjectsToCreate];
    for (int i = 0; i < numberOfObjectsToCreate; i++)
    {
        string key = GenerateRandomString(lengthOfRandomString);

        string value = GenerateRandomString(lengthOfRandomString);
        testData[i] = value;

    }
    return testData;
}

string GenerateRandomString(int length)
{
    Random random = new Random();
    const string chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
    return new string(Enumerable.Repeat(chars, length)
      .Select(s => s[random.Next(s.Length)]).ToArray());
}

Conclusion

Overall RankMethodSpeed
1LINQ Distinct8436ms
2For Loop With HashSet And List10625ms

The best method for removing duplicates is LINQ Distinct. It is the fastest method, with short and sweet syntax. It is very readable when reviewing the code. But the only downside to it is that is not guaranteed to preserve the order so if you need the order preserved then you'll need to use the next method on the list.

For Loop With HashSet And List is two seconds slower with a million objects isn't that bad but it does preserve the order since a list is involved. It does have more code with it so this is good if you need to order the array preserved otherwise it's better to go with LINQ Distinct in most cases.

Know any other ways to remove duplicates from an array? Let me know in the comments below.

Get Latest Updates