In C#, Get A Substring Before Or After A Certain Character

Get A Substring Before Or After A Certain Character Banner Image

Introduction

This scenario is common in parsing strings, user data, database data, and automatic data generated by scripts. In C#, there is always a need to parse the string for names, dates, and information and return a sub-section of that string. Of course, there is more than one approach to finding that information from a string and these are just some of the ways to do that when looking for a particular character or pattern.

Video With Examples

Summary Table

Analysis TypeIndexOf and SubstringString Concat And LINQString Split
Rank132
Speed Test2562ms16299ms12930ms
.NET Version Available>= .NET Framework 1.1 .NET Framework 4.0.NET Framework 3.5
Lines Of Code414

Get A Substring After A Certain Character

Substring After By IndexOf and Substring

The first step is to dynamically find a certain character's index using IndexOf then with the index use the start of a new substring. The rest of the characters are taken and returned as a new substring.

Substring After By IndexOf and Substring Video
Starting String
012345678
sixty-two

Approach

Using the example of a compound number sixty-two with a hyphen. If we want all the text after the first hyphen then we use the string IndexOf method to find the hyphen or any other character we need. The IndexOf function will return a value of 5. We don't want to start the substring at 5 because we don't want to include the hyphen in the substring. To start the substring at index 6 so we must add a one to this number to get the start position after the hyphen. Once we add one then we can start the substring to all the characters right of the hyphen which is index 6 to 8. Next for the substring function we also need to provide a length for the new string which is the length of the original string minus the start of where we want to cut. See the code example below.

Code Example
string compoundNumber = "sixty-two";//Orignal string before manipulation 
int indexCharLength = 1;//Length of the character we want to skip over for the substring
char indexChar = '-';//Character that we want to find the index to
int substringStartIndex = compoundNumber.IndexOf(indexChar) + indexCharLength;//Get an index of the hyphen plus 1 to get the start index just after the hyphen
int length = compoundNumber.Length - substringStartIndex;//This will yield a length of 2 since the start index is at 6 and the length is 8.
string number = compoundNumber.Substring(substringStartIndex, length);//Start index is 6 and length is 2 so substring will cut the indexes 6,7, and 8 to give the two at the end.
Console.WriteLine("compoundNumber:" + compoundNumber);//Print to screen starting string
Console.WriteLine("number:" + number);//Print to screen ending string
Code Output
compoundNumber:sixty-two
number:two

Multiple String Approach

That works for just one string but if we want to test this against more than one string and where the hyphen is in different positions in the string then next the code needs to be placed into a function call and called many times. Below is an example.

Code Example
List<string> compoundNumberList = new List<string>() { "sixty-two", "seventy-five", "fifty-nine", "ninety-three" };
foreach (string compoundNumber in compoundNumberList)
{
    string number = GetSubstringAfterOfSpecificChar(compoundNumber);//Call function to get substring to the left of the character
    Console.WriteLine("compoundNumber:" + compoundNumber + ", number:" + number);//Print original and function output
}
string GetSubstringAfterOfSpecificChar(string compoundNumber)
{
    const int indexCharLength = 1;//Length of the character we want to skip over for the substring
    const char indexChar = '-';//Character that we want to find the index to
    int substringStartIndex = compoundNumber.IndexOf(indexChar) + indexCharLength;//Get an index of the hyphen plus 1 to get the start index just after the hyphen
    string number = compoundNumber.Substring(substringStartIndex);//Start index is determined by IndexOf then the rest of the string is returned in the substring
    return number;
}
Code Output
compoundNumber:sixty-two, number:two
compoundNumber:seventy-five, number:five
compoundNumber:fifty-nine, number:nine
compoundNumber:ninety-three, number:three

Multiple Hyphen With Old Approach

The next challenge we may face is if there is more than one hyphen in the input string. The result may not be what want to see what happens if we run the same code on a different input set. If the inputs were phone numbers and we wanted the last 4 digits and we block out the rest of the number.

Code Example
List<string> compoundNumberList = new List<string>() { "324-380-3289", "845-389-3288", "383-383-9599", "849-458-3939" };
foreach (string compoundNumber in compoundNumberList)
{
    string number = GetSubstringToLeftOfSpecificChar(compoundNumber);//Call function to get substring to the left of the character
    Console.WriteLine("compoundNumber:" + compoundNumber + ", number:***-***-" + number);//Print original and function output
}
string GetSubstringToLeftOfSpecificChar(string compoundNumber)
{
    const int indexCharLength = 1;//Length of the character we want to skip over for the substring
    const char indexChar = '-';//Character that we want to find the index to
    int substringStartIndex = compoundNumber.IndexOf(indexChar) + indexCharLength;//Get index of the hyphen plus 1 to get start index just after the hyphen
    string number = compoundNumber.Substring(substringStartIndex);//Start index is determined by IndexOf then the rest of the string is returned in the substring
    return number;
}
Code Output
compoundNumber:324-380-3289, number:***-***-380-3289
compoundNumber:845-389-3288, number:***-***-389-3288
compoundNumber:383-383-9599, number:***-***-383-9599
compoundNumber:849-458-3939, number:***-***-458-3939
Starting Phone Number Example
01234567891011
324-380-3289

What Went Wrong When There Are Multiple Hyphens?

As you can see from the output the substring was taken at the first hyphen point at index 4 onwards. When what we want is to take index 8 onwards on the second hyphen.

Multiple Hyphens With Correct Approach

To do this we need to use LastIndexOf instead of IndexOf to get the last hyphen in the string. See the example below.

Code Example
List<string> compoundNumberList = new List<string>() { "324-380-3289", "845-389-3288", "383-383-9599", "849-458-3939" };
foreach (string compoundNumber in compoundNumberList)
{
    string number = GetSubstringToAfterASpecificChar(compoundNumber);//Call function to get substring to the left of the character
    Console.WriteLine("compoundNumber:" + compoundNumber + ", number:***-***-" + number);//Print original and function output
}
string GetSubstringToAfterASpecificChar(string compoundNumber)
{
    const int indexCharLength = 1;//Length of the character we want to skip over for the substring
    const char indexChar = '-';//Character that we want to find the index to
    int substringStartIndex = compoundNumber.LastIndexOf(indexChar) + indexCharLength;//Get an index of the last hyphen plus 1 to get the start index just after the hyphen
    string number = compoundNumber.Substring(substringStartIndex);//Start index is determined by IndexOf then the rest of the string is returned in the substring
    return number;
} 
Code Output
compoundNumber:324-380-3289, number:***-***-3289
compoundNumber:845-389-3288, number:***-***-3288
compoundNumber:383-383-9599, number:***-***-9599
compoundNumber:849-458-3939, number:***-***-3939

Multiple Hyphens With Correct Approach Recap

As we can see using LastIndexOf now correctly gets the position of the last hyphen so that the substring is only the last 4 digits of the phone number.

Substring After By String Concat And LINQ

string startingText = "Matthew.Smith@example.com";

char indexChar = '@';//character check up to
int index = startingText.IndexOf(indexChar) + 1;

string domain = string.Concat(startingText.Take(Range.StartAt(index)));//Get all characters after the specific character

Console.WriteLine(domain);
Code Output
example.com

Substring After By String Split

Video:Performance Minded Coding | Avoid String Split for C# Substring After A Character

https://youtube.com/shorts/sgjr30pF9hE?feature=share

string fileNameWithExtension = "Street430_10302001.csv";//Orignal filename with extension before manipulation 
const char indexChar = '.';//Character that we want to find the index to
const int fileNameIndex = 1;//After the string split the index with the file name is 0
string [] splitUpFileName = fileNameWithExtension.Split(indexChar);//Get index of the period
string fileName = splitUpFileName[fileNameIndex];//index is 0 to get only the file name, index = 1 would be the extension
Console.WriteLine("fileNameWithExtension:" + fileNameWithExtension);//Print to screen starting string
Console.WriteLine("fileName:" + fileName);//Print to screen ending string
Code Output
fileNameWithExtension:Street430_10302001.csv
fileName:csv

Get A Substring Before A Certain Character

Substring Before By IndexOf and Substring

Another way we can form a substring is to use IndexOf to find a certain character and we'll pass the start of the substring will be 0 and the length of the substring will be index returned by IndexOf for the certain character. See the example below.

Starting String

Street430_10302001.csv

Problem Statement

We want to find the file name and exclude the file extension. We are only given the file name with an extension from the database and have to parse the string to get the file name only.

Approach

To solve this we need to use IndexOf as in previous examples to find the index of the period endpoint for our substring. Once that is found then we can form the substring by passing the start index of zero and the endpoint to be the index of the period. See the code example below.

Code Example
string fileNameWithExtension = "Street430_10302001.csv";//Orignal filename with extension before manipulation 
const int startIndex = 0;
char indexChar = '.';//Character that we want to find the index to
int substringEndIndex = fileNameWithExtension.IndexOf(indexChar);//Get index of the period
string fileName = fileNameWithExtension.Substring(startIndex, substringEndIndex);//Start index is 0 and this function overload will get length is the index of the period
Console.WriteLine("fileNameWithExtension:" + fileNameWithExtension);//Print to screen starting string
Console.WriteLine("fileName:" + fileName);//Print to screen ending string
Code Output
fileNameWithExtension:Street430_10302001.csv
fileName:Street430_10302001
New Problem Statement

Now, what if want to find the file name with multiple file names with extensions assuming that the file name length may be different each time. Let's see an example.

Code Example
List<string> fileNameWithExtensions = new List<string>() { "Street430_10302001.xlsx", "Building430_1152003.csv", "Shop4620_01032020.xlsm", "Drive5405_12042021.xls" };
foreach (string fileNameWithExtension in fileNameWithExtensions)
{
    string fileName = GetSubstringBeforeASpecificChar(fileNameWithExtension);//Call function to get substring to the before of the character
    Console.WriteLine("fileNameWithExtension:" + fileNameWithExtension + ", fileName:" + fileName);//Print original and function output
}
string GetSubstringBeforeASpecificChar(string fileNameWithExtension)
{
    const int startIndex = 0;//Start index for the substring
    const char indexChar = '.';//Character that we want to find the index to
    int substringEndIndex = fileNameWithExtension.IndexOf(indexChar);//Get index of the period
    string fileName = fileNameWithExtension.Substring(startIndex, substringEndIndex);//Start index is 0 and this function overload will get length is the index of the period
    return fileName;
}
Code Output
fileNameWithExtension:Street430_10302001.xlsx, fileName:Street430_10302001
fileNameWithExtension:Building430_1152003.csv, fileName:Building430_1152003
fileNameWithExtension:Shop4620_01032020.xlsm, fileName:Shop4620_01032020
fileNameWithExtension:Drive5405_12042021.xls, fileName:Drive5405_12042021
New Problem Statement

Now suppose that we have a new set of data that included file names with multiple periods in them. How would the current code how this use case? Let's see below.

Code Example
List<string> fileNameWithExtensions = new List<string>() { "Street430.10302001.xlsx", "Building430.1152003.csv", "Shop4620.01032020.xlsm", "Drive5405.12042021.xls" };
foreach (string fileNameWithExtension in fileNameWithExtensions)
{
    string fileName = GetSubstringBeforeASpecificChar(fileNameWithExtension);//Call function to get substring to the before of the character
    Console.WriteLine("fileNameWithExtension:" + fileNameWithExtension + ", fileName:" + fileName);//Print original and function output
}
string GetSubstringBeforeASpecificChar(string fileNameWithExtension)
{
    const int startIndex = 0;//Start index for the substring
    const char indexChar = '.';//Character that we want to find the index to
    int substringEndIndex = fileNameWithExtension.IndexOf(indexChar);//Get index of the period
    string fileName = fileNameWithExtension.Substring(startIndex, substringEndIndex);//Start index is 0 and this function overload will get length is the index of the period
    return fileName;
}
fileNameWithExtension:Street430.10302001.xlsx, fileName:Street430
fileNameWithExtension:Building430.1152003.csv, fileName:Building430
fileNameWithExtension:Shop4620.01032020.xlsm, fileName:Shop4620
fileNameWithExtension:Drive5405.12042021.xls, fileName:Drive5405
What Went Wrong When There Are Multiple Periods?

As we can see that the file name is cut off with multiple periods. To correct this we need to take the LastIndexOf instead IndexOf because want the last period next to the extension. See the corrected code below.

Corrected Code Example
List<string> fileNameWithExtensions = new List<string>() { "Street430.10302001.xlsx", "Building430.1152003.csv", "Shop4620.01032020.xlsm", "Drive5405.12042021.xls" };
foreach (string fileNameWithExtension in fileNameWithExtensions)
{
    string fileName = GetSubstringBeforeASpecificChar(fileNameWithExtension);//Call function to get substring to the before of the character
    Console.WriteLine("fileNameWithExtension:" + fileNameWithExtension + ", fileName:" + fileName);//Print original and function output
}
string GetSubstringBeforeASpecificChar(string fileNameWithExtension)
{
    const int startIndex = 0;//Start index for the substring
    const char indexChar = '.';//Character that we want to find the index to
    int substringEndIndex = fileNameWithExtension.LastIndexOf(indexChar);//Get index of the last period
    string fileName = fileNameWithExtension.Substring(startIndex, substringEndIndex);//Start index is 0 and this function overload will get length is the index of the period
    return fileName;
}
Code Output
fileNameWithExtension:Street430.10302001.xlsx, fileName:Street430.10302001
fileNameWithExtension:Building430.1152003.csv, fileName:Building430.1152003
fileNameWithExtension:Shop4620.01032020.xlsm, fileName:Shop4620.01032020
fileNameWithExtension:Drive5405.12042021.xls, fileName:Drive5405.12042021
Multiple Periods With Correct Approach Recap

The corrected code now returns the corrected full file name without the extension.

Substring Before By String Concat And LINQ

Video With Code Example

Code Example

string startingText = "Matthew.Smith@example.com";

char indexChar = '@';//character check up to

string domain = string.Concat(startingText.TakeWhile(x => x != indexChar));//Get all characters before the specific character

Console.WriteLine(domain);

Code Output
Matthew.Smith

Substring Before By String Split

string fileNameWithExtension = "Street430_10302001.csv";//Orignal filename with extension before manipulation 
const char indexChar = '.';//Character that we want to find the index to
const int fileNameIndex = 0;//After the string split the index with the file name is 0
string [] splitUpFileName = fileNameWithExtension.Split(indexChar);//Get index of the period
string fileName = splitUpFileName[fileNameIndex];//index is 0 to get only the file name, index = 1 would be the extension
Console.WriteLine("fileNameWithExtension:" + fileNameWithExtension);//Print to screen starting string
Console.WriteLine("fileName:" + fileName);//Print to screen ending string
Code Output
fileNameWithExtension:Street430_10302001.csv
fileName:Street430_10302001

Performance

This test is done by averaging the time it takes in 10 tests. Each test runs a test method 100 times in a loop and each test method has 1 million.

IndexOf And Substring Code Example

void TestMethod(List<string> list)
{
    foreach (string str in list)
    {
        int indexCharLength = 1;//Length of the character we want to skip over for the substring
        int substringStartIndex = str.IndexOf(indexChar) + indexCharLength;//Get an index of the hyphen plus 1 to get the start index just after the hyphen
        int length = str.Length - substringStartIndex;//This will yield a length
        string afterText = str.Substring(substringStartIndex, length);//get a substring
    }
}

String Concat And LINQ Code Example

void TestMethod(List<string> list)
{
    foreach (string str in list)
    {
        string substringBefore = string.Concat(str.TakeWhile(x => x != indexChar));
    }
}

String Split Code Example

void TestMethod(List<string> list)
{
    foreach (string str in list)
    {
        const int fileNameIndex = 1;
        string[] stringSplit = str.Split(indexChar);//Get index of the period
        string substring = stringSplit[fileNameIndex];//Get substring after character
    }
}
IndexOf And Substring Code Output
Test 1:Function Calls:100, In 0m 2s 584ms
Test 2:Function Calls:100, In 0m 2s 504ms
Test 3:Function Calls:100, In 0m 2s 462ms
Test 4:Function Calls:100, In 0m 2s 511ms
Test 5:Function Calls:100, In 0m 2s 556ms
Test 6:Function Calls:100, In 0m 2s 686ms
Test 7:Function Calls:100, In 0m 2s 557ms
Test 8:Function Calls:100, In 0m 2s 605ms
Test 9:Function Calls:100, In 0m 2s 529ms
Test 10:Function Calls:100, In 0m 2s 619ms
IndexOf and Substring Average Speed:2562ms, In 10 Tests
String Concat And LINQ Code Output
Test 1:Function Calls:100, In 0m 16s 708ms
Test 2:Function Calls:100, In 0m 16s 307ms
Test 3:Function Calls:100, In 0m 16s 322ms
Test 4:Function Calls:100, In 0m 16s 173ms
Test 5:Function Calls:100, In 0m 16s 239ms
Test 6:Function Calls:100, In 0m 16s 329ms
Test 7:Function Calls:100, In 0m 16s 257ms
Test 8:Function Calls:100, In 0m 16s 296ms
Test 9:Function Calls:100, In 0m 16s 178ms
Test 10:Function Calls:100, In 0m 16s 172ms
String Concat Average Speed:16299ms, In 10 Tests
String Split
Test 1:Function Calls:100, In 0m 12s 987ms
Test 2:Function Calls:100, In 0m 12s 869ms
Test 3:Function Calls:100, In 0m 13s 123ms
Test 4:Function Calls:100, In 0m 12s 858ms
Test 5:Function Calls:100, In 0m 12s 835ms
Test 6:Function Calls:100, In 0m 12s 838ms
Test 7:Function Calls:100, In 0m 13s 69ms
Test 8:Function Calls:100, In 0m 12s 989ms
Test 9:Function Calls:100, In 0m 12s 908ms
Test 10:Function Calls:100, In 0m 12s 820ms
String Split Average Speed:12930ms, In 10 Tests
Full Test Code
using System.Diagnostics;

int numberOfTests = 10;//Number of tests 
int numberOfFunctionCalls = 100;//Number of function calls made per test
int numberOfObjectsToCreate = 1000000;//Number test objects
int lengthOfRandomString = 50;
char indexChar = '-';//Character that we want to find the index to

string testName = "IndexOf and Substring";//Test name to print to average

void TestMethod(List<string> list)
{
    foreach (string str in list)
    {
        int indexCharLength = 1;//Length of the character we want to skip over for the substring
        int substringStartIndex = str.IndexOf(indexChar) + indexCharLength;//Get an index of the hyphen plus 1 to get the start index just after the hyphen
        int length = str.Length - substringStartIndex;//This will yield a length
        string afterText = str.Substring(substringStartIndex, length);//get a substring
    }
}

List<double> testSpeedList = new List<double>();
for (int testIndex = 0; testIndex < numberOfTests; testIndex++)
{
    testSpeedList.Add(StartTest(testIndex));
}
Console.WriteLine($"{testName} Average Speed:{Math.Round(testSpeedList.Average())}ms, In {numberOfTests} Tests");

double StartTest(int testIndex)
{
    Stopwatch stopwatch = new Stopwatch();
    List<string> testData = GetListData();//Get intial random generated data
    for (int i = 0; i < numberOfFunctionCalls; i++)
    {
        stopwatch.Start();//Start the Stopwatch timer
        TestMethod(testData);//
        stopwatch.Stop();//Stop the Stopwatch timer
    }
    stopwatch.Stop();//Stop the Stopwatch timer
    Console.WriteLine($"Test {testIndex + 1}:Function Calls:{numberOfFunctionCalls}, In {stopwatch.Elapsed.Minutes}m {stopwatch.Elapsed.Seconds}s {stopwatch.Elapsed.Milliseconds}ms");
    return stopwatch.Elapsed.TotalMilliseconds;
}

List<string> GetListData()
{
    List<string> testData = new List<string>(numberOfObjectsToCreate);
    for (int i = 0; i < numberOfObjectsToCreate; i++)
    {
        string value = "";
        while (!value.Contains(indexChar))
        {
            value = GenerateRandomString(lengthOfRandomString);
        }

        testData.Add(value);
    }
    return testData;
}

string GenerateRandomString(int length)
{
    Random random = new Random();
    string chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" + indexChar + indexChar + indexChar + indexChar + "abcdefghijklmnopqrstuvwxyz@";
    return new string(Enumerable.Repeat(chars, length)
      .Select(s => s[random.Next(s.Length)]).ToArray());
}


Conclusion

IndexOf and Substring are the fastest method by far. This makes the best method for this use case. It may not fit on just one line of code but it's speed and flexibility more than make up for it.

String split seems like use less lines of code but it is slow and should be avoided when dealing large amounts of data.

string concat and LINQ should be avoided in this use case because it is really slow. Even though it all fits on one line of code it just isn't worth it.

Get Latest Updates