Identifying Common Performance Issues in C#

$C:\Dave\Storey$

7 min readJul 3, 2018

Introduction

As developers we get so deep and involved in solving a problem that we lose track of the bigger picture. Code evolves over time and sometimes this causes code to be repurposed or misused beyond its original design. Here I will share some of the common failures I identified in our codebase:

Deferred execution
Incorrect data types
Excessive/careless use of reflection
Unnecessary object creation and excessive garbage collection

Let’s take a deeper look at each of these topics in depth.

Deferred Execution

Lamdas are an immensely powerful and useful piece of modern programming languages like .NET (LINQ), they allow us to perform complex tasks with very little code due to some syntactical sugar, but at the same time very few programmers actually appreciate what is happening under the hood. Take for instance this piece of code:

var mike = people.Where(p => p.Name == "Mike").Select(p => new User(p));

What does the variable mike contain? Most developers will say it contains a collection of User objects for all people with the name "Mike"; however, this is wrong, what it actually contains is a deferred execution expression tree (or rather think of this less as the thing you want but more the way to get it).

This means that every time we access the variable mike the program will re-evaluate the query to produce the correct result. Yes, the theory is sound as in our example the application at runtime does not know if the people object has changed, a new object may have been added to the collection. This means that deferred execution expressions assigned to properties that are frequenty accessed but do not change are in fact just burning CPU for no reason.

Thankfully there is an easy way to get around this. Simply calling one of the many extension methods provided that force the expression to be evaluated such as ToArray(), ToList(), Any() or Count() or any LINQ expression that produces a single value such as people.First(p => p.Name == "Mike"). For more info about deffered execution I'd recommend taking a look at this great blog post by Charlie Calvert

In summary:

1: Think carefully about how you use LINQ expressions.

2: Think about how many times you are acessing expression results.

Incorrect Data Types

.NET developers seem to have a real passion for the List<T> data type. Everytime we want to store or build a collection of items, we seem to default to the old reliable List<T>. Why is this?? What's worse is we then often want to later try and extract values out of this list quickly, or quickly or evaluate if an item exists, but lists are anything but quick.

Thankfully .NET provides a number of helpful data types for operations such as ensuring uniqueness, random access and logical set operations. Let’s take another look at our previous example:

return people.Where(p => p.Name == "Mike").Select(p => new User(p));

It would seem that we are trying to quickly find people using their first name. This feels like a code smell, a simple enumerable of people does not seem to be the correct DataType. Imagine if we have 1000's of people, we would have to loop over all records to find any with the name Mike. Maybe we can use a Dictionary<TKey, TValue> or a instead. Now assuming we have defined our people variable to be of type Lookup<string, Person> our code becomes much simpler:

return people["Mike"]?.Select(p => new User(p));

For more information regarding the pros and cons of hash based data types please take a look at this blog post by Yan Cui and this StackOverflow article comparing speed tradeoffs of List<T> and HashSet<T>

Excessive/careless use of reflection

Over the years (and iterations of the .NET Framework) it’s become common knowledge that reflection is slow. For those not familiar with the concept, reflection is essentially using .NET to dynamically program and interrogate meta-data of objects at runtime. However, this is not to say that reflection is a bad thing, many IoC frameworks and test frameworks use reflection to great effect without impacting performance drastically; but, we should be mindful that this is a case of using the right tool for the right job.

Within our codebase a clear hotspot became visible regarding time taken during one function call with the telltale name Invoke():

return _validator ?? _factory.Invoke(fare.ValidFrom);

Now this seemingly innocent line of code above was nested deep within the codebase and was responsible for trying to validate if a given fare was valid for a given railcard. However, lets actually dig a little deeper as what it’s actually doing: basically this is returning a previously constructed validator if it is set, otherwise it is using reflection to call some function that will create a new validator passing a value of the date the fare is valid from. Now this is already slightly worrying, but what’s more worrying is that never persisting that new validator and so would be executed every single time it is requested; in this perticular situation this code could be called up to 100 times 😱

Unnecessary Object Creation and garbage collection

Garbage collection in .NET is something that most developers take for granted these days. We are all familiar (at least i hope we are all familiar) with the concept of IDisposable and why it is important to clean up after ourselves, however, sometimes we forget how some seemingly innocent code can leave a big mess for the Garbage Collector to deal with.

Let’s take another code example:

var phrase = "the quick brown fox jumps over the lazy dog";
var words = string.Split(phrase, ' '); return words.Contains("fox");

Seems innocent enough right? Wrong! Split() will actually create a new string object for each of the words in the original string plus an additional array object to hold all the new strings. In this instance that works out to be 10 new objects in total to be created and cleaned up for very little benefit. Suddenly our Garbage Collector is taking vital CPU away from doing our work to clean up our mess.

We can easily re-write the logic above into something just as effective, but much more efficient:

var phrase = "the quick brown fox jumps over the lazy dog";
return phrase.IndexOf("fox") > -1;

Now we do not create any excessive objects but manage to achieve the same result and the Garbage Collector is a happy bunny again 😄

Now let’s take a look at another example piece of code:

public bool IsTravellingByAnyOf(ICollection<TransportMode> modes)public bool IsWalk()
{
{
 var walk = new List<TransportMode>() {TransportMode.Walk}
 return IsTravellingByAnyOf(walk);
} 
 return modes.Contains(Mode);
}

Again, seems innocent enough, we are looking to see if the property Mode on our class contains any of the values within a specified collection; something that .NET allows us easily to do via the Contains() method on ICollection<T>. But look more closely, every time we call this method we are creating a new List object containing only a single element that never changes. This is just unnecessary waste that we need to clean up later at the cost of valuable CPU. You could maybe argue that .NET will optimize this at compilation time, but why take the chance? We can easily simplify this code as:

public bool IsWalk()
{
 return Mode == TransportMode.Walk;
}

Now we have no garbage to clean, we have no extra method call to do, the code is cleaner and once again we are happy 😄

TL;DR; Don’t bore us get to the chrous!

So what does all this actually mean to our customers?

Short and simple, we can now serve their complex searching needs much faster meaning they have to wait less time to see their results.

But what does this mean to us in Trainline?

Our server average CPU usage has been reduced by 30%
Our code complexity is reduced.
Our test coverage increased.
We can now reduce our number of servers or scale down vertically while still serving the same traffic = ££’s saved.

In Summary

We know our telemetry is important, but if that telemetry doesn’t tell us “why” or enable us to work out “why” then it can be argued that it is not providing us useful or valuable insights. We should actively strive to have a feedback loop in place which allows us to replay problematic requests and analyse snapshot moments on our server farms to figure out why our code is not performing.

We also know that optimisation is not hard, squeezing every last ounce of performance from our code is something that should be baked into every teams review process; but moreover it should be monitored and revisited during the development lifecycle to ensure it hasn’t degraded as code evolves. Luckily we have a lot of options in this space from automated benchmarking tools to monitoring build pipelines.

However, a word of warning, remember it is very easy to try and preemptively optimise or even over optimise your code. This can in turn have negative impact on your metrics, it is important that code is free to evolve, to change; but we much also remember to keep testing and evaluating changes to see what and if you have made any improvement or detriment to performance.

Originally published at https://medium.com on July 3, 2018.