Wednesday, October 29, 2014

Microservices... Where to Start?

Micro-services are becoming a "thing" now and are probably de-facto when someone begins a new project and are thinking about hosting in the cloud but where do you start when you have a brown field project. Now I don't have any hot answers or amazing insights here all I can do is describe what my first "micro-service" was and how it came into being.

Over time the application was getting more use and the number of servers involved started to increase; we were using auto-scaling and the number of servers increased in line with usage but wavered between 8 and 24 instances. This quite rightly caused some consternation so we tinkered with number of core settings for each instance and thresholds for triggers to scale up and down but nothing seemed to alter the number of total cores being used. We actually have a hefty bit of logging and we can control the output through logging levels so we decided to change the logging to try and get more diagnostic information and this is when things got interesting. As this is a production system getting hold of this log information was initially problematic and slow so we had already started forwarding all the messages to SplunkStorm using the available API and all was well (for over a year) and we were very impressed with how we could use that information for ad-hoc queries. However when we changed the logging levels the servers started scaling and we started to get database errors; unusual ones involving SQL connection issues rather than SQL query errors. We quickly reverted the changes and decided to try and replicate the problem in our CI/SIT environments.

What we realized was that it was our own logging that was causing our performance issues and even more awkwardly was also responsible for the SQL connection issues as the logging to SplunkStorm via its API was using up the available TCPIP connections; this was even more pronounced when we changed the logging level. What we needed to do was refactor our logging such that we could get all our data into SplunkStorm (and Splunk as we were also in the process of migrating to SplunkStorm's big brother) with minimum impact to the actual production systems. Thankfully our logging framework used NLog, which we had wrapped in another entity for mocking purposes, so what we decided to do was write a new NLog target that would instead log to a queue (service-bus) and then have another service read messages from that queue and forward them to Splunk and SplunkStorm and thus our first micro-service was born.

The new NLog target took the log messages, batch pushed them to the queue, then a microservice was written that monitors the queue, pulls messages off in batches, and then pushes them to Splunk and SplunkStorm, also in batches. The initial feasibility spike took 1/2 a day with the the final implementation being ready and pushed into production the following week. Because we were using .NET we could also take advantage of multiple threads so we used thead-pools to limit the number of active Splunk/SplunkStorm messages being sent in parallel. What we found after deployment was that we could scale back our main application servers to 4 instances with only a pair of single core services dealing with the logging aspect, we also noticed that the auto scaling never reaches its old thresholds and the instance count has been stable ever since. Another advantage is that the queue can now be used by other services to push messages to Splunk and can even use the same NLog target in their projects to deal with all the complexities.

I hope the above shows that your first micro-service does not have to be something elaborate but instead deal with a mundane but quite essential task and the benefits can be quite astounding.

Monday, October 13, 2014

Excluding code from coverage...

This may (no guarantees) turn into a series of posts on how to refactor your code for testing using simple examples.

This particular example came from a request to add an "Exclude Lines from Coverage" feature to OpenCover. Now there are many ways this could be achieved, none of which I had any appetite for as they were either too clunky and/or could make OpenCover very slow. I am also not a big fan on excluding anything from code coverage; though OpenCover has several exclude options I just thought that this was one step too far in order to achieve that 100% coverage value as it could too easily abused. Even if I did think the feature was useful it still may not get implemented by myself for several days, weeks or months.

But sometimes there are other ways to cover your code without a big refactoring and mocking exercise which can act as a deterrent to doing the right thing.

In this case the user was using EntityFramework and wanted to exclude the code in the catch handlers because they couldn't force EntityFramework to crash on demand - this is quite a common problem in my experience. The user also knew that one approach was to push all that EntityFramework stuff out to another class and could then test their exception handling via mocks but didn't have the time/appetite to go down that path and thus wanted to exclude that code.

I imagined that the user has code that looked something like this:

public void SaveCustomers(ILogger logger)
{
  CustomersEntities ctx = CustomersEntities.Context;//)
  try
  {
    // awsome stuff with EntityFramework
    ctx.SaveChanges();
  }
  catch(Exception ex)
  {
    // do some awesome logging
    logger.Write(ex);
    throw;
  }
}

and I could see why this would be hard (but not impossible) to test the exception handling. Now instead of extracting out all the interactions with the EntityFramework so it is possible to throw an exception during testing I suggested the following refactoring:

internal void CallWrapper(Action doSomething, ILogger logger)
{
  try
  {
    doSomething();
  }
  catch(Exception ex)
  {
    // do some awesome logging
    logger.Write(ex);
    throw;
  }
}

which I would then use like this:

public void SaveCustomers(ILogger logger)
{
  CustomersEntities ctx = CustomersEntities.Context;//)
  CallWrapper(() => {
    // awsome stuff with EntityFramework
    ctx.SaveChanges();
  }, logger);
}


My original tests should still continue as before and now I have a new method that I can now test independently.

I know this isn't the only way to tackle this sort of problem and I'd love to hear about other approaches.

Monday, October 6, 2014

A simple TDD example

I recently posted a response to StackOverflow wrt TDD and Coverage and I thought it would be worth re-posting the response here. The example is simple but hopefully shows how writing the right tests using TDD gives you a better suite of tests for your code than you would probably write if you wrote the tests after the code (which may have been re-factored as you developed).

"As the [original] accepted answer has pointed out your actual scenario reduces to collection.Sum() however you will not be able to get away with this every time.

If we use TDD to develop this (overkill I agree but easy to explain) we would [possibly] do the following (I am also using NUnit in this example out of preference).

[Test]
public void Sum_Is_Zero_When_No_Entries()
{
    var bomManager = new BomManager();
    Assert.AreEqual(0, bomManager.MethodToTest(new Collection<int>()));
}

and then write the following code (note: we write the minimum to meet the current set of tests)

public int MethodToTest(Collection<int> collection)
{
    var sum = 0;
    return sum;
}

We would then write a new test e.g.

[Test]
[TestCase(new[] { 0 }, 0)]
public void Sum_Is_Calculated_Correctly_When_Entries_Supplied(int[] data, int expected)
{
    var bomManager = new BomManager();
    Assert.AreEqual(expected, bomManager.MethodToTest(new Collection<int>(data)));
}

If we ran our tests they would all pass (green) so we need a new test(cases)

[TestCase(new[] { 1 }, 1)]
[TestCase(new[] { 1, 2, 3 }, 6)]

In order to satisfy those tests I would need to modify my code e.g.

public int MethodToTest(Collection<int> collection)
{
    var sum = 0;
    foreach (var value in collection)
    {
        sum += value;
    }
    return sum;
}

Now all my tests work and if I run that through OpenCover I get 100% sequence and branch coverage - Hurrah!.... And I did so without using coverage as my control but writing the right tests to support my code.

BUT there is a 'possible' defect... what if I pass in null? Time for a new test to investigate

[Test]
public void Sum_Is_Zero_When_Null_Collection()
{
    var bomManager = new BomManager();
    Assert.AreEqual(0, bomManager.MethodToTest(null));
}

The test fails so we need to update our code e.g.

public int MethodToTest(Collection<int> collection)
{
    var sum = 0;
    if (collection != null)
    {
        foreach (var value in collection)
        {
            sum += value;
        }
    }
    return sum;
}

Now we have tests that support our code rather than tests that test our code i.e. our tests do not care about how we went about writing our code.

Now we have a good set of tests so we can now safely refactor our code e.g.

public int MethodToTest(IEnumerable<int> collection)
{
    return (collection ?? new int[0]).Sum();
}

And I did so without affecting any of the existing tests."