Wednesday, October 29, 2014

Microservices... Where to Start?

Micro-services are becoming a "thing" now and are probably de-facto when someone begins a new project and are thinking about hosting in the cloud but where do you start when you have a brown field project. Now I don't have any hot answers or amazing insights here all I can do is describe what my first "micro-service" was and how it came into being.

Over time the application was getting more use and the number of servers involved started to increase; we were using auto-scaling and the number of servers increased in line with usage but wavered between 8 and 24 instances. This quite rightly caused some consternation so we tinkered with number of core settings for each instance and thresholds for triggers to scale up and down but nothing seemed to alter the number of total cores being used. We actually have a hefty bit of logging and we can control the output through logging levels so we decided to change the logging to try and get more diagnostic information and this is when things got interesting. As this is a production system getting hold of this log information was initially problematic and slow so we had already started forwarding all the messages to SplunkStorm using the available API and all was well (for over a year) and we were very impressed with how we could use that information for ad-hoc queries. However when we changed the logging levels the servers started scaling and we started to get database errors; unusual ones involving SQL connection issues rather than SQL query errors. We quickly reverted the changes and decided to try and replicate the problem in our CI/SIT environments.

What we realized was that it was our own logging that was causing our performance issues and even more awkwardly was also responsible for the SQL connection issues as the logging to SplunkStorm via its API was using up the available TCPIP connections; this was even more pronounced when we changed the logging level. What we needed to do was refactor our logging such that we could get all our data into SplunkStorm (and Splunk as we were also in the process of migrating to SplunkStorm's big brother) with minimum impact to the actual production systems. Thankfully our logging framework used NLog, which we had wrapped in another entity for mocking purposes, so what we decided to do was write a new NLog target that would instead log to a queue (service-bus) and then have another service read messages from that queue and forward them to Splunk and SplunkStorm and thus our first micro-service was born.

The new NLog target took the log messages, batch pushed them to the queue, then a microservice was written that monitors the queue, pulls messages off in batches, and then pushes them to Splunk and SplunkStorm, also in batches. The initial feasibility spike took 1/2 a day with the the final implementation being ready and pushed into production the following week. Because we were using .NET we could also take advantage of multiple threads so we used thead-pools to limit the number of active Splunk/SplunkStorm messages being sent in parallel. What we found after deployment was that we could scale back our main application servers to 4 instances with only a pair of single core services dealing with the logging aspect, we also noticed that the auto scaling never reaches its old thresholds and the instance count has been stable ever since. Another advantage is that the queue can now be used by other services to push messages to Splunk and can even use the same NLog target in their projects to deal with all the complexities.

I hope the above shows that your first micro-service does not have to be something elaborate but instead deal with a mundane but quite essential task and the benefits can be quite astounding.

Monday, October 13, 2014

Excluding code from coverage...

This may (no guarantees) turn into a series of posts on how to refactor your code for testing using simple examples.

This particular example came from a request to add an "Exclude Lines from Coverage" feature to OpenCover. Now there are many ways this could be achieved, none of which I had any appetite for as they were either too clunky and/or could make OpenCover very slow. I am also not a big fan on excluding anything from code coverage; though OpenCover has several exclude options I just thought that this was one step too far in order to achieve that 100% coverage value as it could too easily abused. Even if I did think the feature was useful it still may not get implemented by myself for several days, weeks or months.

But sometimes there are other ways to cover your code without a big refactoring and mocking exercise which can act as a deterrent to doing the right thing.

In this case the user was using EntityFramework and wanted to exclude the code in the catch handlers because they couldn't force EntityFramework to crash on demand - this is quite a common problem in my experience. The user also knew that one approach was to push all that EntityFramework stuff out to another class and could then test their exception handling via mocks but didn't have the time/appetite to go down that path and thus wanted to exclude that code.

I imagined that the user has code that looked something like this:

public void SaveCustomers(ILogger logger)
{
  CustomersEntities ctx = CustomersEntities.Context;//)
  try
  {
    // awsome stuff with EntityFramework
    ctx.SaveChanges();
  }
  catch(Exception ex)
  {
    // do some awesome logging
    logger.Write(ex);
    throw;
  }
}

and I could see why this would be hard (but not impossible) to test the exception handling. Now instead of extracting out all the interactions with the EntityFramework so it is possible to throw an exception during testing I suggested the following refactoring:

internal void CallWrapper(Action doSomething, ILogger logger)
{
  try
  {
    doSomething();
  }
  catch(Exception ex)
  {
    // do some awesome logging
    logger.Write(ex);
    throw;
  }
}

which I would then use like this:

public void SaveCustomers(ILogger logger)
{
  CustomersEntities ctx = CustomersEntities.Context;//)
  CallWrapper(() => {
    // awsome stuff with EntityFramework
    ctx.SaveChanges();
  }, logger);
}


My original tests should still continue as before and now I have a new method that I can now test independently.

I know this isn't the only way to tackle this sort of problem and I'd love to hear about other approaches.

Monday, October 6, 2014

A simple TDD example

I recently posted a response to StackOverflow wrt TDD and Coverage and I thought it would be worth re-posting the response here. The example is simple but hopefully shows how writing the right tests using TDD gives you a better suite of tests for your code than you would probably write if you wrote the tests after the code (which may have been re-factored as you developed).

"As the [original] accepted answer has pointed out your actual scenario reduces to collection.Sum() however you will not be able to get away with this every time.

If we use TDD to develop this (overkill I agree but easy to explain) we would [possibly] do the following (I am also using NUnit in this example out of preference).

[Test]
public void Sum_Is_Zero_When_No_Entries()
{
    var bomManager = new BomManager();
    Assert.AreEqual(0, bomManager.MethodToTest(new Collection<int>()));
}

and then write the following code (note: we write the minimum to meet the current set of tests)

public int MethodToTest(Collection<int> collection)
{
    var sum = 0;
    return sum;
}

We would then write a new test e.g.

[Test]
[TestCase(new[] { 0 }, 0)]
public void Sum_Is_Calculated_Correctly_When_Entries_Supplied(int[] data, int expected)
{
    var bomManager = new BomManager();
    Assert.AreEqual(expected, bomManager.MethodToTest(new Collection<int>(data)));
}

If we ran our tests they would all pass (green) so we need a new test(cases)

[TestCase(new[] { 1 }, 1)]
[TestCase(new[] { 1, 2, 3 }, 6)]

In order to satisfy those tests I would need to modify my code e.g.

public int MethodToTest(Collection<int> collection)
{
    var sum = 0;
    foreach (var value in collection)
    {
        sum += value;
    }
    return sum;
}

Now all my tests work and if I run that through OpenCover I get 100% sequence and branch coverage - Hurrah!.... And I did so without using coverage as my control but writing the right tests to support my code.

BUT there is a 'possible' defect... what if I pass in null? Time for a new test to investigate

[Test]
public void Sum_Is_Zero_When_Null_Collection()
{
    var bomManager = new BomManager();
    Assert.AreEqual(0, bomManager.MethodToTest(null));
}

The test fails so we need to update our code e.g.

public int MethodToTest(Collection<int> collection)
{
    var sum = 0;
    if (collection != null)
    {
        foreach (var value in collection)
        {
            sum += value;
        }
    }
    return sum;
}

Now we have tests that support our code rather than tests that test our code i.e. our tests do not care about how we went about writing our code.

Now we have a good set of tests so we can now safely refactor our code e.g.

public int MethodToTest(IEnumerable<int> collection)
{
    return (collection ?? new int[0]).Sum();
}

And I did so without affecting any of the existing tests."

Thursday, April 3, 2014

Customsing New Relic installation during Azure deployments

For about a year we've been running New Relic to monitor our WebRoles running on the Azure platform. Installing has been quite simple by following the instructions initially found on the New Relic site and is now available via Nuget; however two things about this process have been irking me.

First, I wanted to be able to distinguish the CI and Production deployments in the New Relic portal by making them have different names, but the name as it appears in the New relic portal is controlled through a setting in the web.config and cannot be controlled though the Azure portal.

Second, I wanted to be able to control the licence key we used for CI (free licence, limited functionality) and Production (expensive licence, full functionality) deployments, however the key is embedded in the newrelic.cmd and is applied when the New Relic agent is installed; this is not easy to change during/post deployment.

The initial solution to both these problems involved producing two packages, one for the CI environment(s) and one for the Production environment. Instead of the normal Debug and Release build outputs, a 3rd target, Production, was used and the web.config was modified during the build process using a transform that changed the name to what was wanted. The licence key issue was resolved by have two newrelic.cmd items in the project and then packaging the required one with the appropriate build. This was not ideal but it worked in a fashion however the ProdOps guys were keen on having control over the name and licence key used in production.

Changing the Application name

New Relic gets the Application name from a setting in the web.config and so what is necessary is to read a setting in the Azure configuration and update the web.config. There are many ways to resolve this issue but the approach we took was based on the solution to an identical issue raised on GitHub.  

Form completeness I will however reiterate the steps below:

  1. In the ServiceDefinition.csdef file add a setting to the  <ConfigurationSettings/> section

  2. <ConfigurationSettings>
      <Setting name="NewRelicApplicationName" />
    </ConfigurationSettings>
    

  3. In the ServiceConfiguration file for your environment add a setting that will be used to set the Application name in New Relic

  4. <ConfigurationSettings>
      <Setting name="NewRelicApplicationName" value="MyApplication" />
    </ConfigurationSettings>
    

  5. In the WebRole.cs file for your application amend your code with the following

  6.     public class WebRole : RoleEntryPoint
        {
            public override bool OnStart()
            {
                ConfigureNewRelic();
    
                return base.OnStart();
            }
    
            private static void ConfigureNewRelic()
            {
                if (RoleEnvironment.IsAvailable && !RoleEnvironment.IsEmulated)
                {
                    string appName;
                    try
                    {
                        appName = RoleEnvironment.GetConfigurationSettingValue("NewRelicApplicationName");
                    }
                    catch (RoleEnvironmentException)
                    {
                        /*nothing we can do so just return*/
                        return;
                    }
    
                    if (string.IsNullOrWhiteSpace(appName))
                        return;
    
                    using (var server = new ServerManager())
                    {
                        // get the site's web configuration
                        const string siteNameFromServiceModel = "Web";
                        var siteName = string.Format("{0}_{1}", RoleEnvironment.CurrentRoleInstance.Id, siteNameFromServiceModel);
                        var siteConfig = server.Sites[siteName].GetWebConfiguration();
    
                        // get the appSettings section
                        var appSettings = siteConfig.GetSection("appSettings").GetCollection();
                        AddConfigElement(appSettings, "NewRelic.AppName", appName);
                        server.CommitChanges();
                    }
                }
            }
    
            private static void AddConfigElement(ConfigurationElementCollection appSettings, string key, string value)
            {
                if (appSettings.Any(t => t.GetAttributeValue("key").ToString() == key))
                {
                    appSettings.Remove(appSettings.First(t => t.GetAttributeValue("key").ToString() == key));
                }
                
                ConfigurationElement addElement = appSettings.CreateElement("add");
                addElement["key"] = key;
                addElement["value"] = value;
                appSettings.Add(addElement);
            }
        }
    
And that should be it

Changing the New Relic licence key

The New Relic licence key is applied when the New Relic agent is installed on the host so what we is needed is to read the Azure configuration when the newrelic.bat is executed as part of the Startup tasks (defined in the ServiceDefinition.csdef) and apply it when the agent is installed. There does not appear to be way of changing the licence key if your agents have already been installed other than reducing the number of instances to 0 and then scaling back up (I suggest you use the staging slot for this).

  1. In the ServiceDefinition.csdef file add a setting to the  <ConfigurationSettings/> section

  2. <ConfigurationSettings>
      <Setting name="NewRelicLicenceKey" />
    </ConfigurationSettings>
    

    and add a new Environment variable to the newrelic.cmd startup task that will be set by the new configuration setting

    <Task commandLine="newrelic.cmd" executionContext="elevated" taskType="simple">
            <Environment>
              <Variable name="EMULATED">
                <RoleInstanceValue xpath="/RoleEnvironment/Deployment/@emulated" />
              </Variable>
              <Variable name="NewRelicLicence">
                <!-- http://msdn.microsoft.com/en-us/library/windowsazure/hh404006.aspx -->
                <RoleInstanceValue xpath="/RoleEnvironment/CurrentInstance/ConfigurationSettings/ConfigurationSetting[@name='NewRelicLicenceKey']/@value" />
              </Variable>
              <Variable name="IsWorkerRole" value="false" />
            </Environment>
          </Task>
    

  3. In the ServiceConfiguration file for your environment add a setting that will be used to set the Application name in New Relic

  4. <ConfigurationSettings>
      <Setting name="NewRelicLicenceKey" value="<ADD YOUR KEY HERE>" />
    </ConfigurationSettings>

  5. Edit your newrelic.cmd to use the Environment variable

  6. :: Update with your license key
    SET LICENSE_KEY=%NewRelicLicenceKey%

Now you should be able to control the New Relic licence key during your deployment.