Monkey see, monkey do, occasionally monkey learn.: 2011

Sunday, October 2, 2011

Adding OpenCover to TeamCity

Adding OpenCover to the latest version of TeamCity (6.5) couldn't be easier however if you need help follow these simple steps.

1) Download and install OpenCover
2) Download and install ReportGenerator (actually unzip)
3) Register the OpenCover profiler DLLs using the regsvr32 utility

regsvr32 /s x86\OpenCover.Profiler.dll
regsvr32 /s x64\OpenCover.Profiler.dll

4) Using TeamCity add a new Build Step to your configuration
5) Choose Command Line as the runner type then choose Custom Script for the Run option.
6) Now all is needed is to set up the command to run the profiler against your tests e.g. for OpenCover the working directory is set to main\bin\debug and so we have

"%env.ProgramFiles(x86)%\opencover\opencover.console.exe" "-target:..\..\..\tools\NUnit-2.5.10.11092\bin\net-2.0\nunit-console-x86.exe" -targetargs:"OpenCover.Test.dll /noshadow" -filter:"+[Open*]* -[OpenCover.T*]*" "-output:..\..\..\opencovertests.xml"
"%env.ProgramFiles(x86)%\ReportGenerator\bin\ReportGenerator.exe" ..\..\..\opencovertests.xml ..\..\..\coverage

7) Finally setup the artifacts so that you can view the results in TeamCity e.g.

%teamcity.build.workingDir%\opencovertests.xml
%teamcity.build.workingDir%\coverage\**\*.*

And there you have it, OpenCover running under TeamCity and visual reports provided by ReportGenerator. I am sure you will find ways to improve upon this for your own builds.

Sunday, August 28, 2011

The problem with sequence coverage. (part 2)

Previously I mentioned why just relying on sequence coverage is not a good idea as it is possible to have 100% sequence coverage but not 100% code coverage. However I only described a scenario that used a branch that had 2 paths i.e. the most common form of the conditional branches, but there is one other member of the conditional branch family that exists in IL and that is the switch instruction; this instruction can have many paths. This time I am using the code from the Newtonsoft.Json library because a) it has tests and b) it is very well covered at 83% sequence coverage, but only 72% (by my calculations) branch coverage. The subject of this investigation is BsonReader::ReadType(BsonType) this method has a very large switch statement, that actually is defined as a switch statement in IL, with a default and several fall-throughs; a fall-through is where two or more case statements call the same code. The method itself has 98% sequence coverage and 82% branch coverage; the only code that is uncovered is the handler for the default: path.

which is not unexpected as it is a handler for an Enum which should not be set to any value that is not part of the allowed values. Looking at the branch coverage report we have the following results (the switch instruction we are interested in is at IL offset 8.

Now the first path (0) is unvisited, but we knew that, so the next unvisited branch is #14 and the next is #17; luckily for us the enum in question that is used by the switch instruction is well defined.

And as such we can thus deduce that the method is never called during testing with the values Symbol and TimeStamp but the code that they would call is covered; in fact we can see from the code that both these enum values are part of the switch/case statement and are part of fall-throughs. So again we see how branch coverage helps identify 'potential' issues and test candidates.

Friday, August 26, 2011

The problem with sequence coverage.

Sequence coverage is probably the simplest coverage metric, the information is packaged in PDB files and can be read using tools like Mono.Cecil, but just because a method has 100% sequence coverage does not mean you have 100% code coverage.

I'll use an example from OpenCover's own dogfood tests to demonstrate what I mean. Here is a method which shows that it has 100% coverage (sequence point that is).

However I see an issue and that is that on line 101 there is a condition, i.e. a branch, and yet if the visit count is 1 then there is no possibility that both paths for that branch could have been tested. We can therefore infer that even if we had 10000 visits there is no guarantee that every path would be covered even in such a simple method.

Looking at the OpenCover results from which the coverage report was generated. We get

We can see that each sequence point has been visited once, however the branch coverage shows that only one of the paths for the condition we have identified had been visited (in this case it is the true path); which is good as that is what we deduced.

So if you are using code coverage tools do NOT just rely on sequence point coverage alone to determine how well covered your code is. Luckily OpenCover now, as of 25th Aug 2011, supports branch coverage and ReportGenerator 1.2 displays most of the information to help you identify possible coverage mismatches.

Wednesday, August 10, 2011

OpenCover Performance Impact (part 2)

I think I now have a handle on why I was getting the results I earlier reported i.e. OpenCover and PartCover were not some magical performance boosters that added Go Faster stripes to your code.

After a heads up by leppie and his investigations of using OpenCover on his IronScheme project I realised that I needed to spend some time on optimizing how I get data from the profiler and aggregate it into the report. In case you are wondering IronScheme test that took just shy of 1 minute to run on my machine took over 60mins when running under the profiler, Ouch!

The problem

First of all I should explain what sort of data OpenCover gathers (and why) and then I can describe what I did to improve performance. OpenCover records each visit to a sequence point and stores these visits into shared memory; I did it this way as I am hoping to be able to use the order of visits for some form of path coverage analysis at a later date. After 8000 visits it informs the host process that there is a block ready for processing. The host takes this block, makes a copy, releases the shared memory back to the profiler and then processes the data. After processing the data the hosts then waits for the next message. It was this latter stage that was the bottleneck as the host was spending too much time aggregating the data that the profiler was already ready with the next 8000 points.

An (interim) solution

I say interim solution as I am not finished with performance improvements yet but decided that what I had implemented so far was okay for release.

First I looked at how the results were being aggregated and noticed that a lot of the time was being spent looking up the sequence points so that the visit count could be updated, I switched this to a list and mapped the visit count data to the model at the end of the profiling run. This helped but only by bringing the profiling run down to ~40mins.

I realised that I just had to get the data out of the way quickly and process it at a later date, so I added a processing thread and a ConcurrentQueue. This was an interesting turn of events as the target process now finished in 4 mins but the host took nearly 40 mins to process the data and the memory usage went up to 2.5GB and a backlog of 40K messages. Hmmm....

After some toying, whilst looking for inspiration, I noticed that the marshaling of the structure (2 integers) was where most of the time was spent. I switched this to using BitConvertor, which also meant that I could avoid the memory pinning required by the marshaling. Now the target process still ran in just under 4 mins but the backlog very rarely reached 20 messages and memory usage stayed at a comfortable level (<100MB) and the results were ready virtually as soon as the target process had closed. I also upped the number of visit points per packet to 16000, but this didn't show any noticeable improvement in performance.

I decided this was enough for now and released a version of the profiler.

But what about the earlier results?

Those earlier results though were still are cause for thought. Why should the OpenCover dogfood tests be faster but the ironscheme test be so much slower. Well the IronScheme tests were doing a lot of loops and were running parts of the code many 1000's of times whereas the dogfood tests were unit tests and the code was only being run several times before moving onto the next test fixture and next section of code. I am now thinking that the issue is due to the optimization that is normally performed by the JIT compiler, but is turned off by the profiler i.e. when running the tests (without profiler) the JIT compiler spends time optimizing the code but the time spent is not recovered as the code is not run enough times to get a net gain, compared to when the JIT compiler just compiles the non-optimised modified code that the profiler produces.

So in conclusion you may see some speed improvements if running tests where your code is only visited a few times but if you are doing intensive execution of code then don't be surprised if the performance is degraded.

Sunday, July 24, 2011

OpenCover Performance Impact

So how does OpenCover's profiling impact your testing. The best way is to get some figures so that you can judge for yourself.

I decided to use OpenCover's own tests and use the timing value produced by Nunit itself; just like I'd expect any user who is trying to determine impact I suppose. I've also added the results from PartCover for comparison. Before I took any numbers I (warmed) the code by running the code several times beforehand.

	Nunit32	Nunit32 (OpenCover)	Nunit32 (PartCover)	Nunit64	Nunit64 (OpenCover)
	2.643	2.691	2.639	4.544	3.807
	2.629	2.69	2.611	4.426	3.753
	2.642	2.638	2.612	4.46	4.036
Average	2.638	2.673	2.621	4.477	3.865

I don't know how to interpret these results as they don't make much sense, OpenCover seemed to add on average 1.3% to the total time (which I'd expect), whereas PartCover appears to make the code go faster by 0.64%. I can't explain why the results for 64 bit seem to show that OpenCover improves performance by 13.6%.

I tried to come up with a number of reasons for the above but the results I keep getting are reasonably consistent, so I decided to post them anyway and perhaps someone else will be able to tell me what is happening.

Saturday, July 9, 2011

Questions about open source and liability in the workplace

Last weekend I attended DDDSydney and one of the most interesting sessions was a panel session about Microsoft and Opensource (Open Source & Microsoft Ecosystem); though as these things go, it went quickly off(ish) topic as expected by the panelists whom I'll refer to as the crazy drupal girl and the 3 stooges (honestly no offence folks, it was highly entertaining).

However it got me thinking about the number of projects where I come have across an unusual bit of open source software that has some use (but has not found a niche or has since been surpassed) and I find that this was introduced by a developer as it was their pet open source project. Now the first question is "what is the liability under this scenario?"

Did the developer ask first as they should before using any open source software on a project? If so then the company accepted the situation but what happens if they did not (or what not made aware) are they still liable or is the developer liable? I assume it would be the company as they should be having some sort of oversight but for small overworked teams where process may not be as strong this may get overlooked.

The other issue is what happens if you introduce your pet open source software project and then you leave, who supports it? How do you separate the open source project needs and the day-job, when they are so intermingled? Does the remaining team support it, do they have the skills? What happens if the parting was acrimonious in nature then they, the team, raised a legitimate issue would you fix it, or leave them to stew?

I don't have answers to the above, I did title this "Questions about...", that can be applied universally the answer to most I suppose is "it depends". Each situation will be different I suspect but I think these type of questions should be asked by any company hoping to use open source software and developers wishing to introduce it, whether that are contributors or not.

Personally I have decided to NOT introduce the open source software I develop into my workplace, yes they could use it and find it useful but they can also afford commercial alternatives. If someone else suggested it, I'd have to make sure there was an agreement should an issue arise that affects them, that if they want it fixed quick then I may have to use 'work' time i.e. no guarantees that it would be done that evening or even that week; after all it is supposed to be fun and not stressful.

Saturday, June 25, 2011

How do we get Users out of [open source] Welfare?

Okay an odd title but something I've been thinking about for some time and I suppose is the source of much frustration I have been having whilst maintaining PartCover; I am hoping to reverse the situation with OpenCover.

Categorizing open source users

First I'd like to explain that I like to roughly categorize people involved in open source like thus:

Contributors - these are the guys and gals at the pit-face, developing software, writing documentation and generally striving to make an open source product better.

Investors - these individuals use open source software and help to make the product better via feedback and raising, and following up, issues (probably as it is in their interest to do so).

Benefactors - usually companies that give tools to open source developers or sponsor a project in other ways i.e. free licenses or free hosting e.g. NDepend, JetBrains and GitHub.

Angels - these people provide invaluable advice in just managing a open source project and may not be actively involved in the development itself but just keep you sane.

Community - Our main user base, users of open source but don't actively contribute back and hence why sometimes I refer to them as Welfare. Maybe in the case of the this group it is just a failure to engage, the product just works and they have no need to be involved outside of viewing forums and stackoverflow. But I feel that without the involvement of this group a lot of open source software, no matter how good, can fall by the wayside.

But how do we get them involved? Well first we have to find them, in my case with PartCover as the project had been abandoned the users stopped raising issues on the SourceForge forums and tended to ask questions on other outlets such as StackOverflow, SharpDevelop or Gallio forums and mailing lists.

Finding the users

I scoured the internet and compiled a list of popular places that PartCover was mentioned or supported. I was surprised to find that PartCover was used or supported by SharpDevelop, TeamCity and TypeMock amongst others (and yet again I am surprised it was abandoned and not adopted by anyone sooner).

StackOverflow seems to be the main place where people ask questions and to keep track of questions I have subscribed to an RSS feed for the partcover tag; and as soon as an opencover tag becomes available, or I get enough rep to create it, I'll subscribe to that.

Twitter is also quite a common medium nowadays so I have also set up the following search filter "opencover OR partcover -rt -via" to see if anyone mentions either of the projects.

Engaging the users

Now I have found the users, or the majority of, I started notifying these lists, forums and projects that PartCover was alive again (and I have started to do the same to inform them about OpenCover). Hopefully bringing them back or at least notifying them that if they have really big issues there is somewhere to go.

Involving the Community users

This is the big ask and I don't have an answer. If the product works then they don't need to talk to the forums or declare their appreciation of a job well done. I think sites like ohloh are trying to address the balance. Some OS projects have a donate button, but I am not sure we are doing open source for money, though some projects do eventually go commercial, anyone else can pick up the original code and develop it. Maybe the users don't know how to be involved, in the case of my OS projects they are quite specialised and the learning curve may be too much for some. But I don't think you have to just be involved in projects you use a lot.

Possible ways to get involved

If you are good at graphics why not offer to knock up some graphics for use on web-sites and in the application. [I am quite lucky that Danial Palme added support for PartCover and OpenCover to his Report Generator tool and has done a much better job than I would ever do.]

If you are good at installers, or even if you want to learn more about them, offer to manage them on behalf of the project.

If there is a project you like, support them on the forums like StackOverflow and help other users.

Perhaps update the wikis and forums, sometimes the users know how a product works or can be used better then the developers.

If your company uses a lot of open source, why not buy some licenses for useful software tools and donate them, geeks love shiny new toys; quite a few vendors such as NDepend will donate licenses to open source projects.

If you have an issue, try to help the developers as much as possible to resolve it by supplying as much information as you can and repeatable samples, remember the developers are international and doing this in their own time (as you probably know trying to repeat a scenario from scant information is very frustrating) and maintain contact whilst it is being resolved and let them know when it is.

Okay that's me done on the subject for now, suggestions anyone?

Saturday, June 18, 2011

OpenCover First Beta Release

Okay, the first post on a blog I created many, many months ago and still not got round to starting. Why the delay? Well, just been busy and not a lot to say; actually some would say I have too much to say it's just not publishable.

But now I am happy to announce that the first release of OpenCover is now available on GitHub https://github.com/sawilde/opencover/downloads.

"So what?" I hear you say, "we have NCover, dotCover and PartCover [and probably many others with the word cover in the name,] do we need another code coverage tool?" Well, I think the answer is "Yes!" but before I say why a brief history.

About a year ago I adopted PartCover when I found it lost and abandoned and only supporting .NET2; also PartCover has a large user base, SharpDevelop, Gallio, to name but two, and I felt it was a shame to just let it fall by the wayside. I had also done some work on CoverageEye (another OpenSource tool that was originally hosted on GotDotNet and has since vanished) whilst working for a client in the UK, so I felt I had a fighting chance to do the upgrade to .NET4; I don't know if my changes ever got uploaded to GotDotNet as I was not in charge of that.

The adoption was far from easy for a number of reasons, one of which was I was surprised just how little C++ I could actually remember and it's changed a bit since I last used it in anger. Also due to lack of communication with the original developers meant that I was on my own in working out a) how it worked and b) just what the issues are (a lot of the reported issues had long since been abandoned by the reporters).

At the beginning of the adoption I cloned the SourceForge repository to GitHub, git being the in-thing at the time, and after I was eventually admitted access to SourceForge I attempted to maintain both repositories. Due to the lack of permissions on SourceForge, no matter how many times I asked, I eventually abandoned SourceForge and kept all development to GitHub; I also updated the SourceForge repository with a lot of ReadMe posts to point to GitHub.

So upgrading PartCover progressed and thankfully bloggers such as David Broman had already covered the subject matter about upgrading .NET2 profilers to .NET4 and things to look out for. That, it would turn out, was the easy bit.

PartCover had 3 main issues (other than lack of .NET4 support)

1) Memory usage

2) 64 bit support

3) If the target crashed then you got no results.

I'll tackle each of these in turn:

1) Memory - PartCover builds a model of each assembly/method/instrumented point in memory; though I managed to cut down memory usage by moving some of the data gathering to the profiler host it wasn't enough - PartCover also added 10 IL instructions (23 bytes) for each sequence point identified + 4 bytes allocated memory for the counter.

2) 64 bit support - PartCover used a complex COM + Named Pipe RPC, which thankfully just worked but I couldn't work out how to upgrade it to 64 bit (a few other helpers have offered and then gone incommunicado, I can only assume the pain was too much).

3) Crashing == no results - this was due to the profiler being shutdown unexpectedly and the runtime not calling the ::Shutdown method and as such all that data not being streamed to the host process; thankfully people were quite happy to fix crashing code so not a major issue but still an annoyance.

All of this would take major rework of substantial portions of the code and the thought was unbearable. I took a few stabs at bits and pieces but got nowhere.

Thankfully I had received some good advice and though I tried to apply it to PartCover I realised the only way was to start again, taking what I had learned from the guys who wrote PartCover and some ideas I had come across from looking at other opensource tools such as CoverageEye and Mono.Cecil.

OpenCover was born.

This time I created a simple COM object supporting the interfaces and then made sure I could compile it in both 32 and 64 bit from day one.

I then decided to make the profiler as simple as possible, so it is maintainable and move as much of the model handling to the profiler host, thank heavens for Mono.Cecil. The only complex thing was deconstructing the IL and reassembling it after it had been instrumented. OpenCover only inserts 3 IL instruction (9/13 bytes depending on 32/64 bit) per instrumented point; it forces a call into the profiler assembly itself and this C++ code then records the 'hit'.

Finally I decided I had to get the data out of the profiler and into the host as soon as possible. I toyed with WCF and WWSAPI but this also meant I had no XP support, but at least I could test other ideas. However if my target/profiler crashed I would loose the last packet of data; not drastic but not ideal. Eventually I bit the bullet and switched to using shared memory.

The switch to shared memory has brought a number of benefits one of which is the ability to handle a number of processes under the same profiling session, both 64 and 32 bit and to aggregate the results as they all use the same shared memory. I have yet to work out how to set this up via configuration files but anyone wishing to experiment can do so via modifying the call to ProfilerManager::RunProcess in the OpenCover.Host::Program.

So this is where we are now, OpenCover has been released (beta obviously) and as of time of writing some people have actually downloaded it. I am now braced for the issues to come flooding/trickling in.

Feel free to download and comment, raise issues on GitHub, get involved; Daniel Palme, he of Report Generator fame, is hopefully going to upgrade his tool to include OpenCover.