Saturday, August 25, 2012

MongoDB, Mongoid, MapReduce and Embedded Documents.

I am using Mongoid to store some data as documents in a MongoDB database and then run some MapReduce queries against the data. Now I have no trouble with mapping data from normal documents and an embedded document but I could not extract data from an embedded collection of documents i.e.

class Foo
  include Mongoid::Document

  #fields
  field :custom_id, :type => String

  #relations
  embeds_many :bars

end
class Bar
  include Mongoid::Document

  #fields
  field :custom_field, :type => String

  #relations
  embedded_in :Foo

end

First it looks like that we need to run the map part of the MapReduce against the parent document and not the child i.e. Foo.map_reduce(...) will work find documents but Bar.map_reduce(...) does not, however that is not surprising as it is also not possible to count all Bar documents by doing Bar.all.count in the rails console.

Now a MapReduce query in MongoDB is done as a pair of JavaScript scripts, the first does the map by emitting a mini-document of data and the second that aggregates the data in some manner. So thinking I had a collection (array) my first attempt to map data from the embedded document was this:

MAP:
function() {
  if (this.bars == null) return;
  for (var bar in this.bars){
    emit(bar.custom_field, { count: 1 });
  }
}

REDUCE:
function(key, values) {
  var total = 0;
  for ( var i=0; i< values.length; i++ ) {
    total += values[i].count;
  }
  return { count: total };
}

This produced an unusual result such that there was only a single aggregated document with a null key and the count was the total number of child documents (summed across all the parents).

Now I could have just broken the child document out and not embedded it but I didn't want to break the model over something so trivial that must, in my eyes, be possible.

After much googling and reading of forum posts, I couldn't find any samples. I eventually observed of some 'unusual' syntax on an unrelated topic which led me to rewrite the map script into this:

function() {
  if (this.bars== null) return;
  for (var bar in this.bars){
    emit(this.bars[bar].custom_field, { count: 1 });
  }
}

Which produced the expected results. Okay this was probably obvious to anyone who knows MongoDB+MapReduce well but it took me a while to find out and it still isn't that intuitive, though I think I now know why it is this way, so I thought I'd write it up as a bit of a reference.

Friday, June 8, 2012

The "Pigs" and "Chickens" fable

I think anyone who is anyone who has heard of Agile and Scrum have heard of the Pigs and Chickens story and how it describes those who are committed to the delivery of the project, "Pigs", and those who are just involved, "Chickens." If not click on the image and learn more about it.

Implementing Scrum - Pigs and Chickens

However I was just recently re-reading "Death March" by Edward Yourdon (1st Edition) and I came across this response to the parable, in the context of commitment whilst on a death march.
“I’m not sure you will find any old pigs in development perhaps more chickens. I think that kind of commitment continues until (inevitably?) you get into the first death march project – then there is a rude awakening. Either the pig realises what’s happening, this is the slaughterhouse! RUN!! Or the pig is making bacon…” - Paul Mason (Death March).
I just found it quite amusing and thought I should share...




Friday, February 3, 2012

Mutation Testing; a use for re-JIT?

Where to start...
Mutation testing is described as modifying a program in small amounts and then executing the original 'passing' tests that exercise that code and then watching them fail. It is a way of making sure your tests are actually testing what you believe they are testing.

Setting the stage...
So how can we do this with .NET? Well first we need to know what tests execute what code and we can use OpenCover for that when it is using it's tracking by test feature. With that feature we can see which tests execute which sequence points and also see what branches were exercised, it is this later information we can take advantage of when creating a mutation testing utility.

New toys to play with...
Now this mutation tester is going to be working at the IL level and as such we could use the JIT (Just-in-time) compilation feature that is used with OpenCover (and PartCover). However that would mean a complicated instrumentation that we would then have to control which path we would want to exercise, or we could have simpler instrumentation but that would require the process under test (e.g. nunit, mstest, ...) to be stop and started each time to allow new code to be exercised. With .NET 4.5 (in preview at the time of writing) there is a re-JIT compilation feature that we could use instead and this would allow us to use simple instrumentation without needing to stop and start the process under test. There are a number of limitations of re-JIT but after reviewing them (several times) I don't think any are actually show stoppers.

However to make the Re-JIT useful we need a way of executing a test or tests repeatedly without having to restart the application under test and this isn't possible with nunit and mstest. However it should be possible to use the test runners from AutoTest.Net if we host them directly or in a separate process that can be communicated with.

A plan...
So the flow will be something like this (I wonder how will this will stand up to the test of time) I haven't looked at the latest profiler API in-depth but documentation on MSDN) and David Broman's Blog seem to indicate this should be possible.

  • Run OpenCover to produce an XML file with a list of what tests exercised what branches
  • For each branch point =>
    • if first branch of method then store the original IL (as we will need repeated access to this IL)
    • (re)instrument the method that contains that branch point and using the original IL of that method invert the logic of only that point
    • execute each test that exercises that branch point => record pass, fail
    • if last branch of method then revert method to original IL
All it needs is a name...
All of this will be hosted on GitHub under OpenMutate. Let the games begin....

Saturday, January 21, 2012

Unusual coverage in VB.NET

Recently a user posted on StackOverflow on why he was seeing unusual coverage results in VB.NET with MSTEST and Visual Studio. The the question already had answers that helped the questioner but I decided to delve a little deeper and find out why the solution proposed worked.

The issue was that in his code sample the End Try was not being shown as covered even though he had exercised the Try and the Catch parts of his code.

First I broke his sample down into something simpler and I have highlighted the offending line.

 07  Function Method() As String  
 08    Try  
 09      Return ""  
 10    Catch ex As Exception  
 11      Return ""  
 12    End Try
 13  End Function  

In debug we can extract the following sequence points (I am, obviously, using OpenCover for this.)

<SequencePoints>  
  <SequencePoint offset="0" ordinal="0" uspid="261" vc="0" ec="32" el="7" sc="5" sl="7"/>  
  <SequencePoint offset="1" ordinal="1" uspid="262" vc="0" ec="12" el="8" sc="9" sl="8"/>  
  <SequencePoint offset="2" ordinal="2" uspid="263" vc="0" ec="22" el="9" sc="13" sl="9"/>  
  <SequencePoint offset="19" ordinal="3" uspid="264" vc="0" ec="30" el="10" sc="9" sl="10"/>  
  <SequencePoint offset="20" ordinal="4" uspid="265" vc="0" ec="22" el="11" sc="13" sl="11"/>  
  <SequencePoint offset="40" ordinal="5" uspid="266" vc="0" ec="16" el="12" sc="9" sl="12"/>  
  <SequencePoint offset="41" ordinal="6" uspid="267" vc="0" ec="17" el="13" sc="5" sl="13"/>  
</SequencePoints>  
(where sl = start line, el = end line, sc = start column, ec = end column and offset = IL offset in decimal)

However these only make sense when you look at the IL...

.method public static
    string Method () cil managed
{
    // Method begins at RVA 0x272c
    // Code size 43 (0x2b)
    .maxstack 2
    .locals init (
        [0] string Method,
        [1] class [mscorlib]System.Exception ex
    )

    IL_0000: nop
    IL_0001: nop
    .try
    {
        IL_0002: ldstr ""
        IL_0007: stloc.0
        IL_0008: leave.s IL_0029

        IL_000a: leave.s IL_0028
    } // end .try
    catch [mscorlib]System.Exception
    {
        IL_000c: dup
        IL_000d: call void [Microsoft.VisualBasic]Microsoft.VisualBasic.CompilerServices.ProjectData::SetProjectError(class [mscorlib]System.Exception)
        IL_0012: stloc.1
        IL_0013: nop
        IL_0014: ldstr ""
        IL_0019: stloc.0
        IL_001a: call void [Microsoft.VisualBasic]Microsoft.VisualBasic.CompilerServices.ProjectData::ClearProjectError()
        IL_001f: leave.s IL_0029

        IL_0021: call void [Microsoft.VisualBasic]Microsoft.VisualBasic.CompilerServices.ProjectData::ClearProjectError()
        IL_0026: leave.s IL_0028
    } // end handler

    IL_0028: nop

    IL_0029: ldloc.0
    IL_002a: ret
} // end of method Module1::Method

Now as you can see the End Try line that is causing concern would only be marked as hit (assuming they are using similar instrumentation to OpenCover) if the code reached IL instruction at offset 40 (IL_0028) however when one looks at the IL produced it is not possible to see how you would ever reach that instruction due to the odd IL produced (leave.s is a small jump like instruction that is used to exit try/catch/finally blocks) and if you follow the code you see that you will always reach a leave.s that jumps to IL_0029 first.

In release the IL changes to something more like what I was expecting beforehand and it has no unusual extra IL...

.method public static
    string Method () cil managed
{
    // Method begins at RVA 0x2274
    // Code size 30 (0x1e)
    .maxstack 2
    .locals init (
        [0] string Method,
        [1] class [mscorlib]System.Exception ex
    )

    .try
    {
        IL_0000: ldstr ""
        IL_0005: stloc.0
        IL_0006: leave.s IL_001c
    } // end .try
    catch [mscorlib]System.Exception
    {
        IL_0008: dup
        IL_0009: call void [Microsoft.VisualBasic]Microsoft.VisualBasic.CompilerServices.ProjectData::SetProjectError(class [mscorlib]System.Exception)
        IL_000e: stloc.1
        IL_000f: ldstr ""
        IL_0014: stloc.0
        IL_0015: call void [Microsoft.VisualBasic]Microsoft.VisualBasic.CompilerServices.ProjectData::ClearProjectError()
        IL_001a: leave.s IL_001c
    } // end handler

    IL_001c: ldloc.0
    IL_001d: ret
} // end of method Module1::Method

but so do the sequence points...

<SequencePoints>
  <SequencePoint offset="0" ordinal="0" uspid="33" vc="0" ec="22" el="9" sc="13" sl="9"/>
  <SequencePoint offset="15" ordinal="1" uspid="34" vc="0" ec="22" el="11" sc="13" sl="11"/>
  <SequencePoint offset="28" ordinal="2" uspid="35" vc="0" ec="17" el="13" sc="5" sl="13"/>
</SequencePoints>

So now one will never see your try/catch lines marked covered, so this is not helpful.

So lets try changing your code as suggested and go back to debug (because that is where you will be running coverage from usually.)

15   Function Method2() As String
16        Dim x As String
17        Try
18            x = ""
19        Catch ex As Exception
20            x = ""
21        End Try
22        Return x
23    End Function

Again we look at the sequence points...

<SequencePoints>
  <SequencePoint offset="0" ordinal="0" uspid="268" vc="0" ec="33" el="15" sc="5" sl="15"/>
  <SequencePoint offset="1" ordinal="1" uspid="269" vc="0" ec="12" el="17" sc="9" sl="17"/>
  <SequencePoint offset="2" ordinal="2" uspid="270" vc="0" ec="19" el="18" sc="13" sl="18"/>
  <SequencePoint offset="17" ordinal="3" uspid="271" vc="0" ec="30" el="19" sc="9" sl="19"/>
  <SequencePoint offset="18" ordinal="4" uspid="272" vc="0" ec="19" el="20" sc="13" sl="20"/>
  <SequencePoint offset="31" ordinal="5" uspid="273" vc="0" ec="16" el="21" sc="9" sl="21"/>
  <SequencePoint offset="32" ordinal="6" uspid="274" vc="0" ec="17" el="22" sc="9" sl="22"/>
  <SequencePoint offset="36" ordinal="7" uspid="275" vc="0" ec="17" el="23" sc="5" sl="23"/>
</SequencePoints>

and the IL...

.method public static
    string Method2 () cil managed
{
    // Method begins at RVA 0x282c
    // Code size 38 (0x26)
    .maxstack 2
    .locals init (
        [0] string Method2,
        [1] string x,
        [2] class [mscorlib]System.Exception ex
    )

    IL_0000: nop
    IL_0001: nop
    .try
    {
        IL_0002: ldstr ""
        IL_0007: stloc.1
        IL_0008: leave.s IL_001f
    } // end .try
    catch [mscorlib]System.Exception
    {
        IL_000a: dup
        IL_000b: call void [Microsoft.VisualBasic]Microsoft.VisualBasic.CompilerServices.ProjectData::SetProjectError(class [mscorlib]System.Exception)
        IL_0010: stloc.2
        IL_0011: nop
        IL_0012: ldstr ""
        IL_0017: stloc.1
        IL_0018: call void [Microsoft.VisualBasic]Microsoft.VisualBasic.CompilerServices.ProjectData::ClearProjectError()
        IL_001d: leave.s IL_001f
    } // end handler

    IL_001f: nop
    IL_0020: ldloc.1
    IL_0021: stloc.0
    IL_0022: br.s IL_0024

    IL_0024: ldloc.0
    IL_0025: ret
} // end of method Module1::Method2

So for the End Try to be covered we need line 21 to be hit and that is offset 31 (IL_001F) and as it can be seen both leave.s instructions jump to that point so now that line will be marked as covered.