Blog Archive

Home
RavenDB

For those of you who may not know, DataTables.Net is a fantastic jQuery plug-in that creates a data grid presentation and offers support for filtering and server-side paging solutions. Yep, define an endpoint web service, pump down your data and you are good to go. But the cool kids these days are into the No-SQL thing, and one of the great entries into the document based database arena is RavenDB. With Raven, you define your domain objects and just store them in the database, as Raven will do it’s magic and persist your objects as documents. Have a List<Customer> to store, give it to Raven and it will create one document of the customers and store it in JSON format.

This post will show you how to combine the front end goodness of DataTables with the back end magic of RavenDB. The goal is to provide:

  • The ability to define a single class for that indexes data.
  • Control how what data is selected by define the columns, sort order and paging size in Javascript. In other words, DataTables will tell the server what it wants to pull back
  • Provide support for filtering properties with a single search field, ala Google style.
  • Above all, save you time, make you a hero in front of your fans. :)

For the Impatient, Here’s the End Product

There’s a lot ground that we’ll cover but for those who want to see the light at the end of the tunnel here what the end solution will look like. You may want to download the solution first so you can follow along in the code. First, your web service or controller will have the following:


[HttpPost]
public JsonResult GetTenants(string jsonAOData)
{
   var tenantPager = new DataTablesPager<Tenant, Tenant_Search>(DocumentStore);
   var results = tenantPager.PrepAOData(jsonAOData)
                 .FilterFormattedList();

   return Json(results);
}

// The core method that is used to get the data is in DataTablesPager.cs
public List Filter(int pageSize, int pageIndex, string[] terms)
{
   var targetList = new List();
   RavenQueryStatistics stats;

   using(var session = docStore.OpenSession())
   {
      if (terms[0].Length > 0)
      {
         targetList = session.Query<DataTablesReduceResult, TIndexCreator>()
                               .Customize(x => x.WaitForNonStaleResults())
                               .SearchMultiple(x => x.QueryFields, string.Join(" ", terms), 
                                                options: SearchOptions.And)
                               .Statistics(out stats)
                               .Skip(pageIndex)
                               .Take(pageSize)
                               .As()
                               .ToList();

         this.totalDisplayResults = stats.TotalResults;

         session.Query<DataTablesReduceResult, TIndexCreator>()
                 .Statistics(out stats)
                 .As()
                 .ToList();
         this.totalResults = stats.TotalResults;
}
// Code reduced for reading purposes.

Take note of the Generics on the constructor – Tenant is your domain object, Tenant_Search is the class that Raven will use to create the index for retrieving the data, as well as defining what properties you can filter on the object. We’ll cover indexing shortly along with some RavenDB background.

Your Javascript will be the following:



var otable;

$(document).ready(function(){
   otable = $("#tenantTable").dataTable({
               "bProcessing": true,
               "bSort": true,
               "sPaginationType": "full_numbers",
               "aoColumnDefs": [
               { "sName": "Name", "aTargets": [0], "bSortable": true, "bSearchable": true },
               { "sName": "Agent", "aTargets": [1], "bSortable": true, "bSearchable": true },
               { "sName": "Center", "aTargets": [2], "bSortable": true, "bSearchable": true },
               { "sName": "StartDate", "aTargets": [3], "bSortable": true, "bSearchable": true },
               { "sName": "EndDate", "aTargets": [4], "bSortable": true, "bSearchable": true },
               { "sName": "DealAmount", "aTargets": [4], "bSortable": true, "bSearchable": true }
               ],
               "oLanguage": {
               "sSearch": "Search all columns:"
            },
               "aaSorting": [[1, "asc"]],
               "iDisplayLength": 7,
               "bServerSide": true,
               "sAjaxSource": "GetTenants",
               "fnServerData": function (sSource, aoData, fnCallback) {

                      var jsonAOData = JSON.stringify(aoData);

                     $.ajax({
                       //dataType: 'json',
                       contentType: "application/json; charset=utf-8",
                       type: "POST",
                       url: sSource,
                       data: "{jsonAOData : '" + jsonAOData + "'}",
                       success: function (msg) {
                         fnCallback(msg);
                       },
                       error: function (XMLHttpRequest, textStatus, errorThrown) {
                         alert(XMLHttpRequest.status);
                         alert(XMLHttpRequest.responseText);

                       }
                     });
         }
   });

   otable.fnSetFilteringDelay(1000);
});

It’s actually longer than the .Net stuff!!! We’ll cover what this means as well.

Getting Data and Providing Search with RavenDB

This post assumes that you have installed RavenDB, can connect to it, know how to store your objects, and that you can perform queries with LINQ. There are some good tutorials such as Rob Ashton’s video introduction to Raven, as well as a brief overview by Sensei (me). We’re going to focus on Raven’s inherent full-text search capability and rely on Raven’s built in paging mechanism to help us achieve our goals. While there is great capability that Raven provides, it is not SQL, and much of what you know about LINQ and LINQ to SQL will help you as well as paint you into a corner at the same. We’ll cover those aspects too.

First off, RavenDB is built on top of the search engine Lucene.Net. It is a schema-less database so up front we will need to identify what how we want to fetch data, as they indexes provide super fast data retrieval. Raven Indexes reduce the need to devote huge cpu cycles to processing a query, as the index is built from the documents and processed as a background operation. This operation is asynchronous and is performed by Lucene. Without this approach, any query force a complete scan of all documents. A miserably slow scan. With indexes define up front, Raven will work quitely to keep the indexes up to date when new documents are created. So why is this important? Well, what you think are doing in LINQ:


var search = "Bonus";
var steps = session.Query()
                    .Customize(x => x.WaitForNonStaleResults())
                    .Where(x => x.State.StartsWith(search) || x.WorkflowName.StartsWith(search))
                    .ToList();

 

is really translated to Lucene syntax. So what we get in the end is State:Bonus OR WorkflowName:Bonus. While it is simple to write a query that includes all the properties of an object, if you had an object with 15 properties would you really want to create a monster statement with a ton of ||’s? Hell no! If you look in the TestSuite project of the source code there is a few example of using pure LINQ queries. Check out the method “CanFilterAccrossAllStringProperties” and you will see where things were headed.

We want to be like Fonzie, and what was Fonzie? Correctomundo – he’s cool. A good solution would be to know what properties a domain object had, and perform a filter against those properties. In other words, it would really helpful if we could write some code that would look like this:


var propertyFilterSteps = session.Query()
                                 .Customize(x => x.WaitForNonStaleResults())
                                 .Where(AllStringPropertiesFilter(search,
                                      "Answer,AnsweredBy,
                                      Id,Participants,WorkflowName,WorkflowType,"))
                                 .ToList();

Here we are using a Expression<Func<T>> to pass in a delimited list of property names and with a little LINQ query against the class Step, we can generate a Lambda to process of filter. This is in the test method “CanFilterAccrossAllStringProperties”. It worked great until we needed to include DateTime properties. The code is included in the project, so you look at it there.

So how do we achieve the goal of have a single text box driven search that will query across all type of properties, and when you type “Spock 2010″ will query the properties that you specify for both values of “Spock” and “2010″? Let’s go back to Raven as you can specify an index query by mapping what properties you want to be included in the index and how you want Raven / Lucene to parse the text for matching values in that index. Raven provides a class called “AbstractIndexCreationTask” where you define a Map / Reduce function to create your index. In other words you are picking which properties are included in the index. You can define the output of the Map to anything that you wish. This is held in a class that we’ll name ReduceResult and we will query against the properties of that class. We want to tell Raven to take the significant properties and index them in a format that we can match our terms against. So we will create the following index that will allows us to filter for any term. This is found in Step_Search.cs in the Index folder


public class Step_Search : AbstractIndexCreationTask<Step, Step_Search.ReduceResult>
{
	public class ReduceResult
	{
		public string[] QueryFields { get; set; }

	// ... code eliminated for reading purpose
	}

	public Step_Search()
	{
		Map = steps =>
		from step in steps
		select new
		{
			QueryFields = new [] { step.State, step.Answer, step.AnsweredBy, step.WorkflowName,
			step.Created.ToShortDateString(), step.Created.Year.ToString(),
			step.Created.Month.ToString() + "/" + step.Created.Year.ToString()},
			DateCreated = step.Created,
			WorkflowName = step.WorkflowName,
			State = step.State
		};

	Indexes.Add(x => x.QueryFields, FieldIndexing.Analyzed);

// ... more code eliminated for reading purposes

So what we have done is create an index that has an array of strings. This array holds the text of our properties that we will match against. Raven has a method called Search that will perform a “StartsWith” style match against each object in the array. The call is .Search(x => x.QueryFields, “string to be searched”). If you take a look at the index we have done some additional things with dates: for one, we create a string representation in ShortDate format. So when the user knows the exact date they can enter it and the Pager will match it. But we want to make things as easy possible, so we have created string representations in mm/yyyy format so it’s easy for the users to filter if they only know the month and year of the item they are looking for. “I think it was April last year …”. This is a big for those users who don’t recall exact details, as it allows them to quickly discover what they are looking for.

One last thing before we move on to making this work with DataTables. Raven provides the search method that works with the IRavenQueryable collection. Take a look at the DataTablesPager.Filter method and you will see a SearchMultiple method. This is put in place to perform a search for multiple terms. In other words it will search for “Spock” and then chain a search for “2010″ against the IRavenQueryable. Phillip Haydon came up with this approach that works with partial matches, as it will give Lucene the right syntax. Otherwise you end up with weird results because you feed Lucene “spoc 201″ and because of the tokens Lucene creates with the text analyzer it will not pick up what you need. Phillip’s brilliant approach bridges this gap by using an extension method to perform the chaining of search terms. This is found in the class RavenExtensionMethods.cs, and it basically tokenizes the search string, creates an array and an individual call to the Search() method for member of the array. It allows us to perform advanced filtering such as partial matches like “spoc 201″. Try this out in the Tenant.aspx page of the WebDemo solution are you’ll see how it works.

Outta Breath Yet? Let’s Talk DataTables.Net!!!

Breathing hard yet? Good! There’s more to do – how does this work with DataTables.Net? DataTables uses the following parameters when processing server-side data:

Sent to the server:

Type Name Info
int iDisplayStart Display start point
int iDisplayLength Number of records to display
int iColumns Number of columns being displayed (useful for getting individual column search info)
string sSearch Global search field
boolean bEscapeRegex Global search is regex or not
boolean bSortable_(int) Indicator for if a column is flagged as sortable or not on the client-side
boolean bSearchable_(int) Indicator for if a column is flagged as searchable or not on the client-side
string sSearch_(int) Individual column filter
boolean bEscapeRegex_(int) Individual column filter is regex or not
int iSortingCols Number of columns to sort on
int iSortCol_(int) Column being sorted on (you will need to decode this number for your database)
string sSortDir_(int) Direction to be sorted – “desc” or “asc”. Note that the prefix for this variable is wrong in 1.5.x where iSortDir_(int) was used)
string sEcho Information for DataTables to use for rendering

Reply from the server

In reply to each request for information that DataTables makes to the server, it expects to get a well formed JSON object with the following parameters.

Type Name Info
int iTotalRecords Total records, before filtering (i.e. the total number of records in the database)
int iTotalDisplayRecords Total records, after filtering (i.e. the total number of records after filtering has been applied – not just the number of records being returned in this result set)
string sEcho An unaltered copy of sEcho sent from the client side. This parameter will change with each draw (it is basically a draw count) – so it is important that this is implemented. Note that it strongly recommended for security reasons that you ‘cast’ this parameter to an integer in order to prevent Cross Site Scripting (XSS) attacks.
string sColumns Optional – this is a string of column names, comma separated (used in combination with sName) which will allow DataTables to reorder data on the client-side if required for display
array array mixed aaData The data in a 2D array

DataTables will POST an AOData object. The class DataTablesPager.cs will handle parsing this object with the method PrepAOData. It’s responsible for determining what properties we are querying, how the data will be sorted, paging size, as well as including any terms for filtering. Because we have used generics, PrepAOData doesn’t care what object in your domain you are using as it is designed to read the properties and match those properties against the list of data items that DataTables has sent up to our application on the server. Again, our goal is to let DataTables dictate what to look for, and as long as we did our work when we created the index we should have great flexibility.

Let’s look at the Javascript again:

"aoColumnDefs": [
{ "sName": "Name", "aTargets": [0], "bSortable": true, "bSearchable": true },
{ "sName": "Agent", "aTargets": [1], "bSortable": true, "bSearchable": true },
{ "sName": "Center", "aTargets": [2], "bSortable": true, "bSearchable": true },
{ "sName": "StartDate", "aTargets": [3], "bSortable": true, "bSearchable": true },
{ "sName": "EndDate", "aTargets": [4], "bSortable": true, "bSearchable": true },
{ "sName": "DealAmount", "aTargets": [4], "bSortable": true, "bSearchable": true }
],

In the server side application we have Tenant_Search.cs that has created an index with the properties Name, Agent, Center, StartDate, EndDate and DealAmount. The Javascript above is DataTables way of saying “Hey, server, give me information back in the form of an array of value pairs and by they way, here are the data columns to use when you get me my stuff.” On the server side, we don’t care what the order of columns in the grid will be as the server assumes that DataTables will take care of that. And indeed, DataTables is supplying the request in the sName value pair. The server fetches it, spits it back to the browser and DataTables munches it. You can change the Javascript and leave your server application alone as long as you stick to using the fields you included in your index. Just like Fonzie, be cool.

But even cooler is the fact that Raven will handle paging for us: it has a built in limit of return up to 128 documents at a slice. Given that it’s retrieval speed is very fast this will work very well for us. If you look at the Raven console for each page that you retrieve you will see a very low time for the fetches. Remember, there is very little processing for the query as it is the index that has already performed the heavy lifting. For an example of this the page Tenants.aspx in WebDemo solution will page and filter 13,000 + documents. It is lightning fast.

Has Your Head Exploded Yet?

This is a lot to digest. Source code is here, along with the means to create 13,000 documents that you can use for testing. Please note that you will be required to pull down the Raven assemblies/packages via NuGet. Otherwise the download would be about 36 MB. Work for responding to sorting request has been started and hopefully you’ll want to see how that’s is solved in future post. But what we have accomplished is a very robust and easy way to display data from our document database, with low effort on the back end application.

Play with the code. The only way we make this better is to critique constructively, adapt to better ideas and grow! Some of the failed experiments have been included in the test so you can see how things have progressed. They marked as failures so you can focus on testing the DataTablesPager class. These failures are interesting though, and can provide insight to how the solution was arrived at. Also, the first time you fire up the web site that Global.ascx page will look for the test records and create them. This takes some time so if you want the wait those sections are marked for you so you comment them out and do what you need to. Enjoy.

Antonin Januska was kind enough to let Sensei guest post for him on his awesome blog.  There you’ll find great posts on CSS, UI design, Bootstrap Framework among many other goods related to Web development.  As you Sensei loves him some RavenDB and DataTables, and has created a tutorial on how to create paging solution with RavenDB that takes advantage of Lucene and DataTables inherent filtering capability.  Teaming up these two pieces of technology is like one of those rare events when Marvel & DC teamed up Spidey and Supes.

Kind of feeling like Stan Lee so lets end with Excelsior!!

This post focuses on getting started with RavenDB, so we’ll set aside our focus on workflows for a bit.  It’s included in the ApprovaFlow series as it is an important part of the workflow framework we’re building.  To follow along you might want to get the source code.

RavenDB is a document database that provides a flexible means for storing object graphs.  As you’ll see a document database presents you with a different set of challenges than you are normally presented when using a traditional relational database.

The storage “unit” in RavenDB is a schema-less JSON document.  This takes the form of:  Because you are working with documents you now have the flexibility to define documents differently; that is, you can support variations to your data without have to re-craft your data model each time you want to add a new property to a class.  You can adopt a “star” pattern for SQL as depicted here, but querying can become difficult.  Raven excels in this situation and one such sweet spot is:

Dynamic Entities, such as user-customizable entities, entities with a large number of optional fields, etc. – Raven’s schema free nature means that you don’t have to fight a relational model to implement it.

Installing and Running RavenDB

The compiled binaries are easy to install.  Download the latest build and extract the files to a share.  Note that in order to run the console you are required to install Silverlight.  To start the server, navigate to the folder[] and double click “Start.cmd”.  You will see a screen similar to this one once the server is up and running:

The console will launch it self and will resemble this:

How To Start Developing

In Visual Studio, reference Raven with Raven.Client.Lightweight.  For CRUD operations and querying this will be all that you will need.

First you will need to connect to the document store.  It is recommended that you do this once per application.  That is accomplished with


var documentStore = new DocumentStore {Url = "http://localhost:8080"};
documentStore.Initialize();

Procedures are carried out using the Unit of Work pattern, and in general you will be using these type of blocks:


using(var session = documentStore.OpenSession())
{
   //... Do some work
}

RavenDB will work with Plain Old C# Objects and only requires an Id property of type string.  An identity key is generated for Id during this session.  If were were to create multiple steps we would have identities created in succession.  A full discussion of the alternatives to the Id property is here.

Creating a document from your POCOs’ object graphs is very straight forward:


public class Person
{
    public string FirstName { get; set; }
	public string LastName { get; set; }
	public string Id { get; set; }
	public int DepartmentId { get; set; }
    // ...
}

var person = new Person();

using(var session = documentStore.OpenSession())
{
   session.Store(person);
   session.SaveChanges();
}

Fetching a document can be accomplished in two manners:  by Id or with a LINQ query.  Here’s how to get a document by id:


string person = "Person/1";  //  Raven will have auto-generated a value for us.
using(var session = documentStore.OpenSession())
{
   var fetchedPerson = session.Load<Person>(personId);
   //Do some more work
}

You’ll note that there is no casting or conversion required as Raven will determine the object type and populate the properties for you.

There are naturally cases where you want to query for documents based on attributes other than the Id. Best practices guides that we should create static indexes on our documents as these will offer the best performance. RavenDB also has a dynamic index feature that learns from queries fired at the server and over time these dynamic indexes are memorialized.

For your first bout with RavenDB you can simply query the documents with LINQ.   The test code takes advantage of the dynamic feature.  Later you will want to create indexes based on how you most likely will retrieve the documents.  This is different that a traditional RDMS solution, where the data is optimized for querying.  A document database is NOT.

Continuing with our example of Person documents we would use:


int departmentId = 139;

using(var session = documentStore.OpenSession())
{
   var people = session.Query<Person>()
                          .Where(x => x.DepartmentId == departmentId)
                          .ToList();
}

In the source code for this post there are more examples of querying.

Debugging, Troubleshooting and Dealing with Frustration

Given that this is something new and an open source project you may find yourself searching for help and more guidelines.  One thing to avail yourself of while troubleshooting is the fact that RavenDB has REST interface and you can validate your assumptions – or worse, confirm your errors – by using curl from the command line.  For example, to create a document via http you issue:

curl -X POST http://localhost:8080/docs -d "{ FirstName: 'Bob', LastName: 'Smith', Address: '5 Elm St' }"

Each action that takes place on the RavenDB server is displayed in a log on the server console app.  Sensei had to resort to this technique when troubleshooting some issues when he first started.  This StackOverflow question details the travails.

Another area that threw Sensei for a loop at first was the nature of the RavenDB writing and maintaining indexes.  In short, indexing is a background process, and Raven is designed to be “eventually consistent”.  That means that there can be a latency between when a change is submitted, saved, and indexed in the repository so that it can be fetched via queries.  When running tests from NUnit this code did not operate as expected, yet the console reported that the document was created:


session.Store(teamMember);

int posttestCount = session.Query<TeamMember>()
                .Count();

According to the documentation you can overcome this inconsistency by declaring that you are willing to wait until RavenDB has completed its current write operation.   This code will get you the expected results:


int posttestCount = session.Query<TeamMember>()
              .Customize(x => x.WaitForNonStaleResults())
              .Count();

Depending on the number of tests you write you may wish to run RavenDB in Embedded mode for faster results.  This might prove useful for automated testing and builds.  The source code provided in this post does NOT use embedded mode; rather, you have need your server running as this gives you the opportunity to inspect documents and acclimate yourself to the database.

There is much more that you can do with RavenDB, such as creating indexes across documents, assign security to individual documents, and much more.  This primer should be enough to get you started.  Next post we’ll see how RavenDB will fit into the ApprovaFlow framework.  Grab the source, play around and get ready for the next exciting episode.

 

It’s been a while, and as usual Sensei has started something with such bravado and discovered that life offers more bluster and pounding than even he can anticipate.  Hopefully you haven’t given up on the series, ’cause Sensei hasn’t.  Hell, ApprovaFlow is constantly on the forefront, even though it appears that he’s taken a good powder.

Let’s recap our goals and talk about philosophy and direction.  In this long silence a few additional considerations have taken precedence, and this is a good opportunity to assess goals and re-align the development efforts.  In the end ApprovaFlow will:

 Model a workflow in a clear format that is readable by both developer and business user. One set of verbiage for all parties.  Discussed in Simple Workflows With ApprovaFlow and Stateless.

  Allow the state of a workflow to be peristed as an integer, string, etc.  Quickly fetch state of a workflow.  Discussed in Simple Workflows With ApprovaFlow and Stateless.

 Create pre and post processing methods that can enforce enforce rules or carry out actions when completing a workflow task.  Discussed inApprovaFlow:  Using the Pipe and Filter Pattern to Build a Workflow Processor

• Introduce new functionality while isolating the impact of the new changes. New components should not break old ones

• Communicate to the client with a standard set of objects. In other words, your solution domain will not change how the user interface will gather data from the user.

• Use one. aspx page to processes user input for any type of workflow.

• Provide ability to roll your own customizations to the front end or backend of your application.

The astute members of the audience will no doubt say “What about the technical objectives, like how are you going to store all the workflow data?  In flat files?  Will you give me alternatives for storage?  How will I create the workflows, by using Notepad?” Indeed, Sensei has pondered these issues as well and has accumulated a fair amount failed experiments with some being quite interesting.   Given time these little experiments may become posts as well, since there are interesting things to learn from these failures.

What ApprovaFlow Will Need To Provide:  Workflow Storage

The biggest issue is storage.  The point of using Stateless was that we wanted flexibility.  Recall that the state of our state machine can be represented with a mere integer or string.  Makes it pretty easy to store this in a database, or a document.  While you could map the Step and Workflow class to tables in SQL our domain is using JSON so it makes sense to gravitate to a storage solution that will easily support that format.  ApprovaFlow will use RavenDB as the document database, but will provide the opportunity for you to use a different solution if you wish.  You’ll find that RavenDB quite readily provides a document storage format for our workflows that is quite elegant.

As an aside, Sensei experimented with a great alternative to the NoSQL solutions called Sis0DB.  This open project provides you that ability to store you object graphs in SQL Server.  Time permitting Sensei will share some of his adventures with you regarding this neat project.

What ApprovaFlow Will Need to Provide:  Authorization of Actions

While Sensei was off in the weeds learning about RavenDB he discovered that Ayende created a fantastic mechanism for authorizing user actions on documents.  This authorization of activity can be a granular as denying / allowing updates to occur based on an operation.

Since we want to adhere to principles of flexibility the Authorization features will be implemented as a plug-in, so if you wish to roll your own mechanisms to govern workflow approvals you will be free to do so.

What ApprovaFlow Will Need to Provide:  Admin Tools

Yep.  Sensei is sick of using Notepad to create JSON documents as well.  We want to be able to create the states, the triggers and the target states and save.  We’ll want to assign the filters to specific states and save.  No more text fiddling. Period.  As Sensei is thinking about this, it seems that another pipeline can be created for administration.  Luckily we have a plug-in architecture so this should be rather straight forward.

Summing It All Up

These are really important things to consider, and as much as Sensei hates changing goals in mid stream the capabilities discussed above can make life much easier while implementing a workflow system.  In making the decision to use RavenDB the thought that “a storage solution should not shape the solution domain” kept raising its ugly maw.  But, so what.   We want to finish something, and admittedly this has been a challenge – just look at the lag between posts if you need a reminder.  If Sensei decided to include an IOC container just to remain “loosely” coupled to document storage we’ll get no where.  Would you really want to read those posts?  How boring.  Besides, Sensei doesn’t know how to do all that stuf – gonna stick to the stuff he thinks he knows.  Or at least the stuff he can fake.


ActiveEngine Software by ActiveEngine, LLC.