Dapper vs Rolling-Your-Own

A follow up to DataTable To List

After my last post, I got a lot of good advice on Reddit, mostly on better ways of handling my SQL data. The main point people were bringing up was using ORMs. Now, I’ve heard of the major ORMs, mostly the Entity framework I guess, but I’ve never met or spoken to anyone who’s particularly excited about. I’ve heard a lot of “Good idea, bad execution”. Now I’m not sure if that’s accurate or fair, but that’s what I’ve heard.

Most of the advice I was receiving was to use something called Dapper. After a little research I realized that Dapper was written (or at least co-written) by the same Marc Gravell that wrote FastMember (the project that sped up my reflection code). Quite a coincidence I’d say, I guess this guy knows what he’s doing in C#.

Anyway, I updated my benchmark code to compare what I was doing to using Dapper. Since Dapper actually handles the execution of the command as well as returning it to a type, I had to change my tests a little bit, so it looks like everything’s slower compared to last time, but in this test I’m timing the SQL query and converting to class.

The query I’m using in Dapper is essentially this

TestTable testData = connection.Query<TestTable>(new CommandDefinition("SELECT TOP 5000 * FROM TestTable));

If you take a look on the Github page for this project you’ll see a few other examples of how this works, but this is the simplest case.

I tested this against reading the command into a SqlDataReader and parsing (something a lot of people were recommending I do instead of converting to a DataTable).

Here’s the results –

Dapper runtimes

Now, I don’t know exactly how accurate this is, my benchmarking is pretty sloppy overall (They have more benchmarking on the project site, so feel free to dig a little deeper), but Dapper was faster, or at least essentially the same speed. Add to that the fact that it’s way easier to use than writing your own custom SqlDataReader parsing method for each stored procedure in your project and Dapper gets the win by a mile.

All I know if that I’m going to be giving a real shot on Monday when I get back to work and probably recommending it to the rest of the guys in my department.

Advertisements

DataTable to List

My new job has gotten me involved in a lot more SQL than I’d ever had to in the past. Now this can be interesting, but there’s a whole ton of boilerplate code that goes along with writing SQL, most of which involves converting DataTables to List.

My general opinion on DataTables is that you should immediately convert them into some equivalent C# object representing the properties you’re returning from SQL. I know this isn’t how everyone feels, especially if you’re then taking this data and converting it into some other model-type data, but for the purposes of this post, we’re going to be doing it, stick with me!

I’ve got three different ways of converting the DataTable to List and I’ve benchmarked their relative speeds for your reading pleasure. Here’s the initial setup so you can replicate it.

Creating the Table

CREATE TABLE [TestTable]
(
    [ID] [int] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,
    [Bit1] [bit] NOT NULL DEFAULT(1),
    [Bit2] [bit] NOT NULL DEFAULT(1),
    [Bit3] [bit] NOT NULL DEFAULT(1),
    [Bit4] [bit] NOT NULL DEFAULT(1),
    [Bit5] [bit] NOT NULL DEFAULT(1),
    [Bit6] [bit] NULL,
    [Bit7] [bit] NULL,
    [Bit8] [bit] NULL,
    [Bit9] [bit] NULL,
    [Bit10] [bit] NULL,
    [Float1] [float] NOT NULL DEFAULT(0.0),
    [Float2] [float] NOT NULL DEFAULT(0.0),
    [Float3] [float] NOT NULL DEFAULT(0.0),
    [Float4] [float] NOT NULL DEFAULT(0.0),
    [Float5] [float] NOT NULL DEFAULT(0.0),
    [Float6] [float] NULL,
    [Float7] [float] NULL,
    [Float8] [float] NULL,
    [Float9] [float] NULL,
    [Float10] [float] NULL,
    [Int1] [int] NOT NULL DEFAULT(0),
    [Int2] [int] NOT NULL DEFAULT(0),
    [Int3] [int] NOT NULL DEFAULT(0),
    [Int4] [int] NOT NULL DEFAULT(0),
    [Int5] [int] NOT NULL DEFAULT(0),
    [Int6] [int] NULL,
    [Int7] [int] NULL,
    [Int8] [int] NULL,
    [Int9] [int] NULL,
    [Int10] [int] NULL,
    [VarChar1] [varchar](100) NOT NULL DEFAULT('TEST'),
    [VarChar2] [varchar](100) NOT NULL DEFAULT('TEST'),
    [VarChar3] [varchar](100) NOT NULL DEFAULT('TEST'),
    [VarChar4] [varchar](100) NOT NULL DEFAULT('TEST'),
    [VarChar5] [varchar](100) NOT NULL DEFAULT('TEST'),
    [VarChar6] [varchar](100) NULL,
    [VarChar7] [varchar](100) NULL,
    [VarChar8] [varchar](100) NULL,
    [VarChar9] [varchar](100) NULL,
    [VarChar10] [varchar](100) NULL,
    PRIMARY KEY CLUSTERED 
    (
        [ID] ASC
    )
WITH 
    (
        PAD_INDEX = OFF, 
        STATISTICS_NORECOMPUTE = OFF, 
        IGNORE_DUP_KEY = OFF, 
        ALLOW_ROW_LOCKS = ON, 
        ALLOW_PAGE_LOCKS = ON
    ) ON [PRIMARY]
) 
ON [PRIMARY]

I’ve also got the test data here, so feel free to grab it. For this test, I had 5000 rows of data (the test file is 1000 records). For the record, this is all MSSQL, and probably won’t work with SQLite without a little tweaking.

To go along with this, I’ve also got a C# class with properties for each of the Columns in the SQL table, here’s that too

public class TestTable
{
    public int ID { get; set; }
    public bool Bit1 { get; set; }
    public bool Bit2 { get; set; }
    public bool Bit3 { get; set; }
    public bool Bit4 { get; set; }
    public bool Bit5 { get; set; }
    public bool? Bit6 { get; set; }
    public bool? Bit7 { get; set; }
    public bool? Bit8 { get; set; }
    public bool? Bit9 { get; set; }
    public bool? Bit10 { get; set; }
    public double Float1 { get; set; }
    public double Float2 { get; set; }
    public double Float3 { get; set; }
    public double Float4 { get; set; }
    public double Float5 { get; set; }
    public double? Float6 { get; set; }
    public double? Float7 { get; set; }
    public double? Float8 { get; set; }
    public double? Float9 { get; set; }
    public double? Float10 { get; set; }
    public int Int1 { get; set; }
    public int Int2 { get; set; }
    public int Int3 { get; set; }
    public int Int4 { get; set; }
    public int Int5 { get; set; }
    public int? Int6 { get; set; }
    public int? Int7 { get; set; }
    public int? Int8 { get; set; }
    public int? Int9 { get; set; }
    public int? Int10 { get; set; }
    public string VarChar1 { get; set; }
    public string VarChar2 { get; set; }
    public string VarChar3 { get; set; }
    public string VarChar4 { get; set; }
    public string VarChar5 { get; set; }
    public string VarChar6 { get; set; }
    public string VarChar7 { get; set; }
    public string VarChar8 { get; set; }
    public string VarChar9 { get; set; }
    public string VarChar10 { get; set; }
}

1 – ForEach

This was my first crack at it, without putting too much thought into it, this was what I thought would be the most efficient way of doing it. Turns out it’s pretty good.

public static List<TestTable> ToListForEach(DataTable dt)
{
    var data = new List<TestTable>();

    foreach (DataRow row in dt.Rows)
    {
        data.Add(new TestTable()
        {
            ID = Convert.ToInt32(row["ID"]),
            Bit1 = Convert.ToBoolean(row["Bit1"]),
            Bit2 = Convert.ToBoolean(row["Bit2"]),
            Bit3 = Convert.ToBoolean(row["Bit3"]),
            Bit4 = Convert.ToBoolean(row["Bit4"]),
            Bit5 = Convert.ToBoolean(row["Bit5"]),
            Bit6 = Convert.ToBoolean(row["Bit6"]),
            Bit7 = Convert.ToBoolean(row["Bit7"]),
            Bit8 = Convert.ToBoolean(row["Bit8"]),
            Bit9 = Convert.ToBoolean(row["Bit9"]),
            Bit10 = Convert.ToBoolean(row["Bit10"]),
            Float1 = Convert.ToDouble(row["Float1"]),
            Float2 = Convert.ToDouble(row["Float2"]),
            Float3 = Convert.ToDouble(row["Float3"]),
            Float4 = Convert.ToDouble(row["Float4"]),
            Float5 = Convert.ToDouble(row["Float5"]),
            Float6 = Convert.ToDouble(row["Float6"]),
            Float7 = Convert.ToDouble(row["Float7"]),
            Float8 = Convert.ToDouble(row["Float8"]),
            Float9 = Convert.ToDouble(row["Float9"]),
            Float10 = Convert.ToDouble(row["Float10"]),
            Int1 = Convert.ToInt32(row["Int1"]),
            Int2 = Convert.ToInt32(row["Int2"]),
            Int3 = Convert.ToInt32(row["Int3"]),
            Int4 = Convert.ToInt32(row["Int4"]),
            Int5 = Convert.ToInt32(row["Int5"]),
            Int6 = Convert.ToInt32(row["Int6"]),
            Int7 = Convert.ToInt32(row["Int7"]),
            Int8 = Convert.ToInt32(row["Int8"]),
            Int9 = Convert.ToInt32(row["Int9"]),
            Int10 = Convert.ToInt32(row["Int10"]),
            VarChar1 = row["VarChar1"].ToString(),
            VarChar2 = row["VarChar2"].ToString(),
            VarChar3 = row["VarChar3"].ToString(),
            VarChar4 = row["VarChar4"].ToString(),
            VarChar5 = row["VarChar5"].ToString(),
            VarChar6 = row["VarChar6"].ToString(),
            VarChar7 = row["VarChar7"].ToString(),
            VarChar8 = row["VarChar8"].ToString(),
            VarChar9 = row["VarChar9"].ToString(),
            VarChar10 = row["VarChar10"].ToString(),
        });
    }

    return data;
}

OK, so this is pretty good and fast. The downside is when you have actual nullable fields that can actually contain null data. In this test, all the data has been faked out, so we don’t have to deal with that, but when you do, this can slow it down considerably.

When you go from a non-null field like

Int6 = Convert.ToInt32(row["Int6"])

and make it nullable, suddenly you have to start writing your code like this –

Int6 = row.IsNull("Int6") ? new int?() : new int?(Convert.ToInt32(row["Int6"])),

or (as emn13 on reddit pointed out to me, a simpler conversion would be)

Int6 = row["Int6"] as int?

Obviously the more you have of that, the worse it gets. This leads us to

2 – LINQ

One of the guys at my work showed me this way. I’m pretty comfortable with LINQ, but I didn’t know you could get an enumerable for a DataTable, and I knew nothing about the Field structure. Here’s what it looks like

public static List<TestTable> ToListLinq(DataTable dt)
{
    return dt.AsEnumerable().Select(item => new TestTable()
    {
        ID = item.Field<int>(nameof(TestTable.ID)),
        Bit1 = item.Field<bool>(nameof(TestTable.Bit1)),
        Bit2 = item.Field<bool>(nameof(TestTable.Bit2)),
        Bit3 = item.Field<bool>(nameof(TestTable.Bit3)),
        Bit4 = item.Field<bool>(nameof(TestTable.Bit4)),
        Bit5 = item.Field<bool>(nameof(TestTable.Bit5)),
        Bit6 = item.Field<bool?>(nameof(TestTable.Bit6)),
        Bit7 = item.Field<bool?>(nameof(TestTable.Bit7)),
        Bit8 = item.Field<bool?>(nameof(TestTable.Bit8)),
        Bit9 = item.Field<bool?>(nameof(TestTable.Bit9)),
        Bit10 = item.Field<bool?>(nameof(TestTable.Bit10)),
        Float1 = item.Field<double>(nameof(TestTable.Float1)),
        Float2 = item.Field<double>(nameof(TestTable.Float2)),
        Float3 = item.Field<double>(nameof(TestTable.Float3)),
        Float4 = item.Field<double>(nameof(TestTable.Float4)),
        Float5 = item.Field<double>(nameof(TestTable.Float5)),
        Float6 = item.Field<double?>(nameof(TestTable.Float6)),
        Float7 = item.Field<double?>(nameof(TestTable.Float7)),
        Float8 = item.Field<double?>(nameof(TestTable.Float8)),
        Float9 = item.Field<double?>(nameof(TestTable.Float9)),
        Float10 = item.Field<double?>(nameof(TestTable.Float10)),
        Int1 = item.Field<int>(nameof(TestTable.Int1)),
        Int2 = item.Field<int>(nameof(TestTable.Int2)),
        Int3 = item.Field<int>(nameof(TestTable.Int3)),
        Int4 = item.Field<int>(nameof(TestTable.Int4)),
        Int5 = item.Field<int>(nameof(TestTable.Int5)),
        Int6 = item.Field<int?>(nameof(TestTable.Int6)),
        Int7 = item.Field<int?>(nameof(TestTable.Int7)),
        Int8 = item.Field<int?>(nameof(TestTable.Int8)),
        Int9 = item.Field<int?>(nameof(TestTable.Int9)),
        Int10 = item.Field<int?>(nameof(TestTable.Int10)),
        VarChar1 = item.Field<string>(nameof(TestTable.VarChar1)),
        VarChar2 = item.Field<string>(nameof(TestTable.VarChar2)),
        VarChar3 = item.Field<string>(nameof(TestTable.VarChar3)),
        VarChar4 = item.Field<string>(nameof(TestTable.VarChar4)),
        VarChar5 = item.Field<string>(nameof(TestTable.VarChar5)),
        VarChar6 = item.Field<string>(nameof(TestTable.VarChar6)),
        VarChar7 = item.Field<string>(nameof(TestTable.VarChar7)),
        VarChar8 = item.Field<string>(nameof(TestTable.VarChar8)),
        VarChar9 = item.Field<string>(nameof(TestTable.VarChar9)),
        VarChar10 = item.Field<string>(nameof(TestTable.VarChar10)),
    }).ToList();
}

So, I’m going to give you a sneak peak at the end results and let you know that this is the fastest version. If you’re looking to turn a DataTable into a List manually, this is your guy. Super fast, and handles nullable fields with ease.

3 – Reflection

Now, I know everyone gets so hung up on speed with reflection, and that, after all, is the entire reason I’m doing this, but if you’re looking for a nice generic way of converting DataTables, I think you’ll like this.

The trick to speeding this up a lot mostly relies on someone else’s smarts. The snippet below uses a project called FastMember by Marc Gravell. Here he is describing how it came to be. Now, I didn’t spend much time looking into it, I just gave it a try and it worked great, so please feel free to read more on the Github site and his blog if you’re not feeling comfortable. Luckily for us, there’s a Nuget package available for FastMember, so just include that, and use the code below, you’ll be fine.

public static List<TestTable> ToListReflection(DataTable dt)
{
    return (List<TestTable>)dt.DataTableToList<TestTable>();
}

private static readonly IDictionary<Type, IEnumerable<PropertyInfo>> _Properties =
    new Dictionary<Type, IEnumerable<PropertyInfo>>();

public static IEnumerable<T> DataTableToList<T>(this DataTable table) where T : class, new()
{
    var objType = typeof(T);
    IEnumerable<PropertyInfo> properties;

    lock (_Properties)
    {
        if (!_Properties.TryGetValue(objType, out properties))
        {
            properties = objType.GetProperties().Where(property => property.CanWrite);
            _Properties.Add(objType, properties);
        }
    }

    var list = new List<T>(table.Rows.Count);

    Parallel.ForEach<DataRow>(table.AsEnumerable().Skip(1), row => {
        var obj = new T();

        foreach (var prop in properties)
        {
            if (prop != null)
            {
                Type t = Nullable.GetUnderlyingType(prop.PropertyType) ?? prop.PropertyType;

                object propertyValue = (row[prop.Name] == null) ? null : Convert.ChangeType(row[prop.Name], t);

                var accessors = TypeAccessor.Create(objType);
                accessors[obj, prop.Name] = propertyValue;
            }
        }

        list.Add(obj);
    });

    return list;
}

This runs pretty great on its own, but by using Parallel in the ForEach we’re able to at least half our time. If you’re running a 4 core or even an 8 core machine, it can be even better, but YMMV.

So, obviously this method is a little slower, but what you lose in speed, you make up for in flexibility. You’ll never have to add or remove code when you change your stored procedure. You’ll never have to write boiler plate code like in Methods 1 and 2 ever again. Just keep in mind that this works best for small result sets.

Results

OK, here’s the final results.

DataTable to Linq runtimes

It’s pretty much as we’d expect for methods 1 and 2. If you have more nullable fields, 1 will get slower and slower, almost to the speed of method 3, so if you’re looking to do it manually, go with 2. The one thing that amazes me is how close to LINQ speeds we can get with Reflection and FastMember, so give it a try.

If you’re going with 2, here’s a SQL query that will auto generate all the class code for you. It’s pretty sweet.

OK, that’s it. Generally from now on, I think I’ll be using a combination of 2 and 3, depending on the situation. Let me know in the comments if you found anything questionable.

Searching In SQL

My current job has me working in a massive old SQL server environment a lot of the time. I’m trying to update an old VB6 application that has hundreds of stored procedures doing random pieces of the work along the way.

Since this is my first time using SQL in any meaningful way, I’m not 100% sure if this is a normal setup, but in the server there are 100s of tables and also 100s of stored procedures, oh and just for fun, there are 100s of functions of each type. This server was created a long time ago and not a lot of thought was put into the naming of anything, which leads me to having a nightmare finding anything.

Here’s an example of a couple of stored procedures I have

usp_GetSomethingImportant

usp_Get_SomethingImportant

usp_Get_Something_Important

Every time the underscore is in a different position, all the sorting moves them around, so even though these 3 stored procedures are used together they’re sorted differently!

Never mind the fact that there are multiple different types of Functions in SQL (I’m not exactly sure why or what the differences are, but they exist).

So, why am I telling you all of this, is it just to complain? Well, partly (it is ridiculous), but mostly I’m here to talk about my favorite new SQL helper utility!

It’s called SQL Hunting Dog. I honestly can’t believe this isn’t a feature just built into SQL, but here we are. Microsoft seems to have major blind spots.

Hunting Dog lets you search a sql database for anything based on the name. I know, right, soooooooo simple, but there it is. Once you find the thing you’re looking for, it can either do it’s own Select or Modify query, or it can expand all the folders in the sql explorer section to jump you right to the thing you’re looking for. That’s my favorite feature, as then you have access to all the usual SQL magic (create to, modify etc).

Also, if you’re into it (you probably are if you’re reading this blog), the source code is available on BitBucket (the link is on the website), so if there’s something you’re not completely on board with, feel free to tinker away.

If you don’t already have this, you should get it.