Dapper vs Rolling-Your-Own

A follow up to DataTable To List

After my last post, I got a lot of good advice on Reddit, mostly on better ways of handling my SQL data. The main point people were bringing up was using ORMs. Now, I’ve heard of the major ORMs, mostly the Entity framework I guess, but I’ve never met or spoken to anyone who’s particularly excited about. I’ve heard a lot of “Good idea, bad execution”. Now I’m not sure if that’s accurate or fair, but that’s what I’ve heard.

Most of the advice I was receiving was to use something called Dapper. After a little research I realized that Dapper was written (or at least co-written) by the same Marc Gravell that wrote FastMember (the project that sped up my reflection code). Quite a coincidence I’d say, I guess this guy knows what he’s doing in C#.

Anyway, I updated my benchmark code to compare what I was doing to using Dapper. Since Dapper actually handles the execution of the command as well as returning it to a type, I had to change my tests a little bit, so it looks like everything’s slower compared to last time, but in this test I’m timing the SQL query and converting to class.

The query I’m using in Dapper is essentially this

TestTable testData = connection.Query<TestTable>(new CommandDefinition("SELECT TOP 5000 * FROM TestTable));

If you take a look on the Github page for this project you’ll see a few other examples of how this works, but this is the simplest case.

I tested this against reading the command into a SqlDataReader and parsing (something a lot of people were recommending I do instead of converting to a DataTable).

Here’s the results –

Dapper runtimes

Now, I don’t know exactly how accurate this is, my benchmarking is pretty sloppy overall (They have more benchmarking on the project site, so feel free to dig a little deeper), but Dapper was faster, or at least essentially the same speed. Add to that the fact that it’s way easier to use than writing your own custom SqlDataReader parsing method for each stored procedure in your project and Dapper gets the win by a mile.

All I know if that I’m going to be giving a real shot on Monday when I get back to work and probably recommending it to the rest of the guys in my department.

Advertisements

DataTable to List

My new job has gotten me involved in a lot more SQL than I’d ever had to in the past. Now this can be interesting, but there’s a whole ton of boilerplate code that goes along with writing SQL, most of which involves converting DataTables to List.

My general opinion on DataTables is that you should immediately convert them into some equivalent C# object representing the properties you’re returning from SQL. I know this isn’t how everyone feels, especially if you’re then taking this data and converting it into some other model-type data, but for the purposes of this post, we’re going to be doing it, stick with me!

I’ve got three different ways of converting the DataTable to List and I’ve benchmarked their relative speeds for your reading pleasure. Here’s the initial setup so you can replicate it.

Creating the Table

CREATE TABLE [TestTable]
(
    [ID] [int] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,
    [Bit1] [bit] NOT NULL DEFAULT(1),
    [Bit2] [bit] NOT NULL DEFAULT(1),
    [Bit3] [bit] NOT NULL DEFAULT(1),
    [Bit4] [bit] NOT NULL DEFAULT(1),
    [Bit5] [bit] NOT NULL DEFAULT(1),
    [Bit6] [bit] NULL,
    [Bit7] [bit] NULL,
    [Bit8] [bit] NULL,
    [Bit9] [bit] NULL,
    [Bit10] [bit] NULL,
    [Float1] [float] NOT NULL DEFAULT(0.0),
    [Float2] [float] NOT NULL DEFAULT(0.0),
    [Float3] [float] NOT NULL DEFAULT(0.0),
    [Float4] [float] NOT NULL DEFAULT(0.0),
    [Float5] [float] NOT NULL DEFAULT(0.0),
    [Float6] [float] NULL,
    [Float7] [float] NULL,
    [Float8] [float] NULL,
    [Float9] [float] NULL,
    [Float10] [float] NULL,
    [Int1] [int] NOT NULL DEFAULT(0),
    [Int2] [int] NOT NULL DEFAULT(0),
    [Int3] [int] NOT NULL DEFAULT(0),
    [Int4] [int] NOT NULL DEFAULT(0),
    [Int5] [int] NOT NULL DEFAULT(0),
    [Int6] [int] NULL,
    [Int7] [int] NULL,
    [Int8] [int] NULL,
    [Int9] [int] NULL,
    [Int10] [int] NULL,
    [VarChar1] [varchar](100) NOT NULL DEFAULT('TEST'),
    [VarChar2] [varchar](100) NOT NULL DEFAULT('TEST'),
    [VarChar3] [varchar](100) NOT NULL DEFAULT('TEST'),
    [VarChar4] [varchar](100) NOT NULL DEFAULT('TEST'),
    [VarChar5] [varchar](100) NOT NULL DEFAULT('TEST'),
    [VarChar6] [varchar](100) NULL,
    [VarChar7] [varchar](100) NULL,
    [VarChar8] [varchar](100) NULL,
    [VarChar9] [varchar](100) NULL,
    [VarChar10] [varchar](100) NULL,
    PRIMARY KEY CLUSTERED 
    (
        [ID] ASC
    )
WITH 
    (
        PAD_INDEX = OFF, 
        STATISTICS_NORECOMPUTE = OFF, 
        IGNORE_DUP_KEY = OFF, 
        ALLOW_ROW_LOCKS = ON, 
        ALLOW_PAGE_LOCKS = ON
    ) ON [PRIMARY]
) 
ON [PRIMARY]

I’ve also got the test data here, so feel free to grab it. For this test, I had 5000 rows of data (the test file is 1000 records). For the record, this is all MSSQL, and probably won’t work with SQLite without a little tweaking.

To go along with this, I’ve also got a C# class with properties for each of the Columns in the SQL table, here’s that too

public class TestTable
{
    public int ID { get; set; }
    public bool Bit1 { get; set; }
    public bool Bit2 { get; set; }
    public bool Bit3 { get; set; }
    public bool Bit4 { get; set; }
    public bool Bit5 { get; set; }
    public bool? Bit6 { get; set; }
    public bool? Bit7 { get; set; }
    public bool? Bit8 { get; set; }
    public bool? Bit9 { get; set; }
    public bool? Bit10 { get; set; }
    public double Float1 { get; set; }
    public double Float2 { get; set; }
    public double Float3 { get; set; }
    public double Float4 { get; set; }
    public double Float5 { get; set; }
    public double? Float6 { get; set; }
    public double? Float7 { get; set; }
    public double? Float8 { get; set; }
    public double? Float9 { get; set; }
    public double? Float10 { get; set; }
    public int Int1 { get; set; }
    public int Int2 { get; set; }
    public int Int3 { get; set; }
    public int Int4 { get; set; }
    public int Int5 { get; set; }
    public int? Int6 { get; set; }
    public int? Int7 { get; set; }
    public int? Int8 { get; set; }
    public int? Int9 { get; set; }
    public int? Int10 { get; set; }
    public string VarChar1 { get; set; }
    public string VarChar2 { get; set; }
    public string VarChar3 { get; set; }
    public string VarChar4 { get; set; }
    public string VarChar5 { get; set; }
    public string VarChar6 { get; set; }
    public string VarChar7 { get; set; }
    public string VarChar8 { get; set; }
    public string VarChar9 { get; set; }
    public string VarChar10 { get; set; }
}

1 – ForEach

This was my first crack at it, without putting too much thought into it, this was what I thought would be the most efficient way of doing it. Turns out it’s pretty good.

public static List<TestTable> ToListForEach(DataTable dt)
{
    var data = new List<TestTable>();

    foreach (DataRow row in dt.Rows)
    {
        data.Add(new TestTable()
        {
            ID = Convert.ToInt32(row["ID"]),
            Bit1 = Convert.ToBoolean(row["Bit1"]),
            Bit2 = Convert.ToBoolean(row["Bit2"]),
            Bit3 = Convert.ToBoolean(row["Bit3"]),
            Bit4 = Convert.ToBoolean(row["Bit4"]),
            Bit5 = Convert.ToBoolean(row["Bit5"]),
            Bit6 = Convert.ToBoolean(row["Bit6"]),
            Bit7 = Convert.ToBoolean(row["Bit7"]),
            Bit8 = Convert.ToBoolean(row["Bit8"]),
            Bit9 = Convert.ToBoolean(row["Bit9"]),
            Bit10 = Convert.ToBoolean(row["Bit10"]),
            Float1 = Convert.ToDouble(row["Float1"]),
            Float2 = Convert.ToDouble(row["Float2"]),
            Float3 = Convert.ToDouble(row["Float3"]),
            Float4 = Convert.ToDouble(row["Float4"]),
            Float5 = Convert.ToDouble(row["Float5"]),
            Float6 = Convert.ToDouble(row["Float6"]),
            Float7 = Convert.ToDouble(row["Float7"]),
            Float8 = Convert.ToDouble(row["Float8"]),
            Float9 = Convert.ToDouble(row["Float9"]),
            Float10 = Convert.ToDouble(row["Float10"]),
            Int1 = Convert.ToInt32(row["Int1"]),
            Int2 = Convert.ToInt32(row["Int2"]),
            Int3 = Convert.ToInt32(row["Int3"]),
            Int4 = Convert.ToInt32(row["Int4"]),
            Int5 = Convert.ToInt32(row["Int5"]),
            Int6 = Convert.ToInt32(row["Int6"]),
            Int7 = Convert.ToInt32(row["Int7"]),
            Int8 = Convert.ToInt32(row["Int8"]),
            Int9 = Convert.ToInt32(row["Int9"]),
            Int10 = Convert.ToInt32(row["Int10"]),
            VarChar1 = row["VarChar1"].ToString(),
            VarChar2 = row["VarChar2"].ToString(),
            VarChar3 = row["VarChar3"].ToString(),
            VarChar4 = row["VarChar4"].ToString(),
            VarChar5 = row["VarChar5"].ToString(),
            VarChar6 = row["VarChar6"].ToString(),
            VarChar7 = row["VarChar7"].ToString(),
            VarChar8 = row["VarChar8"].ToString(),
            VarChar9 = row["VarChar9"].ToString(),
            VarChar10 = row["VarChar10"].ToString(),
        });
    }

    return data;
}

OK, so this is pretty good and fast. The downside is when you have actual nullable fields that can actually contain null data. In this test, all the data has been faked out, so we don’t have to deal with that, but when you do, this can slow it down considerably.

When you go from a non-null field like

Int6 = Convert.ToInt32(row["Int6"])

and make it nullable, suddenly you have to start writing your code like this –

Int6 = row.IsNull("Int6") ? new int?() : new int?(Convert.ToInt32(row["Int6"])),

or (as emn13 on reddit pointed out to me, a simpler conversion would be)

Int6 = row["Int6"] as int?

Obviously the more you have of that, the worse it gets. This leads us to

2 – LINQ

One of the guys at my work showed me this way. I’m pretty comfortable with LINQ, but I didn’t know you could get an enumerable for a DataTable, and I knew nothing about the Field structure. Here’s what it looks like

public static List<TestTable> ToListLinq(DataTable dt)
{
    return dt.AsEnumerable().Select(item => new TestTable()
    {
        ID = item.Field<int>(nameof(TestTable.ID)),
        Bit1 = item.Field<bool>(nameof(TestTable.Bit1)),
        Bit2 = item.Field<bool>(nameof(TestTable.Bit2)),
        Bit3 = item.Field<bool>(nameof(TestTable.Bit3)),
        Bit4 = item.Field<bool>(nameof(TestTable.Bit4)),
        Bit5 = item.Field<bool>(nameof(TestTable.Bit5)),
        Bit6 = item.Field<bool?>(nameof(TestTable.Bit6)),
        Bit7 = item.Field<bool?>(nameof(TestTable.Bit7)),
        Bit8 = item.Field<bool?>(nameof(TestTable.Bit8)),
        Bit9 = item.Field<bool?>(nameof(TestTable.Bit9)),
        Bit10 = item.Field<bool?>(nameof(TestTable.Bit10)),
        Float1 = item.Field<double>(nameof(TestTable.Float1)),
        Float2 = item.Field<double>(nameof(TestTable.Float2)),
        Float3 = item.Field<double>(nameof(TestTable.Float3)),
        Float4 = item.Field<double>(nameof(TestTable.Float4)),
        Float5 = item.Field<double>(nameof(TestTable.Float5)),
        Float6 = item.Field<double?>(nameof(TestTable.Float6)),
        Float7 = item.Field<double?>(nameof(TestTable.Float7)),
        Float8 = item.Field<double?>(nameof(TestTable.Float8)),
        Float9 = item.Field<double?>(nameof(TestTable.Float9)),
        Float10 = item.Field<double?>(nameof(TestTable.Float10)),
        Int1 = item.Field<int>(nameof(TestTable.Int1)),
        Int2 = item.Field<int>(nameof(TestTable.Int2)),
        Int3 = item.Field<int>(nameof(TestTable.Int3)),
        Int4 = item.Field<int>(nameof(TestTable.Int4)),
        Int5 = item.Field<int>(nameof(TestTable.Int5)),
        Int6 = item.Field<int?>(nameof(TestTable.Int6)),
        Int7 = item.Field<int?>(nameof(TestTable.Int7)),
        Int8 = item.Field<int?>(nameof(TestTable.Int8)),
        Int9 = item.Field<int?>(nameof(TestTable.Int9)),
        Int10 = item.Field<int?>(nameof(TestTable.Int10)),
        VarChar1 = item.Field<string>(nameof(TestTable.VarChar1)),
        VarChar2 = item.Field<string>(nameof(TestTable.VarChar2)),
        VarChar3 = item.Field<string>(nameof(TestTable.VarChar3)),
        VarChar4 = item.Field<string>(nameof(TestTable.VarChar4)),
        VarChar5 = item.Field<string>(nameof(TestTable.VarChar5)),
        VarChar6 = item.Field<string>(nameof(TestTable.VarChar6)),
        VarChar7 = item.Field<string>(nameof(TestTable.VarChar7)),
        VarChar8 = item.Field<string>(nameof(TestTable.VarChar8)),
        VarChar9 = item.Field<string>(nameof(TestTable.VarChar9)),
        VarChar10 = item.Field<string>(nameof(TestTable.VarChar10)),
    }).ToList();
}

So, I’m going to give you a sneak peak at the end results and let you know that this is the fastest version. If you’re looking to turn a DataTable into a List manually, this is your guy. Super fast, and handles nullable fields with ease.

3 – Reflection

Now, I know everyone gets so hung up on speed with reflection, and that, after all, is the entire reason I’m doing this, but if you’re looking for a nice generic way of converting DataTables, I think you’ll like this.

The trick to speeding this up a lot mostly relies on someone else’s smarts. The snippet below uses a project called FastMember by Marc Gravell. Here he is describing how it came to be. Now, I didn’t spend much time looking into it, I just gave it a try and it worked great, so please feel free to read more on the Github site and his blog if you’re not feeling comfortable. Luckily for us, there’s a Nuget package available for FastMember, so just include that, and use the code below, you’ll be fine.

public static List<TestTable> ToListReflection(DataTable dt)
{
    return (List<TestTable>)dt.DataTableToList<TestTable>();
}

private static readonly IDictionary<Type, IEnumerable<PropertyInfo>> _Properties =
    new Dictionary<Type, IEnumerable<PropertyInfo>>();

public static IEnumerable<T> DataTableToList<T>(this DataTable table) where T : class, new()
{
    var objType = typeof(T);
    IEnumerable<PropertyInfo> properties;

    lock (_Properties)
    {
        if (!_Properties.TryGetValue(objType, out properties))
        {
            properties = objType.GetProperties().Where(property => property.CanWrite);
            _Properties.Add(objType, properties);
        }
    }

    var list = new List<T>(table.Rows.Count);

    Parallel.ForEach<DataRow>(table.AsEnumerable().Skip(1), row => {
        var obj = new T();

        foreach (var prop in properties)
        {
            if (prop != null)
            {
                Type t = Nullable.GetUnderlyingType(prop.PropertyType) ?? prop.PropertyType;

                object propertyValue = (row[prop.Name] == null) ? null : Convert.ChangeType(row[prop.Name], t);

                var accessors = TypeAccessor.Create(objType);
                accessors[obj, prop.Name] = propertyValue;
            }
        }

        list.Add(obj);
    });

    return list;
}

This runs pretty great on its own, but by using Parallel in the ForEach we’re able to at least half our time. If you’re running a 4 core or even an 8 core machine, it can be even better, but YMMV.

So, obviously this method is a little slower, but what you lose in speed, you make up for in flexibility. You’ll never have to add or remove code when you change your stored procedure. You’ll never have to write boiler plate code like in Methods 1 and 2 ever again. Just keep in mind that this works best for small result sets.

Results

OK, here’s the final results.

DataTable to Linq runtimes

It’s pretty much as we’d expect for methods 1 and 2. If you have more nullable fields, 1 will get slower and slower, almost to the speed of method 3, so if you’re looking to do it manually, go with 2. The one thing that amazes me is how close to LINQ speeds we can get with Reflection and FastMember, so give it a try.

If you’re going with 2, here’s a SQL query that will auto generate all the class code for you. It’s pretty sweet.

OK, that’s it. Generally from now on, I think I’ll be using a combination of 2 and 3, depending on the situation. Let me know in the comments if you found anything questionable.