BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Implementation Strategies for the Repository Pattern with Entity Framework, Dapper, and Chain

Implementation Strategies for the Repository Pattern with Entity Framework, Dapper, and Chain

Bookmarks

Key takeaways

  • Micro-ORMs such as Dapper offer the best performance, but require the most work
  • Fluent ORMs such as Chain are easier to use if you don’t need complex object graphs.
  • Entity Framework’s performance can be improved significantly if you put a lot of work into it.
  • Projections can be tedious, but are needed to get the most performance out of your database.
  • Partial updates can be problematic for full ORMs.

In modern enterprise development it is common to use a multi-layered approach to building one’s data access layer (DAL). When using C#, the lowest layer of the DAL is almost always ADO.NET. However, that can be a clumsy library at times so it is common to layer upon it an ORM. Then to enable mocking and hide the ORM’s details, the whole DAL is wrapped inside a repository.

In this series we’ll be looking at techniques for building a repository using three different styles of ORM:

  • Entity Framework: A tradition “full feature” or “OOP” style ORM
  • Dapper: A lightweight micro-ORM that focuses primarily on result set mapping.
  • Tortuga Chain: A fluent ORM based on functional programming concepts.

This article will focus on the basic functionality that one would find in a typical repository. In part two, we’ll look at advanced techniques that one would implement on a case by case basis.

Insert Operations

For any set of CRUD operations, the simple insert is usually the first one implemented so it can be used to test other operations.

Chain

Chain uses runtime matching between column names and property names. Unless strict mode is enabled, properties on the object that don't exist in the database will be ignored. Likewise, columns that don't have matching properties are not part of the generated SQL.

public int Insert(Employee employee)
        {
            return m_DataSource.Insert("HR.Employee", employee).ToInt32().Execute();
        }

Dapper

Without third party extensions, Dapper requires the programmer to manually specify the SQL needed. This includes database-specific logic for returning the newly created primary key.

 public int Insert(Employee employee)
        {
            const string sql = @"INSERT INTO HR.Employee
		(FirstName,
		 MiddleName,
		 LastName,
		 Title,
		 ManagerKey,
		 OfficePhone,
		 CellPhone
		)
VALUES	(@FirstName,
		 @MiddleName,
		 @LastName,
		 @Title,
		 @ManagerKey,
		 @OfficePhone,
		 @CellPhone
		);

SELECT SCOPE_IDENTITY()
";
            using (var con = new SqlConnection(m_ConnectionString))
            {
                con.Open();
                return con.ExecuteScalar<int>(sql, employee);
            }
        }

Entity Framework

Entity framework uses compile-time mapping to generate the SQL at runtime. If there are any properties that don't match a column, they need to be tagged as NotMapped or an error will occur.

public int Insert(Employee employee)
        {
            using (var context = new CodeFirstModels())
            {
                context.Employees.Add(employee);
                context.SaveChanges();
                return employee.EmployeeKey;
            }
        }

Update Operations

Chain

By default, Chain uses the primary key(s) as defined in the database. However, it will honor the Key attribute in the model if the appropriate insert option is set.

public void Update(Employee employee)
        {
            m_DataSource.Update("HR.Employee", employee).Execute();
        }

Dapper

Again, plain Dapper requires the user to manually write the necessary SQL.

public void Update(Employee employee)
        {
            const string sql = @"UPDATE	HR.Employee
SET		FirstName = @FirstName,
		MiddleName = @MiddleName,
		LastName = @LastName,
		Title = @Title,
		ManagerKey = @ManagerKey,
		OfficePhone = @OfficePhone,
		CellPhone = @CellPhone
WHERE	EmployeeKey = @EmployeeKey
";
            using (var con = new SqlConnection(m_ConnectionString))
            {
                con.Open();
                con.Execute(sql, employee);
            }
        }

Entity Framework, Novice

EF looks for the Key attribute to generate the where clause for the update statement.

public void Update(Employee employee)
        {
            using (var context = new CodeFirstModels())
            {
                var entity = context.Employees.Where(e => e.EmployeeKey == employee.EmployeeKey).First();
                entity.CellPhone = employee.CellPhone;
                entity.FirstName = employee.FirstName;
                entity.LastName = employee.LastName;
                entity.ManagerKey = employee.ManagerKey;
                entity.MiddleName = employee.MiddleName;
                entity.OfficePhone = employee.OfficePhone;
                entity.Title = employee.Title;
                context.SaveChanges();
            }
        }

Entity Framework, Intermediate

When working with EF, novices often make a mistake when it comes to performing updates. Attaching an entity that comes in over the wire is easy, but the pattern isn't something you can just stumble across. Here is the corrected example:

public void Update(Employee employee)
        {
            using (var context = new CodeFirstModels())
            {
                context.Entry(employee).State = EntityState.Modified;
                context.SaveChanges();
            }
        }

Read All Operations

Read All operations in EF and Chain are very similar, with EF requiring more lines and Chain longer lines.

As it requires raw SQL, Dapper of course is the most verbose. That said, you can reduce some of Dapper's overhead by using SELECT * instead of manually specifying the column names. This also reduces the chances of a mismatch between the class and SQL statement, at the risk of returning extraneous data.

Chain

In Chain, the ToObject link generates a list of desired columns. The From link generates the SQL by matching the desired list with the list of available columns.

public IList<Employee> GetAll()
        {
            return m_DataSource.From("HR.Employee").ToCollection<Employee>().Execute();
        }

Dapper

As it requires raw SQL, Dapper is the most verbose. Though frowned upon, you can reduce some of Dapper's overhead by using SELECT * instead of manually specifying the column names. This makes it less likely to miss a column at the risk of returning extraneous data.

 public IList<Employee> GetAll()
        {
            using (var con = new SqlConnection(m_ConnectionString))
            {
                con.Open();
                return con.Query<Employee>("SELECT e.EmployeeKey, e.FirstName, e.MiddleName, e.LastName, e.Title, e.ManagerKey, e.OfficePhone, e.CellPhone, e.CreatedDate FROM HR.Employee e").AsList();
            }
        }

Entity Framework

As before, EF uses compile time information to determine how to generate the SQL.

public IList<Employee> GetAll()
        {
            using (var context = new CodeFirstModels())
            {
                return context.Employees.ToList();
            }
        }

Get by Id Operations

Note that each example has a slightly modified syntax to indicate only one object is being returned. The same basic filtering technique can be applied when returning multiple objects.

Chain

Chain heavily relies on "filter objects". These objects are translated directly to parameterized WHERE clauses with an "and" operator between each property.

public Employee Get(int employeeKey)
        {
            return m_DataSource.From("HR.Employee", new { @EmployeeKey = employeeKey }).ToObject<Employee>().Execute();
        } 

Though used less often, Chain also allows the WHERE clause to be expressed as a parameterized string.

If the primary key is scalar, which is to say it only has one column, then you can use a simplified syntax.

public Employee Get(int employeeKey)
        {
            return m_DataSource.GetByKey("HR.Employee", employeeKey).ToObject<Employee>().Execute();
        }

Dapper

Here we see Dapper manually specifying essentially the same SQL Chain and EF would generate.

using (var con = new SqlConnection(m_ConnectionString))
            {
                con.Open();
                return con.Query<Employee>("SELECT e.EmployeeKey, e.FirstName, e.MiddleName, e.LastName, e.Title, e.ManagerKey, e.OfficePhone, e.CellPhone, e.CreatedDate FROM HR.Employee e WHERE e.EmployeeKey = @EmployeeKey", new { @EmployeeKey = employeeKey }).First();
            }

Entity Framework

EF treats everything between the table name and the first ToList or First operation as an expression tree. This tree is evaluated at runtime to generate the SQL.

public Employee Get(int employeeKey)
        {
            using (var context = new CodeFirstModels())
            {
                return context.Employees.Where(e => e.EmployeeKey == employeeKey).First();
            }
        }

Delete Operations

Chain

Chain expects a parameter object that includes the primary keys. Extra properties in the parameter object are ignored. (Bulk deletes are not supported with this syntax.)

public void Delete(int employeeKey)
        {
            m_DataSource.Delete("HR.Employee", new { @EmployeeKey = employeeKey }).Execute();
        }

If you have a scalar primary key, then a simplified syntax is available.

 public void Delete(int employeeKey)
        {
            m_DataSource.DeleteByKey("HR.Employee", employeeKey).Execute();
        }

Dapper

public void Delete(int employeeKey)
        {
            using (var con = new SqlConnection(m_ConnectionString))
            {
                con.Open();
                con.Execute("DELETE FROM HR.Employee WHERE EmployeeKey = @EmployeeKey", new { @EmployeeKey = employeeKey });
            }
        }

Entity Framework, Novice

The novice will generally fetch a record and then immediately delete, discarding all of the information returned.

public void Delete(int employeeKey)
        {
            using (var context = new CodeFirstModels())
            {
                var employee = context.Employees.Where(e => e.EmployeeKey == employeeKey).First();
                context.Employees.Remove(employee);
                context.SaveChanges();
            }
        }

Entity Framework, Intermediate

To avoid the round-trip to the database, inline SQL can be used.

public void Delete(int employeeKey)
        {
            using (var context = new CodeFirstModels())
            {
                context.Database.ExecuteSqlCommand("DELETE FROM HR.Employee WHERE EmployeeKey = @p0", employeeKey);
            }
        }

Projections

Projections are an important part of middle tier development. If more data is pulled back than what's actually needed, databases often lose the ability to use covering indexes, or indexes at all, which can result in serious performance hits.

Chain

As before, Chain will simply select whatever columns are necessary for the indicated object type.

public IList<EmployeeOfficePhone> GetOfficePhoneNumbers()
        {
            return m_DataSource.From("HR.Employee").ToCollection<EmployeeOfficePhone>().Execute();
        }

Dapper

Since Dapper is explicit, it is up to the developer to ensure only the necessary columns are selected.

public IList<EmployeeOfficePhone> GetOfficePhoneNumbers()
        {
            using (var con = new SqlConnection(m_ConnectionString))
            {
                con.Open();
                return con.Query<EmployeeOfficePhone>("SELECT e.EmployeeKey, e.FirstName, e.LastName, e.OfficePhone FROM HR.Employee e").AsList();
            }
        }

Entity Framework

EF requires an extra step, which is often overlooked because it is somewhat tedious.

By including the extra Select expression before calling ToList, EF will generate the correct SQL and avoid returning too much information from the database.

public IList<EmployeeOfficePhone> GetOfficePhoneNumbers()
        {
            using (var context = new CodeFirstModels())
            {
                return context.Employees.Select(e => new EmployeeOfficePhone()
                {
                    EmployeeKey = e.EmployeeKey,
                    FirstName = e.FirstName,
                    LastName = e.LastName,
                    OfficePhone = e.OfficePhone
                }).ToList();
            }
        }

Updating using Projections

Once you have a projected object, it is nice to be able to update the database directly from it. In Chain and Dapper, this comes naturally from their basic patterns. In EF, you have to choose between manually copying properties and writing Dapper style inline SQL.

Chain

Note any column that doesn't have a matching property on the projected class will not affected.

  public void Update(EmployeeOfficePhone employee)
        {
            return m_DataSource.Update("HR.Employee", employee).Execute();
        }

Dapper

public void Update(EmployeeOfficePhone employee)
        {
            const string sql = @"UPDATE	HR.Employee
SET		FirstName = @FirstName,
		LastName = @LastName,
		OfficePhone = @OfficePhone
WHERE	EmployeeKey = @EmployeeKey
";
            using (var con = new SqlConnection(m_ConnectionString))
            {
                con.Open();
                con.Execute(sql, employee);
            }
        }

Entity Framework

public void Update(EmployeeOfficePhone employee)
        {
            using (var context = new CodeFirstModels())
            {
                var entity = context.Employees.Where(e => e.EmployeeKey == employee.EmployeeKey).First();
                entity.FirstName = employee.FirstName;
                entity.LastName = employee.LastName;
                entity.OfficePhone = employee.OfficePhone;
                context.SaveChanges();
            }
        }

Reflexive Inserts

Now we're getting into some more interesting use cases. Reflexive inserts mean you return back the object being inserted. This is usually done to pick up defaulted and calculated fields.

Model

Note for EF and Chain, you need to annotate the property so the library knows the field will be set by the database.

[DatabaseGenerated(DatabaseGeneratedOption.Computed)] //Needed by EF
        [IgnoreOnInsert, IgnoreOnUpdate] //Needed by Chain
        public DateTime? CreatedDate { get; set; }

Chain

Chain allows ToObject to be appended to any insert or update operation.

public Employee InsertAndReturn(Employee employee)
        {
            return m_DataSource.Insert("HR.Employee", employee).ToObject<Employee>().Execute();
        }

Dapper

You can use database specific functionality such as OUTPUT clauses to implement a reflexive insert with Dapper.

public Employee InsertAndReturn(Employee employee)
        {
            const string sql = @"INSERT	INTO HR.Employee
		(FirstName,
		 MiddleName,
		 LastName,
		 Title,
		 ManagerKey,
		 OfficePhone,
		 CellPhone
		)
    OUTPUT 
        Inserted.EmployeeKey,
        Inserted.FirstName,
        Inserted.MiddleName,
        Inserted.LastName,
        Inserted.Title,
        Inserted.ManagerKey,
        Inserted.OfficePhone,
        Inserted.CellPhone,
        Inserted.CreatedDate
VALUES	(@FirstName,
		 @MiddleName,
		 @LastName,
		 @Title,
		 @ManagerKey,
		 @OfficePhone,
		 @CellPhone
		);";
            using (var con = new SqlConnection(m_ConnectionString))
            {
                con.Open();
                return con.Query<Employee>(sql, employee).First();
            }
        }

If we went with a novice level pattern, it is more typical to just to forward a call to the insert method followed by the get method.

 public Employee InsertAndReturn_Novice(Employee employee)
        {
            return Get(Insert(employee));
        }

Entity Framework

Using the aforementioned DatabaseGenerated attribute, you can insert a new entity and read back its computed and/or defaulted columns.

public Employee InsertAndReturn(Employee employee)
        {
            using (var context = new CodeFirstModels())
            {
                context.Employees.Add(employee);
                context.SaveChanges();
                return employee;
            }
        }

Restricted/Partial Updates

Sometimes the application isn't meant to be able to change every column. This is especially true when the model comes directly from the UI and may contain a mix of user updatable and non-updatable fields.

Chain

In Chain, the IgnoreOnInsert and IgnoreOnUpdate attributes are used to restrict inserts and updates. Typically you would put both on a CreatedDate style column to allow the database to default the value. Often you'll put IgnoreOnUpdate on columns such as CreatedBy to prevent accidental changes during an update.

Dapper

Since you explicitly write your insert and update statements, Dapper is the most flexible in this regard.

Entity Framework

EF doesn't offer an easy way to say a given column doesn't participate in insert/update operations, except in the case of computed columns. However, you can simulate it using the read-copy-write pattern for updates.

Upsert

The ability to insert or update a record as a single operation is often desirable, especially when using natural keys.

Chain

Chain exposes Upsert using the same design as Insert and Update. The SQL generated varies by database engine. (e.g. SQL Server uses a MERGE, SQLite uses a batch of statements.)

public int Upsert(Employee employee)
        {
            return m_DataSource.Upsert("HR.Employee", employee).ToInt32().Execute();
        }

Dapper

Dapper requires either multiple round-trips or rather complex database-specific SQL, so we are omitting it from this article.

Entity Framework

In EF, this (process? Function? Add a specific to “this”) is just a variation of our improved Update method.

public int Upsert(Employee employee)
        {
            using (var context = new CodeFirstModels())
            {
                if(employee.EmployeeKey == 0)
                    context.Entry(employee).State = EntityState.Added;
                else
                    context.Entry(employee).State = EntityState.Modified;
                context.SaveChanges();
                return employee.EmployeeKey;
            }
        }

Performance

While lines of code and ease of use are the primary benchmarks in this article, looking at actual performance is also useful.

All benchmarks will include a warmup, followed by a 1000 iteration main loop. The same models are used for each test, which were created using EF's Code First from Database code generator. Each iteration represents the basic set of CRUD operations including a total of 13 creates, reads, updates, and deletes.

I would like to make it clear that these are rough benchmarks using the type of code one would typically see from someone just starting out with each library. There are advanced techniques available for each that can improve the performance, sometimes dramatically.

BenchmarkDotNet Timings

  • Chain: 3.4160 ms (0.2764 ms StdDev)
  • Chain w/ Compiled Materializers: 3.0955 ms (0.1391 ms StdDev)
  • Dapper: 2.7250 ms (0.1840 ms StdDev)
  • Entity Framework, Novice: 13.1078 ms (0.4649 ms StdDev)
  • Entity Framework, Intermediate: 10.1149 ms (0.1952 ms StdDev)
  • Entity Framework, Intermediate w/AsNoTracking: 9.7290 ms (0.3281 ms StdDev)

Conclusion

While any ORM framework can be used to implement a basic repository pattern, the performance and amount of code needed can vary significantly. Choosing between them requires balancing these factors, as well as issues such as database portability, cross-platform support, and developer experience.

In part 2, we’ll look at the advanced use cases that make the repository pattern more than just a thin abstraction layer.

The source code for this article is available on GitHub.

About the Author

Jonathan Allen got his start working on MIS projects for a health clinic in the late 90's, bringing them up from Access and Excel to an enterprise solution by degrees. After spending five years writing automated trading systems for the financial sector, he became a consultant on a variety of projects including the UI for a robotic warehouse, the middle tier for cancer research software, and the big data needs of a major real estate insurance company. In his free time he enjoys studying and writing about martial arts from the 16th century.

Rate this Article

Adoption
Style

BT