Nothing Is Permanent Except Change - How Software Architects Can Embrace Change
Michael Stal discusses system architecture quality, how to avoid architectural erosion, how to deal with refactoring, and design principles for architecture evolution.
The content has been bookmarked!
There was an error bookmarking this content! Please retry.

Posted by Yevhen Shchyholyev on Jul 25, 2011
ORM users have grown accustomed to thinking in terms of .NET and LINQ, forgetting the specifics of particular databases, differences in their functionality and effectiveness of certain SQL-constructions. In this article, we shall briefly discuss some of the problems that the user may face in the process of developing an application for Entity Framework (EF), which is meant to interact with Oracle, MySQL, PostgreSQL or SQLite rather than Microsoft SQL Server. I hope this will be useful both for users who create an EF-application for these databases for the first time as well as for those who create an application that must support interaction with multiple databases, i.e., SQL Server and Oracle. As examples of implementing EF-providers for databases other than SQL Server, we shall use Devart ADO.NET providers.
Presently, Entity Framework supports the following three approaches to development: Database-First, Model-First and Code-First. There is no “best” approach – the choice of the approach depends on the specifics of a particular application and the scope of work. In any case, the user must be ready for a situation when the standard capabilities of Visual Studio’s design-time model development are not convenient enough; either conditioned by their limited functionality or such capabilities are intended for use with SQL Server and can hardly be conveniently used when working with Oracle, for example. For that reason, a number of third-party solutions have been created:
If it is necessary to develop an application that supports several different databases, there are several solutions in Entity Framework. Entity Framework v1/v4 provided no choice – the developer had to use the XML mapping approach. With the release of Entity Framework 4.1, it became possible to develop applications using fluent mapping.
The developer can work entirely in the Model-First approach. Or they may initially use the Database-First approach against the model targeting one DBMS, then switch to the Model-First approach, change the target database by changing the connection and the DDL generation template, after which the mapping and storage parts of the model have to be re-generated. The DDL SQL script for database creation and generated metadata must be saved to obtain a set of CSDL, MSL, SSDL files at each stage. CSDL and MSL are the same, while the SSDL files are different for each database. Further on, the correct EF connection string must be formed based on the provider-specific connection string and the names of required resources for each database.
This is only a brief outline of the process. There are many detailed tutorials on the subject that can be referred to for more information. For example, for more information on SQL Server + Oracle, see the articles Migrating database schema from Microsoft SQL Server to Oracle using Entity Framework and Preparing for multiple databases
Advantages of using XML-mapping for multiple databases:
* Support for all capabilities of Entity Framework mapping.
* When standard VS EDM Designer is used, most operations on creating and editing the model can be performed in the designer.
* If other types of code generation are used instead of standard EntityObjects code generation, the user can get POCO objects or self-tracking entities (STE).
* Customization of generated code is performed through editing T4 templates.
Disadvantages of using XML-mapping:
* Inconvenient, complex modification.
* Standard VS EDM Designer lacks many capabilities and does not support all capabilities of XML mapping, so that the user has to modify XML manually in the XML editor or use a third-party EF-designer.
* T4 code generation templates, especially for POCO and STE, are complex and cannot be conveniently modified.
After fluent mapping functionality became available, the development of applications that target multiple databases has grown significantly easier. It is possible to write the entire code manually (Code-Only); however it also possible to use the designer and generate the code with the help of a code generation template.
For an example of developing an EF Code-First application that targets multiple databases, see the article "Entity Framework Code-First support for Oracle, MySQL, PostgreSQL and SQLite".
Within the SQL Server behavior, when database objects mapped to the corresponding objects in the model, are deleted and created, the entire database is deleted and then created and filled with new tables. DatabaseExists is intended to check if the database exists.
This behavior is not always optimal, since only some users have the required privileges. This behavior cannot be always implemented (for example in Oracle, the schema is the user and it is impossible to delete the user that connected to the database).
That is why for Oracle, MySQL, PostgreSQL and SQLite, there are three types of deletion behavior. The developer can choose which to use:
Database creation means creating tables and relationships between them. For Oracle databases, if the primary key is specified as database-generated with the DatabaseGeneratedOption.Identity option, then a sequence and an insert trigger will be generated for this table to make this column autoincrement. Additionally, if the schema deletion/creation mode is selected, then the entire schema (database) will be created.
The implementation of database existence verification checks if at least one required object exists in the database. The existence of at least one table in MySQL, PostgreSQL, and SQLite is verified as well. Additionally for Oracle, the existence of sequences is verified.
The behavior of the current Code-First implementation is such that in the course of mapping generation the dbo schema name is defined by default, when the names all the tables are generated. That is, the Column class will be mapped to the table "dbo"."Column".
This behavior is suitable for SQL Server in principle, but cannot be used when targeting other databases. This can be corrected by explicitly setting the schema name using the TableAttribute attribute or (for fluent mapping) by using the method .ToTable() for each class of the entity (do not forget about the EdmMetadata table and the intermediate table for many-to-many associations if they are used in the model.) There are no standard means that allow not generating the prefix of the database schema name before the name of the table so that all actions are performed in the current schema/database; however on the level of the EF-provider implementation this workaround can be implemented, as it is in Devart dotConnect.
Entity Framework supports a limited number of .NET types (signed integers, Single, Double, Decimal, String, Byte[], TimeSpan, DateTime, DateTimeOffset, Boolean, and Guid). However not all .NET types from this list are adequately represented in all databases.
Different databases have different data types that represent numeric types. In some of them, they are separated into numerous types. In Oracle, all .NET types can be stored in one NUMBER type. There are no special problems with this arrangement, however developers should ensure that the accuracy and capability of DB-type are sufficient to store the entire range of values of the corresponding .NET-type (or vice versa, depending on the scope of the task). Ideally, the types will have equal capabilities.
As for particulars, it should be remembered that Entity Framework does not support the mapping of .NET unsigned integers, so MySQL unsigned types have to be mapped to signed integers with a greater range. That is, MySQL unsigned integer is mapped to System.Int64 and so on.
Some EF-providers make it possible to customize rules for mapping numeric db-types to .NET-types when a model is built through the Database-First approach, while other providers do not, as they have a fixed set of rules. Normally this functionality is implemented in EF-providers for Oracle.
Also keep in mind Oracle has no autoincrement columns; the identity column is implemented as columns of integer type (i.e., NUMBER(9)) plus the INSERT trigger that assigns a value to this column based on the sequence.
Remember also different database string types have different limitations for the maximum length of the string that can be saved, and there can also be a separation into Unicode/non-Unicode string types.
The most problem-prone area is how empty strings are processed in Oracle. This database stores empty strings "" as NULL; if this peculiarity is not accounted for, there may be problems with saving if the column has the NOT NULL constraint. If the column is nullable, the saving operation is problem-free. However when data is read it can be expected that null will be returned instead of the recorded value "". Application logic must be able to process this peculiarity correctly.
Another source of trouble are requests in which there are strings of different types or Unicode/non-Unicode strings, since Oracle has a number of limitations in this area. In most cases, a fully functional EF-provider can provide some or other workaround and generate correct SQL code.
Developers should pay attention to the accuracy of database-specific DateTime types, namely their capability for storing fractional parts of seconds (System.DateTime allows storing time values with 7 places after the decimal point; its accuracy is 100 nanoseconds). Some database types cannot store them and the accuracy varies depending on a particular type, so sometimes it is impossible to save the entire value stored in System.DateTime without truncating its part. For example:
DataTimeOffset is the most problematic .NET type. It is supported in MS SQL Server 2008, Oracle and SQLite, but not in MS SQL Server 2005, MySQL or PostgreSQL, since these databases do not have a native DateTime type that can store the time zone offset.
In SQL Server there is the rowversion/timestamp type that can be used to conveniently implement concurrency checks. Different approaches have to be used when working with other databases. There is no optimal way to build concurrency checks that work equally well with any database. Certainly, one can set СoncurrencyMode=Fixed for all columns of an entity to check absolutely all values, but this option is the slowest.
Normally we create an UPDATE-trigger that assigns a value to the particular concurrency column. For concurrency checks we can use a numeric, DateTime or Guid (binary) column:
If concurrency checks are made by the floating-point column, pay attention to type accuracy.
Generally in Oracle, concurrency checks cannot be made by the ORA_ROWSCN pseudo-column, since the values of these pseudo-columns are changed only after the transaction is saved.
When the specifics of a particular database are not taken into account, it is quite possible that a LINQ-query can be processed faster in one database and slower in another.
Data-paging performed by the .Skip()/.Take() LINQ-methods can be one of the most resource-consuming operations. Unlike in SQL Server, Skip/Take in Oracle generates two sub-queries with sorting in the inner sub-query, which might negatively impact the performance. It should be noted that paging can be rather ineffective in other databases, especially if sorting is performed by non-indexed columns.
Both the performance of the EF-provider and the availability of support for Batch Updates when the SaveChanges() method is used impact the process of saving changes to the database.
At this stage of development, you need to control generated SQL code both for queries and updates. Since there is no built-in mechanism for monitoring all generated SQL, you can use third-party tools. When working with Devart's EF-providers, you can use the dbMonitor free tool; if you use other EF-providers, you can trace SQL on the database side.
LINQ-queries can be optimized in various ways: a LINQ-query can be re-written, separated in two, some operations can be moved to the server-side logic by creating a stored procedure/function or view. It is also possible to write and execute native SQL without ORM functionality, move a part of logic from LINQ to Entities onto LINQ to Objects by materializing the collection of objects received from the database and performing manipulations on the in the workstation memory, etc.
For general information on various aspects of performance of all EF-providers, see the article “Performance Considerations (Entity Framework)” on MSDN.
When calculations are performed on numeric types or when aggregating functions are used on different databases, it is possible to get slight different values on the same initial dataset, since calculation accuracy may differ for different types.
Let's demonstrate this with the following example.
Table TEST_TABLE with column COLUMN1 of type NUMBER(15,10) or BINARY_DOUBLE in Oracle and DOUBLE(15,10) or DECIMAL(15,10) in MySQL. Initial data:
|
1.0001 |
|
1.0002 |
|
1.0003 |
|
1.0004 |
|
1.0005 |
|
1.0006 |
|
1.0007 |
|
1.0008 |
|
1.0009 |
|
1.0010 |
Query
SELECT AVG(COLUMN1) FROM TEST_TABLE
returns 1.00055 in both cases in Oracle 11g and DECIMAL(15,10) in MySQL. For DOUBLE(15,10) in MySQL 5.1, we shall get 1.00055000782013.
In Entity Framework, the developer can form queries using either LINQ to Entities, or Entity SQL, or their combinations in the form of LINQ to Entities queries with Entity SQL insertions. In any case, such a query is transformed by the EF engine into an intermediate representation as an expression tree and is sent to the EF-provider to generate SQL. Since Entity Framework was initially developed for SQL Server, sometimes EF-providers have to significantly transform the initial expression tree to generate correct SQL for a particular target database. However, it is not always possible.
The most typical examples of a functionality supported only in Microsoft SQL Server are such SQL-commands as CROSS APPLY and OUTER APPLY. Therefore in certain situations (see the MSDN article for descriptions of such situations), Entity Framework converts the query using these constructions. Since Oracle, MySQL, PostgreSQL and SQLite have no corresponding syntactical constructions, it becomes impossible to generate the query and an error will occur in the process of SQL generation.
It is also possible that a complex LINQ-query with subqueries will work in SQL Server but will crash due to the query execution error in Oracle because of the existing limitation on calling identifiers located on non-adjacent levels of hierarchy. Typically this problem becomes actual when the .Skip()/.Take() methods are used in a complex query, since their implementation in Oracle results in additional generation of subqueries, unlike in SQL Server.
Calls to properties and methods of .NET-types in LINQ to Entities queries can often be transformed into the so-called Entity Framework Canonical Functions (see this MSDN article), on which SQL is generated. Canonical functions can also be used directly in Entity SQL.
The corresponding MSDN section contains statements saying all these functions are supported by all EF-providers:
"This section discusses canonical functions that are supported by all data providers, and can be used by all querying technologies."
This statement seems to be over-optimistic, as different DBMS have different functionality and sometimes it is impossible to implement all support capabilities.
Depending on DBMS, the following aspects may be unsupported:
For a detailed comparison of support for canonical functions for Microsoft SQL Server 2005/2008, Oracle, MySQL, PostgreSQL and SQLite, see the article "Entity Framework Canonical Functions".
Different implementations of EF-providers have different functionality. Thus, the issue of whether to generate literal constants or parameters in INSERT/UPDATE/DELETE commands or to enable the user to choose a particular type of behavior is solely the responsibility of provider developers. Equally different can be the behavior as regards sending NULL-values to the database as well as processing DefaultValue. There being no standardized solution to the issue of availability of the Batch Updates functionality in the SaveChanges() process, provider developers are free to resolve it individually – that is why I recommend referring to provider-specific documentation in any particular case. A number of capabilities on adjusting the behavior of DML-operations, including Batch Updates, are implemented in Devart dotConnect providers.
In this article, I have tried to present a brief description of the most common problems the user faces in the process of developing an EF-application for databases other than Microsoft SQL Server. Undoubtedly, many other aspects have not gotten their due share of limelight within this article. To summarize, I would like to present a list of basic recommendations to developers. Regardless of how commonplace and trivial such recommendations might seem, it is wise to follow them, as they retain their validity, no matter what ORM or database you develop applications for.

Yevhen Shchyholyev is a Dev Lead in Devart. He has more than 10 years of experience in working with different databases and is currently engaged in developing ADO.NET providers, run-time and design-time development tools for ORMs. Yevhen has been developing Entity Framework providers for different databases since 2007. He is also an author of numerous Entity-Framework-related articles on the company's blog.
Michael Stal discusses system architecture quality, how to avoid architectural erosion, how to deal with refactoring, and design principles for architecture evolution.
Every developer has had to integrate with another system, API or component. Tis article provides strategies to handle the change and for he separating system boundaries.
Alex Russell talks about the shortcomings of the web platform and how it is evolving in order to adress them. He also explains about how browsers are improving and shares his vision on things to come.
Jeff Lindsay discusses creating distributed and concurrent systems using ZeroMQ – a lightweight message queue-, and gevent – a coroutine-based networking library.
Brian Ketelsen introduces Skynet, a platform for polyglot, distributed and composable services that communicate with each other over RPC/JSON.
Carin Meier tells the story of Alice discovering Monads, meeting three types of monads – Identity, Maybe, State-, and learning how to implement them in Clojure.
The need for agile, queryable, reliable, scalable storage without the pain of SQL schema migration is real. This article uses MongoDB to introduce NoSQL concepts to Java, PHP, and Python developers.
Jérôme Giraud introduces Wink Toolkit, an open source mobile JavaScript framework for HTML5 web or hybrid apps, showing widgets and interactions.
No comments
Watch Thread Reply