BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News What ASP.NET Core May Bring to the .NET Framework’s String Handling

What ASP.NET Core May Bring to the .NET Framework’s String Handling

Bookmarks
In what was apparently a major miscommunication among Microsoft’s developers and managers, ASP.NET Core 2.0 will in fact be supported on the full the .NET Framework. The change to only offer ASP.NET Core on .NET Core was supposed to be a temporary step to ease development.

From the ASP.NET Core preview release notes,

This preview version of ASP.NET Core 2.0 ships with support for the .NET Core 2.0 SDK only.  Our goal is to ship ASP.NET Core 2.0 on .NET Standard 2.0 so applications can run on .NET Core, Mono and .NET Framework.  As the team was working through the last of their issues before Build, it was uncovered that the preview of ASP.NET Core 2.0 utilized API’s that were outside of .NET Standard 2.0, preventing it from running on .NET Framework. Because of this we limited Preview 1 support .NET Core only so it would not break a developer upgrading an ASP.NET Core 1.x application to ASP.NET Core 2 preview on .NET Framework.

In an interview on The Register, Miguel de Icaza confirmed Microsoft’s commitment to .NET Framework,

I would love to clear up that. I want to hide my face. The .NET Core 2.0 that’s coming, the one that they baked for Build, it’s a preview, and they discovered that they had a number of problems that they couldn’t address fast enough on .NET Framework. So the packages that went out only support running ASP.NET Core 2.0 on .NET Core 2.0. We’ll fix this, and we will run on .NET Framework.

One justification for temporary change remains valid: to continue improving performance ASP.NET Core needs a better string handling library.

Memory Allocation

A long-known problem with .NET is the way virtually all string-handling methods allocate memory. When parsing JSON, XML, etc. it is not unusual to see hundreds or even thousands of tiny string allocations generated from the substring method. Not only does this potentially waste a lot of time making the copy, it puts a lot of stress on the garbage collector and as an application developer there isn’t much you can do about this.

There is actually a good reason for this. Like .NET, strings in Java are immutable. So rather than allocation new strings, Java’s original substring method simply created pointers to the original string. This gave it allocation free substrings, but at the risk of memory leaks. A single character substring could keep a 5 MB string from being garbage collected. (The problem was so bad, that Java changed to allocating substrings in 1.7u6.)

Under that Span<T> proposal, you would be able to choose between allocating and non-allocating substrings. The parsing libraries that ASP.NET Core use could be rewritten to use the non-allocating version internally, improving performance without the risk of a memory leak. They just have to be sure to release any instance of Span<char> by the end of the parsing operation.

This change would also require new versions of primitive parsing function such as Int32.Parse and Int32.TryParse to be effective. Ideally this would be baked into the Base Class Library (BCL) rather than being a separate library. Which returns us to the .NET Framework vs Standard vs Core issue.

There is no question that .NET Core can be changed the fastest. So aside from OS-specific functionality, it will be getting any new feature first. Conversely, .NET Standard won’t see new features until all of the various incarnations of .NET/Mono agree to support it. While Microsoft theoretically owns them all, this is still going to be a lengthy process.

So during its development, it makes sense that ASP.NET Core is built against .NET Core. This will allow the new APIs to be refined with real use cases before they are submitted for standardization.

Default Encoding

Most people don’t know this, but .NET uses UTF16 strings internally. For most use cases this is acceptable and developers don’t need think about encoding unless they are dealing with file or network I/O.

The web is mostly based on UTF8. Again, this is acceptable for most use cases and server-side developers merely need to ensure that whatever internal format they’re using is eventually converted to that encoding.

When we start talking about performance, this becomes a bit of a problem. All web requests start out as UTF8 and then need to be converted into UTF16 before .NET can understand them. Conversely, all of the responses from a .NET server need to be converted from UTF16 back to UTF8.

There are a couple of proposals to eliminate all of this otherwise unnecessary conversion. The first is to create a Utf8String class and matching string handling library. You could then build new parsing libraries that work directly with them. This is a very low-risk option, as it would be entirely opt-in.

The more comprehensive proposal is titled Compact String implementation by Matt Warren. Inspired by a similar proposal in OpenJDK, it would add a type field to strings what would allow them to indicate which encoding they use. This is a much larger change and could negatively impact the plans for Span<T>.

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT