BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Introducing Microsoft Avro

Introducing Microsoft Avro

Bookmarks

Microsoft has announced their implementation of the Apache Avro wire protocol. Avro is described a “compact binary data serialization format similar to Thrift or Protocol Buffers” with additional features needed for distributed processing environments such as Hadoop.

In order to make the protocol as fast as possible, the Microsoft Avro Library uses expression trees to build and compile a custom serializer at run time. After the initial hit to compile the serializer into IL code, this should provide significantly better performance than reflection-based algorithms.

Unlike Protocol Buffers, the Avro protocol is self-describing. When the connection is made between client and server, the schema is transmitted. Usually just once, so neither have to hard code the binary format nor do you need to pay the price to transmit the schema in each message.

Because of this, the Microsoft Avro Library can support three modes:

  • Reflection mode. The IL code for the serializer is built based on the schema of .NET types to achieve maximum performance.
  • Generic record mode. The JSON schema of the data can be specified at runtime so that it provides the ability for handling dynamic data with arbitrary schema.
  • Container mode. The library can generate portable files with embedded schema. The file format is compatible with Avro container file specification and can be used across platforms.

When used in reflection mode, Avro uses the same DataContract/DataMemeber attributes that WCF developers are familiar with.

In generic record mode it is assumed that you don’t have a .NET class predefined to store the data. Instead you use the AvroRecord class in conjunction with a JSON document that describes the format of the data. AvroRecord objects need to be accessed in a late bound manner (C# dynamic, VB Option Strict Off).

Container mode can be used in conjunction with reflection or generic record mode. Since you are creating files in this mode instead of sending messages over the wire you can compress and/or encrypt the data using whatever means you prefer. Out of the box you get no compression or deflate, but instructions for building your code codec are included.

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

  • What about compatibility?

    by Alexander Shopov,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Thus far Apache Avro has been multilanguage and multi platform providing bindings out of the box for Java, C#, C, C++.
    Am I reading the article right? - Only the last mode of Microsoft implementation (container mode) can be used across platforms (producing files according to container spec)? Are datastreams from the fast (reflection) and the JSON (generic) modes compatible with upstream Avro or is this implementation just a MS to MS solution?

  • Re: What about compatibility?

    by Jonathan Allen,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    As I understand it, the "reflection" and "generic" modes only deal with how the .NET code interprets the messages. Whatever is on the other side of the connection shouldn't be able to tell which mode is being used.

    For "container" I'm assuming that everyone has to agree on whatever compression/encryption codec you are layering on top.

  • Re: What about compatibility?

    by Alexander Shopov,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    This sounds much, much better. If comparing with Java:

    • Reflection mode seems like .NET's preferable way of using data objects. Would be an analogue to beans which seems the popular way for DO in Java. (but the two are different otherwise)

    • Generic mode is using a Map interface.

    The Java implementation has also a JSON view implementation, but given reflection and generic mode either Microsoft or someone else can provide the JSON-ish way of using Avro provided the user case is there. Just to add - they say they have published the code under Apache 2 license on CodePlex which is nice.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT