After several years of development, MBrace 1.0 was released last week. MBrace is a programming model for scalable cloud data scripting and programming with F# and C#. The project consists mainly of code libraries and cloud providers runtime.
The primary component is MBrace.Core, a standalone class library that contains the core MBrace programming model. It provides an API based on computations expressions, which can be used directly or to create libraries like MBrace.Flow.This example, using the cloud workflow, shows how to get remote content via HTTP.
let urls = [| ("bing", "http://bing.com") ("google", "http://google.com") /* more urls*/ |]
let download (name: string, uri: string) =
cloud {
let webClient = new WebClient()
let! text = webClient.AsyncDownloadString(Uri(uri)) |> Cloud.OfAsync
do! CloudFile.Delete(sprintf "pages/%s.html" name)
let! file = CloudFile.WriteAllText(path = sprintf "pages/%s.html" name, text = text)
return file
}
let filesTask =
urls |> Array.map download |> Cloud.Parallel |> cluster.CreateProcess
Built on top MBrace.Core, MBrace.Flow is a distributed streaming library using functional pipeline declarations. The next example shows how to find the number of duplicated strings inside some CSV files using cloud flows.
let numberOfDuplicates =
CloudFlow.OfCloudFilesByLine ["container/data0.csv" ; "container/data1.csv"]
|> CloudFlow.map (fun line -> line.Split(','))
|> CloudFlow.map (fun tokens -> int tokens.[0], Array.map int tokens.[1 ..])
|> CloudFlow.groupBy (fun (id,_) -> id)
|> CloudFlow.filter (fun (_,values) -> Seq.length values > 1)
|> CloudFlow.length
|> cluster.Run
Beside the code libraires, the other major components are the MBrace runtime implementations. Azure is currenty the only provider supported, however AWS support is under development.. The Azure runtime implementation includes full support of the MBrace.Core programming model and cluster management helpers. The following shows how to create a cluster of four A3 instances on Azure:
let pubSettingsFile = @"... path to your downloaded publication settings file ... "
let config = DeploymentManager.BeginDeploy(pubSettingsFile, Regions.North_Europe, VMSizes.A3, vmCount = 4)
To help getting started, two starting kits are available: one uses Azure, the other uses a simulated cluster. The simulated cluster can run on a single machine, providing a way to run and debug distributed code directly on a developer machine without needing additional infrastructure.
MBrace is an open source project available on GitHub. Contributions can be made on several levels: to the libraries, to the cloud providers runtime and to the samples/documentation.