Wolfram|Alpha, the Details Behind the Rhombic Hexecontahedron
Wolfram|Alpha uses symbolic computation in an attempt to make the world’s systematic knowledge computable. It does that by accepting a linguistic input not a custom set of formulas. The main components of the system are a data curation pipeline, an algorithmic computation system, a linguistic processing system, and an automated presentation system.
Wolfram|Alpha is not a search engine offering links to preexisting web pages, nor does it follow on Wikipedia’s steps which provides a sea of “popular” narrative knowledge, but it rather tries to answer questions by providing facts through real-time computations.
Wolfram|Alpha does not search the web for results, nor does it take its source data from the web. It has its own internal curated and audited data coming mostly from systematic primary sources. Even the real-time data (weather, stock market, earthquake) is curated and compared against valid data and the results are shown differently (for example, with dash lines) if deviations are found.
Wolfram|Alpha uses “10+ trillion of pieces of data, 50,000+ types of algorithms and models, and linguistic capabilities for 1000+ domains”. The engine was built on top of Mathematica’s engine being continually developed since 1986 and currently containing over 5 million lines of symbolic code running on world’s 66th fastest super-computer supporting 175 millions requests a day. The service is offered by R Systems and can process 39.6 trillion mathematical operations per second. Details:
The system, called R Smarr, has 4,608 processor cores using 576 quad-core "Harpertown" Xeon machines, 65,536GB of memory, and high-speed InfiniBand data-transfer connections, according to the Top500 site and a Dell case study on the system (PDF). It also uses both the Red Hat Enterprise Linux and Microsoft Windows HPC Server operating systems, according to the Dell paper.
Alpha requests will be served from five co-location facilities, Wolfram Research said. There actually are two supercomputers in the project, with nearly 10,000 processor cores total and hundreds of terabytes of hard drives.
Data is retrieved as Mathematica expressions, which are S-expressions, through a load on-demand mechanism and a uniform Mathematica language interface. There is a wealth of data in many fields: “mathematics, physics, chemistry, astronomy, geography, linguistics, and finance.” According to the authors, the difference between the two technologies is:
Wolfram|Alpha gives small, quick, one-off results on the web. Mathematica is a broad, deep computing environment that lets you handle arbitrarily sophisticated problems. Extensions to both Wolfram|Alpha and Mathematica will bring the two closer together.
The current input language is English, but there are plans to support other languages in the future. Ambiguous questions are solved by:
It ranks possible interpretations, gives results for the one it thinks is most plausible, and gives links to click to get other ones. It often uses your location to rank interpretations—say, favoring cities that are near you.
Some responses take into account the user’s location based on his/her IP address and it is obtained from GeoIP with a 5 miles precision.
There is a limitation in the processing time allocated to an user. A request is not fully processed if a time threshold is passed and partial results are returned in that case. A Wolfram|Alpha Professional Edition is planned for the near future containing special features including no computational time limit. Other features of the Professional Edition are:
Ability to download in many formats (e.g. spreadsheets, XML, 3D modeling, TeX, etc.)
Uploading of data for analysis (e.g. spreadsheets, text, images, webpages, etc.)
Alternate display formats
Persistent individual or corporate preferences
Ability to store entity definitions
Dynamic interactivity capabilities
Wolfram has more plans for the future: “a developer APIs, professional and corporate versions, custom versions for internal data, connections with other forms of content, and deployment on emerging mobile and other platforms.”
An example, asking for “Hurricane Katrina”. The result shows like this:
Labeled sections are called pods. Each pod can have sub-pods. At the base of a response there is a link with the information source and the option to download the result as PDF.
Wolfram|Alpha’s logo is a rhombic hexecontahedron.