Saturday, December 31, 2011

Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks


Dryad is kind of lower-level computational model than MapReduce. The programmer has to write a DAG for a specific task. The programmer implements the job details on each vertex and defines what kinf of edges that connect the vertices. In this way, the programmer first decomposes the tasks into a DAG. The commonly seen parallelization of splitting data has to be done manually or by some library to read data from a GFS like distributed storage. There is no shuffling like in reduce step. So programmer has to build his own for this type of operation (or dryad's library has such implementation). The comminication (i.e. the edges) can be fine tuned by the programmer, like a TCP pipe or a temporary file.

On a whole dryad provides a lower level of computational model, on top of which other computational models (like Map/Reduce) can be built. But it also raises bar for its users. I don't think MS will open source for dryad and this is MS's style. Proprietary software, limited community...

No comments: