MapReduce-MPI WWW Site

Latest Features and Bug Fixes in the MapReduce-MPI library

This page is a continuous listing of new features and bug fixes for the MapReduce-MPI (MR-MPI) library.

You can download the current tarball from the download page which includes all the features/fixes listed below, and re-build MR-MPI from scratch.


What version of the MapReduce-MPI library do you have?

A MR-MPI "version" is the date when it was released, such as 1 Sept 2010. MR-MPI is updated continuously. Every time we fix a bug or add a feature, we release it immediately, as listed below. Each dated copy of MR-MPI contains all the features and bug-fixes up to and including that version date. Each time you use the library, the version date is printed to the screen at the end of the run, as part of the cummulative statistics, assuming the verbosity setting is enabled. It is also in the file src/version.h and in the MR-MPI directory name created when you unpack a tarball.



7 Apr 2014

Xiaoban Wu (U Wyoming) found another couple of bugs with the sort_multivalues() method in the library. This patch should fix them.

This is the list of changed files from the 22 Nov 2013 version.


4 Apr 2014

Fixed a logic bug in one of the print() methods which might allow a page of key/values to be accessed when not yet in memory. Also a logic bug that could undercount the memory requirements of the largest key/value pair. This could cause a memory issue if some key/value pair was very large.

Thanks to Xiaoban Wu (U Wyoming) for flagging these issues.

This is the list of changed files from the 22 Nov 2013 version.


22 Nov 2013

Added a couple accessor functions to the C-style interface, to allow access to the KV and KMV pointers from a MapReduce object. This can be done directly in C++, since the pointers are public variables.

This is the list of changed files from the 23 Oct 2013 version.


23 Oct 2013

A couple of the map function C prototypes in the C-style library header were wrong. This fixes it. Thanks to John Brunelle (Harvard) for identifying the problem.

This is the list of changed files from the 17 Sep 2013 version.

17 Sep 2013

Small change needed in the Python wrapper to insure the correct Python modules are imported before an error message is printed for an invalid library load. Thanks to Geoff Fairchild (LANL) for pointing this out.

This is the list of changed files from the 11 Mar 2013 version.


11 Mar 2013

Typo in the in.sssp example input script for single-source shortest path finding.

This is the list of changed files from the 16 Aug 2012 version.


16 Aug 2012

A couple of files were missing from the repository for yesterday's changes to the Python interface.

This is the list of changed files from the 15 Aug 2012 version.


15 Aug 2012

Simplied the building and installing of the Python interface to the MR-MPI library. See the python/README file for details.

This is the list of changed files from the 10 Aug 2012 version.


10 Aug 2012

Fixed a typo in the Python interface file mrmpi.py.

This is the list of changed files from the 20 Jun 2011 version.


20 Jun 2011

A couple users found some bugs with the memory allocation logic that manages pages of allocated/deallocated memory within the library, which could lead to memory corruption issues. This patch should fix it.

This is the list of changed files from the 30 May 2011 version.

Thanks to Holger Teutsch and John Wehle for identifying the problems.


30 May 2011

Added some memory allocation debugging code.

This is the list of changed files from the 28 May 2011 version.


28 May 2011

Updated the PDF manuals.

This is the list of changed files from the 27 May 2011 version.


27 May 2011

Relaxed an error check to allow building of the MR-MPI library on systems with 4-byte pointers.

This is the list of changed files from the 18 Mar 2011 version.


18 Mar 2011

Added a histo command to OINK, to calculate a frequency count of unique keys in a set of key/value pairs.

This is the list of changed files from the 9 Mar 2011 version.


9 Mar 2011

Tweaked some header files in the OINK scripting framework to allow for more portable 64-bit integer usage.

This is the list of changed files from the 8 Mar 2011 version.


8 Mar 2011

Added some files and commands to OINK, which are implementations of the graph algorithms described in the new paper listed on the publications page of the WWW site.

These include algorithms for R-MAT matrix generation, connected component finding, triangle finding, and maximal independent set identification. Algorithms for PageRank and single-source shortest-path will soon follow.

We also added OINK input scripts to test these commands in the examples directory.

This is the list of changed files from the 11 Feb 2011 version.


11 Feb 2011

Insure the OINK version stays current with MR-MPI version in two source files, at each patch.

This is the list of changed files from the 10 Feb 2011 version.


10 Feb 2011

Fixed a bug with edge ordering in the tri_find command in OINK, causing the finder to miss about half the triangles when the input Eij didn't have Vi < Vj.

This is the list of changed files from the 8 Feb 2011 version.


8 Feb 2011

Forgot a couple files in the OINK release yesterday, like the src/Makefile.

This is the list of changed files from the 7 Feb 2011 version.


7 Feb 2011

Release of OINK scripting framework which allows the MR-MPI library to be called from simple input scripts, as well as new MapReduce algorithms to be written within OINK so that data can be passed from one command to another. See the OINK doc pages for more details.

This release also changes the way the MR-MPI library is built to make it easier to build for different targets simultaneously.

This is the list of changed files from the 27 Jan 2011 version.


27 Jan 2011

Added a scan() method to the MR-MPI library to process existing key/value or key/multi-value pairs without changing or deleting them.

This is the list of changed files from the 21 Jan 2011 version.


21 Jan 2011

Added version string to distribution.

This is the list of changed files from the 19 Jan 2011 version.


19 Jan 2011

Changed the calling syntax of several of the map() methods to give more flexibility in how files are processed to generate input. There are options now for reading one or more files, one or more directories of files, recursing through a directory tree to find files, generating filenames only on processor 0 or on every processor, and reading filenames from one or more files.

This is the list of changed files from the 18 Jan 2011 version.


18 Jan 2011

Added some extra diagnostic output about memory usage and fixed a couple bugs with the new memory manager with respect to freeing memory and the sort methods.

This is the list of changed files from the 17 Jan 2011 version.


17 Jan 2011

Added some settings that enable more precise control of memory page usage by the library, especially when your code creates and uses several MapReduce objects.

Note that freepage default is 1, which is different behavior then previously, when pages were allocated as needed and never released.

The default for outofcore is 0, which is the same behavior as previously.

The default for zeropage is 0, which is different behavior than recently, but should be invisible to the user.

Also added new variants of the sort_key(), sort_value(), sort_multivalue() methods, which allow for use of internal compare functions for standard data types (int, 64-bit int, float, double, strings, etc). If your data to be sorted is one of these types, then you no longer need to write and provide your own compare function.

This is the list of changed files from the 11 Jan 2011 version.


11 Jan 2011

Enhanced the gather() method so it should now work with a pagesize larger than 2 Gb.

This is the list of changed files from the 6 Jan 2011 version.


6 Jan 2011

Added some new options to the map() method to enable lists of files or entire directories of files to be read in as part of a map operation.

This is the list of changed files from the 1 Nov 2010 version.


1 Nov 2010

The print() function in the Python interface was causing some problems with Python builds, likely because it is a keyword in Python. This patch renames the function.

This is the list of changed files from the 26 Aug 2010 version.


26 Aug 2010

Forgot to add the relevant doc page for the new broadcast() method to the distribution.

This is the list of changed files from the 25 Aug 2010 version.


25 Aug 2010

Added a broadcast() method for spreading key/value pairs across processors. Can by useful for taking a final gathered answer and distributing it for further processing outside the MapReduce framework.

This is the list of changed files from the 22 Apr 2010 version.


22 Apr 2010

Added a compiler flag option for the "memsize and fpath" settings, so that the default values of 64 Mbyte pages and the current working directory can be overridden, if desired, when the library is built.

This is the list of changed files from the 15 Apr 2010 version.


16 Apr 2010

Added open() and close() methods to the MR-MPI library, to make it easy to have a single map() or reduce() add key/value pairs to multiple MapReduce objects.

This is the list of changed files from the 14 Apr 2010 version.


15 Apr 2010

Added an internal debug option to KeyMultiValue to make it easier to test user reduce functions that work on multi-block KMV pairs.

This is the list of changed files from the 14 Apr 2010 version.


14 Apr 2010

Added a print() function to the library to print-out KeyValue and KeyMultiValue pairs to the screen, useful for debugging.

This is the list of changed files from the 15 Mar 2010 version.


15 Mar 2010

New release of MR-MPI library. 7102 lines of code in src dir.

The new version adds the following features:

See this section of the documentation for further details on out-of-core operation.

See this section of the documentation for a discussion of the (very large) limits on data set sizes that the library can now process.


1 Mar 2010

Final update to in-core version before releasing out-of-core version. This upgrade just tweaks a few small cosmetic things.

This is the list of changed files from the 22 Apr 2009 version.


22 Apr 2009

This upgrade adds 3 small features:

1) A timer setting that enables output of timing info from each library call.

2) A new map() option to pass in an existing set of key/value pairs, operate on them via the user's map function (one key/value at a time), producing a new set of key/value pairs. This is an alternative to using the clone() method to turn the key/value pairs into key/multivalue pairs and then perform a reduce() on them.

3) Some clean-up inside the library of KeyValue and KeyMultiValue objects when various calls transform data from one to the other. This should prevent users from invoking a call that uses an old KV or KMV object that was already converted to something else by a previous call.

This is the list of changed files from the 19 Apr 2009 version.


19 Apr 2009

Added some Makefiles for different machines and changed the C library interface to the dummy MPI library, so that pure C programs can be built with a C compiler instead of C++. Ditto for the C examples provided.

This is the list of changed files from the 18 Apr 2009 version.


18 Apr 2009

Added a dummy serial Pypar file to the examples dir, so that the Python script examples can be run (in serial) even if you don't have Pypar (Python wrapper on MPI) installed.

This is the list of changed files from the 17 Apr 2009 version.


17 Apr 2009

Enhanced the C++ and C word-frequency example programs to read arbitrary sized text files (under 2 Gb).

This is the list of changed files from the 16 Apr 2009 version.


16 Apr 2009

Restored some commented-out code in the examples/rmat.py Python script.

This is the list of changed files from the 15 Apr 2009 version.


15 Apr 2009

The Python wrapper files got accidentally whacked in the 14Apr09 upgrade due to a faulty script. This upgrade restores them.

This is the list of changed files from the 14 Apr 2009 version.


14 Apr 2009

Added an error check to the gather() function if proc count < 1 or proc count > actual procs.

This is the list of changed files from the 13 Apr 2009 version.


13 Apr 2009

Initial public release of MR-MPI. 3730 lines of code in src dir. This version supercedes earlier distributed beta versions.