MapReduce-MPI WWW Site - MapReduce-MPI Documentation

MapReduce sort_values() method

uint64_t MapReduce::sort_values(int (*mycompare)(char *, int, char *, int))
uint64_t MapReduce::sort_values(int flag) 

This calls the sort_values() method of a MapReduce object, which sorts a KeyValue object by its values to produce a new KeyValue object.

For the first variant, you provide a mycompare() function which compares pairs of values for the sort, since the MapReduce object does not know how to interpret the content of your values. The method returns the total number of key/value pairs in the new KeyValue object which will be the same as in the original.

For the second variant, you can select one of several pre-defined compare functions, so you do not have to write the compare function yourself:

flag = 1 compare 2 integers
flag = 2 compare 2 64-bit unsigned integers
flag = 3 compare 2 floats
flag = 4 compare 2 doubles
flag = 5 compare 2 NULL-terminated strings via strcmp()
flag = 6 compare 2 arbitrary strings via strncmp()

For the flag = 6 case, the 2 strings do not have to be NULL-terminated since only the first N characters are compared, where N is the shorter of the 2 string lengths.

This method is used to sort key/value pairs by value before a KeyValue object is transformed into a KeyMultiValue object, e.g. via the clone(), collapse(), or convert() methods. Note that these operations preserve the order of pairs in the KeyValue object when creating a KeyMultiValue object, which can then be passed to your application for output, e.g. via the reduce() method. Note however, that sort_values() does NOT sort values across all processors but only sorts the values on each processor within the KeyValue object. Thus if you gather() or aggregate() after performing a sort_values(), the sorted order will be lost, since those methods move key/value pairs to new processors.

In this example for the first variant, the user function is called mycompare() and it must have the following interface

int mycompare(char *value1, int len1, char *value2, int len2) 

Value1 and value2 are pointers to the byte strings for 2 values, each of length len1 and len2. Your function should compare them and return a -1, 0, or 1 if value1 is less than, equal to, or greater than value2, respectively.

This method is an on-processor operation, requiring no communication. When run in parallel, each processor operates only on the key/value pairs it stores.


Related methods: sort_keys(), sort_multivalues()