MapReduce-MPI WWW Site - MapReduce-MPI Documentation

MapReduce sort_keys() method

uint64_t MapReduce::sort_keys(int (*mycompare)(char *, int, char *, int))
uint64_t MapReduce::sort_keys(int flag) 

This calls the sort_keys() method of a MapReduce object, which sorts a KeyValue object by its keys to produce a new KeyValue object.

For the first variant, you provide a mycompare() function which compares pairs of keys for the sort, since the MapReduce object does not know how to interpret the content of your keys. The method returns the total number of key/value pairs in the new KeyValue object which will be the same as in the original.

For the second variant, you can select one of several pre-defined compare functions, so you do not have to write the compare function yourself:

flag = +/- 1 compare 2 integers
flag = +/- 2 compare 2 64-bit unsigned integers
flag = +/- 3 compare 2 floats
flag = +/- 4 compare 2 doubles
flag = +/- 5 compare 2 NULL-terminated strings via strcmp()
flag = +/- 6 compare 2 arbitrary strings via strncmp()

If the flag is positive, the sorting is done is ascending order; if the flag is negative, the sorting is done is descending order.

For the flag = +/- 6 case, the 2 strings do not have to be NULL-terminated since only the first N characters are compared, where N is the shorter of the 2 string lengths.

This method is used to sort key/value pairs by key before a KeyValue object is transformed into a KeyMultiValue object, e.g. via the clone(), collapse(), or convert() methods. Note that these operations preserve the order of paires in the KeyValue object when creating a KeyMultiValue object, which can then be passed to your application for output, e.g. via the reduce() method. Note however, that sort_keys() does NOT sort keys across all processors but only sorts the keys on each processor within the KeyValue object. Thus if you gather() or aggregate() after performing a sort_keys(), the sorted order will be lost, since those methods move key/value pairs to new processors.

In this example for the first variant, the user function is called mycompare() and it must have the following interface

int mycompare(char *key1, int len1, char *key2, int len2) 

Key1 and key2 are pointers to the byte strings for 2 keys, each of length len1 and len2. Your function should compare them and return a -1, 0, or 1 if key1 is less than, equal to, or greater than key2, respectively.

This method is an on-processor operation, requiring no communication. When run in parallel, each processor operates only on the key/value pairs it stores.


Related methods: sort_values(), sort_multivalues()