MapReduce-MPI WWW Site - MapReduce-MPI Documentation

MapReduce sort_multivalues() method

uint64_t MapReduce::sort_multivalues(int (*mycompare)(char *, int, char *, int))
uint64_t MapReduce::sort_multivalues(int) 

This calls the sort_multivalues() method of a MapReduce object, which sorts the values for each key within a KeyMultiValue object to produce a new KeyMultiValue object.

For the first variant, you provide a mycompare() function which compares pairs of values for the sort, since the MapReduce object does not know how to interpret the content of your values. The method returns the total number of key/multi-value pairs in the new KeyMultiValue object which will be the same as in the original.

For the second variant, you can select one of several pre-defined compare functions, so you do not have to write the compare function yourself:

flag = 1 compare 2 integers
flag = 2 compare 2 64-bit unsigned integers
flag = 3 compare 2 floats
flag = 4 compare 2 doubles
flag = 5 compare 2 NULL-terminated strings via strcmp()
flag = 6 compare 2 arbitrary strings via strncmp()

For the flag = 6 case, the 2 strings do not have to be NULL-terminated since only the first N characters are compared, where N is the shorter of the 2 string lengths.

This method can be used to sort a set of multi-values within a key before they are passed to your application, e.g. via the reduce() method. Note that it typically only makes sense to use sort_multivalues() for a KeyMultiValue object created by the convert() or collate() methods, not KeyMultiValue objects created by the clone() or collapse() or scrunch() methods.

In this example for the first variant, the user function is called mycompare() and it must have the following interface

int mycompare(char *value1, int len1, char *value2, int len2) 

Value1 and value2 are pointers to the byte strings for 2 values, each of length len1 and len2. Your function should compare them and return a -1, 0, or 1 if value1 is less than, equal to, or greater than value2, respectively.

This method is an on-processor operation, requiring no communication. When run in parallel, each processor operates only on the key/multi-value pairs it stores.


Related methods: sort_keys(), sort_values()