cmomand-name params ... -i input1 input2 ,.. -o output1.file output1.ID output2.file output2.ID ...
wordfreq 5 -i v_files -o NULL NULL rmat 10 8 0.25 0.25 0.25 0.25 0.0 12345 -o tmp.rmat NULL degree -i graph/edges -o degree/degree degree
Invoke a named command with the list of parameters it requires, as well as the list of input and output objects it expects. In OINK a named command is a child class that derives from the Command parent class, meaning that it contains several methods that can be called from the OINK framwork. See this section of the manual for details on how to write new named commands and add them to OINK. The list of named commands currently included with OINK are listed on this page. They are also listed in the source code in the file src/style_command.h which is auto-generated each time that OINK is built.
Each named command has a "name", defined in the *.h file for the class, which is the command name used in the input script to invoke the command, e.g. wordfreq or rmat or degree in the examples above.
Any arguments that follow the command name, upto a "-i" or "=o" argument are passed as params to the command before it is invoked, so that it can process and store them as needed. The number and nature of these parameters are defined by the command itself and it should generate errors if they are not specified correctly. The code that processes parameters can be written to allow for optional parameters and keywords within the list of params.
The "-i" and "-o" arguments can be listed in either order. The arguments that follow each of them, either between them, or upto the end of the command, are passed to an "input" and "output" processing routine within the command class. Each command requires a specific number of input and output "definitions", as explained below. Input definitions are single arguments. Output definitions are pairs of arguments. If zero input (or output) definitions are required by the command, then the "-i" (or "-o") argument need not be specified. If 2 output definitions are required, then 4 arguments must follow the "-o".
Typically each required input definition is a form of data input that the command requires. It can come from reading one or more files or from an MR-MPI object that already exists. Similarly, each required output definition is a form of data output that the command produces. It can be stored either in one or more files or in an MR-MPI object that the command creates. In OINK, an MR-MPI object is a thin wrapper on a MapReduce object created via the MR-MPI library. See this doc page for more discussion of MR-MPI objects and input script operations that can be performed on them directly.
Each input definition inputN is one of three things.
First, it can be the ID of an existing MR-MPI object, which wraps a MapReduce object which contains key/value pairs. If inputN matches such an ID, then the second or third options are not considered. In this case, it is assumed that the data stored within the MapReduce object is already in the form that would be produced by the map() method that would read the input from one or more file or directory names.
Second, the inputN can be the path name of a file or directory. Third, it can be a variable defined elsewhere in the OINK input script that contains one or more strings. In the third case inputN should be specified as v_name, where name is the name of the variable. All the different styles of variables (except equal-style) store strings; see the variable command for details. Also note that there is a command-line option -var or -v which can be specified when OINK is executed to store a list of strings in an index-style variable. The strings are treated as a list of file or directory names. Thus in both the first or second case the effect is that a list of one or more file/directory names is passed to the command. The command creates a temporary, unnamed MR-MPI object and invokes a map() method within it, as specified in the code of the command class, using the list of file/directory names as input.
There are several options available which influence how the list of strings specified in the input script are converted into actual file/directory paths passed to the map() method. This include wildcard charcters "%" and "*". See the input command for details.
Each required ouptut definition is a pair of arguments: ouputN.file and outputN.ID. Either or both can be specified as NULL if no output in that form is desired.
The outputN.file argument is the path name of a file. A map() or reduce() or scan() method, as specified in the code of the command class, will be invoked which will write data to that file when the command is finished. A processor-ID (0 to Nprocs-1) will be appended to the filename, so that when running on multiple processors, multiple files will be created.
If the specfied path name does not entirely exist, additional directories in the path name will be created as needed. Also, there are several options available which influence how the file name specified in the input script is converted into the file name actually opened by OINK and written to by the map(), reduce(), or scan() method. This include wildcard charcters "%" and "*". See the output command for details.
The outputN.ID argument is the ID of an MR-MPI object which wraps a MapReduce object. The code in the command class will have created or altered the MR-MPI object and its associated MapReduce object and populated it with data. As a final step, the specified ID is assigned to that MR-MPI object. If the ID is already in use, then the name is removed from the other MR-MPI object. This means that if an outputN.ID is the same as an inputN to the command then the output will effectively overwrite that input.
When the command completes, named MR-MPI objects persist so that they can be used in subsequent input script commands. All unnamed MR-MPI objects are deleted.
When any named command is executed, its elapsed execution time is stored internally by OINK. This value can be accessed by the keyword "time" in an equal-style variable and printed out in the following manner:
variable t equal time rmat 10 8 0.25 0.25 0.25 0.25 0.0 12345 -o tmp.rmat NULL print "Time for RMAT generation = $t"
MR-MPI library commands, mr, MR-MPI library documentation, how to write named commands, input, output