MR-MPI WWW Site -MR-MPI Documentation - OINK Documentation - OINK Commands

wordfreq command

Syntax:

wordfreq Ntop -i in1 -o out1.file out1.mr 

Examples:

wordfreq 10 -i v_files -o full.list NULL
wordfreq 10 -i v_files -o NULL NULL 

Description:

This is a named command which calculates the frequency of word occurrence in an input data set, which is typically a set of files.

See the named command doc page for various ways in which the -i inputs and -o outputs for a named command can be specified.

In1 stores a set of words. The input is unchanged by this command.

If the input is one or more files then the files are read and each "word" is defined as separated by whitespace. Note that you can pass a list of files as the input argument after the "-i" argument by using a variable, which in turn can be initialized with a command-line argument to OINK. E.g. this line would work with the first example above:

oink_linux -var files *.cpp < in.script 

See this section of the manual and the variable doc page for more details.

Out1 will store the frequency count of all unique words.

Additional statistics can be generated and printed via the Ntop setting. The highest frequency Ntop words will be printed to the screen with their count, in sorted order. If Ntop is 0, nothing is printed.

Related commands: none