Counting the number of lines, words, and bytes in a file is useful, but the real flexibility of the Linux
wc command comes from working with other commands. Let’s take a look.
What Is the wc Command?
wc command is a small application. It’s one of the core Linux utilities, so there is no need to install it. It’ll already be on your Linux computer.
You can describe what it does in a very few words. It counts the lines, words, and bytes in a file or selection of files and prints the result in a terminal window. It can also take its input from the STDIN stream, meaning the text you want it to process can be piped into it. This is where
wc really starts to add value.
It is a great example of the Linux mantra of “do one thing and do it well.” Because it accepts piped input, it can be used in multi-command incantations. As we’ll see, this little standalone utility is actually a great team player.
One way I use
wc is as a placeholder in a complicated command or alias I’m cooking up. If the finished command has the potential to be destructive and delete files, I often use
wc as a stand-in for the real, dangerous command.
That way, during the development of the command I get visual feedback that each file is being processed as I expected. There’s no chance of anything bad happening while I’m wrestling with the syntax.
As simple as
wc is, there are still a few small quirks that you need to know about.
Getting Started With wc
The simplest way to use
wc is to pass the name of a text file on the command line.
wc to scan the file and count the lines, words, and bytes, and write them out to the terminal window.
Words are considered anything bounded by whitespace. Whether they are words from a real language or not is irrelevant. If a file contains nothing but “frd g lkj”, it still counts as three words.
Lines are sequences of characters terminated by either a carriage return or the end of the file. It doesn’t matter if the line wraps around in your editor or in the terminal window, until
wc encounters a carriage return or the end of the file, it’s still the same line.
Our first example found one line in the entire file. Here’s the content of the “lorem.txt” file.
All of that counts as a single line because there are no carriage returns. Compare this to another file, “lorem2.txt”, and how
wc interprets it.
wc counts 15 lines because carriage returns have been inserted into the text to start a new line at specific points. However, if you count the lines with text in them, you’ll see there are only 12.
The other three lines are blank lines at the end of the file. These contain only carriage returns. Even though there is no text in these lines, a new line has been started and so
wc counts them as such.
We can pass as many files to
wc as we like.
wc lorem.txt lorem2.txt
We get the statistics for each individual file and a total for all the files.
We can also use wildcards so that we can select matching files instead of explicitly named files.
wc *.txt *.?
The Command Line Options
wc will display the lines, words, and bytes in each file. It’s the same as using the
-w (words) and
-c (bytes) options.
wc -l -w -c lorem.txt
We can specify which combination of figures we wish to see.
wc -l lorem.txt wc -w lorem.txt wc -c lorem.txt wc -l -c lorem.txt
Special attention should be paid to the last figure, generated by the
-c (bytes) option. Many people mistake this as counting the characters. It actually counts bytes. The number of characters and the number of bytes might well be the same. But not always.
Let’s look at the contents of a file called “unicode.txt.”
It has three words and a non-Latin alphabet character. We’ll let
wc process the file with its default setting of bytes, and we’ll do it again but request characters with the
-m (characters) option.
wc -l -w -m unicode.txt
There are more bytes than there are characters.
Let’s have a look at the hex dump of the file and see what’s going on. The
-C (canonical) option displays the bytes in the file in lines of 16, with their plain ASCII equivalent (if there is one) shown at the end of the line. If there is no corresponding ASCII character, a period “
.” is shown instead.
hexdump -C unicode.txt
In ASCII, a hexadecimal value of
0x20 represents a space character. If we count three values in from the left, we see the next value is a space character. So the those first three values
0x79 represent the letters in “boy.”
Hopping over the
0x20, we see another set of three hexadecimal values:
0x74. These spell out “cat.” Hopping over the next space character we see three more values for the letters in “dog.” These are
Right behind the word “dog” we can see a space character
0x20, and five more hexadecimal values. The last two are carriage returns,
The other three bytes represent the non-Latin character, which we’ve ringed in green. It is a Unicode character, and it takes three bytes to encode it. These are
So make sure you know what you’re counting, and that bytes and characters need not be the same. Usually, counting bytes is more useful because it tells you what is actually inside the file. Counting by characters gives you the number of things represented by the contents of the file.
RELATED: What Are Character Encodings Like ANSI and Unicode, and How Do They Differ?
Taking Filenames From a File
There’s another way to provide filenames to
wc . You can put the filenames in a file, and pass the name of that file to
wc. It opens the file, extracts the filenames, and processes them as if they had been passed on the command line. This allows you to store an arbitrary collection of filenames for re-use.
But there’s a gotcha, and it’s a big one. The filenames must be null terminated, not carriage return terminated. That is, after each filename there must be a null byte of
0x00 instead of the usual carriage return byte
You can’t open an editor and create a file with this format. Typically, files like this are generated by other programs. But, if you have such a file, this is how you would use it.
Here’s our file containing the filenames. Opening it in
less shows you the strange “
^@” characters that
less uses to indicate null bytes.
To use the file with
wc, we need to use
--files0-from (read input from) option and pass in the name of the file containing the filenames.
The files are processed exactly as though they were provided on the command line.
Piping Input to wc
A much more common, flexible, and productive way to send input to
wc is to pipe the output from other commands into
wc . We can demonstrate this with the
echo "Count this for me" | wc
echo -e "Count this\nfor me" | wc
echo command uses the
-e (escaped characters) option to allow escaped sequences like the “
\n” newline formatting code. This injects a new line, causing
wc to see the input as two lines.
Here’s a cascade of commands feeding their input from one to the other.
find ./* -type f | rev | cut -d'.' -f1 | rev | sort | uniq
- find looks for files (
type -f) recursively, starting in the current directory.
revreverses the filenames.
- cut extracts the first field (
-f1) by defining the field delimiter to be a period “
.” and reading from the “front” of the reversed filename up to the first period it finds. We’ve now extracted the file extension.
- rev reverses the extracted first field.
- sort sorts them in ascending alphabetical order.
- uniq lists unique entries to the terminal window.
This command lists all of the unique file extensions in the current directory and any subdirectories.
If we added the
-c (count) option to the
uniq command it would count the occurrences of each extension type. But if we want to know how many different, unique file extensions there are, we can drop
wc as the last command on the line, and use the
-l (lines) option.
find ./* -type f | rev | cut -d'.' -f1 | rev | sort | uniq | wc -l
RELATED: How to Use the Linux cut Command
Here’s one last trick
wc can do for you. It’ll tell you the length of the longest line in a file. Sadly, it doesn’t tell you which line it is. It just gives you the length.
wc -L taf.c
Beware though, that tabs are counted as eight spaces. Viewed in my editor, there are three two-space tabs at the start of that line. Its real length is 124 characters. So the figure reported is artificially expanded.
I’d treat this function with a big pinch of salt. And by that I mean don’t use it. Its output is misleading.
Despite its quirks,
wc is a great tool to drop into piped commands when you need to count all sorts of values, not just the words in a file.
RELATED: 37 Important Linux Commands You Should Know