How to Traverse a Directory Tree on Linux

Estimated read time 10 min read

[ad_1]

Linux laptop showing a bash prompt
fatmawati achmad zaenuri/Shutterstock.com

Directories on Linux let you group files in distinct, separate collections. The downside is it becomes tedious moving from directory to directory to perform a repetitive task. Here’s how to automate that.

All About Directories

The first command you learn when you’re introduced to Linux is probably ls, but cd won’t be far behind it. Understanding directories and how to move around them, particularly nested subdirectories, is a fundamental part of understanding how Linux organizes itself, and how you can organize your own work into files, directories, and subdirectories.

Grasping the concept of a tree of directories—and how to move between them—is one of the many little milestones you pass as you familiarize yourself with the landscape of Linux. Using cd with a path takes you to that directory. Shortcuts like cd ~ or cd on its own take you back to your home directory, and cd .. moves you up one level in the directory tree. Simple.

However, there isn’t an equally simple means of running a command in all directories of a directory tree. There are different ways we can achieve that functionality, but there isn’t a standard Linux command dedicated to that purpose.

Some commands, such as ls, have command-line options that force them to operate recursively, meaning they start in one directory and methodically work through the entire directory tree below that directory. For ls, it’s the -R (recursive) option.

If you need to use a command that doesn’t support recursion, you have to provide the recursive functionality yourself. Here’s how to do that.

RELATED: 37 Important Linux Commands You Should Know

The tree Command

The tree command won’t help us with the task at hand, but it does make it easy to see the structure of a directory tree. It draws the tree in a terminal window so that we can get an instant overview of the directories and subdirectories that make up the directory tree, and their relative positions in the tree.

You’ll need to install tree .

On Ubuntu you need to type:

sudo apt install tree

Installing tree on Ubuntu

On Fedora, use:

sudo dnf install tree

Installing tree on Fedora

On Manjaro, the command is:

sudo pacman -Sy tree

Installing tree on Manjaro

Using tree with no parameters draws out the tree below the current directory.

tree

Running tree in the current directory

You can pass a path to tree on the command line.

tree work

Running tree on a specified directory

The -d (directories) option excludes files and only shows directories.

tree -d work

Running tree and only showing directories

This is the most convenient way to get a clear view of the structure of a directory tree. The directory tree shown here is the one used in the following examples. There are five text files and eight directories.

Don’t Parse the Output From ls to Traverse Directories

Your first thought might be, if ls can recursively traverse a directory tree, why not use ls to do just that and pipe the output into some other commands that parse the directories and perform some actions?

Parsing the output of ls is considered bad practice. Because of the ability in Linux to create file and directory names containing all sorts of strange characters, it becomes very difficult to create a generic, universally-correct parser.

You might never knowingly create a directory name as preposterous as this, but a mistake in a script or an application might.

A bizarre directory name

Parsing legitimate but poorly considered file and directory names is error-prone. There are other methods we can use that are safer and much more robust than relying on interpreting the output of ls.

Using the find Command

The find command has in-built recursive capabilities, and it also has the ability to run commands for us. This lets us build powerful one-liners. If it’s something you’re likely to want to use in the future, you can turn your one-liner into an alias or a shell function.

This command recursively loops through the directory tree, looking for directories. Each time it finds a directory it prints out the name of the directory and repeats the search inside that directory. Having completed searching one directory, it exits that directory and resumes the search in its parent directory.

find work -type d -execdir echo "In:" {} \;

using the find command to recursively find directories

You can see by the order the directories are listed in, how the search progresses through the tree. By comparing the output from the tree command to the output from the find one-liner, you’ll see how find searches each directory and subdirectory in turn until it hits a directory with no subdirectories. It then goes back up a level and resumes the search at that level.

Here’s how the command is made up.

  • find: The find command.
  • work: The directory to start the search in. This can be a path.
  • -type d: We’re looking for directories.
  • -execdir: We’re going to execute a command in each directory we find.
  • echo “In:” {}: This is the command., We’re simply echoing the name of the directory to the terminal window. The “{}” holds the name of the current directory.
  • \;: This is a semicolon used to terminate the command. We need to escape it with the backslash so that Bash doesn’t interpret it directly.

With a slight change, we can make the find command return files that match a search clue. We need to include the -name option and a search clue. In this example, we’re looking for text files that match “*.txt”, and echoing their name to the terminal window.

find work -name "*.txt" -type f -execdir echo "Found:" {} \;

using the find command to recursively find files

Whether you search for files or directories depends on what you want to achieve. To run a command inside each directory, use -type d . To run a command on each matching file, use -type f.

This command counts the lines in all text files in the starting directory and subdirectories.

find work -name "*.txt" -type f -execdir wc -l {} \;

Using find with the wc command

RELATED: How to Use the find Command in Linux

Traversing Directory Trees With a Script

If you need to traverse directories inside a script you could use the find command inside your script. If you need to—or just want to—do the recursive searches yourself, you can do that too.

#!/bin/bash

shopt -s dotglob nullglob

function recursive {

  local current_dir dir_or_file

  for current_dir in $1; do

    echo "Directory command for:" $current_dir

    for dir_or_file in "$current_dir"/*; do

      if [[ -d $dir_or_file ]]; then
        recursive "$dir_or_file"
      else
        wc $dir_or_file
      fi
    done
  done
}

recursive "$1"

Copy the text into an editor and save it as “recurse.sh”, then use the chmod command to make it executable.

chmod +x recurse.sh

Making the recurse.sh script executable

The script sets two shell options, dotglob and nullglob.

The dotglob setting means file and directory names that start with a period “.” will be returned when wildcard search terms are expanded. This effectively means we’re including hidden files and directories in our search results.

The nullglob setting means search patterns that don’t find any results are treated as an empty or null string. They don’t default to the search term itself. In other words, if we’re searching for everything in a directory by using the asterisk wildcard “*“, but there are no results we’ll receive a null string instead of a string containing an asterisk. This prevents the script from inadvertently trying to open a directory called “*”, or treating “*” as a file name.

Next, it defines a function called recursive. This is where the interesting stuff happens.

Two variables are declared, called current_dir and dir_or_file . These are local variables, and can only be referenced within the function.

A variable called $1 is also used within the function. This is the first (and only) parameter passed to the function when it is called.

The script uses two for loops, one nested inside the other. The first (outer) for loop is used for two things.

One is to run whatever command you want to have performed in each directory. All we’re doing here is echoing the name of the directory to the terminal window. You could of course use any command or sequence of commands, or call another script function.

The second thing the outer for loop does is to check all file system objects it can find—which will be either files or directories. This is the purpose of the inner for loop. In turn, each file or directory name is passed into the dir_or_file variable.

The dir_or_file variable is then tested in an if statement to see if it is a directory.

  • If it is, the function calls itself and passes the name of the directory as a parameter.
  • If the dir_or_file variable is not a directory, then it must be a file. Any commands that you wish to have applied to the file can be called from the else clause of the if statement. You could also call another function within the same script.

The final line in the script calls the recursive function and passes in the first command line parameter $1 as the starting directory to search in. This is what kicks off the whole process.

Let’s run the script.

./recurse.sh work

Processing the directories from shallowest to deepest

The directories are traversed, and the point in the script where a command would be run in each directory is indicated by the “Directory command for:” lines. Files that are found have the wc command run on them to count lines, words, and characters.

The first directory processed is “work”, followed by each nested directory branch of the tree.

An interesting point to note is you can change the order that the directories are processed in, by moving the directory-specific commands from being above the inner for loop to being below it.

Let’s move the “Directory command for:” line to after the done of the inner for loop.

#!/bin/bash

shopt -s dotglob nullglob

function recursive {

  local current_dir dir_or_file

  for current_dir in $1; do

    for dir_or_file in "$current_dir"/*; do

      if [[ -d $dir_or_file ]]; then
        recursive "$dir_or_file"
      else
        wc $dir_or_file
      fi

    done

    echo "Directory command for:" $current_dir

  done
}

recursive "$1"

Now we’ll run the script once more.

./recurse.sh work

Processing the directories from deepest to shallowest

This time the directories have the commands applied to them from the deepest levels first, working back up the branches of the tree. The directory passed as the parameter to the script is processed last.

If it is important to have deeper directories processed first, this is how you can do it.

Recursion Is Weird

It’s like calling yourself on your own phone, and leaving a message for yourself to tell yourself when you next meet you—repeatedly.

It can take some effort before you grasp its benefits, but when you do you’ll see it is a programmatically elegant way to tackle hard problems.

RELATED: What Is Recursion in Programming, and How Do You Use It?



[ad_2]

Source link

You May Also Like

More From Author