find

find

Another very powerful command is ‘find’. For me the most common application of it is to use it to delete a set of files in subfolders. The syntax for it is a one liner

find . -name "patternToSearch" -exec rm -rf {} \;

 

Deleting files smaller x kb

From time to time, we may find ourselves needing to delete a subset of files that fall below a certain file size. For instance, in my personal experience, I often work with a large collection of sequencing files. Occasionally, some of the libraries fail, and I need to remove them.

However, a more robust approach would be to first identify the failed libraries and then proceed to delete all files associated with them. To ensure that I don’t accidentally delete any files, I use the ‘find’ command to verify that I’ve captured all the associated libraries before proceeding with the deletion. This method provides a safety net, ensuring that only the intended files are removed.

find . -type f -name "*.gz" -size -1600k | sort | cut -d'_' -f1-4 | uniq -c

In this process, I first identify the files that are slated for deletion, specifically those that are less than 1600 kB in size. I then proceed to sort these files, extract the sample identifier, and create a unique set. This set is then summarized, providing a count of the files.

For instance, if I’m working with paired-end sequencing across four lanes, I would expect to find 8 files per sample. This method ensures that all relevant files are accounted for and ready for deletion, thereby maintaining the integrity of the remaining data.

Then, when all files were found, I can delete them with the following command

find . -type f -name "*.gz" -size -1600k -delete

Leave a Reply