UNIX Pages / Listing file sizes within multiple directories.

/unix/ Listing file sizes within multiple directories

As we get more data in this information age, we tend to store these files in multiple, hierarchal directories. This is a very good way to organize, lets say, Mac OS X installers. You place files in directores according to their names. i.e. Audion in the A-B directory and DiceBag in the C-D directory. For example, here is a tree listing of my X installers directory:


~/X - Installers/
|-- A-B
| `-- BLT0.2.0.dmg
|-- C-D
| |-- Dec2002DevToolsCD.dmg
| `-- DiceBag.dmg
|-- E-F
|-- G-H
|-- I-J
|-- K-L
|-- M-N
|-- O-P
|-- Q-R
|-- S-T
| `-- StoneStudio-2003-03-03.dmg
|-- U-V
|-- W-X
`-- Y-Z

Let's say, that after about 6 months you have collected over 700MB of X installers. You would like to back up these files onto CD-R, however, there is too much data for the CD to hold. What do you do now?

You say, "Let's not burn some files." But which files should we not burn? Removing a few of the biggest ones is the answer. This way, if you lose your files you only have a few big files to re download than many small ones. This minimalizes the chance that you forgot where you got the file because there are fewer files to download.

This is the exact problem that I ran into a few weeks ago. I wanted to fill the disk up without moving too many files out of the installers directory. I was over by 12MB. My goal was to remove only one file so that I would be able to burn the rest. I needed to find a 12MB file.

The Mac OS X Finder would be inefficient in this task. The Finder would only let me find the size of a file one by one. This would take too long. I needed a list so I could quickly find a near 12MB file to move, out of the directory. This is where UNIX comes in.

I knew I could find out file info from the ls command. Using the -l flag, I listed the file info for all the files in the subdirectories:


% ls -l
total 0
drwx------ 5 rroberts staff 170 Sep 6 09:51 A-B
drwx------ 5 rroberts staff 170 Sep 1 19:50 C-D
drwx------ 3 rroberts staff 102 Sep 1 19:50 E-F
drwx------ 2 rroberts staff 68 Sep 1 19:49 G-H
drwx------ 3 rroberts staff 102 Sep 1 19:49 I-J
drwx------ 2 rroberts staff 68 Sep 1 19:49 K-L
drwx------ 3 rroberts staff 102 Sep 1 19:49 M-N
drwx------ 2 rroberts staff 68 Sep 1 19:49 O-P
drwx------ 3 rroberts staff 102 Sep 1 19:49 Q-R
drwx------ 4 rroberts staff 136 Sep 5 17:54 S-T
drwx------ 2 rroberts staff 68 Sep 1 19:49 U-V
drwx------ 2 rroberts staff 68 Sep 1 19:49 W-X
drwx------ 2 rroberts staff 68 Sep 1 19:49 Y-Z

There was a problem, instantly. This didn't give me the information I needed. This gave me the sizes of the directories. I needed the individual file sizes. So, I tried the next command:


% ls -l *
A-B:
total 2544
-rw-r--r-- 1 rroberts staff 188850 Dec 1 2002 BLT0.2.0.dmg

C-D:
total 645712
-rwxrwxrwx 1 rroberts staff 14680064 Aug 27 09:12 DiceBag.dmg
E-F:
G-H:
I-J:
K-L:
M-N:
O-P:
Q-R:
S-T:
total 50232
-rwxrwxrwx 1 rroberts staff 25718526 Mar 3 2003 StoneStudio-2003-03-03.dmg
U-V:
W-X:
Y-Z:

This still gave me too much information. It searched recursively and returned everything including empty directories. This info I didn't need. So I tried a modified version ls -ld */* . This told the program to list all the files in all direcoties one level below the current one. Also, if it found another directory in the top level directories, it would only return the total size for that directory, not the files within that directory.


% ls -ld */*
-rw-r--r-- 1 rroberts staff 188850 Dec 1 2002 A-B/BLT0.2.0.dmg
-rw-r--r-- 1 rroberts staff 315922255 Dec 10 2002 C-D/Dec2002DevToolsCD.dmg
-rwxrwxrwx 1 rroberts staff 14680064 Aug 27 09:12 C-D/DiceBag.dmg
-rwxrwxrwx 1 rroberts staff 25718526 Mar 3 2003 S-T/StoneStudio-2003-03-03.dmg

That was what I wanted. I got the sizes and the names of the files. However, it's hard to read with the extra information. Next, I filtered the data through a stream editor awk .

Awk allowed me to print different fields of data, seperated by spaces. Looking at the previous data, the file sizes and the file names were fields #5 and #9 respectively. I ran the same command, with the data piped into awk, to print out only those fields to standard output.


% ls -ld */* | awk '{ print $5, $9 }'
188850 A-B/BLT0.2.0.dmg
315922255 C-D/Dec2002DevToolsCD.dmg
14680064 C-D/DiceBag.dmg
25718526 S-T/StoneStudio-2003-03-03.dmg

Perfect. The files are listed alphabetically, as the ls -ld */* command handled the directories alphabetically. Awk limited the data printed out to the file sizes (in kb) and the file names. Now, to sort it.


% ls -ld */* | awk '{ print $5, $9 }' | sort -nr
315922255 C-D/Dec2002DevToolsCD.dmg
25718526 S-T/StoneStudio-2003-03-03.dmg
14680064 C-D/DiceBag.dmg
188850 A-B/BLT0.2.0.dmg

Done! I found the file I wanted and I only needed to move one file. The burn went successfully and I backed up for a change. ;-)

Another way to do this is to use the du -sk */* command. Works the same way, gives out the same output. Just another way to do the same job. UNIX wouldn't be UNIX if you couldn't do the same job with different tools.

With careful planning you can do a lot of work in a few seconds. Instead of searching for forever, in the Finder, I found the file I wanted within a few minutes of tweaking my 'mini-program'. Yep, I said a 'mini-program'. You can think of UNIX commands as mini tools that you combine to make programs that do work you need to be done. This way, UNIX fullfills one of it's goals: to let the average user become interested in programming.



--Ron Roberts

Make Backups Religiously

/unix/ Listing file sizes within multiple directories