All Entries Tagged With: "files"
Fixing the Filenames
From: COMMAND LINE KUNG FU: PaulDotCom, Ed Skoudis, Hal Pomeranz, byte_bucket
Hal Helps Out
A friend of mine contacted me the other day with an interesting problem. She was trying to recover some files from the backup of an old BBS. In particular, she was trying to get at the attachments for various postings.
The attachment files were in a big directory, but the file names unhelpfully used an internal attachment ID number from the BBS. So we had file names like “attachment.43567″. Now my friend also had a text file she extracted from the BBS that mapped attachment IDs to the real file names:
43567 sekrit plans.doc
44211 pizza-costs.xls
...
So the task was to take the file of “attachment ID to file name mappings” and use that to rename the files in the attachments directory to their correct file names.
I thought about it for a minute, and realized the solution was actually pretty straightforward:
$ while read id file; do mv attachment.$id "$file"; done <id-to-filename.txt
The trickiest part of the exercise was dealing with the file names that had spaces in them, like “sekrit plans.doc”. Luckily the format of the input file was “ID filename”, which meant that I could treat everything after the first whitespace as the file name. And this is exactly what the builtin “read” command will do for you: in this case it puts the first whitespace delimited token into the $id variable and then jams whatever is left over into the last “$file” variable. Once I got the right information into $file, it was simply a matter of making sure to quote this variable appropriately in the “mv” command inside the loop.
So there you go– a quick one-liner for me, but a real time-saver for my friend. And possibly a real time-sink for Tim and Ed as they try and figure out how to do this in their shells. Let’s see, shall we?
Ed Frustrates Hal
Sorry, Hal, but this one just isn’t crushing for me. I know that disappoints you, but sometimes (on fairly rare occasions) we don’t have to work too hard to coax little cmd.exe to do what we want. It does take two little tricks, though, but nothing too freakish.
Here’s the fu:
C:\> for /f "tokens=1,*" %i in (id_to_filename.txt) do @copy attachment.%i "%j"
I’m using a FOR /F loop to read the contents of id_to_filename.txt, one line at a time. Default delimiters of FOR /F parsing are spaces and tabs, which will work just fine for us here, so there’s no need to mess with custom delims. I’ve specified custom parsing of “tokens=1,*”, which will make it assign the first column of the file (the integer in Hal’s example) to my first iterator variable (which is %i). Then, the ,* stuff means to assign all of the rest of the line to my second iterator variable, which will be auto-allocated as %j. The ,* stuff is the first trick, which really comes in handy.
Then, in the body of my loop, I turn off display of commands (@) and invoke the copy command to take the contents of attachment.%i and place it into “%j”. The second trick, those quotes around %j, are important in allowing us to handle any spaces in the file name. Note that I’m using copy instead of move here, because I don’t wanna play Ed-Zilla stomping over the city just in case something goes awry (who’s to say that our id_to_filename.txt file will always look like we expect it to?). I guess you could call it the Hipposhellic oath: First do no harm. After we verify that our copy worked like we wanted with a quick dir command, we can always run “del attachment.*”
Whatcha got, Tim?
Tim frustrates most people
Sorry Hal, this isn’t too bad in PowerShell either. There are a few ways we can accomplish this task, but I elected to pick the shortest version, which also happens to be the one that brings up something we haven’t covered before. Here are the long version and short version of the fu. (The short version is identical but uses built in aliases)
PS C:\> Get-Content id-to-filename.txt | ForEach-Object { $id,$file =
$_.Split(" ",2); Rename-Item -Path attachment.$id -NewName $file }
PS C:\> gc id-to-filename.txt | % { $id,$file = $_.Split(" ",2); ren
attachment.$id $file }
The Get-Content cmdlet is used to read the contents of the file, and it is piped into Foreach-Object. Inside the Foreach-Object script block is where the line is split. The first parameter used in the Split method defines the delimiter and the second defines how many items it should be split into.
The only problem, the Split method’s output is multi-line. Here is an illustration:
PS C:\> gc id-to-filename.txt -TotalCount 1 | % { $_.Split(" ",2); }
43567
sekrit plans.doc
We need both portions of the split to do the rename, so here is where we bring up a new little trick. We can assign the output of split into variables. Each line is assigned to a variable, the first variable ($id) is assigned the first line and the second variable ($file) receives the remainder. After we have the Id and the Filename we can easily rename the files.
If we wanted to be a little safer then we could use Copy-Item (alias cp or cpi) instead of Rename-Item (alias ren or rni). Once we confirmed the copy was successful we can delete all the attachment files by using “Remove-Item attachment.*” (alias del, erase, ri, or rm).
Joining Up
From: COMMAND LINE KUNG FU: PaulDotCom, Ed Skoudis, Hal Pomeranz, byte_bucket
Hal fields a question from IRC
Mr. Bucket passed along the following query from the PaulDotCom IRC channel:
What functionality is available to loop through multiple files, and write the output to a single file with some values on the same line? Ex: If one program gives me the hash of a file, and the other program outputs the name/size/etc of a file, can I output to the same file HASH-FileName-Size
I couldn’t resist chortling with glee when this question came up, because it’s another one of those “easy for Unix, hard for Windows” kinds of tasks. I never can resist sharing these “learning experiences” with my fellow co-authors.
First let’s review our inputs. I’m going to use the openssl utility for generating checksums, since it’s fairly generic to lots of different flavors of Unix at this point:
$ openssl sha1 *
SHA1(001.jpg)= a088531884ee5eb520e98b3e9e18283f29e13d25
SHA1(002.jpg)= 77febb1498b2926ee6a988c97f3457e38736456d
SHA1(003.jpg)= 922bcb001d025d747c2ee56328811a4270b62079
...
As you can see, it’s pretty easy to generate a set of checksums over my directory of image files, but there’s a bunch of cruft around the filename that’s not really helpful. So let me get rid of that with some quick sed action:
$ openssl sha1 * | sed -r 's/SHA1\((.*)\)= (.*)/\1 \2/'
001.jpg a088531884ee5eb520e98b3e9e18283f29e13d25
002.jpg 77febb1498b2926ee6a988c97f3457e38736456d
003.jpg 922bcb001d025d747c2ee56328811a4270b62079
...
That’s better! In the sed expression I’m using the “(.*)” sub-expressions to match the file name and the checksum in each line, and the substitution operator is replacing the original line with just the values of the sub-expressions. Slick.
Now that we’ve got the checksums, how do we produce the file sizes? I could just use “ls -l” of course. But since the questioner seems to only want “HASH-FileName-Size”, I may as well just use “wc -c” to produce simpler output:
$ wc -c *
4227504 001.jpg
4600982 002.jpg
4271719 003.jpg
...
Now that I know what my inputs are going to be, the question is how to stitch them together? Luckily, Unix includes the join command for putting files together on arbitrary fields (we last saw the join command back in Episode #43). Now I could save the checksum output and the file sizes to separate files and then join the contents of the two files, but bash actually gives us a cooler way to handle this:
$ join -1 1 -2 2 <(openssl sha1 * | sed -r 's/SHA1\((.*)\)= (.*)/\1 \2/') <(wc -c *)
001.jpg a088531884ee5eb520e98b3e9e18283f29e13d25 4227504
002.jpg 77febb1498b2926ee6a988c97f3457e38736456d 4600982
003.jpg 922bcb001d025d747c2ee56328811a4270b62079 4271719
...
See the “<(…)” syntax? That’s a little bit of bash file descriptor magic that allows us to substitute the output of a command in a place where a program would normally be looking for a file name. In this case it saves us the hassle of having to create intermediate output files to join together. The join command itself is pretty simple. We’re telling the program to join the output of the two commands using the file names in the first field of input #1 and the second field of input #2. The only problem is that the join command isn’t producing the “HASH-FileName-Size” output that the original questioner wanted. That’s because join always outputs the joined field first, followed by the remaining fields from the first input (the checksum in this case), followed by the remaining fields from the second input (the file size). We’ll have to use a little awk fu to re-order the fields:
$ join -1 1 -2 2 <(openssl sha1 * | sed -r 's/SHA1\((.*)\)= (.*)/\1 \2/') <(wc -c *) \ | awk '{print $2 " " $1 " " $3}' a088531884ee5eb520e98b3e9e18283f29e13d25 001.jpg 4227504 77febb1498b2926ee6a988c97f3457e38736456d 002.jpg 4600982 922bcb001d025d747c2ee56328811a4270b62079 003.jpg 4271719
… Mmmm, that’s a tasty little bit of shell magic, isn’t it? Let’s see what Ed and Tim are cooking up.
Ed retorts snidely:
Choosing a topic just because you think it’s hard for us Windows guys, huh, Hal? Well, aren’t you just a big ball of sunshine, a command-line Scrooge this holiday season? When I first read this one, I though… “Ugh… this is gonna be hard.” Perhaps I was psyched out by your juvenile trash talk. Or, maybe I’ve just been hanging around in cmd.exe too long, and have gotten used to hard problems.
But, this one turned out to be surprisingly straight-forward and even non-ugly (well, beauty is in the eye of the beholder, I suppose). Here’s the skinny:
C:\> FOR /f "tokens=1-2" %a in (name-hash.txt) do @for /f "tokens=1,2" %m
in (length-name.txt) do @if %a==%n echo %b %a %m
a088531884ee5eb520e98b3e9e18283f29e13d25 001.jpg 4227504
77febb1498b2926ee6a988c97f3457e38736456d 002.jpg 4600982
922bcb001d025d747c2ee56328811a4270b62079 003.jpg 4271719
I’m assuming that name-hash.txt contains, well, names and hashes, one pair per line. Likewise, length-name.txt contains lengths and names, again one pair per line.
As we know, FOR /F loops can parse through all kinds of crap, including the contents of files. I use a FOR /F loop with two tokens (giving me two variables) of %a (for the file name) and %b (allocated automagically, holding the hash). For each of the files described in name-hash.txt, I then construct the body of my FOR loop. It contains another FOR /F loop, again with two variables (the original question mentioned “etc” for extra stuff there… if you have more stuff, just up the number of tokens and echo the proper variables at the end). My inner FOR /F loop iterates through the length-name.txt file, placing its values in the variables %m (length) and %n (name).
Now, if I just echoed out %a %b %m %n, I’d be making all of the possible combinations of every pair of two lines in the original files. But, we want to pare that down. We only want to generate some output if the name from name-hash.txt (%a) matches the name from length-name.txt (%n). We do this with a little IF operation comparing the two variables. If they match, we then echo out hash (%b), name (%n), and size (%m).
Admittedly, the performance of this little command isn’t great, as I have to run through every line of name-hash.txt, comparing the name by running through the entirety of length-name.txt. I don’t stop when I’ve found a match, because, well, there could be another match somewhere. Also, if there is no match of the name between the two files, my command ignores that name, not issuing any output. But, I think that makes sense given what the questioner asks.
So, Tim… does PowerShell have a nifty little built-in or something to make this easier than running through a couple of FOR loops? Inquiring minds what to know.
Tim tags in for Ed:
For loops! We don’t need no stinking For loops!
The first thing to do is import the files. Since there is a space between the columes we can use Import-CSV with a delimiter of the space character. Also, there is no header information so we have to specify it.
PS C:\> Import-Csv length.txt,hash.txt -Delimiter " " -Header File,Data
File Data
---- ----
001.jpg 4227504
002.jpg 4600982
003.jpg 4271719
001.jpg a088531884ee5eb520e98b3e9e18283f29e13d25
002.jpg 77febb1498b2926ee6a988c97f3457e38736456d
003.jpg 922bcb001d025d747c2ee56328811a4270b62079
...
We have all the data, so now it can be grouped by the file name using Group-Object (alias group).
PS C:\> Import-Csv length.txt,hash.txt -Delimiter " " -Header File,Data | group file
Count Name Group
----- ---- -----
2 001.jpg {@{File=001.jpg; Data=4227504}, @{File=001.jpg; Data=a088531884ee5eb520e98b3e9e18283f29e13d25}}
2 002.jpg {@{File=002.jpg; Data=4600982}, @{File=002.jpg; Data=77febb1498b2926ee6a988c97f3457e38736456d}}
2 003.jpg {@{File=003.jpg; Data=4271719}, @{File=003.jpg; Data=922bcb001d025d747c2ee56328811a4270b62079}}
...
We have the data grouped like we want, but we still need to massage it a bit so we can get the formate we want.
PS C:\> Import-Csv length.txt,hash.txt -Delimiter " " -Header File,Data |
group file | Select @{Name="Hash";Expression={$_.Group[1].Data}}, Name,
@{Name="Length";Expression={$_.Group[0].Data}}
Hash Name Length
---- ---- ------
a088531884ee5eb520e98b3e9e18283f29e13d25 001.jpg 4227504
77febb1498b2926ee6a988c97f3457e38736456d 002.jpg 4600982
922bcb001d025d747c2ee56328811a4270b62079 003.jpg 4271719
...
The Select-Object (alias select) cmdlet allows for custom expressions which was used to get the hash and the length. The “Group” object contains multiple items and each can be access by its index value, 0 is the length and 1 is the hash.
Fileless PowerShell
The initial task was to get the file name, length, and hash from separate files and combine them in to one. Let’s try this again without using files.
This would be very easy if powershell just had a hashing cmdlet, but it doesn’t. However, we can do hashing by using the .NET library and some very ugly PowerShell. Maybe in v3 we will get a Get-Hash cmdlet, but it seems as likely as the addition of Get-Unicorn or Get-MillionDollars.
So we need some hash, but not the kind that is illegal in 49 states, we need the hash of a file. Here is how we get it.
PS C:\> PS C:\> gci 001.jpg | % { (New-Object System.Security.Cryptography
.SHA1CryptoServiceProvider).ComputeHash($_.OpenRead()) }
We use the SHA1CryptoServiceProvider .NET class, but it adds another bump since it doesn’t take files as input and will only take a stream. It isn’t hard to get the stream though, all we need to use is the OpenRead method of our file object. If that wasn’t enough, there is another problem, the output.
PS C:\> PS C:\> gci 001.jpg | % { (New-Object System.Security.Cryptography
.SHA1CryptoServiceProvider).ComputeHash($_.OpenRead()) }
160
136
83
24
...
The result is an array of bytes. So we have to convert that to hex and combine it together.
PS C:\> gci 001.jpg | % {$hash=""; (New-Object System.Security.Cryptography
.SHA1CryptoServiceProvider).ComputeHash($_.OpenRead()) | % { $hash += $_.ToString("X2") }; $hash}
a088531884ee5eb520e98b3e9e18283f29e13d25
We use the ToString method with the format string X2 to convert each byte to hex. The X converts it to hex, and the 2 will make sure the output is two characters wide (0A vs A). We then use the variable $hash to stitch our bytes together to get the full hash.
Now let’s see the full command.
PS C:\> gci *.* | select @{Name="Hash";Expression={$hash=""; (New-Object
System.Security.Cryptography.SHA1CryptoServiceProvider).ComputeHash($_.OpenRead()) |
% { $hash += $_.ToString("X2") }; $hash}}, name, length
Hash Name Length
---- ---- ------
a088531884ee5eb520e98b3e9e18283f29e13d25 001.jpg 4227504
77febb1498b2926ee6a988c97f3457e38736456d 002.jpg 4600982
922bcb001d025d747c2ee56328811a4270b62079 003.jpg 4271719
...
The first thing we do is get all the files in the currect directory using Get-ChildItem (aliased as gci or dir). That is piped in to Select-Object (aliased as select) to get the hash, filename, and size. The Select-Object cmdlet allows us to get properties of the pipeline object as well as creating a custom expression. In our case we will use the custom expression to calculate the hash.
Our results are in object form and can be piped to a file with Out-File or Out-Csv.
So the task is complete, but let’s pretend for a second we had the fictional Get-Hash cmdlet. If we had our leprachaun our command might look something like this:
PS C:\> gci *.* | select @{Name="Hash";Expression={Get-Hash $_ sha1}, name, length
If only getting hash was easier in Windows.
Embedding and Hiding Files in PDF Documents
From Didier Stevens: http://hacksec.blisque.com/
My corrupted PDF quip inspired me to program another steganography trick: embed a file in a PDF document and corrupt the reference, thereby effectively making the embedded file invisible to the PDF reader.
The PDF specification provides ways to embed files in PDF documents. I’m releasing my Python program to create a PDF file with embedded file (I used make-pdf-embedded.py to create my EICAR.pdf).
Here’s how a PDF document with an embedded file looks like:

/EmbeddedFiles points to the dictionary with the embedded files:

As names defined in the PDF specification are case sensitive, changing the case changes the semantics: /Embeddedfiles has no meaning, and thus the PDF reader ignores it and doesn’t find the embedded file.


Actually, I used this trick in my Brucon puzzle. I used the –stego option of make-pdf-embedded.py:

Of course, once you know the stego trick, it’s easy to recover the embedded file: edit the PDF document with an hex editor and change the case back to /EmbeddedFiles.
But if you want to make it harder to detect, use PDF obfuscation techniques. Or embed the file twice with incremental updates. First version is the file you want to hide, second version is a decoy…
The PDF language offers so many features to hide and obfuscate data!
Download:
make-pdf_V0_1_2.zip (86)Hide RAR files in a PNG image
This is an interesting one I found on Shell Fu this morning. It’s similar to my previous post of hiding a .exe in a text file. They both have there uses.
It is possible to hide a rar archive inside a png image file and then retrieve the files from the image.
cat picture.png archive.rar > hidden_archive_in_pic.png
This can also be done on Windows:
copy picture.png + archive.rar hidden_archive_in_pic.png
When you want to retrieve the hidden files, download the image, rename to .rar and extract.
Counting Matching Lines in Files
From: COMMAND LINE KUNG FU: PaulDotCom, Ed Skoudis, Hal Pomeranz, byte_bucket
Hal’s back at it again:
I had another one of those counting problems come up recently, similar to our earlier Browser Count Torture Test challenge. This time my customer needed me to count the number of instances of a particular string in each of several dozen files in a directory. In my case I was looking for particular types of test cases in a software regression test suite, but this is also useful for looking for things like IP addresses in log files, vulnerabilities in assessment tool reports, etc.
For a single file, it would be easy enough to just:
$ grep TEST file1 | wc -l
11
But we want to operate over a large number of files, which means we somehow need to associate the name of the file with the output of “wc -l”.
So I created a loop that does the main part of the work, and then piped the output of the loop into awk for some pretty-printing:
$ for f in *; do echo -n "$f "; grep TEST $f | wc -l; done | \
awk '{t = t + $2; print $2 "\t" $1} END {print t "\tTOTAL"}'
11 file1
8 file2
14 file3
31 file4
12 file5
7 file6
3 file7
25 file8
19 file9
22 file10
19 file11
22 file12
10 file13
203 TOTAL
Inside the loop we’re first spitting out the filename and a couple of spaces, but no newline. This means that the output of our “grep … | wc -l” will appear on the same line, immediately following the filename and the spaces.
The only problem I had with the basic loop output was that the file names had very irregular lengths (unlike the sample output above) and it was difficult to read the “wc -l” data because it wasn’t lined up neatly in a column. So I decided to do some post-processing with awk. The main part of the awk code keeps a running total of the values we’ve read in so far (you saw me using this idiom previously in Browser Count Torture Test). But you’ll also notice that it reverses the order of the two columns and also inserts a tab to make things line up nicely (‘print $2 “\t” $1′). In the “END” block we output the “TOTAL” once the entire output from the loop has been processed.
I love the fact that the shell lets me pipe the output of a loop into anther tool like awk for further processing. This lets me grind up a bunch of data from many different sources into a single stream and then operate on this stream. It’s an idiom I use a lot.
Paul Chimes In:
Thats some pretty sweet command kung fu! When I first read this I immediately put it to good use, with some modifications of course. I frequently find myself needing to search through 28,000+ files and look for certain strings. My modifications are as follows:
$ for f in *; do echo -n "$f "; grep -i xss $f | wc -l; done | awk '{t = t + $2; print $2 "\t" $1} END {print t "\tTOTAL"}' | egrep -v '^0' | sort -n
I really didn’t care about files that did not contain at least one occurance of my search string so I sent it to egrep with “-v” which shows me only results which do NOT contain the search term. My regular expression “^0″ reads as, “only show me lines that begin with 0″, which when combines with the “-v” removes all lines that begin with 0. Now, I could have used a filter with awk, but the syntax was not cooperating (i.e. awk /[regex]/ {[code]}). Then I wanted to see a sorted list so I ran it through "sort -n".
Ed retorts:
Gee, 28,000 files, Paul? Where did ya get that number? Sounds suspiciously like... I dunno... Nessus plug-ins. But, I digress.
OK, Sports Fans... Hang on to your hats, because I'm gonna match Hal's functionality here in cmd.exe, and it's gonna get ugly. Real ugly. But, when we're done, our command will do what Hal wants. And, in the process, it'll take us on an adventure through some interesting and useful features of good ol' cmd.exe, tying together a lot of fu that we've used in piece-parts in previous episodes. It's gonna all come together here and now. Let's dive in!
We start out simple enough:
C:\> find /c "TEST" * 2>nul | find /v ": 0"
---------- FILE1: 11
---------- FILE2: 8
---------- FILE3: 14
Here, I've used the /c option of the find command to count the number of lines inside of each file in my current directory that have the string "TEST". I throw away error messages (2>nul) to avoid cruft about directories in my output. I do a little more post processing by piping my output into find again, to search for lines that do not have (/v) the string ": 0" in them, because we don't want to display files that have our string in them zero times.
That's pretty close to what we want right there. So, we could call it a day and just walk away.
But, no.... we're kinda nuts around here, if you haven't noticed. We must press on to get closer to Hal's insanity.
The --------- stuff that find /c puts in our output is kinda ugly. Let's get rid of that with a little parsing courtesy of FOR /F:
C:\> for /f "delims=-" %i in ('"find /c "TEST" * 2>nul | find /v ": 0""') do @echo %i
FILE1: 11
FILE2: 8
FILE3: 14
Here, I'm using a FOR /F loop to parse the output of my previous command. I'm defining custom-parsing with a delimiter of "-" to get rid of those characters in my output.
Again, we could stop here, and be happy with ourselves. We've got most of what Hal wants, and our output is kinda pretty. Heck, our command is almost typable.
But we must press on. Hal's got totals, and we want them too. We could do this in a script, but that's kinda against our way here, as we strive to do all of our kung fu fighting in single commands. We'll need to add a little totaller routine to our above command, and that's where things are going to get a little messy.
The plan will be to run the component we have above, followed by another command that counts the total number of lines that have TEST in them and displays that total on the screen. We'll have to create a variable called total that we'll track at each iteration through our new counting loop. The result is:
C:\> (for /f "delims=-" %i in ('"find /c "TEST" * 2>nul | find /v ": 0""') do
@echo %i) & set total=0 & (for /f "tokens=3" %a in ('"find /c "TEST" * 2>nul"')
do @set /a total+=%a > nul) & echo. & cmd.exe /v:on /c echo TOTAL: !total!
FILE1: 11
FILE2: 8
FILE3: 14
TOTAL: 33
Although what I'm doing here is probably obvious to everyone except Hal and Paul (yeah, right!), please bear with me for a little explanation. You know, just for Hal and Paul.
I've taken my original command from above and surrounded it in parens (), so that it doesn't interfere with the new totaller component I'm adding. My totaller starts by setting an environment variable called total to zero (set total=0). I then add another component in parens (). These parens are very important, lest the shell get confused and blend my commands together, which would kinda stink as my FOR loops would bleed into each other and havoc would ensue.
Next, I want to get access to the line count output of my find /c command to assign it to a variable I can add to my total. In cmd.exe, if you want to take the output of a command and assign its value to a variable, you can use a FOR /F loop to iterate on the output of the command. I do that here by running FOR /F to iterate over "find /c "TEST" * 2>nul". To tell FOR /F that my command is really a command, I have to wrap it in single quotes (' '). But, because my command has special characters in it (the > in particular), I have to wrap the command in double quotes too (" "). The result is wrapped in single and double quotes (' " " '), a technique I use a lot such as in Episodes #34 and #45. My FOR /F loop is set to tokenize around the third element of output of this command, which will be the line count I'm looking for (default FOR /F parsing occurs on spaces as delimiters, and the output of ----- [filename]: [count] has the count as the third item).
Thus, %a now holds my interim line count of the occurrences of TEST for a given file. I then bump my total variable by that amount (set /a total+=%a) using the set /a command we discussed in "My Shell Does Math", Episode #25. I don't want to display the results of this addition on the output yet, so I throw them away (> nul). When my adding loop is done (note that all important close parens), I then echo a blank line (echo.).
Now for the ugly part. I want to display the value of my total variable. But, as we've discussed in previous episodes, cmd.exe does immediate variable expansion. When you run a command, your environment variables are expanded to their values right away. Thus, if I were to simply use "echo %total%" at the end here, it would display the total value that existed when I started the command, if such a value was even defined. But, we want to see the total value after our loop finishes running. For this, we need to activate delayed environment variable expansion, a trick I used in Episode #12 in a slightly different way.
So, with my total variable set by my loop, followed by an extra carriage return from echo. to make things look pretty, I then invoke another cmd.exe with /v:on, which enables delayed variable expansion. I ask that cmd.exe to run a command for me (/c), which is simply displaying the word TOTAL followed by the value !total!. But, what's with the bangs? Normal variables are expanded using %var%, not !var!. Well, when you use delayed variable expansion, you get access to the variable's value using !var!. The bangs are an artifact of delayed variable expansion.
And, for the most part, we've matched Hal's functionality. Our command reverses the file name and counts from Hal's fu, although we could go the other way if we want with some additional logic. I prefer filename first myself, so that's what we'll go with here.
And, our descent into insanity is pretty much done for now. :)
Replacing Strings in Multiple Files
From: COMMAND LINE KUNG FU: PaulDotCom, Ed Skoudis, Hal Pomeranz, byte_bucket
Hal Starts Off:
Wow, our last several Episodes have been really long! So I thought I’d give everybody a break and just show you a cool little sed idiom that I use all the time:
# sed -i.bak 's/foo/bar/g' *
Here we’re telling sed to replace the all instances string “foo” with the string “bar” in all files in the current directory. The useful trick is the “-i.bak” option which causes sed to make an automatic backup copy of each file as .bak before doing the global search and replace.
By the way, you can even do this across an entire directory structure, with a little help from the find and xargs commands:
# find . -type f | xargs sed -i.bak 's/foo/bar/g'
Of course, you could use other search criteria than just “-type f” if you wanted to be more selective about which files you ran sed against.
Oh dear, I hope this isn’t one of those “easy for Unix, hard for Windows” things again. Ed gets so grumpy when I do that.
Ed jumps in:
You nailed it, Hal, with that characterization. Unfortunately, cmd.exe doesn’t include the ability to do find and replace of strings within lines of a file using a built-in command. We can search for strings using the find command, and even process regex with findstr. But, the replacement part just doesn’t exist there.
Thus, most reasonable people will either rely on a separately installed tool to do this, or use Powershell.
For a separately installed tool, my first approach would be use Cygwin, the free Linux-like environment for Windows, and then just run the sed command Hal uses above. Nice, easy, and sensical.
Alternatively, you could download and install a tool called replace.exe.
Or, there’s another one called Find And Replace Text, which, as you might guess, is called FART for short.
To do this in Powershell efficiently, I asked Tim Medin, our go-to guy for Powershell, to comment.
Tim (our Powershell Go-To Guy) says:
This morning when Ed asked me to do a “quick” write up for Powershell, I thought to myself, “This won’t be too bad…” I was wrong.
By default there are aliases for many of the command in Powershell, so I’ll show both the long and short version of the commands (yes, even the short command is long relative to sed).
The Long Way
PS C:\> Get-ChildItem -exclude *.bak | Where-Object {$_.Attributes -ne "Directory"} |
ForEach-Object { Copy-Item $_ "$($_).bak"; (Get-Content $_) -replace
"foo","bar" | Set-Content -path $_ }
The Short Way (using built in aliases)
PS C:\> gci -ex *.bak | ? {$_.Attributes -ne "Directory"} | % { cp $_ "$($_).bak";
(gc $_) -replace "foo","bar" | sc -path $_ }
This command is rather long, so let’s go through it piece by piece.
gci -ex *.bak | ? {$_.Attributes -ne "Directory"}
The first portion gets all files that don’t end in .bak. Without this exclusion, it will process file1.txt and the new file1.txt.bak. Processing file1.txt.bak results in file1.txt.bak.bak, but it doesn’t do this endlessly, just twice.
The Where-Object (with an alias of ?) ensures that we only work with files and not directories because Get-Content on a directory throws an error.
ForEach-Object { Copy-Item $_ "$($_).bak"; (Get-Content $_) -replace "foo","bar" |
Set-Content -path $_ }
Once we get the files, not directories, we want, we then act on each file with the ForEach-Object (alias %). For those of you haven’t yet fallen asleep, I’ll further break down the inner portion of the ForEach-Object:
Copy-Item $_ "$($_).bak"
First, we copy the file to our backup .bak file. We have to use the $() in order to use our variable in a string so we can append .bak.
Finally, we get to the search and replace (and it’s about time, too!).
(Get-Content $_) -replace "foo","bar" | Set-Content -path $_
Get-Content (gc) gets the contents of the file. We wrap it in parentheses so we can act on its output in order to do our replace. The output is then piped to Set-Content (sc) and written back to our file.
We could make this work a little better if we used variables, but then we are more in script-land instead of shell-land which probably violates the almighty laws of this blog. The use of variables turn this more into a scripting exercise instead of shell (OK, we may already be there). For kicks, I’ll show you how we can use variables show you so you can add it to your big bloated belt of windows-fu.
$a = (gci | ? {$_.Attributes -ne "Directory"}); $a | % { cp $_ "$($_).bak";
(gc $_) -replace "foo","bar" | sc -path $_ }
The difference between our original command and this command is that the $a variable grabs a snapshot of the directory before we copy files, so we won’t operate on the new .bak files.
After all this work we have done the same thing as the mighty sed. Sadly even the power of Powershell is no match for efficiency of sed.
Ed closes it out:
Thanks for that, Tim. Nice stuff!
Renaming Files With Regular Expressions
From: COMMAND LINE KUNG FU: PaulDotCom, Ed Skoudis, Hal Pomeranz, byte_bucket
Hal Says:
I admit it, I’m a fan of the CommandLineFu site (hey, it’s a community, not a competition), and like trolling through it occasionally for interesting ideas. This post by vgagliardi shows a cool trick for renaming a bunch of files using regular expressions and the substitution operator in bash. For example, suppose I wanted to convert spaces to underscores in all file names in the directory:
$ for f in *; do mv -- "$f" "${f// /_}"; done
I realize that syntax looks a little crazy. The general form is “${variable/pattern/substitution}”, but in this case we have an extra “/” at the front of “pattern”, which means “replace all instances” rather than only replacing the first instance.
By the way, you can use the standard Unix regular expression syntax for your substitution pattern. For example, here’s a loop to remove all characters from file names except for alphanumeric characters, dot, hypen, and underscore:
$ for f in *; do mv -- "$f" "${f//[^-_.A-Za-z0-9]/}"; done
In this case “[^...]” means match any characters not in the specified set, and we’re performing a null substitution.
Did you notice that we’re also using the “–” argument to the “mv” command, just in case one of our file names happens to start with a “-”? These files are typically a huge pain in Unix. What if we wanted to replace all the “-” characters at the beginning of a file name with underscores?
$ for f in *; do mv -- "$f" "${f/#-/_}"; done
As you can see, starting the pattern with “#” means “match at the front of the string. Or we can match at the end of the string with “%”:
$ find docroot -type f -name \*.htm | \
while read f; do mv -- "$f" "${f/%.htm/.html}"; done
Here we’re using “find” to locate all of the *.htm files in our web docroot and then piping the output into a while loop that renames all these files to be *.html files instead.
There are a couple of problems with the method that I’m using here: (1) if multiple files map to the same name, you’ll end up clobbering all but the last instance of that file, and (2) you get errors if your substitution doesn’t actually modify the file name because the “mv” command refuses to rename a file to itself. We can fix both of these problems with a little extra logic in the loop. Let’s return to our first example of converting spaces to underscores:
$ for f in *; do n="${f// /_}"; [ -f "$n" ] || mv -- "$f" "$n"; done
First we assign the new file name to the variable $n. Then we check to see if a file named “$n” exists– the “mv” command after the “||” is only executed if there is no “$n” file.
I admit that I usually use the Perl rename program for renaming large numbers of files, because (a) the syntax is much more terse, and (b) I love Perl. But this program isn’t always available on all the different flavors of Unix that I end up having to work on. So having this functionality built into the shell is a huge win.
Ed Responds:
When I quickly glanced at Hal’s challenge initially, I thought… “Yeah, that’s pretty easy… findstr supports regex, and I’ll use the ren command to rename the files… No prob.”
And then, I started to write the command, and it got horribly ugly really quickly. Hal squealed with delight when I told him how ugly it was… and believe me… you ain’t seen nothing until you’ve seen Hal squeal with delight.
Anyway, to keep this article from getting unreasonably long, I’m going to address Hal’s original command, which replaced the spaces in file names with underscores. Unfortunately, you see, the parsing, iteration, and recursion capabilities within a single command in cmd.exe are really limiting. For parsing strings, we’ve got FOR /F and a handful of substring operations I covered in Episode 12. For running a command multiple times, we’ve got for /L, as I mentioned in Episode 3. For recursion, well, that’s just plain bad news in a single command unless we bend our rules to create a bat file that calls itself.
To start to address Hal’s original challenge, we can use the following command to determine if there are any files that have at least one space in their names in our current directory:
C:\> dir /b "* *"
That’s pretty straightforward to start, with the /b making dir show only the bare form of output, omitting cruft about volume names and sizes. Note that it will only show files that do not have the hidden attribute set. If you want, you can invoke dir with /a to make it show files regardless of their attributes, hidden or otherwise. Now, let’s see what we can do with this building block.
Plan A: Every File Should Have Four Spaces in Its Name, Right?
My original plan was to wrap that command inside a FOR /F loop, iterating over each file using FOR /F functionality to parse it into its constituent elements. I was thinking something like this:
C:\> for /F "tokens=1-4" %i in ('dir /b "* *"') do ren "%i %j %k %l" "%i_%j_%k_%l"
Well, that’s all very nice, but we’ve got a problem… let me show an example of what this beast creates when I run it in a directory with a file named “file 1.txt” and “file 2 is here.txt”:
C:\> dir /b
file_1.txt__
file_2_is_here.txt
Ooops… the file1.txt name has two underscores after it. This option only works if files have exactly four spaces in their names. That’s no good.
Plan B: Let’s Just Change the First Space into an Underscore
Well, how about this… We could write a one-liner that assumes a file will have only one space in its name, and convert that one space into an underscore. That’s not too bad:
C:\> for /f "tokens=1,*" %i in ('dir /b "* *"') do ren "%i %j" "%i_%j"
I’m parsing the output of the dir /b command using parsing logic of “tokens=1,*”, which means use your default delimiters of space and break each line of the output of the dir command into the entity before the first space into %i, and everything afterward into the next iterator variable, %j.
Let’s run that with our same file names as before, yielding:
C:\> dir /b
file_1.txt
file_2 is here.txt
Well, we got it right for file_1.txt, because there is only one space. But, we only fixed the first space in file 2 is here.txt. Hmmmm… How could we move closer to our result?
Hit the up arrow a couple times to go back to our Plan B FOR /F loop, and hit enter again. Now, running our dir, we get:
C:\> dir /b
file_1.txt
file_2_is here.txt
Ahh… we nailed our second space. Hit the up arrow again and re-run… and… well, you get the picture. We can take care of one space at a time. Not so bad.
But, who wants to hit the up arrow again and again and again until we get rid of all the spaces? You’d have to re-run my Plan B command N times, where N is the maximum number of spaces inside a file name in the current directory.
Plan C: Make the Shell Re-Run the Command Instead of Doing it Manually
Well, instead of re-running a command a bunch of times, let’s make the shell do our work for us. We’ll just wrap the whole thing in a FOR /L loop to count through integers 1 through 10 (1,1,10) and invoke the FOR /F loop at each iteration through our FOR /L loop:
C:\> for /L %a in (1,1,10) do @for /f "tokens=1,*" %i in ('dir /b "* *"')
do ren "%i %j" "%i_%j"
That works, provided that none of the files have more than ten spaces in their name. Ummm… but what if they do? We could raise the number 10 to 20… but that’s kind of a cheap hack, no?
Plan D: Violate the Rules — Make a 3-Line Script
OK… if we had a while construct in cmd.exe, we could simply run my FOR /F loop of Plan B while the dir /b “* *” still returned valid output. But, we don’t have a while command in cmd.exe. If we want to check a condition like that, we only have IF statements. And, if we want to jump around based on the results of IF statements, we need to use GOTOs. And, if we want to use IFs and GOTOs, we can’t dump everything on a single one-line command, but will instead have to create a little bat file.
So, I’m going to have to bend our ground rules for this blog, which require a single command, and instead use a three-line bat file. Here’s a bat file I wrote that converts all of the names of files in the current directory with spaces in them into underscores:
:begin
for /F "tokens=1,*" %%i in ('dir /b "* *"') do ren "%%i %%j" "%%i_%%j"
if exist "* *" goto begin
There you have it…. kind of an ugly little hack, but it works. Note that I had to change my iterator variables in my FOR loop from %i and %j into %%i and %%j. You have to do that to convert command-lines into bat files in Windows. Also, I’m using an IF statement to test for the existence of “* *”, which would match any file with a space in its name.
A small script in cmd.exe can satisfy Hal’s original challenge. To start addressing his other feats to convert other characters in file names, we could specify options for the FOR /F loop of everything we want to parse out with the syntax “tokens=1,* delims=~!@#$%^&*()+=” and whatever else you wanna take out.
I could drone on and on endlessly here, but I think you get the idea. It ain’t pretty, but it is doable…. Now that should be the cmd.exe mantra.
PS: I too am a fan of the CommandLineFu site. It rocks.
Finding & Locating Files
From: COMMAND LINE KUNG FU: PaulDotCom, Ed Skoudis, Hal Pomeranz, byte_bucket
Paul Writes In:
I’m one of those messy desktop people. There I said it, I keep a messy desktop with tons of files all over the place (partly due to the fact that when you do <4> in OS X to take a screen grab it puts the file on the desktop). So, it should some as no suprise that I often need help finding files. I don’t know how many of you have actually run the find command in OS X (or even Linux), but it can be slow:
# time find / -name msfconsole
real 14m3.648s
user 0m17.783s
sys 2m29.870s
I actually stopped it at around 15 minutes because I couldn’t wait that long. There are many factors in the performance equation of the above command, such as the overall speed of the system, how busy the system is when you execute the command, and some even say that find is slower if its checking across different file system types (“/” would also include mounted USB drives). A quicker way to find files is to use the locate command:
$ locate msfconsole | grep -v .svn
/Users/fanboy/metasploit/framework-2.7/msfconsole
/Users/fanboy/metasploit/framework-3.1/msfconsole
/Users/fanboy/metasploit/framework-3.2-release/msfconsole
This command reads from a database (which is generated on a regular basis) that consists of a listing of files on the system. It’s MUCH faster:
$ time locate msfconsole
real 0m1.205s
user 0m0.298s
sys 0m0.050s
I’m wondering what Ed’s going to do on Windows, unless he’s come up with a way to get an animated ASCII search companion dog. :)
Hal Says:
One thing I will note about the locate command is that it’s going to do sub-expression matching, whereas “find … -name …” will do an exact match against the file name. To see the difference, check out the following two commands:
# find / -name vmware
/etc/vmware
/etc/init.d/vmware
/usr/lib/vmware
/usr/lib/vmware/bin/vmware
/usr/bin/vmware
/var/lib/vmware
/var/lock/subsys/vmware
/var/log/vmware
# locate vmware
/etc/vmware
/etc/init.d/vmware
/etc/pam.d/vmware-authd
/etc/rc2.d/K08vmware
/etc/rc2.d/S19vmware
[... 5000+ addtl lines of output not shown ...]
Also, as Paul notes above, the database used by the locate command is updated regularly via cron. The program that builds the database is updatedb, and you can run this by hand if you want to index and search your current file system image, not the image from last night.
I was curious whether doing a find from the root was faster than running updatedb followed by locate. Note that before running the timing tests below, I did a “find / -name vmware” to force everything into the file cache on my machine. Then I ran:
# time find / -name vmware >/dev/null
real 0m1.223s
user 0m0.512s
sys 0m0.684s
# time updatedb
real 0m0.263s
user 0m0.128s
sys 0m0.132s
# time locate vmware >/dev/null
real 0m0.314s
user 0m0.292s
sys 0m0.016s
It’s interesting to me that updatedb+locate is twice as fast as doing the find. I guess this shouldn’t really be that surprising, since find is going to end up calling stat(2) on every file whereas updatedb just has to collect file names.
Ed Kicks in Some Windows Stuff:
In Windows, the dir command is often used to search for files with a given name. There are a variety of ways to do this. One of the most obvious but less efficient ways to do this involves running dir recursively (/s) scraping through its results with the find or findstr command to look for what we want. I’ll use the findstr command here, because it gives us more extensibility if we want to match on regex:
C:\> dir /b /s c:\ | findstr /i vmware
There are a couple of things here that may not be intuitive. First off, what’s with the /b? This indicates that we want the bare form of output, which will omit the extra stuff dir adds to a directory listing, including the volume name, number of files in a directory, free bytes, etc. But, when used with the /s option to recurse subdirectories, /b takes on an additional meaning. It tells dir to show full paths to files, which is what we really want to see to know the file’s location. Try running the command without /b, and you’ll see that it doesn’t show what we want. The /b makes it show what we want: the full path to the file so we know its location. Oh, and the /i makes findstr case insensitive.
But, you know, dumping all of the directory and file names on standard out and then scraping through them with findstr is incredibly inefficient. There is a better way, more analogous to the “find / -name” feature Paul and Hal use above:
C:\> dir /b /s c:\*vmware*
This command seems to imply that it will simply look inside of the c:\ directory itself for vmware, doesn’t it? But, it will actually recurse that directory looking for matching names because of the /s. And, when it finds one, it will then display its full path because of the /b. I put *vmware* here to make this look for any file that has the string vmware in its name so that its functionality matches what we had earlier. If you omit the *’s, you’ll only see files and directories whose name exactly matches vmware. This approach is significantly faster than piping things through the findstr command. Also note that it is automatically case insensitive, because, well, that’s the way that dir rolls.
How much faster? I’m going to use Cygwin so I can get the time command for comparison. The $ prompt you see below is from Cygwin running on my XP box:
$ time cmd.exe /c "dir /s /b C:\ | findstr /i vmware > nul"
real 0m10.672s
user 0m0.015s
sys 0m0.015s
Now, let’s try the other approach:
$ time cmd.exe /c "dir /s /b C:\*vmware* > nul"
real 0m6.484s
user 0m0.015s
sys 0m0.031s
It takes about half the time doing it this more efficient way. Oh, and note how I’m using the Cygwin time command here. I use time to invoke a cmd.exe with the /c option, which will make cmd.exe run a command for me and then go away when the command is done. Cygwin’s time command will then show me how long the command took. I use time to invoke a cmd.exe /c rather than directly invoking a dir so that I can rely on the dir command built-into cmd.exe instead of running the dir command included in Cygwin.
OK… so we have a more efficient way of finding files than simply scraping through standard output of dir. But, what about an analogous construct to the locate command that Hal and Paul talk about above? Well, Windows 2000 and later include the the Indexing Service, designed to make searching for files more efficient by creating an index. You can invoke this service at the command line by running:
C:\> sc start cisvc
Windows will then dutifully index your hard drive, making searches faster. What kind of searches? Well, let’s see what it does for our searches using dir:
$ time cmd.exe /c "dir /s /b C:\*vmware* > nul"
real 0m6.312s
user 0m0.015s
sys 0m0.046s
Uh-oh… The Windows indexing service doesn’t help the dir command, whether used this way or in combination with the find command. Sorry, but dir doesn’t consult the index, and instead just looks through the complete file system directory every time. But, the indexing service does improve the performance of the Start–>Search GUI based search. You can control which directories are included in the index via a GUI tool that can be accessed by running:
C:\> ciadv.msc
Also, in that GUI, if you select System–>Query the Catalog, you get a nice GUI form for entering a query that relies on the indexing service. I haven’t found a built-in cmd.exe feature for searching directories faster using the indexing service, but there is an API for writing your own tools in VBS or other languages for quering the index. Microsoft describes that API and the indexing service in more detail here.
Deleting related files
From: COMMAND LINE KUNG FU: PaulDotCom, Ed Skoudis, Hal Pomeranz, byte_bucket
I had been deliberately holding back on this problem because I didn’t want to make things too tough on Ed and that poor excuse for a command shell he’s been saddled with. But since he had the temerity to suggest that Unix wasn’t a “real operating system” back in Episode #11 (who needs to track file creation times anyway?), the gloves have come off.
So today’s problem is as follows: Delete all files whose contents match a given string AND ALSO delete a related, similarly named file in the same directory. For example, you’ve got a lot of spam in your Sendmail /var/spool/mqueue directory and you need to match the spammer’s email address in the qf<queueID> file and then delete both the qf<queueID> file (header and delivery info) and the df<queueID> file (message contents).
Getting the matching file names is just a matter of using “grep -l”, and obtaining the queue ID values from the file names is just a matter of using “cut”:
# grep -l spammer@example.com qf* | cut -c3-
Add a tight loop and you’re done:
# for i in `grep -l spammer@example.com qf* | cut -c3-`; do rm qf$i df$i; done
And, finally, I’ll administer the coup de grace by using xargs instead of a loop:
# grep -l spammer@example.com qf* | cut -c3- | xargs -I'{}' rm qf{} df{}
So, Skodo, think you’re ready to play with the big-time shells?
Ed (aka Skodo) responds:
Hal says he “Didn’t want to make things too tough on Ed…” Well, thank you for your niceties, but easy-to-use and sensical command shells are for wimps. “Big-time shells…” I wonder if we count the number of copies of cmd.exe in the universe and compare it to the number of bash shells, which would come out “big-time”? Still, I do have to confess, cmd.exe is about the most uglified and frustrating shell ever devised by man. But, I can take care of your so-called challenge with the following trivial-to-understand command:
C:> cmd.exe /v:on /c "for /f %i in ('findstr /m spammer@example.com qf*') do @set stuff=%i & del qf!stuff:~2! & del df!stuff:~2!"
Although an explanation of this really straightforward command probably isn’t necessary (it’s pretty obvious, no?), I’ll go ahead and insert one just for completeness. I’ll start in the middle, work my way through the end, and wrap around to the beginning.
Putting all sarcasm aside, I’m doing a bunch of gyrations in this command to get really flexible string parsing beyond what I can get with normal Windows FOR loops. I start out in the middle by running the findstr command, with the /m option, which makes it find the name of files that contain the string “spammer@example.com” at least one time. I’m looking only through files called qf*. The output of the findstr command will be one qf file name per line. The findstr command will run inside the FOR /F loop because I put it inside of forward single quotes (‘ ‘), with the iterator variable %i taking on the value of each of the lines of the output of findstr.
So far, so good. But, now we get to the fu part here, and I really mean FU. Originally, I considered parsing %i using another FOR /F loop to rip it apart as a string, so I could peel off the qf in front to get the unique part of the file name. However, that won’t work nicely, because FOR /F parsing cannot do substrings. So, I briefly thought about defining the letters q and f as delimiters in my FOR /F so I could parse them off, but the remainder of the file name may have those letters in them as well, which means I would miss some files with my over-exuberant FOR /F q and f delimiters. There must be another way, one that lets us get substrings.
Clearly, we need better parsing of the %i variable. What to do? Well, we can’t apply substring parsing directly to iterator variables of FOR loops, because substring parsing is only available for environment variables. I wish we could just sub-stringify %i, but it doesn’t work. Instead, we can assign its value to an environment variable, which I’ve called “stuff”. Then, we can parse stuff to snip off the first two characters (the q and the f) using !stuff:~2!. I then delete the files referred to with qf!stuff:~2! and df!stuff:~2!.
But, what’s that monstrosity up front with the cmd.exe /v:on /c? Well, cmd.exe does immediate environment variable expansion by default, expanding our stuff variable immediately as the command is invoked. We want delayed expansion, so that stuff can take on different values as our loop iterates. We do that by first invoking a cmd.exe with /v:on to tell it to do delayed environment variable expansion, to execute a command for us (/c), with that command being our FOR loop. All of that nonsense, just to get flexible variable parsing. But, this parsing is pretty useful, especially when combined with FOR /F string parsing. But don’t get me started on that.
So, there you have it. Lots of fun little gems in this one. Thanks for the challenge, Hal. Inspired by your post, I’m now going to install sendmail on a Windows box and write an anti-spam tool using the above command…. NOT!
Special Guest Fu from @jaykul:
@jaykul, a PowerShell master, provided this useful PowerShell command to implement a solution to Hal’s challenge:
#PowerShell> sls spammer@example.com -list -path qf* | rm -path {$_.Path -replace "qf",
"[qd]f"}
@jaykul helpfully notes that sls stands for select string.
Ed comments: It’s amazing how much simpler and more elegant PowerShell is compared to cmd.exe. I only wish we had it 10 years ago, and could rely on it being widely deployed now! Faster, please!
Listing files by inode as a proxy for create time
From: COMMAND LINE KUNG FU: PaulDotCom, Ed Skoudis, Hal Pomeranz, byte_bucket
One of the problems with classic Unix file systems (FFS, UFS, ext[23], etc) is that they don’t track the creation time of files (“ctime” in Unix is the inode change time, not the creation time). However, forensically it’s often very useful to know when a given file was created.
While there’s no way to know the exact creation date of a file from file system metadata, you can use the assigned inode number of the file– because inodes tend to be assigned sequentially– as a proxy to figure out the relative creation dates of files in a directory:
$ ls -li /etc | sort -n
total 4468
1835010 drwxr-xr-x 5 root root 4096 Nov 23 10:04 lvm
1835011 drwxr-xr-x 10 root root 4096 Nov 23 10:04 sysconfig
1835013 drwxr-xr-x 8 root root 4096 Nov 23 10:01 X11
1835014 drwxr-xr-x 2 root root 4096 May 24 2008 rpm
1835018 -rw-r--r-- 1 root root 435 Jul 14 2007 reader.conf
1835019 -rw-r--r-- 1 root root 105 Jul 14 2007 modprobe.conf
...
1837339 -rw-r--r-- 1 root root 2200 Jul 22 2008 passwd
1837348 -rw-r--r-- 1 root root 814 Jul 22 2008 group
1867786 drwxr-xr-x 4 root root 4096 May 24 2008 gimp
1867804 drwxr-xr-x 2 root root 4096 Jul 14 2007 sane.d
1867868 drwxr-xr-x 7 root root 4096 Jul 22 2008 gdm
1867890 drwxr-xr-x 2 root root 4096 Jul 22 2008 setroubleshoot
1867906 drwxr-xr-x 3 root root 4096 Aug 8 2007 apt
1867925 drwxr-xr-x 3 root root 4096 Aug 8 2007 smart
1867929 drwxr-xr-x 5 root root 4096 Dec 11 14:24 raddb
1867954 drwxr-xr-x 10 root root 4096 Dec 15 09:03 vmware
1867972 drwxr-xr-x 2 root root 4096 Aug 8 2007 syslog-ng
1868042 drwxrwsr-x 2 root mailman 4096 Jul 22 2008 mailman
1868075 drwxr-x--- 3 root root 4096 Jul 22 2008 audisp
1900546 drwxr-xr-x 2 root root 4096 Jul 22 2008 purple
1933364 drwxr-xr-x 2 root root 4096 Nov 23 14:08 vmware-vix
2293777 -rw-r--r-- 1 root root 362031 Nov 23 14:04 services
At the top of the output you can see that the inodes are clustered tightly together, indicating these files were probably all created about the same time– typically when the system was first installed. Towards the end of the output, however, you can see other “clusters” of inode numbers corresponding to groups of files that were created around the same time. In this case, these are mostly the configuration directories for software packages I added after the initial OS install.
Ed Responds:
“…A proxy to figure the relative creation dates of files”? Oh my… If I may indulge in a little trash talk, you’d think that a real operating system would have some better way of tracking file creation times than resorting to inode numbers.
Just to pick an alternative operating system at random off the top of my head, let’s consider… um… Windows. Yeah, Windows.
Oh yes, we have file creation time, which can be displayed using the really obscure dir command.
In all seriousness, by default, the dir command displays file modification date and time. If you want it to display creation time, simply run it with the /tc option. The /t indicates you want to twiddle with the time field (yeah, it stands for “twiddle” ;). The options after it are c for creation date/time, a indicates last access, and w is for last written. For example:
$ dir /tc
Lot simpler than Hal’s fu above, and it gets the job done.
Oh, and Hal wanted them sorted. Sadly, we don’t have a numeric sort in Windows, just an alphanumeric one. But, that lament is for another day, because we can sort based on time stamp right within dir, as follows:
$ dir /tc /od
The /o indicates we want to provide a sort order, and we’re sorting by date, oldest first. To reverse the order (newest first), use /o-d, with the minus reversing the date sort.





