BT
x Your opinion matters! Please fill in the InfoQ Survey about your reading habits!

Automating File Uploads with SSH and Ruby

Posted by Matthew Bass on May 29, 2007 |

Isn't it funny how computer skills, just like the computers themselves, evolve at such a rapid pace? Some of us who find ourselves doing hard-core computer programming today got our start with HTML and CGI in the enthralling days of the early web. I happen to be one of those people. If you've also dabbled in the wonderful world of pseudo-coding called "web design" you undoubtedly recognize that, these days, most designers fall into one of two camps. The first camp uses a WYSIWYG editor like Dreamweaver to design and publish a web page. The second camp uses a text editor like Emacs or Vim to code HTML by hand and an FTP client to upload the finished page to a web server for the world to see and, hopefully, appreciate. The first camp sacrifices flexibility for convenience, and the second, convenience for flexibility. Neither way is wrong, but neither way is totally right either.

During my early years of web design I fell into the first camp. More recently I have embraced my Notepad-wielding destiny and joined the ranks of those who prefer to "do it by hand." I've enjoyed the additional flexibility afforded by doing it this way, but the cost in ease of use hasn't been low. Installing a Web server locally and editing the files directly was time consuming and didn't port well, so I usually found myself changing a page, switching over to my FTP client, uploading the file, switching to my browser, and refreshing to view the updated page. Not a quick thing to do, and it got very old after awhile. It was a process that practically screamed "AUTOMATE ME!" So last summer, I fired up my favorite text editor and decided to do just that.

I wanted a program that did everything I was doing by hand, but faster and more accurately. I decided to use Ruby to write an automation script. I wanted my code to be short and maintainable since I would probably be adding additional features to it later on. Ruby (being dynamically typed) makes it simple to write compact code that can be extended with a minimum of fuss. It's a scripting language, but it's also object-oriented. This enabled me to avoid code duplication more elegantly than I could have with a procedural language. Ruby also has a decent open source SFTP library available (Net::SFTP[0]) so I wasn't forced to write my own. (SFTP is a network protocol that enables files to be transferred securely.)

In this article, I'll guide you step-by-step through the process of creating your own version of this program. Complete source code examples will be included, with line-by-line analysis of what the code is doing. I invite you to join in and experience how easily Ruby can automate routine parts of your work day.

Requirements

Our program has one basic requirement: it must connect to a remote SFTP server and upload our files. However, we'd also like it to only upload files which have changed locally, and automatically recurse into subdirectories while checking for files to upload.

The planned flow of the script is:

  1. Establish an SFTP connection with the remote server.
  2. List all files and subdirectories in the local directory.
  3. Compare the timestamps on the files in the local directory to the timestamps on the files in the remote directory and upload only those files that have been changed locally.
  4. Recurse into any local subdirectories and repeat from step two, creating remote subdirectories as needed.

It's clear that steps one through three are easily handled by Ruby's built-in objects and Net::SFTP. Step four is very interesting. While Ruby's Dir class does offer a way to recurse into subdirectories, it's not quite as obvious as we would like. Since Ruby allows for easy extension of the language, why not write our own method? Not only will it be a lot of fun, but we'll also learn how easy it is to extend Ruby.

Dependencies

Aside from Ruby itself, our only dependencies are the Net::SFTP and Net::SSH[1] libraries. Fortunately for us, both are packaged as Gems[2]. Assuming you have Ruby installed on your local machine and Gem is on the path, bring up a command prompt and type:

gem install net-ssh --include-dependencies
gem install net-sftp --include-dependencies

Let's Code!

Now we're ready to start writing some code. Let's try connecting to the remote server, listing all files in a given local directory, and closing the connection. We'll do this using the Net::SSH and Net::SFTP interfaces and Ruby's Dir class.

1: require 'net/ssh'
2: require 'net/sftp'
3: Net::SSH.start('server', 'username', 'password') do |ssh|
4: ssh.sftp.connect do |sftp|
5: Dir.foreach('.') do |file|
6: puts file
7: end
8: end
9: end

Let's go through this line by line:

  1. Require the Net::SSH library.
  2. Require the Net::SFTP library since we'll be using both.
  3. Establish an SSH session with the given username and password. (Additional arguments such as a proxy server can also be given here. See the API documentation for more information.)
  4. Open an SFTP connection to the remote server.
  5. The Dir class lists all of the files in the current working directory.
  6. Print out each file name.
  7. Exit the file listing loop.
  8. -
  9. Close the SFTP and SSH connections.

After executing the script on my system, the following output is produced:

.
..
cgi-bin
etc
logs
public_html
temp

Looking at our original list of requirements, we've managed to finish steps one and two with nine lines of code.

Let's move on to step three, comparing the timestamps of the listed files with the remote timestamps, and uploading only those files which have changed. (For our purposes, we define a file as having changed if the local timestamp is greater than or equal to the timestamp on the remote server.) Comparing timestamps with Ruby is really easy. In fact, the comparison itself can be made with just one line of code.

Let's take a look at the script now:

1: require 'net/ssh'
2: require 'net/sftp'
3: Net::SSH.start('server', 'username', 'password') do |ssh|
4: ssh.sftp.connect do |sftp|
5: Dir.foreach('.') do |file|
6: next if File.stat(file).directory?
7: begin
8: local_file_changed = File.stat(file).mtime > Time.at(sftp.stat(file).mtime)
9: rescue Net::SFTP::Operations::StatusException
10: not_uploaded = true
11: end
12: if not_uploaded or local_file_changed
13: puts "#{file} has changed and will be uploaded"
14: sftp.put_file(file, file)
15: end
16: end
17: end
18: end

Let's go through this line-by-line:

1. - 2. Require Net::SSH and Net::SFTP.
3. - 4. Establish our SSH session and SFTP connection.
5. Loop over files in the current working directory.
6. Since we can't handle recursing into directories yet, check if the current file is really a directory. If so, skip to the next iteration of the loop.
7. - 11. Since the remote file may not exist yet, we need to catch the exception that Net::SFTP will throw when we try to determine its timestamp. We set two flags: one indicates if the local file has changed and needs to be uploaded, the other indicates if the remote file doesn't exist yet.
12.- 13. If the local file has not yet been uploaded or is newer than the remote file, print a line indicating that the file is being uploaded.
14. Upload the local file to the remote server.
15.- 18. Close the if statement, file loop, SFTP connection, and SSH session.

Now we've completed requirement three. We have a script that will login to a remote server and upload all files which have changed on the local system, but the script will only do this for a single directory. It can't navigate into subdirectories to find additional files to upload. It also can't handle creating missing directories on the remote server. We need to cover both of these situations before declaring our script finished.

Recursion

Let's finish up our script by descending into subdirectories and handling the situation where the directory containing the file being transferred may not exist on the remote server:

1: require 'net/ssh'
2: require 'net/sftp'
3: require 'dir'
4:
5: local_path = 'C:\public_html'
6: remote_path = '/usr/jsmith/public_html'
7: file_perm = 0644
8: dir_perm = 0755
9:
10: puts 'Connecting to remote server'
11: Net::SSH.start('server', 'username', 'password') do |ssh|
12: ssh.sftp.connect do |sftp|
13: puts 'Checking for files which need updating'
14: Find.find(local_path) do |file|
15: next if File.stat(file).directory?
16: local_file = "#{dir}/#{file}"
17: remote_file = remote_path + local_file.sub(local_path, '')
18:
19: begin
20: remote_dir = File.dirname(remote_file)
21: sftp.stat(remote_dir)
22: rescue Net::SFTP::Operations::StatusException => e
23: raise unless e.code == 2

24: sftp.mkdir(remote_dir, :permissions => dir_perm)
25: end
26:
27: begin
28: rstat = sftp.stat(remote_file)
29: rescue Net::SFTP::Operations::StatusException => e
30: raise unless e.code == 2
31: sftp.put_file(local_file, remote_file)
32: sftp.setstat(remote_file, :permissions => file_perm)
33: next
34: end
35:
36: if File.stat(local_file).mtime > Time.at(rstat.mtime)
37: puts "Copying #{local_file} to #{remote_file}"
38: sftp.put_file(local_file, remote_file)
39: end
40: end
41: end
42:
43: puts ‘Disconnecting from remote server'
44: end
45: end
46:
47: puts 'File transfer complete'

Wow! That's substantially longer than our previous revisions, but this is mostly due to the exception checking which must be done to handle remote directories that may be missing. This is a limitation of the Net::SFTP library. The put_file method throws a nasty exception if we attempt to upload to a remote directory which doesn't exist. It would be ideal for the put_file method to handle this case by automatically creating the missing parts of the file's directory tree. Modifying the method is outside the scope of this article, however, so I leave it to you as an exercise.

Let's go through our new code line-by-line:

1. - 4. Require Net::SSH and Net::SFTP.
5. - 6. Define variables for the local and remote directories which we will be comparing and uploading across.
7. - 8. Define variables for the default file and directory permissions we will need to assign to files and directories which don't yet exist on the remote server.
9. - 13. Establish an SSH session and SFTP connection to the remote server.
14. Begin descending through each subdirectory.
15. Loop over each item in the current directory.
14.Skip to the next iteration if the current item is a directory and not a file.
15.Set the local_file variable equal to the path to the local file we're looping over, relative to the current directory we're located in.
16.Set the remote_file variable equal to the directory/file destination on the remote server, prefixed with the value of remote_dir so that the file we're uploading is placed in the correct location and not just in the user's home directory.
17.- 26. This is another nasty bit of code which wouldn't need to be here if Net::SFTP was a little more intelligent about file handling. We need to check if the remote directory we're going to upload to already exists. To do that, we call sftp.stat(..), passing it the name of the directory to check. If stat throws an exception with property code equal to 2, the remote directory doesn't exist so we create it, assigning it the correct permissions.
27.- 35. More nasty code to check of the remote file we're uploading to exists. We don't need to perform this check so we can create the remote file, though, because it will be automatically created when we upload the local file. We need to perform this check so we can set the appropriate permissions on the remote file if it's brand new. If we don't do this, the default UNIX permissions will be used which may prevent us from uploading to the file later.
36.- 40. Finally, if we've made it this far, it means the remote directory and file we're trying to upload to both exist. We compare the modified time of the local file to the modified time of the remote file and, if the local file is newer, we upload it.
40.- 48. End all of the loops we've opened, closing our SFTP connection and our SSH session on the remote server.

This is what the output of the script looks like when it's executed:

Connecting to remote server
Checking for files which need updating
Copying D:/html/index.php to /home/public_html/index.php
Copying D:/html/media.php to /home/public_html/media.php
Copying D:/html/contact.php to /home/public_html/contact.php
Copying D:/html/images/go.gif to /home/public_html/images/go.gif
Copying D:/html/images/stop.gif to /home/public_html/images/stop.gif
Copying D:/html/include/menu.php to /home/public_html/include/menu.php
Disconnecting from remote server
File transfer complete

We're done! We now have a fast, easy way to upload files from a local directory tree of any depth to a remote server. The script is smart enough to create directories that don't exist, and it's also smart enough to only upload files that have actually changed.

Future Enhancements

While the "bare bones" version of the script we've developed is quite useful as is, there are several enhancements that could be made to the script with just a little more work:

  • Handle local deletion of files and directories. Our current script doesn't handle removing files and directories remotely if they have been deleted locally. It might be useful to optionally prompt the user before removing remote files to make sure something important isn't accidentally deleted.
  • Add additional checks to determine if a file has changed. Comparing timestamps is fine and dandy, but why not compare file sizes as well? This would be a useful check to have in place when connecting to servers that may not have an accurate system time set.
  • Log a record of which files were uploaded. I added statements to print to standard out, but why not use a more sophisticated logging mechanism to keep track of which files have been uploaded and when? (This would be important if you decided to schedule the script to run on a daily or weekly basis.)
  • Rewrite the script using Capistrano[4]. Jamis Buck's excellent framework for writing deployment recipes would be a good choice for a more permanent solution that could be reused across projects.

Also, while not truly an enhancement, the Net::SSH library does support the use of public key authentication. Fire up PuTTY's Pageant application (see http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html), add your key, and remove your password from the Net::SSH.start statement in our script. You may now upload files without storing your password in plain text and without being forced to enter it every time you connect to the remote server. Brilliant!

Conclusion

Over time, this script has saved me dozens of hours that I would have spent manually twiddling with an FTP GUI, or constantly changing directories via an annoying command-line FTP program. I hope you find the script equally useful in your own work. If you do, I invite you to contact me through my web site (www.matthewbass.com) and let me know. I'd also be interested in hearing from you if you have additional suggestions for enhancements, or if you spotted a good refactoring that could eliminate a line or two of code. Happy Ruby-ing!

Footnotes

1. Net::SFTP is a Ruby API for transferring files securely over SFTP. It is part of Net::SSH.

2. Net::SSH is a Ruby API for accessing resources via a secure shell. http://rubyforge.org/projects/net-ssh/

3. Gem is a Ruby package manager. http://www.rubygems.org

4. Automated application deployment with Ruby. http://manuals.rubyonrails.com/read/book/17

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Good Example by Bruce Zuo

Thanks for the good write up. But if you only care about to get the job done, wouldn't rsync be a good solution?

Re: Good Example by Matthew Bass

Absolutely. But it's more fun to write it in Ruby. :)

Re: Good Example by Alex Popescu

You may be required to have something similar in your app. And your app is required to work on a platform without rsync or at least to not require external dependencies that may be missing from the system. So, you are left to creating something, and this article shows you how to do it.

./alex
--
.w( the_mindstorm )p.
________________________
Alexandru Popescu
Senior Software Eng.
InfoQ TechLead&CoFounder

Re: Good Example by Phi Sanders

FYI to other slack hackers trying to follow along :

I had to add "require 'rubygems'" as the first line of code to get net/ssh and net/sftp to load, presumably because I used the "ports" command on Mac OS X to setyp my ruby/gems environment...

ERROR: Error installing gem net-ssh[.gem]: buffer error in Win XP by Raru Panta

I got ERROR: Error installing gem net-ssh[.gem]: buffer while trying to install net-ssh,net-sftp in Windows, I installed Ruby via latest one click installer version 1.8.4.0. After searching on google I found out comment on Jamies Buck's blog weblog.jamisbuck.org/2007/5/10/net-ssh-1-1-1 for the solution.
After following command
C:\>gem update rubygems-update

and gem install net-ssh and net-sftp
both gems installed successfully.


Thanks for great article

Nasty code by Kris Leech

Shame about the two bits of nasty code otherwise this would make a great introduction to Ruby.
Great article all the same!

There are several serious problems with this script. by David Richards

The first one is that sftp.mkdir(remote_dir, :permissions => dir_perm) is _wrong_ it should be sftp.mkdir(remote_dir, :mode => dir_perm)

This will get by on some platforms, but not others like ubuntu feisty. I'm really not sure why. #setstat is the same way, it should be :mode.

Also, the way in which this recurses will break for many people. If you have a file two directorys up from the directory you started recursing from, and the inbetween directory has not been created yet, this script will break.

Re: There are several serious problems with this script. by David Richards

Also its not very scalable because its calling stat on the remote directory for every single file, and then on each remote file. This is really slow. This is from the net-sftp faq:

handle = sftp.opendir("/usr/lib")
items = sftp.readdir(handle)
items.each do |item|
puts item.filename
puts item.longname
p item.attributes # permissions, atime, etc.
end
sftp.close_handle(handle)

So if you could create a hash or something like that to check against for each file rather then it could scale.

I'm working on a rails backup system that uses this technique and will post code when it gets good enough.

thanks by Richard Finegan

thanks, this article was very helpful for what i was doing

Re: Good Example by janna fierst

This example was great- I've never done anything with ruby before and I was able to modify it for my needs in two days, saving me hours and hours of tedious file uploads and downloads. I had to change a few things though, the biggest one being that sftp.put_file() doesn't seem to be supported anymore and I had to use ssh.sftp.upload() instead. Maybe it's something with the new versions of net::ssh and net::sftp? Thanks for the great intro to ruby!

Improvements by Dominic Rose

Thanks for this I was leaning how to use SSH and this is just what I needed!

Now I have to work on this to make it handle something else:
using a zip archiver with `` commands and zip only the files with a timestamp > X.
So it could be really fast if I disable outputs like "Copying D:/html/images/stop.gif to /home/public_html/images/stop.gif", and it would only send one file before asking the server to unzip it.
Thanks very much maybe I will soon only need to do some keyboard shorcut to synchronise all I need.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

11 Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT