This post is the fourth in a series of arguments for using relational databases instead of, say, flat files. As I mentioned previously, I'm trying to steer clear from the "normal" reasons one often hears (integrity, transactions, anomaly reduction, yadda, yadda).
Dang near every web application features file uploads. And leaving aside how dang inconvenient it is to manage files through a browser interface -- Can't I just upload the whole folder? Only one file at a time you say? -- there's always the question of where to stash the dang files once their bits reach mother server. It's not a complex design issue, just an annoying one, dangnabbit!
In my own work, I've followed the conventional wisdom: "uploaded files are files after all, so stick 'em with the other files, in the filesystem." All that is then needed is to record each file's path in the webapp's database.
But there were always a few nagging doubts about this method. To begin, what should the uploaded file be named? If it keeps its original name, you can't store it in a directory with other uploads lest there exist another file in that directory with the same name. To work around that, you can create a new directory for each uploaded file to avoid conflicts, naming the directory after some surrogate key value supplied by the database. But creating and deleting all of those directories is a bother; why not just rename the file using that surrogate key value and slip in an HTTP Content-Disposition header to replace the original filename whenever the file is downloaded? Oh, because then the server has this directory with files named 4506, 7912, and 9903, and without a SQL prompt handy, there's no easy way to figure out which ones are dirty-girl JPEGs.
And then there's the whole security bugaboo. Encouraging people to upload random binary strings to your server is like a Bring Your Own Matches shindig at a liquid oxygen factory. Enabling write permissions for the webserver account on an upload directory is as dangerous as it sounds. Any exploitable weakness in any part of that webapp -- or any other webapp running on the server under that account -- and the jig is up. "I just writes 'em, I don't secures 'em," says the webapp developer, hands waving aloft.
The alternative -- storing files as BLOBs in the database -- is pooh-poohed: "inefficient" decry the filesystem boosters, "non-atomic" shout the First Normal Form-ers, and "can't 'check' for porn as easily" wail the OS administrators.
I pooh-poohed along with the rest, and spent hours wrasslin the details of uploaded storage. How many times did I reload the database schema from script but forget to clean out the upload directory?
But now I'm ready to try cramming uploaded BLOBs into the database. Binaries Belong in the Database Too tipped the argument for me. While some of the comments do raise convincing objections to the author's thesis, many of them begin with a hidden "theory suggests..." premise.
I can play that game: recent theory suggests its better to store uploaded files in databases than in the filesystem. Now all I need is a webapp project to prove my theory. Perhaps one of my Database Design students will be so kind.