My data hero, Karen Lopez aka @datachick, is hosting a blog meme for this friday called “#FailFriday: I was young and didn’t know any better.”
I have made lots of mistakes, but this one still gets to me more than 10 years later.
In 2000 I was working for a small ISV in the library management systems space. We had customers all over the world, including Kuwait. Now most of our customers were librarians, not Oracle DBAs. So, they paid us to manage their systems for them – remotely.
Now, I don’t know if you remember an Internet where ftp’ing megabytes over long distances was a challenge. But in 2000, it definitely was a challenge.
My task at hand was pretty simple:
Upgrade the customer’s 6 Oracle databases from version 7.3.4 to version 8i, remotely over a telnet session, in Kuwait.
I was a cocky 23 year old at the time. I had a college degree, an entire college class dedicated to database design, and almost a full year of experience under my belt! This new job was very intimidating at first. I was expected to be a DBA, UNIX systems admin, Apache and Perl/CGI, and our own product’s jack of all trades.
I was pretty comfortable with UNIX as my entire 4 years of college has used Solaris as the primary programming platform for my Computer Science classes. I had picked up Perl pretty quickly as it seemed much easier and intuitive than C++ and Ada (I never did get Object Oriented programming which pretty much explains why I’m not a developer), and I was getting more and more comfortable with Oracle. Heck, they had even tasked me to write an operations manual for our Oracle customers.
So when they asked me to perform this upgrade, it was a big deal, but I had done it before several times with other customers.
The process to upgrade these servers went something like this:
- Wake up early or stay up late to FTP the new Oracle RDBMS server software to the Kuwaiti servers
- Export the data – or take a DMP (giggle)
- Shut down the database
- Take a full backup
- Archive to tape
- Install Oracle 8i
- Create new database
- Import the data from the old database to the new one
- Delete the staging software and old database
Now I did all of these steps save the ‘archive to tape’ piece. That was taken care of by the customer as they could actually put the tape physically in the box and run a script. The rest was on me. I had managed to do this successfully for 5 of the 6 servers when I really stuck my foot in it (that’s slang for royally screwed up.)
Hold on Jeff, why would you delete the staging software and old database right away?
Remember in the time before time, where the internet was slow and storage was expensive? Also, this was a library – even though they were in Kuwait, they still had a limited IT budget. There was barely enough room to un-TAR the software for me to even install it, much less leave duplicate copies of the database laying around.
Jeff, one more thing, why didn’t you just upgrade the actual database?
I could have done that. But I wanted to build the things from scratch. Mostly I remember doing that because I thought it was more fun, and I could brag about it later…or I was just more comfortable doing it that way.
The Epic Massive Fail
I had just finished getting the last database upgraded and ready to go on the server. So the only thing left to do was to remove the old files. Here’s an awesome UNIX command that any experienced person has a huge amount of respect for:
rm -rf
And when I say ‘respect’, I mean like how you would respect the power and capabilities of a loaded firearm.
‘rm’ does what it sounds like. It removes or deletes files off the filesystem. The ‘-rf’ part are flags, or options for the command. ‘r’ is for ‘recursive’, meaning it will walk the entire directory tree down. ‘f’ is for ‘force’, as in ‘do not prompt me for each and every file that is to be deleted, just delete it all!’
Are you figuring out what I did wrong?
Yup, I issued this command in the WRONG DIRECTORY. I wiped out all the work I had just done. In the best case scenario this would have meant the system would be down for maybe 8-12 hours instead of 4. We just had to get them to put that tape back in the server so I could restore the backup and start over.
But, I didn’t do the tape backup, they did.
So I sent an email and asked them to do the recovery.
Oh, and I did the walk of shame to my boss and told them what had happened.
Four months later they found the tape and I was able to finish. I had no idea what they did to let folks check out their books and manage their catalog. I doubt they closed the library, but they could have and it would have been mostly my fault.
To this day the first thing I do when entering a UNIX environment is change the prompt to show the full directory path. And the second thing I do is check the directory 5 or 10 times before I even think of issuing that command again.
10 Comments
Way, way back after I first started with Oracle, around 1989, we had Oracle running on a 4 user Prime mini-computer up in Twin Cities. I was down in Denver and we had to telnet for OS commands and FTP for file transfers, and disk space was always low. We also of course had the old client-server setup over X.25 lines. Several times I had to transfer large files to be imported but there wasn’t enough disk space. So I started looking around trying to find some old files I could delete to make some room. Having just left the old dBase II world, I was surprised to see a bunch of large files with the dBase .dbf file extension, so I wondered why these large dBase files were loaded up on the server. Getting rid of them would certainly give me the room I needed for my FTP session, so deleted all of the ,dbf’s and ‘Voila!’, the FTP session went smoothly. But some reason the database had suddenly stopped working, so I had to get on the phone and told the system admin in TC the database wasn’t running (which back in those days on PrimeOS wasn’t unusual). Eventually they got it running again, and things went smoothly again until the next time I needed room for another FTP.
I wound up doing this to them about 3 different times until I had finally taken one of the Oracle Admin courses and discovered that the admin could specify a file extension that wasn’t simply .ora. I still carry that shame to this day, but on the positive side, our backup and restore procedures worked pretty good. π
Oh man, that’s painful to read. But yeah, kudos to your admins for having a great recovery mechanism, but they probably should have done more digging after the 2nd occurrence to figure out exactly what had happened, yeah?
Hi Jeff
Found this when searching for ‘prevent accidental oracle import’ which I just did ‘coz turns out someone has a default export dumpfile lying around in the default datapump directory … arrggghhh
I know DBAs are supposed to know what they are doing when they run commands like these but I wish the next version of impdp has a default option where it prompt your for a Y or N before running the import π
the last time I had to actually use a command line was…a half a lifetime ago.
part of me wants to really know unix and its derivatives – the same part that appreciates Oriental gardens, clean white shirts and Gregorian Chants. But though it annoys me, windows does make it less possible to really hurt myself and others…
You’d be surprised how much damage you can do by clicking a button Charles! That’s probably the #1 argument against GUIs – makes it too easy for the inexperienced to mess something up. That’s why I stress the proper amount of security setup to folks can only do what you want them to be able to do.
Thanks for the comment, your Oriental garden comment really cracked me up π
Tape? What is this “tape” you speak of?
I admit, one time I went to delete the test oracle home on the test box, clicked on wrong window, and deleted the production home on the production box. Funny, didn’t stop people from working, though it did stop new logins.
It ain’t how you screw up, it’s how quick you fix it. 4 months, haha.
Ow…..
Thanks Norm! It turns out my brother did the same thing, so it’s nice to know you have company in the ‘Doh!’ club π
Jeff, been there done that too, same command. Sigh!
I was told to install SE oracle and remove the mistakenly installed EE versions. Everything was off of /opt/oracle/product. I installed both versions of SE and started to remove the EEs.
cd /opt/oracle/product/9.2.0.6/
rm -Rf *
What could possibly go wrong?
How about not spelling /opt/oracle/product/9.2.0.6/ correctly? “orcale” – oops. As I was in /opt/oracle/product (or /opt/oracle maybe) and anyway, the “cd” failed but not the “rm -Rf *”. I too walked the walk of shame to my boss and in a crowded office full of DBAs, I admitted to being a plank!
Luckily we had backups.
Cheers,
Norm.
Been there too.
It’s a right of passage.
Show me a DBA who has never had to recover from tape and I’ll show you a COWARD. π