propeller_beanie posts

My Stack Overfloweth

20080808.friday   comments=2   propeller_beanie  

Early this morning (I was up at 4am to watch the Olympic opening ceremonies), I received an invitation to the beta of Stack Overflow, a Q&A site for computer programmers.

Since that time, I’ve already amassed a total of 95 nondenominated points and five “badges” (one, just for filling in my personal info). All that for asking four questions and jotting ten answers.

I’ve been awaiting this site for some time now. Founded by Jeff Atwood of Coding Horror and Joel Spolsky of Joel on Software, Stack Overflow is akin to sites like expertsexchange but without the double-entendre domain name, hideous appearance, and annoying registration hurdles. (Block the “experts” from your Google search results with Greasemonkey and the Google Filter.)

The content seems a little Windows-centric thus far, but I asked a Linux question (and then provided my own answer — more points!) and got some reponses from folks that clearly know their way around a bash prompt.

From the looks of the site — and the fact that I was invited — I’d say that it’ll be ready to go public pretty soon. At first, I suspect it’ll be a good place to find answers since the point whores will be out in force trying to reach #1. After that, it’ll depend on the same type of prolific and tireless contributors that make Wikipedia so darned useful.

Can’t read this? Or anything else on the ‘net? Have a ZoneAlarm firewall? Oh snap!

20080710.thursday   comments=nil   propeller_beanie  

Carole’s Windows laptop suddenly refused to surf the web last night, right in the midst of her prolific flurry of blog postings. I started down my normal diagnostic checklist:

  1. Can you view my blog on the spare bedroom server? Yes, so the wireless is hunkey-dorey.
  2. Can you ping the IP of the WHTV NorthwesTel gateway downtown? Yes, so our cable connection is A-OK.
  3. Can you view the College’s website? No, so something beyond downtown is bolloxed.
  4. Can you ping the IP of one of the College’s servers? Yes, so DNS is also kaput.

But my Ubuntu laptop had no trouble surfing the entirety of the webosphere. So how could DNS be the problem, since we both resolve domain names using the same service (WHTV’s name servers, proxied by dnsmasq in the spare bedroom)?

Given that we’re working with Windows here, rebooting seemed a sensible tactic, but to no avail. Other random technical incantations produced no better mojo. I finally hit on the idea of disabling the ZoneAlarm firewall.

Presto-changeo, Alakazaam, Walla Walla Washington: it works!

Visiting ZoneAlarm’s site revealed a largish confirmation of this particular gremlin:

ZoneAlarm

Apparently a recent Microsoft Windows update (KB951748) conflicts rather dramatically with ZoneAlarm, effectively shutting down all internet access (via something to do with DNS).

ZoneAlarm’s suggestion is to uninstall the update, which does work, but leaves a sour aftertaste. I may try the free Comodo Pro firewall as a more permanent solution.

Mac and Linux users, perhaps the only ones who can read this given ZoneAlarm’s popularity, can continue to rest easy, although an operating system-independent flaw in DNS will mean patches for everyone.

(I’m not of the generation that can convincingly declare “oh snap“, but I sure do enjoy pretending.)

My name is Dave and I have not run antivirus for 245 days.

20080412.saturday   comments=2   propeller_beanie  

And boy does my computer run like greased light. Forget the “-ning” part: light beams are way speedier than thunderbolts.

Truth be told, I do run anti-virus and anti-spyware scans periodically; I just don’t run always-on real-time antivirus scanners. Disposing of those does wonders to Windows’ boot-up time.

Am I living dangerously? I certainly used to preach the antivirus gospel to my Practical Computer Fluency students. I still strongly advise every protection possible for computers used by kids, music and video traders, online chatters, and all of those friends and relatives of mine enamoured of cheesy greeting card services and chain letters.

But for myself, I simply don’t believe that antivirus software works most of the time. I prefer to rely on my own — admittedly dubious — common sense, virtual machines for suspicious downloads, and a tight firewall. Oh, and most of my speculative web browsing is done from an Ubuntu GNU/Linux laptop. Pretty much virus-immune, that.

I’m also not fond of the makers of antivirus tools. Before going cold turkey, I had bought subscriptions from McAfee, Norton, and Trend Micro over the years. Aside from the constant interruptive harangue of updates, the up-selling renewal processes, and the overreaching litigiousness of at least one of the companies, the profit margins of these corporations seemed more closely linked to fear of infection than effective technology.

The antivirus scanner that I do run every now and again, ClamWin, is free software: free as in freedom, free as in beer, and free as in no corporate bugaboos lousing up the joint. It ain’t pretty, and it ain’t fast — 16 hours to scan 93GB in 390,596 files — but it seems as able as any of the others: no (known) viruses to report.

I’m not going to advise anyone to follow my approach. Remember, computers are my career, so constant monitoring and maintenance is just part of the job. I trade those hours of effort for a few dozen seconds at each startup. Not everyone is able or willing to make that bargain.

Update, April 16

Antivirus maker McAfee has recently taken to blaming open source software for the spread of botnets. I guess their logic is that as open source software grows in popularity, so too do the botnets, and therefore more units of McAfee anti-whatever will be sold.

Relearning Emacs

20080301.saturday   comments=2   propeller_beanie  

The last time I used Emacs regularly, the computers were SPARC pizza-boxes, and every time you logged off, the screen would display how many dollars and cents you owed the central computing office for mainframe activity.

Emacs is a text editor, written in the 70s by the same guy that brought you most of Linux — but not the guy for which Linux is named. Like its thirty-year-old competitor vi, Emacs was designed for keyboard-bound programming folk with muscle memory bulging from every digit. Every command has a succinct keyboard shortcut, often involving intricate chords of the control, alt, and shift keys, together with every other available key, the power button, and an occasional wiggle of the network cable.

For instance, to move down four lines within an open file, insert the word “apple” after the sixth word in the sentence, save, and exit, the command is straightforward:

Ctrl+u 4 Ctrl+n Ctrl+u 6 Alt+f Space apple Ctrl+x Ctrl+s Ctrl+x Ctrl+c

Just what you’d expect compared to vi’s nonsensical alphabet soup:

Esc 4 j 7 w i apple Space Esc : w q Enter

So, right there you can see why I’m keen to jump back into Emacs. And I haven’t even mentioned the customizations that can be conjured via Elisp.

Why shell out for Microsoft Word when Emacs is free and runs on all operating systems? Side by side they’re virtually indistinguishable:

Word v. Emacs

Figure 1: Microsoft Word and Emacs. Can you see a difference?

It should only take a couple more weeks before I reliably remember how to open a file, and perhaps another couple to master escaping the minibuffer, but by then — provided my wrists don’t cramp up — I’ll truly be an Emacs wiz.

Sarcasm aside, I actually am retraining myself to use Emacs. I’ve got the O’Reilly book and everything. A powerful editor is important in my line of work and I’m sick of jumping between different and incompatible programs on Windows and Linux (and maybe someday, Mac). Even the wacky keystrokes start making sense after a while, and they beat the pants off reaching for the mouse every ten seconds.

Episode V, in which our hero drops the (Google) bomb.

20080221.thursday   comments=4   propeller_beanie  

Persistent readers of What He Said will have noticed my penchant for referring to the computerized behemoth that runs Yukon College as the “execrable system.”

That I used precisely the same terminology and hyperlink target over and over and over again was not accidental. I was working on a Google bomb: gradually training the search engine to use my link’s text as the official description of the page to which I was linking.

The result?

Execrable System

(Try it yourself.)

Didn’t take nearly as long as I would’ve thought. I also made into the top two on Yahoo! and Windows Live.

It may not last, but I am well and truly chuffed at the moment.

Update, April 14

The execrable system has fallen from the top spot on Google — my posting actually rates higher — but it now tops the Yahoo! and Live search results.

When not to use a relational database: generating HTML (for real this time)

20080218.monday   comments=4   propeller_beanie  

Currently, the Nth post in a series of N databoombasetic posts.

In a recent posting, I supposedly railed against the practice of generating HTML using the declarative SQL language. But if you look carefully, what I actually railed against was imperative-style programming using SQL. In particular, Oracle’s PL/SQL flavour of the language. HTML didn’t really enter into it.

So here’s a taste of the HTML-generating code deep within the execrable system that runs the College (click for the full size view):

HTML in PL/SQL

And you thought typing angle brackets seemed a waste of time. All told, there are some 84,699 non-blank lines of this verbosicode making up the website.

The code includes some real logic gems too. Note, for instance, the highlighted line number 1,584. It determines whether the visitor is browsing with Internet Explorer. Later on, there is an ELSIF statement that has the specific code for those using Mozilla-based browsers (oddly, there is no other recognized possibility, nor is there a concluding ELSE statement).

I checked, and the only difference between the two 48-line blocks of code was a single CSS class attribute (visible on line 1,590, above).

Once again, keep the code out of the database, unless you’re getting paid by the line.

How to convert a website’s content into simple text files.

20080211.monday   comments=1   propeller_beanie  

Every so often I find the need to convert masses of web pages into simple, editable text files. (Who among us doesn’t?) Programmer that I am, I also want to do this with as little manual intervention as possible.

For example, I recently wanted to gather together some of my Yukon College course notes to give to other instructors. The notes were originally written in HTML, but some people might prefer plain, unadulterated text.

Now, there’s text, and then there’s text. There are a bewildering variety of “lightweight” formats or conventions for specifying headings, emphasis, lists, hyperlinks, and so forth. My favourite is a format called Markdown.

To make a heading in Markdown, just underline it with equal signs or hyphens. To make a bullet list, start each point with an asterisk. To italicize, surround the word with underscores. These are all the same sorts of formatting tricks you might key into a quick e-mail.

(You can do the same in MS Word, if you can spare an hour or two to undo some of Word’s more aggressive auto-corrections.)

Of course, you don’t actually see any of the bullets, italics, or hyperlinks. C’mon it’s just text. Instead, you have the option to — presto-chango — translate Markdown into HTML. Beats typing angle brackets all the live long day.

But today’s exercise is in the other direction. Here are the steps I take to convert a website into Markdown.

  1. “Rip” the website: copy all of its HTML and image content to your computer. On Windows, I use HTTrack. On Linux, something like wget --convert-links --html-extension --mirror --random-wait --wait 3 http://microsoft.com/ will do (consider an extra hard drive or two to rip that site).
  2. Run Aaron Swartz’s html2text.py Python script to convert each ripped HTML file into the equivalent Markdown.
  3. Rename each Markdown text file to something more meaningful than the name typically assigned by HTTrack or wget. The contents of the <title> element makes for a pretty fair filename.

Unfortunately, steps 2 and 3 contain that tedious word “each.” There might be a couple of hundred eaches for one of my course sites. Any time you find yourself doing the same thing over, and over, and over, chances are you can get the computer do it more quickly and with fewer errors. That’s kinda what they’re good at.

So, I wrote some Linux shell script code to automate both steps. The full convert-html-to-md script is part of my in-progress, yet freely-downloadable, Public Domain scripnix project, and depends on some of the project’s other utilities.

If that’s just too much to contemplate, the following is a quick ‘n dirty approximation of the full script. It doesn’t handle filename collisions, and suffers from an excess of hyphenation, but it gets the job done.

#!/bin/bash
# Usage: convert-html-to-md <path-to-html2text.py> <file>[...]
# Convert the specified HTML files into Markdown text-format equivalents
# in the current working directory. The file extension will be .md.txt.
# Requires the html2text.py Python script by Aaron Swartz to convert
# from HTML to Markdown text [www.aaronsw.com/2002/html2text/].
html2text="${1}"
shift

while [ -n "${1}" ] ; do
    # Use the contents of the title element for the filename. In case
    # the title element spans multiple lines, the entire file is first
    # converted to a single line before the sed pattern is applied. Any
    # "unsafe" characters are then replaced with hyphens to produce a
    # valid filename.
    title=$(cat "${1}" | \
            tr -d '\n\r' | \
            sed -nre 's/^.*<title>(.*?)<\/title>.*$/\1\n/ip' | \
            tr "\`~\!@#$%^&*()+={}|[]\\:;\"\'<>?,/ \t" '[-*]')

    # If there's no title, then just use the original filename.
    if [ -z "${title}" ] ; then
        title=$(basename "${1}" .html)
    fi

    # Convert the HTML to Markdown.
    cat "${1}" | python "${html2text}" > "${title}.md.txt"
    shift
done

Your mileage may vary on Mac OS. Without Cygwin, Windows users are better off sticking to their pointee-clickee routine.

Revenge of the Icon

20080127.sunday   comments=5   propeller_beanie  

A past version of my company website featured a page of favourite links, one of which led to Wikipedia. I spruced up each link with a wee favicon: the little 16×16 pixel logo that appears in the browser’s address bar. Wikipedia’s logo is the letter W:

Wikipedia Favicon

The links page was eventually retired, but I noticed that the Wikipedia icon file, wikipedia_icon.png, was still very popular. The webserver’s log files showed that it was being requested dozens of times per day, often by web surfers in tea-sipping China. The logs also revealed that the referring page — the one displaying the W — belonged to a web-hosting-slash-search site named Mixcat Interactive, apparently headquartered in orange-juicing Florida.

Sure enough, the search results page on their site was using “my” icon file (it’s actually the intellectual property of Wikipedia, but the file sat on my server) for any result linking to Wikipedia:

Mixcat Wikipedia Search Results

That’s my W! Two of ‘em, even.

So, since I wasn’t even using the icon file on my site anymore, I reckoned I would simply replace it with my company logo yukon dude software, using the same wikipedia_icon.png file name. Mixcat Interactive would thereafter display an oblique advertisement for my services:

Mixcat Dude Search Results

I figured one of those orange-juicers would eventually spot my little sock puppet and make the appropriate Fixcat.

Six months later, the sock puppet was more popular than ever. Over the past two days, I counted 431 requests for the cute little guy. At a couple of hundred bytes per request, that doesn’t amount to much bandwidth, but it does clog the logs.

Maybe an ever-so-subtle change to wikipedia_icon.png would get their attention:

Mixcat Sucks Search Results

Feel free to verify whether or not Suxcat Interactive has finally managed to correct the problem.

The moral of the story is that, while it’s very easy to use an image from another site to dress up your own web page — and possibly desirable from a fair-dealing copyright perspective — the originating site retains control of the content of that image.

I wonder how much porn I could cram into that 16×16 icon…

Update, April 14

Looks like Mixcat finally got the message and removed the icon reference. But another dubious search site has taken up the banner in the interim. Fortunately, it seems far less popular than Mixcat, leaving my logs relatively unclogged.

When not to use a relational database: generating HTML

20080125.friday   comments=4   propeller_beanie  

Yet another post in a neglected series of riveting relational databauchery.

This past week I’ve been doing some maintenance work on the execrable system that runs the College (bear with me, I’m working on a Google bomb). This system happens to have a web-based interface that allows staff and students to view final marks, paystubs, class schedules, and so forth.

If you’ve ever dabbled in web applications, you may know of some of the popular development platforms: PHP (which powers Facebook, Digg, What He Said, and the College’s public site), ASP.NET (MySpace, Lego), JSP (Globe and Mail), Ruby on Rails (mostly apps from the folks that created RoR in the first place), and a zillion others.

These technologies query relational databases in order to render the HTML web pages that you see when you visit the site. None of them are built in to the relational database itself, using the SQL query language from within to spit out the web’s angle-bracket-laden content.

But that’s exactly how the College’s self-service site works.

It appears to have been built atop relational-database-maker Oracle‘s HTML DB product, which features PL/SQL as its programming language — a bondage-and-discipline language that requires you to first declare that you will later again declare your intention to define a value as being equal to 3.

Unfortunately, declarative query languages like SQL don’t easily jump through the hoops and loops required to emit web pages. For example, a single section of code to extract a list of students based on whether the user is an instructor, an advisor, or both, and whether the user wishes to view students that are instructed, advised, or both, looks something like this.

IF student is instructed and advised THEN
    IF user is instructor and advisor THEN
        30-line SQL query to retrieve students that are both instructed
        and advised by this advisor and instructor
    ELSIF user is advisor THEN
        Same 30-line query, with one change to look for instructed
        students that are only advised.
    ELSE
        Once again, the same 30 lines, this time looking for instructed
        students that are just instructed.
    END IF;
ELSIF student is advised THEN
    IF user is instructor and advisor THEN
        Yet another copy of the 30-line SQL query to retrieve students
        advised by this instructor/advisor.
    ELSIF user is advisor THEN
        You guessed it.
    ELSE
        Hoo boy.
    END IF;
ELSIF student is instructed THEN
    IF user is instructor and advisor THEN
        This is getting tedious.
    ELSIF user is advisor THEN
        And annoying.
    ELSE
        Time for a break, what's new on Reddit?
    END IF;
ELSE
    A secret none-of-the-above option that appears here requires
    another 30 lines.
END IF;

Leaving aside that most of that doesn’t even make sense — I lost an hour to wondering how a student could only be advised and not instructed — it’s just plain hideous. Some 300 lines of code are involved, all but 30 or so seemingly redundant.

This was not an isolated case. Determining whether a user is an instructor or an advisor uses the same pattern. The PL/SQL “logic” that determines whether a user’s account and PIN are valid is a multi-screen riot.

Conclusion: keep the code out of the database.

Update, Jan. 28

Of course, I managed to screw up the example code so that the ELSE clause of each decision would never execute. It’s fixed now, just in case anybody out there was planning on implementing their own twisted instructor/advisor system.

No, seriously. BIG-TIME IT contracting opportunities at the College.

20071218.tuesday   comments=4   propeller_beanie  

Following last week’s surprise management shuffle in the Computing Services department of Yukon College, the little birdie network reports that, as of this coming Thursday, there is zero internal technical support for the College’s Oracle database and the execrable SCT Banner system that runs the institution (everything from payroll to transcripts to bookstore orders).

High-priced contingency operators are being lined up as I type. (You’ll never guess who.)

But the real opportunities will appear as the College attempts to realign its IT department to more closely support the academic divisions. Expertise in information architecture, data warehousing, communications, telepresence, and generally anything related to education that begins with “e” and a hyphen, will stand you in good stead.

Don’t delay, submit your proposal today. As for my modest finder’s fee, I suggest you look no further than the 37″-long Lego Imperial Star Destroyer.