Data driven decision making: Netflix or Blockbuster?

My wife and I have been Netflix subscribers for years, during which we have rented hundreds of movies. We are considering a switch to Blockbuster, but one of the holdups has been that Blockbuster supposedly only has best sellers, while Netflix has lots of niche and foreign movies that make it more attractive. Then I realized it doesn’t really matter what the selection is in the abstract; what matters is, are the movies we want available? So I wrote a quick Perl script to help answer that question. It was fun so I thought I’d share my methodology and results.

  1. First download my Netflix account history to an HTML file: https://www.netflix.com/RentalActivity?all=true
  2. Take out the movies from the HTML file
grep "http://www.netflix.com/Movie" RentalActivity.html
  | sed "s|.*Movie/||" | sed "s|/.*||"
  | sed "s/_/ /g" > netflix_history

Then, go to the Blockbuster search page, and figure out their search endpoint. Blockbuster doesn’t have an API like Netflix does, so we have to scrape their page. Since we’re looking for a relatively simple answer, this is not bad. Playing around, I can get the answer with this command:

curl "http://www.blockbuster.com/search/movie/movies
  ?keyword=The Sopranos Season 2 Disc 2"
  | grep "results containing"

I figured out the regular expression, and whipped up a quick perl script that pulled the number of results, and the title of the first result.

#!/usr/bin/perl

while ($title = <STDIN>) {
  chomp $title;

  $date = `date`;
  chomp($date);
  print STDERR $date . "   " . $title . "\n";
  $url = "http://www.blockbuster.com/search/movie/movies?keyword=". $title;
  $result = `curl "$url"`;

  $num = -1;
  if ($result =~ m/(\d+)&nbsp;results containing/) {
    $num = $1;
  }
  $new_title = '';
  if ($result =~ m|<dt class="titleInfo">.*?<a href="/catalog/movieDetails/\d+" title="(.*?)"|) {
    $new_title = $1;
  }
  print $title . "\t" . $num . "\t" . $new_title . "\n";
  sleep 15;
}

In an attempt not to get shut down by any rate limiters, I only did one query every 15 minutes.

After getting the data this morning, I loaded it into Excel and did some manual scrubbing. Sometimes it was wrong; occasionally I’d get back 0 results even though such an entry did exist. So I manually ran about 20 or 30 searches on the few remaining items, just to make sure everything was accurate.

The net result: only eight out of our 327 movies was not available in Blockbuster. This was mostly composed of the Up Series, which is an old British documentary dating from the 60s, so I’m not terribly surprised. The remaining few missing movies were:

Besides those, Blockbuster had them all. They had all our seasons of Freaks and Geeks, Buffy, Sopranos, Angel, Gilmore Girls, Sex and the City, Six Feet Under. They had The Clan of the Cave Bear, The End of Suburbia, “sex lies and videotape”, Yo Soy Boricua Pa Que Tu Lo Sepas, Uchicago’s own Proof.

So in short, I think I’m switching to Blockbuster. Here’s to data-driven decision making.

Outcome


One Response to “Data driven decision making: Netflix or Blockbuster?”

Leave a Reply

You must be logged in to post a comment.