Thursday, 11 December 2014

finding common movies and actors

We can have a lot of fun with the imdb data. This time, given two actors, find which movies they shared. Alternatively, given two movies, find the common actors.

Let's jump right into the python:
#!/usr/bin/env python3

import sys

if len(sys.argv) < 3:
  print("\nUsage:")
  print("  ./find_common.py actor1 actor2")
  print("  ./find_common.py movie1 movie2\n")
  sys.exit(0)

one = sys.argv[1]
two = sys.argv[2]

def file_recall(filename,op,label):
  pattern = op + " |" + label + "> => "
  n = len(pattern)
  with open(filename,'r') as f:
    for line in f:
      if line.startswith(pattern):
        line = line[n:]
        return line[1:-1].split("> + |")
  return []

def display(line):
#  return ", ".join(line)
  return "\n".join(line)

def intersection(a,b):
  return list(set(a) & set(b))

imdb_sw = "sw-examples/improved-imdb.sw"    # our imdb data

def print_common_movies(sw_file,one,two):
  actor1 = "actor: " + one
  actor2 = "actor: " + two
  movies1 = file_recall(sw_file,"movies",actor1)
  movies2 = file_recall(sw_file,"movies",actor2)

# check if we have info on them:
  if len(movies1) == 0 or len(movies2) == 0:
    return

  common_movies = intersection(movies1,movies2)

  print()
  print("common movies for:")
  print(one)
  print(two)
  print("number of common movies:",len(common_movies))
  print("common movies:")
  print(display(common_movies))
  print()

def print_common_actors(sw_file,one,two):
  movie1 = "movie: " + one
  movie2 = "movie: " + two
  actors1 = file_recall(sw_file,"actors",movie1)
  actors2 = file_recall(sw_file,"actors",movie2)

# check if we have info on them:
  if len(actors1) == 0 or len(actors2) == 0:
    return

  common_actors = intersection(actors1,actors2)

  print()
  print("common actors for:")
  print(one)
  print(two)
  print("number of common actors:",len(common_actors))
  print("common actors:")
  print(display(common_actors))
  print()


print_common_actors(imdb_sw,one,two)
print_common_movies(imdb_sw,one,two)
Now some examples:
$ ./find_common.py "Tom Cruise" "Nicole Kidman"

common movies for:
Tom Cruise
Nicole Kidman
number of common movies: 8
common movies:
movie: Eyes Wide Shut (1999)
movie: Days of Thunder (1990)
movie: Der Geist des Geldes (2007)
movie: August (2008)
movie: Boffo! Tinseltown's Bombs and Blockbusters (2006)
movie: Stanley Kubrick: A Life in Pictures (2001)
movie: Far and Away (1992)
movie: The Queen (2006)


$ ./find_common.py "Matt Damon" "Morgan (I) Freeman"

common movies for:
Matt Damon
Morgan (I) Freeman
number of common movies: 3
common movies:
movie: Invictus (2009)
movie: The People Speak (2009)
movie: Magnificent Desolation: Walking on the Moon 3D (2005)

$ ./find_common.py "Bruce Willis" "Tom Cruise"

common movies for:
Bruce Willis
Tom Cruise
number of common movies: 1
common movies:
movie: Boffo! Tinseltown's Bombs and Blockbusters (2006)

$ ./find_common.py "Bruce Willis" "Matt Damon"

common movies for:
Bruce Willis
Matt Damon
number of common movies: 1
common movies:
movie: Ocean's Twelve (2004)

$ ./find_common.py "Brad Pitt" "George Clooney"

common movies for:
Brad Pitt
George Clooney
number of common movies: 7
common movies:
movie: Burn After Reading (2008)
movie: Ocean's Twelve (2004)
movie: Boffo! Tinseltown's Bombs and Blockbusters (2006)
movie: Ocean's Eleven (2001)
movie: Touch of Evil (2011)
movie: Ocean's Thirteen (2007)
movie: Confessions of a Dangerous Mind (2002)

$ ./find_common.py "Matt Damon" "Ben Affleck"

common movies for:
Matt Damon
Ben Affleck
number of common movies: 10
common movies:
movie: Chasing Amy (1997)
movie: Jersey Girl (2004)
movie: Jay and Silent Bob Strike Back (2001)
movie: School Ties (1992)
movie: Glory Daze (1995)
movie: Dogma (1999)
movie: The Third Wheel (2002)
movie: Good Will Hunting (1997)
movie: Unite for Japan (2011)
movie: Field of Dreams (1989)

-- now find the common actors for the Ocean's series of movies:
$ ./find_common.py "Ocean's Eleven (2001)" "Ocean's Twelve (2004)"

common actors for:
Ocean's Eleven (2001)
Ocean's Twelve (2004)
number of common actors: 18
common actors:
actor: David (II) Sontag
actor: Casey Affleck
actor: Julia (I) Roberts
actor: Andy (I) Garcia
actor: George Clooney
actor: Eddie Jemison
actor: Elliott Gould
actor: Larry Sontag
actor: Matt Damon
actor: Carl Reiner
actor: Topher Grace
actor: Brad Pitt
actor: Scott L. Schwartz
actor: Scott Caan
actor: Bernie Mac
actor: Jerry (I) Weintraub
actor: Shaobo Qin
actor: Don Cheadle

$ ./find_common.py "Ocean's Twelve (2004)" "Ocean's Thirteen (2007)"

common actors for:
Ocean's Twelve (2004)
Ocean's Thirteen (2007)
number of common actors: 16
common actors:
actor: Don Cheadle
actor: Casey Affleck
actor: Elliott Gould
actor: Bernie Mac
actor: Matt Damon
actor: Brad Pitt
actor: Eddie Izzard
actor: Eddie Jemison
actor: George Clooney
actor: Scott Caan
actor: Jerry (I) Weintraub
actor: Scott L. Schwartz
actor: Carl Reiner
actor: Shaobo Qin
actor: Andy (I) Garcia
actor: Vincent Cassel
So that was all kinda fun! I guess the only note is I was thinking of implementing a google like "did you mean", because currently if you don't get the actor or movie name exactly right, you don't get any results. Shouldn't be too hard to implement something like that.

Update: IMDB does something similar too, which isn't really surprising.

Update: Here are a bunch of other results using the IMDB data:
all-actors-average.txt
all-actors-weighted-average.txt
movie-only-votes-ratings-title.txt
sorted-all-actors-average.txt
sorted-all-actors-weighted-average.txt
sorted-top-1000-actors-average.txt
sorted-top-1000-actors-weighted-average.txt
star-studded-movies.txt
top-1000-actors-average.txt
top-1000-actors-weighted-average.txt
top-1000-actors.txt
top-2500-well-known-actors.txt
votes-ratings-title.txt
well-known-actors.txt
kevin-bacon-numbers.sw

No comments:

Post a Comment