Code
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarityLisa Lucky
April 24, 2025
This project is adapted off of Kevin Kibe’s Medium article which details the usage of Cosine Similarity and TF-IDF Vectorisation for book recommendations 1. Using a Disney movies dataset from Kaggle 2 which included titles, descriptions and directors, a defintion function from the Medium article was adapted to allow for the return of five similar movie titles (based on their title input).
Under the same dataset, a large amount of Kaggle users had used Tensorflow and others,
TF-IDF vectorisors(input “recommendation” in search bar for notebooks). Simplfying previous works and looking to the Medium article, I was able to build a recommendation system from the Disney catalogue. Below details my process as well as explanations with the code.
To begin, necessary libraries were imported.
| show_id | type | title | director | cast | country | date_added | release_year | rating | duration | listed_in | description | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1114 | s1115 | TV Show | Origins: The Journey of Humankind | NaN | Mark Monroe, Jason Silva | United States | November 12, 2019 | 2016 | TV-14 | 1 Season | Docuseries, Historical | Hosted by Jason Silva, this eight-part series ... |
| 554 | s555 | Movie | The Olympic Elk | James Algar | Winston Hibler | United States | April 24, 2020 | 1952 | TV-G | 27 min | Animals & Nature, Documentary, Family | Olympic elk trek toward the fertile grazing gr... |
| 153 | s154 | Movie | Shark vs. Surfer | Phil Stebbing | Billy Lloyd | NaN | July 23, 2021 | 2020 | TV-14 | 44 min | Animals & Nature, Documentary | Shark vs. Surfer visits shark-infested surf sp... |
| 222 | s223 | TV Show | Disney Special Agent Oso: Three Healthy Steps ... | NaN | Sean Astin, Meghan Strange, Phill Lewis, Amber... | NaN | May 14, 2021 | 2011 | TV-Y | 1 Season | Action-Adventure, Animation, Kids | Oso and his friends watch a kid as they demons... |
| 96 | s97 | TV Show | Dog: Impossible | NaN | Matt Beisner | United States | September 22, 2021 | 2019 | TV-PG | 2 Seasons | Animals & Nature, Docuseries, Family | Matt Beisner uses unique approaches to modifyi... |
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1450 entries, 0 to 1449
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 show_id 1450 non-null object
1 type 1450 non-null object
2 title 1450 non-null object
3 director 977 non-null object
4 cast 1260 non-null object
5 country 1231 non-null object
6 date_added 1447 non-null object
7 release_year 1450 non-null int64
8 rating 1447 non-null object
9 duration 1450 non-null object
10 listed_in 1450 non-null object
11 description 1450 non-null object
dtypes: int64(1), object(11)
memory usage: 136.1+ KB
| show_id | type | title | director | cast | country | date_added | release_year | rating | duration | listed_in | description | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | s3 | Movie | Ice Age: A Mammoth Christmas | Karen Disher | Raymond Albert Romano, John Leguizamo, Denis L... | United States | November 26, 2021 | 2011 | TV-G | 23 min | Animation, Comedy, Family | Sid the Sloth is on Santa's naughty list. |
| 5 | s6 | Movie | Becoming Cousteau | Liz Garbus | Jacques Yves Cousteau, Vincent Cassel | United States | November 24, 2021 | 2021 | PG-13 | 94 min | Biographical, Documentary | An inside look at the legendary life of advent... |
| 9 | s10 | Movie | A Muppets Christmas: Letters To Santa | Kirk R. Thatcher | Steve Whitmire, Dave Goelz, Bill Barretta, Eri... | United States | November 19, 2021 | 2008 | G | 45 min | Comedy, Family, Musical | Celebrate the holiday season with all your fav... |
| 12 | s13 | Movie | The Pixar Story | Leslie Iwerks | Stacy Keach, John Lasseter, Brad Bird, John Mu... | United States | November 19, 2021 | 2007 | G | 91 min | Documentary, Family | A groundbreaking company forever changes the f... |
| 19 | s20 | Movie | Enchanted | Kevin Lima | Amy Adams, Patrick Dempsey, James Marsden, Tim... | United States | November 12, 2021 | 2007 | PG | 110 min | Comedy, Family, Fantasy | An animated princess winds up in the real worl... |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1445 | s1446 | Movie | X-Men Origins: Wolverine | Gavin Hood | Hugh Jackman, Liev Schreiber, Danny Huston, wi... | United States, United Kingdom | June 4, 2021 | 2009 | PG-13 | 108 min | Action-Adventure, Family, Science Fiction | Wolverine unites with legendary X-Men to fight... |
| 1446 | s1447 | Movie | Night at the Museum: Battle of the Smithsonian | Shawn Levy | Ben Stiller, Amy Adams, Owen Wilson, Hank Azar... | United States, Canada | April 2, 2021 | 2009 | PG | 106 min | Action-Adventure, Comedy, Family | Larry Daley returns to rescue some old friends... |
| 1447 | s1448 | Movie | Eddie the Eagle | Dexter Fletcher | Tom Costello, Jo Hartley, Keith Allen, Dickon ... | United Kingdom, Germany, United States | December 18, 2020 | 2016 | PG-13 | 107 min | Biographical, Comedy, Drama | True story of Eddie Edwards, a British ski-jum... |
| 1448 | s1449 | Movie | Bend It Like Beckham | Gurinder Chadha | Parminder Nagra, Keira Knightley, Jonathan Rhy... | United Kingdom, Germany, United States | September 18, 2020 | 2003 | PG-13 | 112 min | Buddy, Comedy, Coming of Age | Despite the wishes of their traditional famili... |
| 1449 | s1450 | Movie | Captain Sparky vs. The Flying Saucers | Mark Waring | Charlie Tahan | United States | April 1, 2020 | 2012 | TV-G | 2 min | Action-Adventure, Animals & Nature, Animation | View one of Sparky's favorite home movies. |
818 rows × 12 columns
[[1. 0.07856161 0.07072431 ... 0. 0. 0. ]
[0.07856161 1. 0.09737137 ... 0. 0. 0. ]
[0.07072431 0.09737137 1. ... 0. 0. 0. ]
...
[0. 0. 0. ... 1. 0. 0. ]
[0. 0. 0. ... 0. 1. 0. ]
[0. 0. 0. ... 0. 0. 1. ]]
0 Join Mickey and the gang as they duck the halls!
1 Santa Claus passes his magic bag to a new St. ...
2 Sid the Sloth is on Santa's naughty list.
3 This is real life, not just fantasy!
4 A three-part documentary from Peter Jackson ca...
Name: description, dtype: object
give_title = df['title'].tolist() # It should in other words, return a list of values from title because thus far we have only dealt with description (Pandas.pydata, n.d), https://pandas.pydata.org/docs/reference/api/pandas.Series.to_list.html
def recommend(title, cosine_sim = the_matrix):
if title not in give_title:
return "Unable to find title."
recommended_movies = []
# Search for title in title column to get the index
idx = df.index[df['title'] == title].tolist()[0]
score_series = pd.Series(cosine_sim[idx]).sort_values(ascending = False)
top_5_indices = list(score_series.iloc[1:6].index)
for i in top_5_indices:
recommended_movies.append(list(df['title'])[i]) # Recommend based on title
return recommended_movies
print(recommend("Moana"))
#Final block fixed and debugged using Google Colab AI assistant (Gemini).['James and the Giant Peach', 'Jack', 'Iron Will', 'Iron Man Armored Adventures', 'Iron Man (Series)']
I learned how to use Cosine Similarity and TF-IDF Vectorisation. Similar to the idea of the SQL project, I leveraged information about Disney movies to build a recommendation system.
System recommendations are essential in today’s business world, where, through systems such as these, we can better understand customers and how to attract them 3. This proves my ability to chase after trends and understand the customer’s mind. A significant component of Disney’s success is its ability to recommend content properly to customers and ensure attention is always kept 4. My ability to recognise this allows me to apply what I have learned in university to add value to a company.