– What were the difficulties you encountered during the project?
Hardest part was the dataset pre-processing. Since the data is not exactly created for my purpose, I had to change it a lot. After this process, hypothesis testing and learning parts were easy. Because I have manipulated the data just as I wanted. So, I only wrote basic learning code since my data was ready for it.
– If you were given sufficient amount of resources, what additional datasets would you utilize?
I worked on just Action and Drama genres. If my computer would be faster and I had more time for programming, I would analyze all of the genre types. In addition I was using a very large data with 20 million ratings. However I had to use only 100.000 of it because I didn’t have powerful computer and internet.
– Compare the machine learning algorithms you used, in terms of performance and applicability to your dataset.
k-NN was more applicable for my data however I would prefer, for proper datasets, decision tree model since it has is efficient.
– What improvements could have been done in your project?
I would have used more data, this would increase my learning accuracy. I would try more options for learning algorithms.