Creating a Ruby CLI App That Finds You a Random Movie

Stephen McBride
4 min readMay 11, 2020

For my first Ruby project, I decided to create a CLI application that scrapes a list of movies from a website and outputs the details of a random movie on that list. I also wanted to give the user a way of choosing the type of movie they want, so I set the goal of adding a genre and year filter.

The first website that I had in mind that I thought would be a good candidate for this project is IMDb. If you’ve never heard of IMDb, it’s a website that has information on movies, tv shows, and actors. They also have a section where you can get a list of movies based on a given genre, which was exactly what I was looking for.

Movie genre options on IMBD

However, there was one detail that stopped me from using this in my project, and that was the number of movies that were shown on a page.

List of movies per genre on IMDb

The results were capped at 50 movies per page. To have a healthy amount of movies and lower the possibility of getting repeated results, I wanted about 100. There was no way of upping that number through modifying URL parameters, meaning that the only way of achieving that 100 results goal would be by scraping two pages. This would double the time it would take to get results, which is never a good thing.

After more research, I settled on Rotten Tomatoes, another movie website very similar to IMDb. The main difference between the two is that Rotten Tomatoes had the desired 100 results per genre that I was looking for. They also had the option of sorting by a specified year which IMDb didn’t have. To top it all of, those 100 results were the top-rated movies in the given category, meaning that users would only get the best of the best.

Rotten Tomatoes’ top 100 results

While you can tell from the IMDb and Rotten Tomatoes screenshots that there is far less information per movie in the Rotten Tomatoes’ results, this is not important. All of the data that I need is going to be collected on the movie’s page and all I need is the URL for that page.

For the scraping part of the app, I used a gem called Nokogiri which makes using CSS selectors to extract content from a page a breeze. I soon realized, however, that CSS selectors can only help you so much when the content your scraping has little or no way of identifying it. This was the case with Rotten Tomatoes. Most of the content that I needed from their website was displayed using tables or unordered lists, and those cells and lists items only usually had one class associated with it. While most of the time this only meant that I had to be extra careful with my selectors, there was one instance where this caused a pretty big issue.

I noticed that occasionally I would get an empty string for a movie’s runtime, or the app would just spit out an error saying that Nokogiri’s CSS selector method can’t be run on a nil class. I eventually started to realize that the issue was most common with movies released in 2020. Further digging revealed that a majority of the pages for movies released recently were missing information. Most of the time it was the box office amount, but sometimes it would be the runtime. This explained the nil class error because if an element (list item containing runtime) was not found, the class would be nil. However, this did not explain the empty string I was receiving. Turns out that the missing box office amount I mentioned earlier was the culprit. To get a list item, I couldn’t use class selectors, because all of the list items had the same selectors. Instead, since I had the list items in an array, I had to grab them by index. Hard coding variables to equal a certain list item that you grabbed by index is a bad idea when those items aren’t guaranteed to be in that order. Example:

Hard coding an array and removing an element

As you can see, the app is now outputting “Runtime: Marvel Studios”, which is not correct. I eventually worked around this issue by grabbing the text next to the list item’s value (which in this case was “Runtime:”) and using that as a makeshift identifier.

I’m sure there are more edge cases similar to this waiting to be discovered, but for now, I’ve tested my app a healthy amount of times and can say with confidence that there is a 99% chance that it won’t break. Besides, checking over 4000 movie pages is unnecessary and a waste of time.

Once I got that nailed down, the rest of my project was fairly straight forward and I had no other major issues. I used the data scraped to create a movie class and a category class and created the CLI to handle user input.

If you’d like to take a look at the finished product, here is the Github link: https://github.com/smcbride1/random-movie-picker

--

--