One of my favorite book series is The Dark Tower, by Stephen King. Forget about the movie, it doesn’t do it any justice. We have a collection of eight books and one short story telling an epic tale of the gunslinger and his group of friends, his ka-tet, moving towards the Dark Tower, with more than four thousand pages, while dealing with the infamous man in black.

From Wikipedia:

The Dark Tower is a series of eight books and one short story written by American author Stephen King. Incorporating themes from multiple genres, including dark fantasy, science fantasy, horror, and Western, it describes a “gunslinger” and his quest toward a tower, the nature of which is both physical and metaphorical. The series, and its use of the Dark Tower, expands upon Stephen King’s multiverse and in doing so, links together many of his other novels.

Besides that, we will also take a look at the poem that inspired Stephen King to write this series, “Childe Roland to the Dark Tower Came” by Robert Browning.

OrderTitlePagesWordsRelease Year
0Childe Roland to the Dark Tower Came61,7611855
0.5The Little Sisters of Eluria6623,4341998
1The Gunslinger22455,3761982
2The Drawing of the Three400125,9481987
3The Waste Lands512173,4891991
4Wizard and Glass787254,6911997
4.5The Wind through the Keyhole33691,8572012
5Wolves of the Calla714242,7762003
6Song of Susannah432118,2212004
7The Dark Tower845272,2732004
The Dark Tower series4,3161,358,065

Word clouds

To get some insights from the series, we will generate graphs showing the most frequent words in each book. One special kind of visualization that can help us is the word cloud. Again, from Wikipedia:

A tag cloud (word cloud or wordle or weighted list in visual design) is a novelty visual representation of text data, typically used to depict keyword metadata (tags) on websites, or to visualize free form text. Tags are usually single words, and the importance of each tag is shown with font size or color.

What does that mean? A word cloud is a collection, or cluster, of words depicted in different sizes. The bigger and bolder the word appears, the more often it’s mentioned within a given text and the more important it is.


Data set

We want to visualize the frequency of the most common words in this series of books. We will generate a graph for each book and another for whe whole body of text. Our input data is composed of ten text files, one for each title.

We won’t be doing any exploration analysis, but we have some guesses of what we expect as the most relevant words. For the poem, we don’t have any prior knowledge. The first book is mainly a solo story, with the gunslinger follows the man in black and meets the boy Jake. For the other books, gunslinger and Roland are the main ways the main character is referred to. His companions are JakeEddieSusannah and their pet Oy. This group is moving towards the Dark Tower. The short story and some of the books are recollections of memmories from Roland, so other names will pop up.


Creating word clouds

There are Python libraries designed to create this visualizations, so the process to build them is very straightforward. We will describe the step-by-step for the first text and then present the images for every book.

Our input needed is a a string with the collection of every word of the text. So we need to do some data cleaning, removing everything that isn’t alphanumeric. That means all the punctuation and special coding characters, like ‘\n’, that denotes a new line. We also turn everything to lowercase.

Original text
We need to go from this…
Text without punctuation and special characters
to this and then…
Text without punctuation and special characters, all lowercase
to our clean input text.

With this long string with all the words, we can create our first word cloud for the poem ‘Childe Roland to the Dark Tower Came‘.

Word cloud for 'Childe Roland to the Dark Tower Came'
Word cloud for ‘Childe Roland to the Dark Tower Came‘.
The main words are onenowcame, and set. Nothing much can be learned from this.

Let’s see what the other texts show!

Word cloud for ‘The Little Sisters of Eluria
The most relevant word now is roland, what makes a lot of sense. Saidbackonehand and ye are the next ones. We can see gunslinger is important.
Word cloud for ‘The Gunslinger
Gunslinger is how Roland is called by most people he meets while following the man in black. In this book, Roland meets Jake, the boy who will accompany him until the end.
Word cloud for ‘The Drawing of the Three
The new characters are starting to show. EddieRoland and gunslinger are the most relevant words.
Word cloud for ‘The Waste Lands
Now we have our whole group: RolandJakeEddie e Susannah (smaller). As they talk to each other during their journey, their names will be more frequent in the text.
Word cloud for ‘Wizard and Glass
In this book, Roland is telling his back story, so we see less of his companions. We see SusanJonas and Cuthbert, his old friends.
Word cloud for ‘The Wind through the Keyhole
Roland tells the tale of Tim Stoutheart to his friends. There are no remarkable words.
Word cloud for ‘Wolves of the Calla
Again we have our major characters, but now Father Callahan shows up.
Word cloud for ‘Song of Susannah
We see the name of the demon Mia who is responsible for the possession of Susannah.
Word cloud for ‘The Dark Tower
We come to the last book, and we see most of the words being repeated book after book.
Word cloud for the whole series

Filtering

We have some words that show up in most of the graphs, but don’t bring any insight to the story. Onebacksaidnowtimehandthought and looked will be added to a list of words that aren’t considered for the visualization. These are known as stopwords, such as the and and, who are used too frequently to be of any help to the graph.

Let’s also remove the name of the four main characters: Roland, Jake, Eddie and Susannah. Most of the time they are talking amongst themselves, so their names are too frequent.

Filtered word cloud for the whole series

We could keep adding words to the list, but there is no end to that. Let’s take a look at the first book again.

The first sentence of this book is: The man in black fled across the desert, and the gunslinger followed. This visualization is better, drawing attention to the gunslinger, the man in black and the boy.

We can improve it a little further by adding a mask to the graph.

We can bring some image from the story to our graph.

Conclusion

This is a data visualization exercise. There are eight books and one short story following the journey of a group of four people towards the Dark Tower. It is a small set of main characters, so we should remove their names to be able to get some new insights.


Final conclusion: we can present the distribution of relevant words for the Dark Tower Series in a pleasing and insightful visualization.


Jupyter Notebook available at Github. Interactive panel at Tableau Public.