Python : How to count number of times each character appears in a string

python - count number of characters in a string

Recently, I needed to count number of times the characters appear inside a specific sentence. As I was learning Python at that time, this looked like a perfect opportunity to use it to solve a real problem and it is the focus of this article. First, we will use a dictionary to accomplish this task and in the second solution, we will make use of the collections module.

Let's first deal with the string itself. Before we use it to count repeated elements in the string, we will modify it a bit to remove any unwanted characters and make the text inside the string to be case insensitive.

my_text = "Winter is coming!".lower()
my_clean_text = my_text.replace(" ","").replace("!","")    

Examining the lines above:

  • Line 1:

    my_text = "Winter is coming!".lower()

    Here we set the sentence or whatever string we want to analyze. In our case, we want to check how many times a letter is in a string without caring if they are lowercase or uppercase. We can achieve this by using lower() function on the string which will make the whole text lowercase. If you don't want that, remove the lower() function.

  • Line 2:

    my_clean_text = my_text.replace(" ","").replace("!","")

    You might want to remove any unwanted characters from the string first. If you want to count the frequency of all the characters in the string, simply remove this line. In our case, we removed the white space and ! character by using str.replace function that replaces them with an empty string.

Now that we cleaned up the text, let's focus on the algorithm itself. In the first solution, we will use a dictionary without the need of importing any extra module.

Solution 1 - using a dictionary

Dictionaries are unordered collections, with key-value pairs. The value of a dictionary is accessed using a key and since keys are unique, we can use them to store each unique character in our string. So for example, with the character 'a', the dictionary would be dict['a'] and so on. By iterating through each character, we use each of them as a key of our dictionary and increase its value by 1.

The whole code looks like this:

my_text = "Winter is coming!".lower()
my_clean_text = my_text.replace(" ","").replace("!","")    
print("Original text: {}".format(my_text))
print("Cleaned text: {}".format(my_clean_text))

#calculate number of times the characters appear in a string
my_dict={}
for i in my_clean_text:
    my_dict[i]=my_dict.get(i,0)+1

print("\nFrequency of characters:")
#print the characters in descending order from most frequent to least frequent
for j in sorted(my_dict, key=my_dict.get, reverse=True):
    print(j+":",my_dict[j],"time(s)")

There are two highlighted sections in the above code. The first one does the work of calculating the frequency of characters in a text, the other then displays each character and its frequency from highest to lowest.

Count number of times, the characters appear in a string

Let's begin by examining the code for the first section.

  • Line 7:

    my_dict={}
    

    We first create an empty dictionary.

  • Line 8:

    for i in my_clean_text:
    

    In this line, we iterate through each character in the string.

  • Line 9:

        my_dict[i]=my_dict.get(i,0)+1
    

    Once inside the loop, we have variable i containing a character from the string and we want to use it as a key in a dictionary and increase its value by 1. The problem is if the key doesn't exist yet and we access it, Python will throw the KeyError: error, so we can't just use my_dict[i]=my_dict[i]+1 as it will not work.

    To solve this, we make use of built-in dictionary get() function. The 1st parameter requires a key from which we want the value and then the function returns either a value of that key or in case it doesn't exist, it returns the value of the 2nd parameter, which in our case is 0. We then increase the value of the my_dict[i] key by 1.

    Note: The 2nd parameter of get() is optional, but it defaults to None and since None + 1 will cause TypeError: unsupported operand type(s) for +: 'NoneType' and 'int' error, we set it to 0 value.

We end up with a dictionary where each unique character in a string is stored as a key, and the value contains a number of times, the characters repeat in a string.

Printing the characters in descending order

Now, we are going to sort them by the most frequent to the least frequent by using the built-in sorted() Python function.

  • Line 13:

    for j in sorted(my_dict, key=my_dict.get, reverse=True):
    

    The sorted() function sorts a sequence or a collection and returns a new list with elements sorted while leaving the original sequence / collection unchanged and this returned list is what the for loop iterates through.

    The function parameters are as follows:

    • sequence / collection we want to sort (required)

      This required parameter is any iterable object (list, tuple, dictionary, ...) we want to sort and in our case, it is the my_dict dictionary.

    • key (optional, as keyword argument key=)

      With the key parameter, we control what we want to sort. In our case with the dictionary, without setting the key parameter, the keys of the dictionary would have been sorted, but we need the dictionary to be sorted by values. The key parameter expects a function object and this is where get() function comes useful. By using key=my_dict.get, the value of each key will be used in the sort.

    • reverse (optional, as keyword argument reverse=)

      By default, the sorted elements will be sorted in ascending order (lowest to highest). With reverse=True, we tell it to sort it in descending order (highest to lowest).

  • Line 14:

        print(j+":",my_dict[j],"time(s)")  
    

    Here, we are displaying the result of the sorted() function from line 13 that returns a list of keys when sorting the dictionary. We first print the key and then its value, which in our case is a unique character from the string and the number of times it appears in the string.

Note: If you want to learn more about the dictionaries in Python, check this comprehensive article about this topic.

Next, we are going to examine how to accomplish the same task without the need to deal with the dictionary, get() and sorted() function. This time, we will be using the collections module instead.

Solution 2 - using Counter in collections module

For the 2nd solution, the complete code looks like this:

from collections import Counter

my_text = "Winter is coming!"
my_clean_text = my_text.replace(" ","").replace("!","")
print("Original text: {}".format(my_text))
print("Cleaned text: {}".format(my_clean_text))

#calculate number of times the characters appear in a string
list_of_chars = list(my_clean_text)
cnt = Counter(list_of_chars)
common = cnt.most_common()

print("\nFrequency of characters:")
#print the characters in descending order from most frequent to least frequent 
for c in common:
    print(c[0]+":",c[1],"time(s)")

Let's go through the highlighted lines.

  • Line 1:

    from collections import Counter
    

    First, we import the Counter class from the collections module. This module is part of the standard python library and is included when we installed Python, so no need to use pip install command.

As with the 1st solution, this one also has two main highlighted sections. The first section deals with the counting repeated elements and the other section displays the characters from highest number to lowest.

Count number of times, the characters appear in a string

  • Line 9:

    list_of_chars = list(my_clean_text)

    From the text string, we need to store each character as an element in the list and we do that using the Python list() function.

  • Line 10:

    cnt = Counter(list_of_chars)

    We then use this list of characters as an argument on a Counter object from the collections module. The Counter class is used to count the unique elements in the list. This is the description of Counter in the official python documentation:

  • A Counter is a dict subclass for counting hashable objects. It is an unordered collection where elements are stored as dictionary keys and their counts are stored as dictionary values. Counts are allowed to be any integer value including zero or negative counts. The Counter class is similar to bags or multisets in other languages.

    The source code of the Counter class makes use of the dictionary to accomplish this task, similar to our first solution, it just hides all the low-level details from us.

  • Line 11:

    common = cnt.most_common()

    Finally, we use the most_common() method of the Counter class, which returns a list of all elements, ordered from most common to the least common.

Printing the characters in descending order

For displaying the results, this section is very similar to the one in the first solution, except that we don't need to work with sorted() function at all as we already have the result sorted from most frequent to least frequent in the line 11.

  • Line 15:

    for c in common:

    Here we just iterate through the common variable and it contains the list where elements are tuples containing the character and their count. For example, if the character 'i' is repeated in the string 3 times, it would be stored as a ('i', 3) tuple.

  • Line 16:

        print(c[0]+":",c[1],"time(s)") 
    

    This line displays the character and its value from a tuple by using c[0] and c[1] to access the tuple elements.

Output of the script

The output will be the same for both solutions and they will generate the following:

Original text: Winter is coming!
Cleaned text: Winteriscoming

Frequency of characters:
i: 3 time(s)
n: 2 time(s)
W: 1 time(s)
t: 1 time(s)
e: 1 time(s)
r: 1 time(s)
s: 1 time(s)
c: 1 time(s)
o: 1 time(s)
m: 1 time(s)
g: 1 time(s)

Conclusion

In this article, we wanted to count number of times, the character occurs in a string and we achieved this using two solutions. In the first solution, we used a dictionary while for the 2nd one, we imported the Counter class from the collections module and used its most_common() method.

I hope you have found this article useful. If you have any other interesting solution to the same problem, that you would like to share, let me know and I might add it here.

Write a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.