Kodeclik Blog
Finding common characters across Python strings
Here are three different methods to find common characters from two given Python strings.
Method 1: Use Set Intersection Operations
This method converts both strings into sets and uses the intersection operator (&) to find common characters.
def common_characters(str1, str2):
return set(str1) & set(str2)
string1 = 'hello'
string2 = 'world'
result = common_characters(string1, string2)
print(f'Common characters: {result}')
It's the most concise and efficient approach, leveraging Python's built-in set operations. The result is a set of unique characters that appear in both strings, eliminating any duplicates automatically.
The output is:
Common characters: {'o', 'l'}
Note that because we are using sets the duplicate characters are not considered and only unique characters in each string are considered when computing the intersection.
Method 2: Use a List and a Loop
This is a more traditional approach, arguably less Pythonic.
def common_characters_2(str1, str2):
common_chars = []
for char in str1:
if char in str2 and char not in common_chars:
common_chars.append(char)
return common_chars
string1 = 'apple'
string2 = 'grape'
result = common_characters_2(string1, string2)
print(f'Common characters: {result}')
Here we are using a traditional loop to iterate through the first string and then checking if each character exists in the second string. The program maintains a list to store common characters and includes a check to avoid duplicates. While not as concise as the set method, it's more explicit and easier to understand for beginners.
The output here will be:
Common characters: ['a', 'p', 'e']
Method 3: Use a List Comprehension
This method uses a list comprehension to create a string of common characters. It preserves the order of characters as they appear in the first string and can include duplicates if they exist in both strings.
Consider the below program:
def common_characters_3(str1, str2):
return ''.join([char for char in str1 if char in str2])
string1 = 'sleep'
string2 = 'leap'
result = common_characters_3(string1, string2)
print(f'Common characters: {result}')
Note that for each character in string str1, we check if it is present in string str2, and if so we add it to the result. This way if there are duplicate characters in string1 (not necessarily in string2), they will get included in the final result.
As seen in the above program, the join() method concatenates the resulting characters into a single string. This approach is particularly useful when you need to maintain the original order of characters or when duplicates are important for your use case.
The output will be:
Common characters: leep
As you can see there are two “e”s printed in the result.
Finding common characters that appear in the same position across two strings
Let us suppose now that we wish to find common characters across two strings but we would also like to be strict about the position in which these characters appear.
Here is an approach:
def find_common_chars_with_position(string_1, string_2):
# Get the minimum length to avoid index out of range
min_length = min(len(string_1), len(string_2))
# Store matches with their positions
matches = []
# Compare characters at same indices
for i in range(min_length):
if string_1[i] == string_2[i]:
matches.append((i, string_1[i]))
return matches
# Example usage
string_1 = "Battle of Ideas"
string_2 = "Brilliant Minds Unite!"
results = find_common_chars_with_position(string_1, string_2)
for index, char in results:
print(f"index {index}: '{char}'")
This approach is quite efficient because it makes a single pass through the strings and compares characters directly at matching indices1.
First, we find the minimum length of both strings to avoid index out of range errors, which is crucial when comparing strings of different lengths. Then we use a for loop to systematically compare characters only in the same position at respective strings. This makes it perfect for comparing DNA sequences or finding matching patterns in strings where position matters.
The output will be:
index 0: 'B'
index 4: 'l'
index 9: ' '
index 14: 's'
As you can see there are four characters common and at the same position between “Battle of Ideas” and “Brilliant Minds Unite!” including a space character.
Finding common characters across multiple strings
It is now easy to generalize the above program to find common characters across multiple strings. Here is how that would work:
def find_common_chars_multi(strings):
# Get the minimum length among all strings
min_length = min(len(s) for s in strings)
# Store matches with their positions
matches = {}
# Compare characters at the same indices in all strings
for i in range(min_length):
# Get the character from the first string as reference
char = strings[0][i]
# Check if this character is present in all other strings at the same index
if all(s[i] == char for s in strings[1:]):
matches[i] = char
return matches
# Example with 4 strings
string_1 = 'Brilliant Minds'
string_2 = 'Brilliant Ideas'
string_3 = 'Brilliant Lines'
string_4 = 'Brilliant Finds'
result = find_common_chars_multi([string_1, string_2, string_3, string_4])
This implementation is more versatile than the previous two-string version because it can handle any number of input strings. Similar to the previous program, this function works by first finding the minimum length among all strings to avoid index out of range errors. Then, it uses the first string as a reference and checks if each character at a given position matches across all other strings. The result is returned as a dictionary where the keys are the positions and the values are the matching characters. In the example above, we can see that all four strings share the word "Brilliant" and the space after it, plus the final 's', which is reflected in the output dictionary.
The result in our case will be:
{0: 'B',
1: 'r',
2: 'i',
3: 'l',
4: 'l',
5: 'i',
6: 'a',
7: 'n',
8: 't',
9: ' ',
14: 's'}
Returning common characters as a string
To make the output more readable we can try to return the results not as a dictionary but as a string where the common characters are in their respective positions and other non-common occurrences are represented with some fallback character like “_”. Here is a program for that purpose:
def find_common_chars_string(strings):
# Get the minimum length among all strings
min_length = min(len(s) for s in strings)
# Initialize a result list with '_' for all positions
result = ['_' for _ in range(min_length)]
# Compare characters at the same indices in all strings
for i in range(min_length):
# Get the character from the first string as reference
char = strings[0][i]
# Check if this character is present in all other strings at the same index
if all(s[i] == char for s in strings[1:]):
result[i] = char
# Join the result list into a string
return ''.join(result)
# Example 1: Similar strings
string_1 = 'Brilliant Minds'
string_2 = 'Brilliant Ideas'
string_3 = 'Brilliant Lines'
string_4 = 'Brilliant Finds'
result1 = find_common_chars_string([string_1, string_2, string_3, string_4])
print(result1)
# Example 2: Different strings
string_1 = 'Academy of Sciences'
string_2 = 'Academy of Arts'
string_3 = 'Academy of Business'
string_4 = 'Academy of Technology'
result2 = find_common_chars_string([string_1, string_2, string_3, string_4])
print(result2)
This version of the code initializes a list with underscores for all positions up to the minimum length of the input strings. When it finds a character that matches across all strings at the same position, it replaces the underscore with that character. The final result is joined into a string, making it easy to see which positions have matching characters and which don't.
The output for the above examples will be:
Brilliant ____s
Academy of ____
As shown in the examples, the output clearly shows 'Brilliant ____s' for the first set of strings and 'Academy of ____' for the second set, where underscores represent positions where characters don't match across all strings.
Applications of finding common characters
There are several practical applications of finding common characters in Python strings. As mentioned earlier, DNA Sequence Comparison is one application where we aim to find matching nucleotides in multiple DNA sequences to identify common genetic patterns. The process of finding common subsequences is essentially looking for “conservation” and can be used to define a multiple sequence alignment.
A second application is in plagiarism detection, where by identifying common character patterns between documents we can help detect potential copied content.
A third application is in data deduplication, where finding common elements in strings is used to remove redundant information.
Want to learn Python with us? Sign up for 1:1 or small group classes.