Kodeclik Blog
Checking for valid emails using Python
You will encounter several situations where you are given a string and you need to determine if it holds a valid email address. For instance, checkout our sidebar signup form on this page and try entering a non email address where it asks for an email. See what happens!
To check if a string is a valid email there are at least two approaches you can take: the DIY approach or using an external library. We will look at each next.
Method 1: Use Regular Expressions
In this approach, we use the “re” (or regular expression) module in Python.
import re
def validate_email(email):
pattern = r'^[\w\.-]+@[a-zA-Z\d-]+\.[a-zA-Z]{2,}$'
if re.match(pattern, email):
return True
return False
The above code defines a function validate_email that checks if an email address follows a basic valid format using regular expressions. The key line in the above program is the pattern r'^[\w\.-]+@[a-zA-Z\d-]+\.[a-zA-Z]{2,}$' which although looking complicated essentially captures what we are trying to do.
It breaks down as follows: [\w\.-]+ matches one or more word characters (letters, numbers, or underscores), dots, or hyphens before the @ symbol; [a-zA-Z\d-]+ matches one or more letters, numbers, or hyphens for the domain name; \. matches a literal dot; and [a-zA-Z]{2,} ensures the top-level domain has at least two letters. The function returns True if the email matches this pattern and False otherwise. While this validation is useful for basic checks, it doesn't verify if the email actually exists or can receive mail. For instance the email “doesitexist@kodeclik.com” will pass but such an email does not exist (obviously).
Let us try it out with a few examples. First, lets update it so the message output is more user friendly.
import re
def validate_email(email):
pattern = r'^[\w\.-]+@[a-zA-Z\d-]+\.[a-zA-Z]{2,}$'
if re.match(pattern, email):
return email + " looks valid."
return email + " does not look right to me."
print(validate_email("hello@kodeclik.com"))
print(validate_email("doesitexist@kodeclik.com"))
print(validate_email("kodeclik.com"))
print(validate_email("kodeclik@"))
print(validate_email("kodeclik@kodeclik"))
print(validate_email("kodeclik@kodeclik.python"))
The output will be:
hello@kodeclik.com looks valid.
doesitexist@kodeclik.com looks valid.
kodeclik.com does not look right to me.
kodeclik@ does not look right to me.
kodeclik@kodeclik does not look right to me.
kodeclik@kodeclik.python looks valid.
The first two addresses (and their outputs) should be self-explanatory. The input “kodeclik.com” is flagged as invalid because it doesn’t contain a “@” sign. Similarly, “kodeclik@" does not contain anything after the @ sign. “kodeclik@kodeclik” is also not valid because you need a “dot” and something to the left and right of it. Finally, “kodeclik@kodeclik.python” is flagged as valid because even though “python” may not be a top-level domain (TLD) the string looks like it has the right structure of an email address.
In summary, the above program checks if the username contains letters, numbers, dots, and hyphens; has a single @ symbol; has a domain name with letters and numbers; and finally a TLD with 2 or more characters.
Method 2: Use the email-validator library
A second, more robust, approach is to use the “email-validator” library. This not only outsources the hard task of validating emails but also provides useful messages when the validation fails.
Consider:
from email_validator import validate_email, EmailNotValidError
def check_email(email):
try:
email_info = validate_email(email, check_deliverability=True)
return True, email_info.normalized
except EmailNotValidError as e:
return False, str(e)
print(check_email("hello@kodeclik.com"))
print(check_email("doesitexist@kodeclik.com"))
print(check_email("kodeclik.com"))
print(check_email("kodeclik@"))
print(check_email("kodeclik@kodeclik"))
print(check_email("kodeclik@kodeclik.python"))
Note that you must first install the “email_validator” library using a command like “pip install email_validator”.
The check_email function takes an email address as input and returns a tuple containing two elements: a boolean value (True/False) indicating if the email is valid, and either the normalized form of the email (if valid) or an error message (if invalid). When check_deliverability=True, it performs additional checks including DNS lookups to verify if the domain can receive emails.
The output is:
(True, 'hello@kodeclik.com')
(True, 'doesitexist@kodeclik.com')
(False, 'An email address must have an @-sign.')
(False, 'There must be something after the @-sign.')
(False, 'The part after the @-sign is not valid. It should have a period.')
(False, 'The domain name kodeclik.python does not exist.')
Note that the program is available to figure out that a domain name does not exist (presumably by querying the DNS) but it doesn’t quite know whether the email address “doesitexist@kodeclik.com” exists (because that is the province of this individual domain, i.e., kodeclik.com).
This approach is preferred because it validates email syntax according to RFC standards, checks for domain deliverability, supports internationalized email addresses, and provides detailed error messages when something goes wrong.
If you liked this blogpost, checkout how to check if a string in Python is in camelCase format!
Want to learn Python with us? Sign up for 1:1 or small group classes.