Kodeclik Blog
Splitting a Text File by Lines in Python
In this blogpost we show you how to split a textfile by lines in your Python program. Suppose you have a text file containing months and seasons, like so:
January Winter
February Winter
March Spring
April Spring
May Spring
June Summer
July Summer
August Summer
September Fall
October Fall
November Fall
December Winter
Let us name this file “seasons.txt”. Now we show you how to read this text file and split it into lines inside your Python program for ease of processing.
Method 1: Use a for loop to read each line
The easiest way is to open the file using the open() function and read all its contents into a variable. Then you use a for loop to cycle through this variable and print each line, one at a time.
Consider the program:
with open("seasons.txt",'r') as seasonslist:
for x in seasonslist:
print(x)
This produces the output:
January Winter
February Winter
March Spring
April Spring
May Spring
June Summer
July Summer
August Summer
September Fall
October Fall
November Fall
December Winter
Note that there is an extra newline after each line. This is because, in addition to the newline in the file after each line, the print() function adds its own newline after each invocation. Within the for loop you can do any special processing you desire on the variable.
For instance, you can remove the newlines with the following program:
with open("seasons.txt",'r') as seasonslist:
for x in seasonslist:
print(x.strip())
The output is:
January Winter
February Winter
March Spring
April Spring
May Spring
June Summer
July Summer
August Summer
September Fall
October Fall
November Fall
December Winter
Method 2: Use a list comprehension
A second way to split a text file line by line is using list comprehension syntax. Consider the program:
with open("seasons.txt",'r') as seasonslist:
all_lines = [s.strip() for s in seasonslist]
print(all_lines)
This will output:
['January Winter', 'February Winter', 'March Spring', 'April Spring',
'May Spring', 'June Summer', 'July Summer', 'August Summer',
'September Fall', 'October Fall', 'November Fall', 'December Winter']
If you desire pretty printing, you can use another loop, like so:
with open("seasons.txt",'r') as seasonslist:
all_lines = [s.strip() for s in seasonslist]
for x in all_lines:
print(x)
This will output as before:
January Winter
February Winter
March Spring
April Spring
May Spring
June Summer
July Summer
August Summer
September Fall
October Fall
November Fall
December Winter
Method 3: Use split() or splitlines()
The split() method helps split a string into lists of fields. Thus the third method for splitting our text file looks like this:
seasonslist = open("seasons.txt",'r')
for x in seasonslist:
print(x.split())
We read each line as before into the variable “x” and then use split to convert each line into a list. The output is:
['January', 'Winter']
['February', 'Winter']
['March', 'Spring']
['April', 'Spring']
['May', 'Spring']
['June', 'Summer']
['July', 'Summer']
['August', 'Summer']
['September', 'Fall']
['October', 'Fall']
['November', 'Fall']
['December', 'Winter']
The split() method takes an argument which is the (typically) whitespace character that is used to split each line. If instead of whitespace the fields are separated by a pipe (“|”) character then this should be given as an argument to the split() method.
The splitlines() method behaves for the most part like the split() method except in edge cases involving empty strings and terminal line breaks. Its arguments are also different. Thus the code:
seasonslist = open("seasons.txt",'r')
for x in seasonslist:
print(x.splitlines())
outputs:
['January Winter']
['February Winter']
['March Spring']
['April Spring']
['May Spring']
['June Summer']
['July Summer']
['August Summer']
['September Fall']
['October Fall']
['November Fall']
['December Winter']
Method 4: Use a generator
The final method we will encounter is considered a “lazy” method and is suitable for reading very large files. It splits a text file line by line but doesn’t do it all in one stroke. Instead it provides lines one by one on demand.
First we write a function called “lazy_read”:
def lazy_read(file):
fp = open(file,'r')
while True:
line = fp.readline()
if (not line):
break
yield line
Note that in this function we use the “yield” statement instead of return. The yield statement suspends the function’s execution (in this case, after returning the first line read) but when it is invoked again it remembers enough state to continue with the execution (to the second line, and so on). In this manner, you do not get overwhelmed with large files and can read them line by line.
Now here is how we use this function:
seasonslist = lazy_read('seasons.txt')
for x in seasonslist:
print(x.strip())
The output is as before:
January Winter
February Winter
March Spring
April Spring
May Spring
June Summer
July Summer
August Summer
September Fall
October Fall
November Fall
December Winter
You have learnt four different ways to read a text file and split it into lines in Python. Which one is your favorite?
Interested in more things Python? See our blogpost on Python's enumerate() capability. Also if you like Python+math content, see our blogpost on Magic Squares. Finally, master the Python print function!
Want to learn Python with us? Sign up for 1:1 or small group classes.