Kodeclik Blog
Fetching a row from a CSV file using Python
Assume you have CSV (comma-separated value) file that looks like this (lets call it “data.csv”):
id,name,phone_number,address
1,Alice,123-456-7890,123 Maple St
2,Bob,234-567-8901,234 Oak St
3,Charlie,345-678-9012,345 Pine St
4,David,456-789-0123,456 Cedar St
5,Eve,567-890-1234,567 Birch St
Note that the file contains a header listing what each column refers to followed by 5 rows of sample data.
Let us now look at ways to fetch a specific row from the file. For instance, let us try to retrieve row 3 and we should obtain the row that describes “Charlie”. (note that we are not counting the header as part of the row numbering.) There are at least six different ways to accomplish our objective in Python!
Method 1: Fetching a specific row from a CSV file by directly reading the file
The most straightforward approach is to simply open the file and read it line by line:
def get_row_file_object(file_path, row_number):
with open(file_path, 'r') as file:
next(file) # Skip header
for i, line in enumerate(file):
if i == row_number - 1:
return line.strip().split(',')
return None
print(get_row_file_object('data.csv',3))
(In running this code, make sure the path to data.csv is properly specified in your line invoking “get_row_file_object”). The get_row_file_object function is designed to retrieve a specific row from a CSV file. It takes two parameters: the file path and the desired row number. When called, it opens the specified file in read mode using a context manager (with statement) which ensures proper file handling. The function first skips the header row using next(file).
Then, it uses enumerate to iterate through the remaining lines, creating pairs of index numbers and line contents. The enumeration starts at 0, which is why the function checks for row_number - 1 to match the user's input (since users typically think of rows starting at 1, not 0). When it finds the matching row, it processes the line by removing any whitespace or newline characters with strip() and converts it into a list using split(',') which separates the line at each comma.
If the requested row is found, it returns this processed list; if not (for example, if the row number is larger than the file length), it returns None. In the example usage print(get_row_file_object('data.csv',3)), the function will return the third row after the header as a list of values.
The output will be (as expected):
['3', 'Charlie', '345-678-9012', '345 Pine St']
Method 2: Fetching a specific row from a CSV file using the CSV module
This approach uses a specific Python module called “csv”.
import csv
def fetch_row_from_csv(file_path, row_number):
with open(file_path) as csvfile:
reader = csv.reader(csvfile)
header = next(reader) # Skip header
for idx, row in enumerate(reader):
if idx == row_number - 1: # Convert to 0-based indexing
return row
return None # Return None if row_number is out of range
print(fetch_row_from_csv('data.csv',3))
Just like the first example, the above program accepts two parameters: the path to the CSV file and the desired row number. When executed, it opens the specified file and creates a CSV reader object that properly handles CSV formatting, including cases where fields might contain commas or quotes.
Again, the function first skips the header row using next(reader), then uses enumerate to iterate through the remaining rows while keeping track of their position. Since enumerate starts counting from 0 but users typically think of row numbers starting from 1, the function adjusts for this by checking if the current index equals row_number - 1.
When it finds the matching row, it returns it as a list where each element represents a field from that CSV row. If the requested row number exceeds the available rows in the file, the function returns None. When called with print(fetch_row_from_csv('data.csv',3)), it will return the third row of data (excluding the header) from the file 'data.csv'.
The output is again:
['3', 'Charlie', '345-678-9012', '345 Pine St']
Method 3: Fetching a specific row from a CSV file using pandas
The pandas dataframe library provides a very natural way to achieve our objective:
import pandas as pd
# Read specific row by index
df = pd.read_csv('data.csv')
specific_row = df.iloc[2] # Get third row (0-based indexing)
print(specific_row)
The program starts by importing pandas with the conventional alias pd. It then reads the entire CSV file 'data.csv' into a DataFrame object using pd.read_csv(), which automatically handles the CSV formatting and creates a structured table-like object where data can be easily accessed.
The iloc (integer location) accessor is used to fetch a specific row by its index position - in this case, iloc retrieves the third row since pandas uses 0-based indexing. Unlike the previous methods that return a simple list, this approach returns a pandas Series object (a one-dimensional array with labeled indices) containing the row's data, and it preserves the column names as labels for each value.
When printed, it displays the row's data along with the corresponding column names, making it more readable and informative than raw CSV data.
The output will be:
id 3
name Charlie
phone_number 345-678-9012
address 345 Pine St
Name: 2, dtype: object
Note that the output is not printed as a list like in the first two approaches. It is a specific row of a data frame printed in the format pandas dataframes are printed.
Method 4: Fetching a specific row from a CSV file using the linecache module
The linecache module is memory efficient as it doesn't read the entire file:
import linecache
def get_csv_row(file_path, row_number):
# Add 1 to account for header
return linecache.getline(file_path, row_number + 1).strip().split(',')
print(get_csv_row('data.csv',3))
Python's linecache module is specifically designed for reading individual lines from files without loading the entire file into memory. Here, we use linecache.getline() to directly fetch the specified line from the file, adding 1 to the requested row number to account for the header row (since linecache uses 1-based indexing).
The retrieved line is then processed using strip() to remove any whitespace or newline characters, and split(',') to convert the CSV line into a list by splitting it at each comma. This method is particularly efficient for large files when you only need to access specific rows occasionally, as it doesn't require reading the entire file into memory.
The output will be:
['3', 'Charlie', '345-678-9012', '345 Pine St']
Method 5: Fetching a specific row from a CSV file using itertools.islice()
In this approach, we use the Python itertools.islice module, which is designed for working with iterators and sequences.
from itertools import islice
def get_row_itertools(file_path, row_number):
with open(file_path, 'r') as file:
next(file) # Skip header
row = next(islice(file, row_number - 1, row_number), None)
return row.strip().split(',') if row else None
print(get_row_itertools('data.csv',3))
When called, the function opens the file in read mode, skips the header using next(file), as in previous programs, and then employs islice to efficiently iterate through the file to reach the desired row.
The islice function takes three arguments: the file object, the starting position (row_number - 1), and the stopping position (row_number), effectively creating a slice of the file that contains only the desired row. The next() function with a default value of None is used to get the single row from this slice. If the row exists, it's processed by removing whitespace with strip() and converted to a list using split(','); if the row doesn't exist (i.e., the row number is out of range), it returns None.
The output will be:
['3', 'Charlie', '345-678-9012', '345 Pine St']
Method 6: Fetching a specific row from a CSV file using a generator function
The below is another memory-efficient approach:
def row_generator(file_path):
with open(file_path, 'r') as file:
next(file) # Skip header
for line in file:
yield line.strip().split(',')
# Usage
def get_specific_row(file_path, row_number):
gen = row_generator(file_path)
for i, row in enumerate(gen):
if i == row_number - 1:
return row
return None
print(get_specific_row('data.csv',3))
This code implements a generator-based approach to reading CSV files using two functions working together. The row_generator function is a generator that opens the file, skips the header row, and yields each subsequent row as a list of values using yield, which allows for memory-efficient processing as it reads only one line at a time.
The get_specific_row function uses this generator by creating a generator object and then using enumerate to count through the rows until it finds the desired row number (accounting for 0-based indexing by subtracting 1 from the row number). When the matching row is found, it returns that row as a list; if the requested row number exceeds the file length, it returns None.
The output will be:
['3', 'Charlie', '345-678-9012', '345 Pine St']
This generator-based approach is particularly memory efficient for large files because it doesn't load the entire file into memory at once, instead processing rows one at a time as needed. The generator pattern also makes the code reusable, as the same generator could be used for other CSV processing tasks beyond just finding a specific row.
Listing rows with specific conditions
Sometimes you wish to list rows from a CSV file under specific conditions rather than a specific row number. For these types of problems, it is more advantageous to use pandas.
Consider the below program that illustrates various types of conditions you can use to filter your rows:
import pandas as pd
# Read the CSV file
df = pd.read_csv('data.csv')
# Multiple filtering examples
# Greater than condition
filtered_rows = df[df['id'] > 2]
# Exact match
name_filter = df[df['name'] == 'Charlie']
# String starts with
phone_filter = df[df['phone_number'].str.startswith('345')]
# Multiple conditions
multiple_conditions = df[(df['id'] > 1) & (df['id'] < 4)]
# More complex filtering examples
contains_filter = df[df['address'].str.contains('Oak')]
in_list_filter = df[df['name'].isin(['Alice', 'Bob'])]
not_null_filter = df[df['phone_number'].notnull()]
print("Rows with ID > 2:")
print(filtered_rows)
print("\nRows with name 'Charlie':")
print(name_filter)
print("\nRows with phone number beginning in 345:")
print(phone_filter)
print("\nRows with multiple conditions:")
print(multiple_conditions)
print("\nRows containing Oak:")
print(contains_filter)
print("\nRows containing pre-approved people:")
print(in_list_filter)
print("\nRows containing non-null phone numbers:")
print(not_null_filter)
The code demonstrates various filtering methods including numerical comparisons, string matching, combining conditions using & (and) or | (or), list membership testing with isin(), and null value checking. The output will be:
Rows with ID > 2:
id name phone_number address
2 3 Charlie 345-678-9012 345 Pine St
3 4 David 456-789-0123 456 Cedar St
4 5 Eve 567-890-1234 567 Birch St
Rows with name 'Charlie':
id name phone_number address
2 3 Charlie 345-678-9012 345 Pine St
Rows with phone number beginning in 345:
id name phone_number address
2 3 Charlie 345-678-9012 345 Pine St
Rows with multiple conditions:
id name phone_number address
1 2 Bob 234-567-8901 234 Oak St
2 3 Charlie 345-678-9012 345 Pine St
Rows containing Oak:
id name phone_number address
1 2 Bob 234-567-8901 234 Oak St
Rows containing pre-approved people:
id name phone_number address
0 1 Alice 123-456-7890 123 Maple St
1 2 Bob 234-567-8901 234 Oak St
Rows containing non-null phone numbers:
id name phone_number address
0 1 Alice 123-456-7890 123 Maple St
1 2 Bob 234-567-8901 234 Oak St
2 3 Charlie 345-678-9012 345 Pine St
3 4 David 456-789-0123 456 Cedar St
4 5 Eve 567-890-1234 567 Birch St
Enjoy this blogpost? Want to learn Python with us? Sign up for 1:1 or small group classes.