Kodeclik Logo

Our Programs

Courses

Learn More

Schedule

Kodeclik Blog

Fetching a row from a CSV file using Python

Assume you have CSV (comma-separated value) file that looks like this (lets call it “data.csv”):

id,name,phone_number,address
1,Alice,123-456-7890,123 Maple St
2,Bob,234-567-8901,234 Oak St
3,Charlie,345-678-9012,345 Pine St
4,David,456-789-0123,456 Cedar St
5,Eve,567-890-1234,567 Birch St

Note that the file contains a header listing what each column refers to followed by 5 rows of sample data.

Let us now look at ways to fetch a specific row from the file. For instance, let us try to retrieve row 3 and we should obtain the row that describes “Charlie”. (note that we are not counting the header as part of the row numbering.) There are at least six different ways to accomplish our objective in Python!

How to get a row from CSV file in Python

Method 1: Fetching a specific row from a CSV file by directly reading the file

The most straightforward approach is to simply open the file and read it line by line:

def get_row_file_object(file_path, row_number):
  with open(file_path, 'r') as file:
      next(file)  # Skip header
      for i, line in enumerate(file):
          if i == row_number - 1:
              return line.strip().split(',')
  return None


print(get_row_file_object('data.csv',3))

(In running this code, make sure the path to data.csv is properly specified in your line invoking “get_row_file_object”). The get_row_file_object function is designed to retrieve a specific row from a CSV file. It takes two parameters: the file path and the desired row number. When called, it opens the specified file in read mode using a context manager (with statement) which ensures proper file handling. The function first skips the header row using next(file).

Then, it uses enumerate to iterate through the remaining lines, creating pairs of index numbers and line contents. The enumeration starts at 0, which is why the function checks for row_number - 1 to match the user's input (since users typically think of rows starting at 1, not 0). When it finds the matching row, it processes the line by removing any whitespace or newline characters with strip() and converts it into a list using split(',') which separates the line at each comma.

If the requested row is found, it returns this processed list; if not (for example, if the row number is larger than the file length), it returns None. In the example usage print(get_row_file_object('data.csv',3)), the function will return the third row after the header as a list of values.

The output will be (as expected):

['3', 'Charlie', '345-678-9012', '345 Pine St']

Method 2: Fetching a specific row from a CSV file using the CSV module

This approach uses a specific Python module called “csv”.

import csv

def fetch_row_from_csv(file_path, row_number):
  with open(file_path) as csvfile:
      reader = csv.reader(csvfile)
      header = next(reader)  # Skip header
      for idx, row in enumerate(reader):
          if idx == row_number - 1:  # Convert to 0-based indexing
              return row
  return None  # Return None if row_number is out of range


print(fetch_row_from_csv('data.csv',3))

Just like the first example, the above program accepts two parameters: the path to the CSV file and the desired row number. When executed, it opens the specified file and creates a CSV reader object that properly handles CSV formatting, including cases where fields might contain commas or quotes.

Again, the function first skips the header row using next(reader), then uses enumerate to iterate through the remaining rows while keeping track of their position. Since enumerate starts counting from 0 but users typically think of row numbers starting from 1, the function adjusts for this by checking if the current index equals row_number - 1.

When it finds the matching row, it returns it as a list where each element represents a field from that CSV row. If the requested row number exceeds the available rows in the file, the function returns None. When called with print(fetch_row_from_csv('data.csv',3)), it will return the third row of data (excluding the header) from the file 'data.csv'.

The output is again:

['3', 'Charlie', '345-678-9012', '345 Pine St']

Method 3: Fetching a specific row from a CSV file using pandas

The pandas dataframe library provides a very natural way to achieve our objective:

import pandas as pd

# Read specific row by index
df = pd.read_csv('data.csv')
specific_row = df.iloc[2]  # Get third row (0-based indexing)
print(specific_row)

The program starts by importing pandas with the conventional alias pd. It then reads the entire CSV file 'data.csv' into a DataFrame object using pd.read_csv(), which automatically handles the CSV formatting and creates a structured table-like object where data can be easily accessed.

The iloc (integer location) accessor is used to fetch a specific row by its index position - in this case, iloc retrieves the third row since pandas uses 0-based indexing. Unlike the previous methods that return a simple list, this approach returns a pandas Series object (a one-dimensional array with labeled indices) containing the row's data, and it preserves the column names as labels for each value.

When printed, it displays the row's data along with the corresponding column names, making it more readable and informative than raw CSV data.

The output will be:

id                         3
name                 Charlie
phone_number    345-678-9012
address          345 Pine St
Name: 2, dtype: object

Note that the output is not printed as a list like in the first two approaches. It is a specific row of a data frame printed in the format pandas dataframes are printed.

Method 4: Fetching a specific row from a CSV file using the linecache module

The linecache module is memory efficient as it doesn't read the entire file:

import linecache

def get_csv_row(file_path, row_number):
    # Add 1 to account for header
    return linecache.getline(file_path, row_number + 1).strip().split(',')

print(get_csv_row('data.csv',3))

Python's linecache module is specifically designed for reading individual lines from files without loading the entire file into memory. Here, we use linecache.getline() to directly fetch the specified line from the file, adding 1 to the requested row number to account for the header row (since linecache uses 1-based indexing).

The retrieved line is then processed using strip() to remove any whitespace or newline characters, and split(',') to convert the CSV line into a list by splitting it at each comma. This method is particularly efficient for large files when you only need to access specific rows occasionally, as it doesn't require reading the entire file into memory.

The output will be:

['3', 'Charlie', '345-678-9012', '345 Pine St']

Method 5: Fetching a specific row from a CSV file using itertools.islice()

In this approach, we use the Python itertools.islice module, which is designed for working with iterators and sequences.

from itertools import islice

def get_row_itertools(file_path, row_number):
    with open(file_path, 'r') as file:
        next(file)  # Skip header
        row = next(islice(file, row_number - 1, row_number), None)
        return row.strip().split(',') if row else None


print(get_row_itertools('data.csv',3))

When called, the function opens the file in read mode, skips the header using next(file), as in previous programs, and then employs islice to efficiently iterate through the file to reach the desired row.

The islice function takes three arguments: the file object, the starting position (row_number - 1), and the stopping position (row_number), effectively creating a slice of the file that contains only the desired row. The next() function with a default value of None is used to get the single row from this slice. If the row exists, it's processed by removing whitespace with strip() and converted to a list using split(','); if the row doesn't exist (i.e., the row number is out of range), it returns None.

The output will be:

['3', 'Charlie', '345-678-9012', '345 Pine St']

Method 6: Fetching a specific row from a CSV file using a generator function

The below is another memory-efficient approach:

def row_generator(file_path):
  with open(file_path, 'r') as file:
      next(file)  # Skip header
      for line in file:
          yield line.strip().split(',')

# Usage
def get_specific_row(file_path, row_number):
  gen = row_generator(file_path)
  for i, row in enumerate(gen):
      if i == row_number - 1:
          return row
  return None

print(get_specific_row('data.csv',3))

This code implements a generator-based approach to reading CSV files using two functions working together. The row_generator function is a generator that opens the file, skips the header row, and yields each subsequent row as a list of values using yield, which allows for memory-efficient processing as it reads only one line at a time.

The get_specific_row function uses this generator by creating a generator object and then using enumerate to count through the rows until it finds the desired row number (accounting for 0-based indexing by subtracting 1 from the row number). When the matching row is found, it returns that row as a list; if the requested row number exceeds the file length, it returns None.

The output will be:

['3', 'Charlie', '345-678-9012', '345 Pine St']

This generator-based approach is particularly memory efficient for large files because it doesn't load the entire file into memory at once, instead processing rows one at a time as needed. The generator pattern also makes the code reusable, as the same generator could be used for other CSV processing tasks beyond just finding a specific row.

Listing rows with specific conditions

Sometimes you wish to list rows from a CSV file under specific conditions rather than a specific row number. For these types of problems, it is more advantageous to use pandas.

Consider the below program that illustrates various types of conditions you can use to filter your rows:

import pandas as pd

# Read the CSV file
df = pd.read_csv('data.csv')

# Multiple filtering examples

# Greater than condition
filtered_rows = df[df['id'] > 2]                          

# Exact match
name_filter = df[df['name'] == 'Charlie']                 

# String starts with
phone_filter = df[df['phone_number'].str.startswith('345')] 

# Multiple conditions
multiple_conditions = df[(df['id'] > 1) & (df['id'] < 4)]    

# More complex filtering examples
contains_filter = df[df['address'].str.contains('Oak')]  
in_list_filter = df[df['name'].isin(['Alice', 'Bob'])]  
not_null_filter = df[df['phone_number'].notnull()]   

print("Rows with ID > 2:")
print(filtered_rows)

print("\nRows with name 'Charlie':")
print(name_filter)

print("\nRows with phone number beginning in 345:")
print(phone_filter)

print("\nRows with multiple conditions:")
print(multiple_conditions)

print("\nRows containing Oak:")
print(contains_filter)

print("\nRows containing pre-approved people:")
print(in_list_filter)

print("\nRows containing non-null phone numbers:")
print(not_null_filter)

The code demonstrates various filtering methods including numerical comparisons, string matching, combining conditions using & (and) or | (or), list membership testing with isin(), and null value checking. The output will be:

Rows with ID > 2:
   id     name  phone_number       address
2   3  Charlie  345-678-9012   345 Pine St
3   4    David  456-789-0123  456 Cedar St
4   5      Eve  567-890-1234  567 Birch St

Rows with name 'Charlie':
   id     name  phone_number      address
2   3  Charlie  345-678-9012  345 Pine St

Rows with phone number beginning in 345:
   id     name  phone_number      address
2   3  Charlie  345-678-9012  345 Pine St

Rows with multiple conditions:
   id     name  phone_number      address
1   2      Bob  234-567-8901   234 Oak St
2   3  Charlie  345-678-9012  345 Pine St

Rows containing Oak:
   id name  phone_number     address
1   2  Bob  234-567-8901  234 Oak St

Rows containing pre-approved people:
   id   name  phone_number       address
0   1  Alice  123-456-7890  123 Maple St
1   2    Bob  234-567-8901    234 Oak St

Rows containing non-null phone numbers:
   id     name  phone_number       address
0   1    Alice  123-456-7890  123 Maple St
1   2      Bob  234-567-8901    234 Oak St
2   3  Charlie  345-678-9012   345 Pine St
3   4    David  456-789-0123  456 Cedar St
4   5      Eve  567-890-1234  567 Birch St

Enjoy this blogpost? Want to learn Python with us? Sign up for 1:1 or small group classes.

Kodeclik sidebar newsletter

Join our mailing list

Subscribe to get updates about our classes, camps, coupons, and more.

About

Kodeclik is an online coding academy for kids and teens to learn real world programming. Kids are introduced to coding in a fun and exciting way and are challeged to higher levels with engaging, high quality content.

Copyright @ Kodeclik 2024. All rights reserved.