Kodeclik Blog
How to find the column names of a Pandas dataframe
Recall that a Pandas dataframe is just like a table or spreadsheet, composed of columns and rows which contain data about some subject matter. Each column denotes typically one attribute (or variable) and each row contains one instance or observation of all variables.
For instance, here is a simple Pandas dataframe where the rows denote months and columns denote various facets of months:
import pandas as pd
months = pd.DataFrame(
{
'name': ["Jan", "Feb", "Mar", "Apr", "May"],
'number': [1, 2, 3, 4, 5],
'days': [31, 28, 31, 30, 31]
})
print(months)
The output will be:
name number days
0 Jan 1 31
1 Feb 2 28
2 Mar 3 31
3 Apr 4 30
4 May 5 31
Note that the days (rows) are numbered from 0 to 1.
To obtain a list of column names of this (or any) Pandas dataframe, there are three possible approaches.
First, we can use list(dataframe). Second, we can use dataframe.columns.values.tolist(). Or finally, we can use list(dataframe.columns.values). Let us try each of these methods in turn.
Method 1: list(dataframe)
The first approach is the simplest and just uses the list constructor with the dataframe as input. Here’s how this will work with our example above:
import pandas as pd
months = pd.DataFrame(
{
'name': ["Jan", "Feb", "Mar", "Apr", "May"],
'number': [1, 2, 3, 4, 5],
'days': [31, 28, 31, 30, 31]
})
print(list(months))
The output will be:
['name', 'number', 'days']
Note that the output is a list containing the three column names of our “months” dataframe.
Method 2: dataframe.columns.values.tolist()
In the second approach, we extract the columns, then the values and finally convert that into a list using the tolist() method. Here is how that works:
import pandas as pd
months = pd.DataFrame(
{
'name': ["Jan", "Feb", "Mar", "Apr", "May"],
'number': [1, 2, 3, 4, 5],
'days': [31, 28, 31, 30, 31]
})
print(months.columns.values.tolist())
The output is once again:
['name', 'number', 'days']
Method 3: list(dataframe.columns.values)
The third approach is a minor variant of the second approach in that instead of using the tolist() method to convert the final answer into a list, we use the list() constructor:
import pandas as pd
months = pd.DataFrame(
{
'name': ["Jan", "Feb", "Mar", "Apr", "May"],
'number': [1, 2, 3, 4, 5],
'days': [31, 28, 31, 30, 31]
})
print(list(months.columns.values))
The output is still the same:
['name', 'number', 'days']
In summary, these are three different ways to extract the column names of a Pandas dataframe into a list. Any one of them will serve your purpose; which one is your favorite?
If you liked this blogpost, checkout our blogpost on Pandas daterange. Also learn how to do cross-tabs in Python Pandas.
Interested in more things Python? Checkout our post on Python queues. Also see our blogpost on Python's enumerate() capability. Also if you like Python+math content, see our blogpost on Magic Squares. Finally, master the Python print function!
Want to learn Python with us? Sign up for 1:1 or small group classes.