Kodeclik Blog
How to export a Python Dataframe into an SQL file
A Pandas dataframe is a 2-dimensional data structure, similar to a table with rows and columns. It is a fundamental object in Pandas, a popular Python library for data manipulation and analysis.
Dataframes can be created from various data structures such as dictionaries, lists, or arrays, and they provide a convenient way to work with structured data.
For instance, here is a simple Pandas dataframe where the rows denote months and columns denote various facets of months.
import pandas as pd
months = pd.DataFrame({
'name': ["Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"],
'number': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
'days': [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
})
print(months)
The output will be:
name number days
0 Jan 1 31
1 Feb 2 28
2 Mar 3 31
3 Apr 4 30
4 May 5 31
5 Jun 6 30
6 Jul 7 31
7 Aug 8 31
8 Sep 9 30
9 Oct 10 31
10 Nov 11 30
11 Dec 12 31
The rows of the dataframe are numbered from 0 as you can see from the output. Now let us try to convert this dataframe to SQL. How do we go about it?
In our solution, we will use the pandas library (of course) and sqlalchemy to interact with a SQLite database. Below is the code:
import pandas as pd
from sqlalchemy import create_engine
months = pd.DataFrame({
'name': [
"Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
],
'number': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
'days': [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
})
db_file = 'months.db'
db_engine = create_engine(f'sqlite:///{db_file}')
months.to_sql('month_days', db_engine, index=False, if_exists='replace')
query = 'SELECT * FROM month_days'
result = pd.read_sql(query, db_engine)
print(result)
Notice that the script starts by importing the pandas library as pd and the create_engine function from the sqlalchemy module. SQLAlchemy is a SQL toolkit and Object-Relational Mapping (ORM) library for Python.
After we create the dataframe, we create a string db_file that specifies the name of the SQLite database file. Then, we use create_engine to create a connection to the SQLite database specified by db_file.
The DataFrame months is then written to the SQLite database using the to_sql method. The table name is set to 'month_days', and the if_exists parameter is set to 'replace', which means that if the table already exists, it will be replaced with the new data. The index parameter is set to False, indicating that the DataFrame's index (row labels) will not be written to the database as a separate column.
The script then defines a SQL query that selects all records from the 'month_days' table. This query is executed against the database using the read_sql function, which returns the result as a new DataFrame named result. Finally, we print this and the answer is:
name number days
0 Jan 1 31
1 Feb 2 28
2 Mar 3 31
3 Apr 4 30
4 May 5 31
5 Jun 6 30
6 Jul 7 31
7 Aug 8 31
8 Sep 9 30
9 Oct 10 31
10 Nov 11 30
11 Dec 12 31
Wow - we see the exact same dataframe printed (but now via the SQL query result). It is so exact it actually feels a bit suspicious whether this translation from pandas to SQL indeed happened.
Let us update the query part of the program to:
query = 'SELECT * FROM month_days where days < 30'
Now, after exporting the dataframe (which contained 12 rows) we are only querying for rows (months) where the number of days is less than 30. The output will now be:
name number days
0 Feb 2 28
Now you can be convinced that the translation from Pandas to SQL indeed has happened.
Did you like this blogpost? If so, you will enjoy learning more about Pandas and dataframes, e.g., how to print the first 10 rows of a Pandas dataframe.
The underlying theme in this blogpost, of mapping between dataframes and SQL tables occurs quite frequently in real life. In general, it is important to note that while Pandas dataframes and SQL tables both handle tabular data, they are distinct in their use cases, capabilities, and underlying technology.
Dataframes are part of the Pandas library in Python and offer a flexible and user-friendly approach to data analysis, while SQL tables are part of a database system and are optimized for data storage, retrieval, and transactional operations. Despite these differences, this blogpost has shown how you can work with them together. This interoperability allows data analysts and scientists to leverage the strengths of both Pandas and SQL in their workflows, as this blogpost shows.
If you would like to explore more numpy transformations, checkout other Kodeclik blogposts, such as how to convert a Python list into a numpy array.
Want to learn Python with us? Sign up for 1:1 or small group classes.