How to Use to_sql in Pandas (With Examples)


Often you may want to write the records stored in a pandas DataFrame to a SQL database.

The benefit of doing this is that you can store the records from multiple DataFrames in a single database and then query the records from whichever DataFrame you would like in a single location.

The easiest way to write records from a DataFrame to a SQL database is to use the pandas to_sql() function, which uses the following basic syntax:

df.to_sql(name, con, schema=None, if_exists=’fail’, …)

where:

  • name: Name to give to SQL table
  • con: The engine or connection to the database
  • schema: A specific table schema to use
  • if_exists: The action to perform if the table already exists in the database

Note that the default value for the if_exists argument is fail.

This means that if a table already exists with the DataFrame you’re attempting to use, you will receive an error.

If you would like to instead overwrite the existing table in the database, then you can specify if_exists=’replace’ to replace the existing table.

The following example shows how to use the to_sql() function to write records from a pandas DataFrame to a SQL database in practice.

Example: How to Use to_sql() in Pandas

Suppose we create the following pandas DataFrame that contains information about various basketball players:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C'],
                   'points': [18, 22, 19, 14, 14, 11, 40, 32],
                   'rebounds': [5, 7, 7, 9, 12, 9, 5, 17],
                   'minutes': [2.1, 4, 5.8, 9, 9.2, 3.5, 4.3, 15.4]})

#view DataFrame
print(df)

  team  points  rebounds  minutes
0    A      18         5      2.1
1    A      22         7      4.0
2    A      19         7      5.8
3    B      14         9      9.0
4    B      14        12      9.2
5    B      11         9      3.5
6    C      40         5      4.3
7    C      32        17     15.4

The DataFrame contains four columns with data about various basketball players including their team, their points scored, their total rebounds, and their average minutes played per game.

We can see that the DataFrame has 8 total records (or “rows”). Suppose that we would like to write each of these records to a SQL database.

For this example, we can create an in-memory SQLite database by using the create_engine function from the sqlalchemy library.

We can use the following syntax to do so:

#create in-memory SQLite database
from sqlalchemy import create_engine
engine = create_engine('sqlite://', echo=False)

Note that no output will be generated at this point.

To then write the records from our DataFrame named df to this database, we can use the following syntax:

#write records from DataFrame to SQL database
df.to_sql(name='basketball_data', con=engine)

8

The to_sql() function simply returns a value of 8, which indicates that 8 records from our DataFrame have been written to the SQL database.

Note that we chose to give the DataFrame a table name of basketball_data within the SQL database.

Note that if a table named basketball_data already existed in the database then we would receive an error.

If we would like to replace the existing table with our DataFrame, we could specify if_exists=’replace’ as follows:

#write records from DataFrame to SQL database
df.to_sql(name='basketball_data', con=engine, if_exists='replace')

8

The to_sql() function returns a value of 8, which tells us that 8 records have been written to the database and the existing basketball_data table has been replaced with the records from our DataFrame.

Note: You can find the complete documentation for the pandas to_sql() function here.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

Pandas: How to Query Column Name with Space
Pandas: How to Use isin() with query() Method
Pandas: How to Query Column Name with Space

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *