How to Create a Pandas DataFrame from a NumPy Array


Occasionally you may want to create a pandas DataFrame from a NumPy array. Fortunately this is easy to do using the following syntax:

#create NumPy array
data = np.array([[1, 7, 6, 5, 6], [4, 4, 4, 3, 1]])

#convert NumPy array to pandas DataFrame
df = pd.DataFrame(data=data)

This tutorial provides an example of how to create a pandas DataFrame from a NumPy array in practice.

Create Pandas DataFrame from a NumPy Array

Suppose we have the following NumPy array:

import numpy as np

#create NumPy array
data = np.array([[1, 7, 6, 5, 6], [4, 4, 4, 3, 1]])

#print class of NumPy array
print(type(data))

<class 'numpy.ndarray'>

We can use the following syntax to create a pandas DataFrame from the array:

import pandas as pd

#convert NumPy array to pandas DataFrame
df = pd.DataFrame(data=data)

#print DataFrame
print(df)

   0  1  2  3  4
0  1  7  6  5  6
1  4  4  4  3  1

#print class of DataFrame
print(type(df)) 

<class 'pandas.core.frame.DataFrame'>

Manually Specify Row & Column Names

We can specify row names for the pandas DataFrame by using the index argument and column names using the columns argument:

#convert NumPy array to pandas DataFrame and specify rows & columns
df = pd.DataFrame(data=data, index=["r1", "r2"], columns=["A", "B", "C", "D", "E"])

#print the DataFrame
print(df)

    A  B  C  D  E
r1  1  7  6  5  6
r2  4  4  4  3  1

Automatically Specify Row & Column Names

If the NumPy array is quite large, it may not be reasonable to manually specify each row and column name. In this case, we could use a simple for loop to specify row and column names

The following code shows how to do so:

#create NumPy array with 100 values
data = np.arange(0,100,1).reshape(20,5)

#print NumPy array
print(data)

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]
 [20 21 22 23 24]
 [25 26 27 28 29]
 [30 31 32 33 34]
 [35 36 37 38 39]
 [40 41 42 43 44]
 [45 46 47 48 49]
 [50 51 52 53 54]
 [55 56 57 58 59]
 [60 61 62 63 64]
 [65 66 67 68 69]
 [70 71 72 73 74]
 [75 76 77 78 79]
 [80 81 82 83 84]
 [85 86 87 88 89]
 [90 91 92 93 94]
 [95 96 97 98 99]]

#convert to pandas DataFrame and automatically specify row and column names
df=pd.DataFrame(data=data[0:,0:],
                index=[i for i in range(data.shape[0])],
                columns=['col'+str(i) for i in range(data.shape[1])])

#print DataFrame 
print(df)

    col0  col1  col2  col3  col4
0      0     1     2     3     4
1      5     6     7     8     9
2     10    11    12    13    14
3     15    16    17    18    19
4     20    21    22    23    24
5     25    26    27    28    29
6     30    31    32    33    34
7     35    36    37    38    39
8     40    41    42    43    44
9     45    46    47    48    49
10    50    51    52    53    54
11    55    56    57    58    59
12    60    61    62    63    64
13    65    66    67    68    69
14    70    71    72    73    74
15    75    76    77    78    79
16    80    81    82    83    84
17    85    86    87    88    89
18    90    91    92    93    94
19    95    96    97    98    99

We can quickly confirm the class of the DataFrame along with the shape:

#print class of DataFrame
print(type(df))

<class 'pandas.core.frame.DataFrame'>

#print number of rows and columns of DataFrame
df.shape

(20, 5)

Additional Resources

How to Add a Numpy Array to a Pandas DataFrame
How to Drop the Index Column in Pandas
Pandas: Select Rows Where Value Appears in Any Column

Leave a Reply

Your email address will not be published. Required fields are marked *