PySpark Guides

This page lists every PySpark tutorial available on Statology.

PySpark: How to Find Unique Values in a Column
PySpark: How to Select Rows by Index in DataFrame
PySpark: How to Select Columns by Index in DataFrame
PySpark: How to Select Rows Based on Column Values
PySpark: How to Keep Certain Columns
PySpark: How to Select Multiple Columns
PySpark: How to Do a Left Join
PySpark: How to Do a Left Join on Multiple Columns
PySpark: How to Do a Right Join
PySpark: How to Do an Anti-Join
PySpark: How to Do an Outer Join
PySpark: How to Do an Inner Join
PySpark: How to Join on Different Column Names
PySpark: How to Print One Column of DataFrame
PySpark: How to Check if Column Contains String
PySpark: How to Check if Column Exists in DataFrame
PySpark: How to Check if DataFrame is Empty
PySpark: How to Check Data Type of Columns in DataFrame
PySpark: How to Drop Multiple Columns from DataFrame
PySpark: How to Drop Duplicate Rows from DataFrame
PySpark: How to Read CSV File into DataFrame
PySpark: How to Select Distinct Rows
PySpark: How to Select Columns with Alias
PySpark: How to Select Top N Rows in DataFrame
PySpark: How to Select All Columns Except Specific Ones
PySpark: How to Use a Case Statement
PySpark: How to Convert Date to String
PySpark: How to Convert String to Date
PySpark: How to Convert String to Timestamp
PySpark: How to Convert Timestamp to Date
PySpark: How to Convert String to Integer
PySpark: How to Convert Integer to String
PySpark: How to Convert RDD to DataFrame
PySpark: How to Convert Column to Lowercase
PySpark: How to Convert Column to Uppercase
PySpark: How to Use “Is Not Null”
PySpark: How to Use “IS NOT IN”
PySpark: How to Use “OR” Operator
PySpark: How to Use “AND” Operator
PySpark: How to Use “Not Equal” Operator
PySpark: How to Use Case-Insensitive “Contains”
PySpark: How to Filter for “Not Contains”
PySpark: How to Filter Using “Contains”
PySpark: How to Filter for Rows that Contain One of Multiple Values
PySpark: How to Filter Rows Based on Values in a List
PySpark: How to Filter Rows Using LIKE Operator
PySpark: How to Filter Rows Using NOT LIKE Operator
PySpark: How to Filter by Date Range
PySpark: How to Filter by Boolean Column
PySpark: How to Use fillna() with Specific Columns
PySpark: How to Use fillna() with Another Column
PySpark: How to Add New Column with Constant Value
PySpark: How to Add Column from Another DataFrame
PySpark: How to Add Multiple Columns to DataFrame
PySpark: How to Add New Rows to DataFrame
PySpark: How to Add Days to a Date Column
PySpark: How to Add Months to a Date Column
PySpark: How to Add Years to a Date Column
PySpark: How to Create Date Column from Year, Month and Day
PySpark: How to Sum Multiple Columns
PySpark: How to Calculate the Sum of a Column
PySpark: How to Calculate Sum by Group
PySpark: How to Sum Column Based on a Condition
PySpark: How to Calculate a Cumulative Sum
PySpark: How to Count Distinct Values
PySpark: How to Count by Group
PySpark: How to Count Null Values
PySpark: How to Use Alias After Groupby Count
PySpark: How to Count Number of Occurrences
PySpark: How to Replace Multiple Values in One Column
PySpark: How to Replace String in Column
PySpark: How to Replace Zero with Null
PySpark: How to Conditionally Replace Value in Column
PySpark: How to Calculate the Mean of a Column
PySpark: How to Calculate Mean of Multiple Columns
PySpark: How to Calculate the Mean by Group
PySpark: How to Calculate a Rolling Mean
PySpark: How to Calculate the Median of a Column
PySpark: How to Calculate the Median by Group
PySpark: How to Calculate the Max Value of a Column
PySpark: How to Calculate Max Value Across Columns
PySpark: How to Calculate the Max by Group
PySpark: How to Calculate the Minimum Value of a Column
PySpark: How to Calculate Minimum Value Across Columns
PySpark: How to Calculate the Minimum by Group
PySpark: How to Calculate the Mode of a Column
PySpark: How to Calculate Percentiles
PySpark: How to Calculate Quartiles
PySpark: How to Find Duplicates in DataFrame
PySpark: How to Count Number of Duplicate Rows in DataFrame
PySpark: How to Create New DataFrame from Existing DataFrame
PySpark: How to Create Empty DataFrame with Column Names
PySpark: How to Create DataFrame from List
PySpark: How to Calculate a Difference Between Two Dates
PySpark: How to Use Equivalent of Pandas value_counts()
PySpark: How to Drop Rows that Contain a Specific Value
PySpark: How to Calculate Standard Deviation
PySpark: How to Create a Pivot Table
PySpark: How to Unpivot a DataFrame
PySpark: How to Sort Pivot Table by Values in Column
PySpark: How to Calculate Difference Between Two Times
PySpark: How to Count Values in Column with Condition
PySpark: How to Add New Column with Row Numbers
PySpark: How to Reorder Columns
PySpark: How to Remove Specific Characters from Strings
PySpark: How to Remove Special Characters from Column
PySpark: How to Remove Spaces from Column Names
PySpark: How to Remove Leading Zeros in Column
PySpark: How to Extract Substring from Column
PySpark: How to Drop First Column in DataFrame
PySpark: How to Rename Columns
PySpark: How to Concatenate Columns
PySpark: How to Vertically Concatenate DataFrames
PySpark: How to Exclude Columns
PySpark: How to Use groupBy on Multiple Columns
PySpark: How to Use groupBy with Count Distinct
PySpark: How to Select First Row of Each Group
PySpark: How to Get Last Row from DataFrame
PySpark: How to Use Groupby Agg on Multiple Columns
PySpark: How to Create a Duplicate Column
PySpark: How to Group by Date
PySpark: How to Group by Week
PySpark: How to Group by Month
PySpark: How to Group by Year
PySpark: How to Create Boolean Column Based on Condition
PySpark: How to Convert Column from Boolean to Integer
PySpark: How to Multiply Two Columns
PySpark: How to Select Only Numeric Columns
PySpark: How to Order by Multiple Columns
PySpark: How to Check if Value Exists in Column
PySpark: How to Select Columns Containing a Specific String
PySpark: How to Find Minimum Date
PySpark: How to Find Max Date
PySpark: How to Calculate Conditional Mean
PySpark: How to Combine Rows with Same Column Values
PySpark: How to Convert Epoch to Datetime
PySpark: How to Split String Column into Multiple Columns
PySpark: How to Split String in Column and Get Last Item
PySpark: How to Reshape DataFrame from Long to Wide
PySpark: How to Use Case-Insensitive rlike
PySpark: How to Extract Year from Date
PySpark: How to Extract Quarter from Date
PySpark: How to Extract Month from Date
PySpark: How to Calculate the Difference Between Rows
PySpark: How to Add String to Each Value in Column
PySpark: Get Rows Which Are Not in Another DataFrame
PySpark: How to Compare Dates
PySpark: How to Compare Strings
PySpark: How to Calculate Percentage of Total with groupBy
PySpark: How to Create a Correlation Matrix
PySpark: How to Calculate Correlation Between Two Columns
PySpark: How to Extract Hour from Timestamp
PySpark: How to Extract Minutes from Timestamp
PySpark: How to Use Groupby and Concatenate Strings
PySpark: How to Show Full Column Content
PySpark: How to Calculate Lag by Group
PySpark: How to Add Time to Datetime
PySpark: How to Replicate Rows in DataFrame
PySpark: How to Fill Null Values with Mean
PySpark: How to Fill Null Values with Median
PySpark: How to Coalesce Values from Multiple Columns into One
PySpark: How to Select Row with Max Value in Each Group
PySpark: How to Round Column Values to 2 Decimal Places
PySpark: How to Round Date to First Day of Week
PySpark: How to Round Date to First Day of Month
PySpark: How to Use withColumn() with IF ELSE
PySpark: How to Use cast() with Multiple Columns
PySpark: How to Convert DataFrame to Pandas
PySpark: How to Use partitionBy() with Multiple Columns
PySpark: How to Use Window.orderBy() Descending
PySpark: How to Explode Array into Rows
PySpark: How to Union DataFrames with Different Columns
PySpark: How to Perform Union and Return Distinct Rows
PySpark: How to Select Random Sample of Rows
PySpark: How to Create New Column with Random Numbers
PySpark: How to Create Column If It Doesn’t Exist
PySpark: How to Calculate Sum of Each Row in DataFrame
PySpark: How to Perform Linear Regression
PySpark: How to Perform Data Binning
PySpark: How to Use When with AND Condition
PySpark: How to Use When with OR Condition
PySpark: How to Update Column Values Based on Condition
PySpark: How to Add a Count Column to DataFrame
PySpark: A Formula for “Group By Having”
PySpark: How to Split Data into Training & Test Sets
PySpark: How to Drop Rows Based on Multiple Conditions
PySpark: How to Find Day of the Week
PySpark: How to Calculate Summary Statistics
PySpark: How to Create a Crosstab