pyspark.pandas.DataFrame.spark.frame#
- spark.frame(index_col=None)#
- Return the current DataFrame as a Spark DataFrame. - DataFrame.spark.frame()is an alias of- DataFrame.to_spark().- Parameters
- index_col: str or list of str, optional, default: None
- Column names to be used in Spark to represent pandas-on-Spark’s index. The index name in pandas-on-Spark is ignored. By default, the index is always lost. 
 
 - See also - DataFrame.to_spark
- DataFrame.pandas_api
- DataFrame.spark.frame
 - Examples - By default, this method loses the index as below. - >>> df = ps.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [7, 8, 9]}) >>> df.to_spark().show() +---+---+---+ | a| b| c| +---+---+---+ | 1| 4| 7| | 2| 5| 8| | 3| 6| 9| +---+---+---+ - >>> df = ps.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [7, 8, 9]}) >>> df.spark.frame().show() +---+---+---+ | a| b| c| +---+---+---+ | 1| 4| 7| | 2| 5| 8| | 3| 6| 9| +---+---+---+ - If index_col is set, it keeps the index column as specified. - >>> df.to_spark(index_col="index").show() +-----+---+---+---+ |index| a| b| c| +-----+---+---+---+ | 0| 1| 4| 7| | 1| 2| 5| 8| | 2| 3| 6| 9| +-----+---+---+---+ - Keeping an index column is useful when you want to call some Spark APIs and convert it back to pandas-on-Spark DataFrame without creating a default index, which can affect performance. - >>> spark_df = df.to_spark(index_col="index") >>> spark_df = spark_df.filter("a == 2") >>> spark_df.pandas_api(index_col="index") a b c index 1 2 5 8 - In case of multi-index, specify a list to index_col. - >>> new_df = df.set_index("a", append=True) >>> new_spark_df = new_df.to_spark(index_col=["index_1", "index_2"]) >>> new_spark_df.show() +-------+-------+---+---+ |index_1|index_2| b| c| +-------+-------+---+---+ | 0| 1| 4| 7| | 1| 2| 5| 8| | 2| 3| 6| 9| +-------+-------+---+---+ - Can be converted back to pandas-on-Spark DataFrame. - >>> new_spark_df.pandas_api( ... index_col=["index_1", "index_2"]) b c index_1 index_2 0 1 4 7 1 2 5 8 2 3 6 9