pyspark.sql.datasource.DataSource#
- class pyspark.sql.datasource.DataSource(options)[source]#
- A base class for data sources. - This class represents a custom data source that allows for reading from and/or writing to it. The data source provides methods to create readers and writers for reading and writing data, respectively. At least one of the methods - DataSource.reader()or- DataSource.writer()must be implemented by any subclass to make the data source either readable or writable (or both).- After implementing this interface, you can start to load your data source using - spark.read.format(...).load()and save data using- df.write.format(...).save().- Methods - name()- Returns a string represents the format name of this data source. - reader(schema)- Returns a - DataSourceReaderinstance for reading data.- schema()- Returns the schema of the data source. - simpleStreamReader(schema)- Returns a - SimpleDataSourceStreamReaderinstance for reading data.- streamReader(schema)- Returns a - DataSourceStreamReaderinstance for reading streaming data.- streamWriter(schema, overwrite)- Returns a - DataSourceStreamWriterinstance for writing data into a streaming sink.- writer(schema, overwrite)- Returns a - DataSourceWriterinstance for writing data.