pyspark.pandas.CategoricalIndex#
- class pyspark.pandas.CategoricalIndex(data=None, categories=None, ordered=None, dtype=None, copy=False, name=None)[source]#
- Index based on an underlying Categorical. - CategoricalIndex can only take on a limited, and usually fixed, number of possible values (categories). Also, it might have an order, but numerical operations (additions, divisions, …) are not possible. - Parameters
- dataarray-like (1-dimensional)
- The values of the categorical. If categories are given, values not in categories will be replaced with NaN. 
- categoriesindex-like, optional
- The categories for the categorical. Items need to be unique. If the categories are not given here (and also not in dtype), they will be inferred from the data. 
- orderedbool, optional
- Whether or not this categorical is treated as an ordered categorical. If not given here or in dtype, the resulting categorical will be unordered. 
- dtypeCategoricalDtype or “category”, optional
- If - CategoricalDtype, cannot be used together with categories or ordered.
- copybool, default False
- Make a copy of input ndarray. 
- nameobject, optional
- Name to be stored in the index. 
 
 - See also - Index
- The base pandas-on-Spark Index type. 
 - Examples - >>> ps.CategoricalIndex(["a", "b", "c", "a", "b", "c"]) CategoricalIndex(['a', 'b', 'c', 'a', 'b', 'c'], categories=['a', 'b', 'c'], ordered=False, dtype='category') - CategoricalIndexcan also be instantiated from a- Categorical:- >>> c = pd.Categorical(["a", "b", "c", "a", "b", "c"]) >>> ps.CategoricalIndex(c) CategoricalIndex(['a', 'b', 'c', 'a', 'b', 'c'], categories=['a', 'b', 'c'], ordered=False, dtype='category') - Ordered - CategoricalIndexcan have a min and max value.- >>> ci = ps.CategoricalIndex( ... ["a", "b", "c", "a", "b", "c"], ordered=True, categories=["c", "b", "a"] ... ) >>> ci CategoricalIndex(['a', 'b', 'c', 'a', 'b', 'c'], categories=['c', 'b', 'a'], ordered=True, dtype='category') - From a Series: - >>> s = ps.Series(["a", "b", "c", "a", "b", "c"], index=[10, 20, 30, 40, 50, 60]) >>> ps.CategoricalIndex(s) CategoricalIndex(['a', 'b', 'c', 'a', 'b', 'c'], categories=['a', 'b', 'c'], ordered=False, dtype='category') - From an Index: - >>> idx = ps.Index(["a", "b", "c", "a", "b", "c"]) >>> ps.CategoricalIndex(idx) CategoricalIndex(['a', 'b', 'c', 'a', 'b', 'c'], categories=['a', 'b', 'c'], ordered=False, dtype='category') - Methods - add_categories(new_categories)- Add new categories. - all(*args, **kwargs)- Return whether all elements are True. - any([axis])- Return whether any element is True. - append(other)- Append a collection of Index options together. - argmax()- Return a maximum argument indexer. - argmin()- Return a minimum argument indexer. - Set the Categorical to be ordered. - Set the Categorical to be unordered. - asof(label)- Return the label from the index, or, if not present, the previous one. - astype(dtype)- Cast a pandas-on-Spark object to a specified dtype - dtype.- copy([name, deep])- Make a copy of this object. - delete(loc)- Make new Index with passed location(-s) deleted. - difference(other[, sort])- Return a new Index with elements from the index that are not in other. - drop(labels)- Make new Index with passed list of labels deleted. - drop_duplicates([keep])- Return Index with duplicate values removed. - droplevel(level)- Return index with requested level(s) removed. - dropna([how])- Return Index or MultiIndex without NA/NaN values - equals(other)- Determine if two Index objects contain the same elements. - factorize([sort, use_na_sentinel])- Encode the object as an enumerated type or categorical variable. - fillna(value)- Fill NA/NaN values with the specified value. - get_level_values(level)- Return Index if a valid level is given. - holds_integer()- Whether the type is an integer type. - identical(other)- Similar to equals, but check that other comparable attributes are also equal. - insert(loc, item)- Make new Index inserting new item at location. - intersection(other)- Form the intersection of two Index objects. - is_boolean()- Return if the current index type is a boolean type. - is_categorical()- Return if the current index type is a categorical type. - is_floating()- Return if the current index type is a floating type. - is_integer()- Return if the current index type is an integer type. - is_interval()- Return if the current index type is an interval type. - is_numeric()- Return if the current index type is a numeric type. - is_object()- Return if the current index type is an object type. - isin(values)- Check whether values are contained in Series or Index. - isna()- Detect existing (non-missing) values. - isnull()- Detect existing (non-missing) values. - item()- Return the first element of the underlying data as a python scalar. - map(mapper)- Map values using input correspondence (a dict, Series, or function). - max()- Return the maximum value of the Index. - min()- Return the minimum value of the Index. - notna()- Detect existing (non-missing) values. - notnull()- Detect existing (non-missing) values. - nunique([dropna, approx, rsd])- Return number of unique elements in the object. - remove_categories(removals)- Remove the specified categories. - Remove categories which are not used. - rename(name[, inplace])- Alter Index or MultiIndex name. - rename_categories(new_categories)- Rename categories. - reorder_categories(new_categories[, ordered])- Reorder categories as specified in new_categories. - repeat(repeats)- Repeat elements of a Index/MultiIndex. - set_categories(new_categories[, ordered, rename])- Set the categories to the specified new_categories. - set_names(names[, level, inplace])- Set Index or MultiIndex name. - shift([periods, fill_value])- Shift Series/Index by desired number of periods. - sort(*args, **kwargs)- Use sort_values instead. - sort_values([return_indexer, ascending])- Return a sorted copy of the index, and optionally return the indices that sorted the index itself. - symmetric_difference(other[, result_name, sort])- Compute the symmetric difference of two Index objects. - take(indices)- Return the elements in the given positional indices along an axis. - to_frame([index, name])- Create a DataFrame with a column containing the Index. - to_list()- Return a list of the values. - to_numpy([dtype, copy])- A NumPy ndarray representing the values in this Index or MultiIndex. - to_pandas()- Return a pandas Index. - to_series([name])- Create a Series with both index and values equal to the index keys useful with map for returning an indexer based on an index. - tolist()- Return a list of the values. - transpose()- Return the transpose, For index, It will be index itself. - union(other[, sort])- Form the union of two Index objects. - unique([level])- Return unique values in the index. - value_counts([normalize, sort, ascending, ...])- Return a Series containing counts of unique values. - view()- this is defined as a copy with the same identity - Attributes - T- Return the transpose, For index, It will be index itself. - The categories of this categorical. - The category codes of this categorical. - dtype- Return the dtype object of the underlying data. - empty- Returns true if the current object is empty. - has_duplicates- If index has duplicates, return True, otherwise False. - hasnans- Return True if it has any missing values. - inferred_type- Return a string of the type inferred from the values. - is_monotonic_decreasing- Return boolean if values in the object are monotonically decreasing. - is_monotonic_increasing- Return boolean if values in the object are monotonically increasing. - is_unique- Return if the index has unique values. - name- Return name of the Index. - names- Return names of the Index. - ndim- Return an int representing the number of array dimensions. - nlevels- Number of levels in Index & MultiIndex. - Whether the categories have an ordered relationship. - shape- Return a tuple of the shape of the underlying data. - size- Return an int representing the number of elements in this object. - values- Return an array representing the data in the Index.