Link: Python
Select
When using the function .iloc[]
(starts with index 0) and .loc[]
(starts with index 1), be careful, because it’s related whether it’s a view or copy. See documentation Returning a view versus a copy.
titanic[["age", "sex"]]
Filter
Df.query("width >3 & length <2")
Rename
# Rename: (inplace = True to overwrite current dataframe)
dataframe.rename(columns = {'Oldname': 'Newname'}, inplace = True)
Case when:
dataframe.loc[dataframe['Species'] == 'setosa', "Species"] = 0
dataframe.loc[dataframe['Species'] == 'versicolor', "Species"] = 1
dataframe.loc[dataframe['Species'] == 'virginica', "Species"] = 2
Dplyr
dataframe <- dataframe %>%
mutate(Species = case_when(Species == 'setosa' ~ 0,
Species == 'versicolor' ~ 1,
Species == 'virginica' ~ 2))
Distinct
dataframe.Species.unique()
Output: #array(['setosa', 'versicolor', 'virginica'], dtype=object)
Check length of dataframe
Dplyr:
nrow()
len(dataframe)
Group_by and count:
dataframe.value_counts('Species')
# Alternatively, you can also use the .groupby() method followed by size()
dataframe.groupby(['Species']).size()
Mutate
Mutate based on current columns:
dataframe["New_feature"] = dataframe["Petal_width"]* dataframe["Petal_length"] / 2
Delete
dataframe.drop("New_feature", axis=1, inplace=True)
Sort
Sort by ascending/descending:
Dplyr:
dataframe %>% arrange(desc(Petal_width))
Pandas:
dataframe.sort_values('Petal_width', ascending=0)
Relocate column position
Change order of columns (relocate):
Method 1
df.insert(0, 'mean', df['mean'])
Method 2
df = df[["C", "A", "B"]]
Method 3
cols = df.columns.tolist() # get colnames
cols = cols[-1:] + cols[:-1] # moved the last element to the first position
df = df[cols]