Link: Python

Select

When using the function .iloc[] (starts with index 0) and .loc[] (starts with index 1), be careful, because it’s related whether it’s a view or copy. See documentation Returning a view versus a copy.

titanic[["age", "sex"]]

Filter

Df.query("width >3 & length <2")

Rename

# Rename: (inplace = True to overwrite current dataframe)
dataframe.rename(columns = {'Oldname': 'Newname'}, inplace = True)

Case when:

dataframe.loc[dataframe['Species'] == 'setosa', "Species"] = 0
dataframe.loc[dataframe['Species'] == 'versicolor', "Species"] = 1
dataframe.loc[dataframe['Species'] == 'virginica', "Species"] = 2
Dplyr
dataframe <- dataframe %>%
  mutate(Species = case_when(Species == 'setosa' ~ 0,
                             Species == 'versicolor' ~ 1,
                             Species == 'virginica' ~ 2))

Distinct

dataframe.Species.unique()
Output:   #array(['setosa', 'versicolor', 'virginica'], dtype=object)

Check length of dataframe

Dplyr:

nrow()
len(dataframe)

Group_by and count:

dataframe.value_counts('Species')
 
# Alternatively,  you can also use the .groupby() method followed by size()
dataframe.groupby(['Species']).size()

Mutate

Mutate based on current columns:

dataframe["New_feature"] = dataframe["Petal_width"]* dataframe["Petal_length"] / 2

Delete

dataframe.drop("New_feature", axis=1, inplace=True)

Sort

Sort by ascending/descending:

Dplyr:

dataframe %>% arrange(desc(Petal_width))

Pandas:

dataframe.sort_values('Petal_width', ascending=0)

Relocate column position

Change order of columns (relocate):

Method 1

df.insert(0, 'mean', df['mean'])

Method 2

df = df[["C", "A", "B"]]

Method 3

cols = df.columns.tolist()  # get colnames
cols = cols[-1:] + cols[:-1] # moved the last element to the first position
df = df[cols]