Python Cheatsheet



If you are having trouble to remember the exact syntax no matter how many times you’ve used it, you are not alone.

There is a community driven programming cheatsheet, so you can lookup the common usage of the function and it gives you a quick example to refresh your memories.

Introducing the ultimate programming cheatsheet - (Official Site).

      _                _         _    __
  ___| |__   ___  __ _| |_   ___| |__ \ \      The only cheat sheet you need
 / __| '_ \ / _ \/ _` | __| / __| '_ \ \ \     Unified access to the best
| (__| | | |  __/ (_| | |_ _\__ \ | | |/ /     community driven documentation
 \___|_| |_|\___|\__,_|\__(_)___/_| |_/_/      repositories of the world

Why Cheatsheet

Some Examples

A) Python group by lambda


#  The apply method itself passes each "group" of the groupby object as
#  the first argument to the function. So it knows to associate 'Weight'
#  and "Quantity" to `a` and `b` based on position. (eg they are the 2nd
#  and 3rd arguments if you count the first "group" argument.

df = pd.DataFrame(np.random.randint(0,11,(10,3)), columns = ['num1','num2','num3'])
df['category'] = ['a','a','a','b','b','b','b','c','c','c']
df = df[['category','num1','num2','num3']]

category  num1  num2  num3
0        a     2     5     2
1        a     5     5     2
2        a     7     3     4
3        b    10     9     1
4        b     4     7     6
5        b     0     5     2
6        b     7     7     5
7        c     2     2     1
8        c     4     3     2
9        c     1     4     6

gb = df.groupby('category')

#  implicit argument is each "group" or in this case each category

gb.apply(lambda grp: grp.sum())

#  The "grp" is the first argument to the lambda function
#  notice I don't have to specify anything for it as it is already,
#  automatically taken to be each group of the groupby object

category  num1  num2  num3
a             aaa    14    13     8
b            bbbb    21    28    14
c             ccc     7     9     9

#  So apply goes through each of these and performs a sum operation

{'a': Int64Index([0, 1, 2], dtype='int64'), 'b': Int64Index([3, 4, 5, 6], dtype='int64'), 'c': Int64Index([7, 8, 9], dtype='int64')}

print('1st GROUP:\n', df.loc[gb.groups['a']])
1st GROUP:
category  num1  num2  num3
0        a     2     5     2
1        a     5     5     2
2        a     7     3     4

print('SUM of 1st group:\n', df.loc[gb.groups['a']].sum())

SUM of 1st group:
category    aaa
num1         14
num2         13
num3          8
dtype: object

#  Notice how this is the same as the first row of our previous operation
#  So apply is _implicitly_ passing each group to the function argument
#  as the first argument.
#  From the [docs](
#  docs/stable/generated/pandas.core.groupby.GroupBy.apply.html)
#  > GroupBy.apply(func, *args, **kwargs)
#  >
#  > args, kwargs : tuple and dict
#  >> Optional positional and keyword arguments to pass to func
#  Additional Args passed in "\*args" get passed _after_ the implicit
#  group argument.
#  so using your code

gb.apply(lambda df,a,b: sum(df[a] * df[b]), 'num1', 'num2')

a     56
b    167
c     20
dtype: int64

#  here 'num1' and 'num2' are being passed as _additional_ arguments to
#  each call of the lambda function
#  So apply goes through each of these and performs your lambda operation

# copy and paste your lambda function
fun = lambda df,a,b: sum(df[a] * df[b])

{'a': Int64Index([0, 1, 2], dtype='int64'), 'b': Int64Index([3, 4, 5, 6], dtype='int64'), 'c': Int64Index([7, 8, 9], dtype='int64')}

print('1st GROUP:\n', df.loc[gb.groups['a']])

1st GROUP:
category  num1  num2  num3
0        a     2     5     2
1        a     5     5     2
2        a     7     3     4

print('Output of 1st group for function "fun":\n',
      fun(df.loc[gb.groups['a']], 'num1','num2'))

Output of 1st group for function "fun":

#  [RSHAP] [so/q/47551251] [cc by-sa 3.0]

B) R ggplot scatter


# question_id: 7714677
# One way to deal with this is with alpha blending, which makes each
# point slightly transparent. So regions appear darker that have more
# point plotted on them.
# This is easy to do in `ggplot2`:

df <- data.frame(x = rnorm(5000),y=rnorm(5000))
ggplot(df,aes(x=x,y=y)) + geom_point(alpha = 0.3)

# ![enter image description here][1]
# Another convenient way to deal with this is (and probably more
# appropriate for the number of points you have) is hexagonal binning:

ggplot(df,aes(x=x,y=y)) + stat_binhex()

# ![enter image description here][2]
# And there is also regular old rectangular binning (image omitted),
# which is more like your traditional heatmap:

ggplot(df,aes(x=x,y=y)) + geom_bin2d()

# [1]:
# [2]:
# [joran] [so/q/7714677] [cc by-sa 3.0]

PySpark dataframe filter


 * Pyspark: Filter dataframe based on multiple conditions
 * <!-- language-all: lang-python -->
 * Your logic condition is wrong. IIUC, what you want is:

import pyspark.sql.functions as f

        ((f.col('col1') != f.col('col3')) |
         (f.col('col2') != f.col('col4')) & (f.col('col1') == f.col('col3')))

 * I broke the filter() step into 2 calls for readability, but you could
 * equivalently do it in one line.
 * Output:

|col1|col2|col3|col4|  d|
|   A|  xx|   D|  vv|  4|
|   A|   x|   A|  xx|  3|
|   E| xxx|   B|  vv|  3|
|   F|xxxx|   F| vvv|  4|
|   G| xxx|   G|  xx|  4|

/* [pault] [so/q/49301373] [cc by-sa 3.0] */

My Workflow


Quick demo to create a dummy python dataframe.

Final Thoughts

Hopefully you find it useful too.

Happy Coding!

Recommended Readings

What is Docstrings

Code is more often read than written. Do yourself a favour and write some good code.

18 Useful Pandas Functions for Data Science

Here is my list of useful pandas functions for your day to day tasks.
