Exercise 1 hints

Add below some tips for working on Exercise 1 if needed.

Check of what type an object is

The isinstance(actual_object, variable_type) function is a Python builtin function that will answer your question, if the actual_object is of the variable typpe variable_type. Basic types are for example:

  • str: String
  • int: Integer number
  • float: Floating point numbers
  • list: List of thing aka [ ] brackety things
  • dict: Dictionaries, Python versatile data structures, based on associative lists and objects, where you address via named fields (see Python recap lecture )

An example:

In [1]: a_string_var = "I am a string"

In [2]: an_int_var = 42

In [3]: a_float_var = 3.5

In [4]: a_boolean_true_false_var = True

In [5]: is_string = isinstance(a_string_var, str)

In [6]: print(is_string)
True

In [7]: is_int = isinstance(an_int_var, int)

In [8]: print(is_int)
True

In [9]: is_float = isinstance(a_float_var, float)

In [10]: print(is_float)
True

In [11]: true_or_false = isinstance(a_float_var, str)

In [12]: print(true_or_false)
False

Control flow for checks with if and else

If you want to make a “left” or “right” decision, you can use Python’s if then construct. For that you need to check a condition (or a fact) if it’s true or false. If it’s true, go only through the first block, if it’s false, go only through the else block.

initial_demo_output = 0

if 3 > 2:
    print("3 is larger than 2")
    initial_demo_output = 3
else:
    print("3 not larger than 2")
    initial_demo_output = 2

# guess the final value of ``initial_demo_output`` ?
print(initial_demo_output)

For more details and practice, see Python recap lecture

Reading a CSV file into Pandas

With Python it is basically possible to read data from any kind of input datafile (such as csv-, txt-, etc). The widely used library Pandas can easily read a file with tabular data and present it to us as a so called dataframe:

In [13]: import pandas as pd

# make sure you have the correct path to your working file, ideally in the same folder
In [14]: df = pd.read_csv('source/_static/data/L1/global-city-population-estimates.csv', sep=';', encoding='latin1')

In [15]: pd.set_option('max_columns',20)

In [16]: print(df.head(5))
  Country or area Urban Agglomeration   Latitude   Longitude  Population_2015  \
0           Japan               Tokyo  35.689500  139.691710         38001018   
1           India               Delhi  28.666670   77.216670         25703168   
2           China            Shanghai  31.220000  121.460000         23740778   
3          Brazil           S?o Paulo -23.550000  -46.640000         21066245   
4           India     Mumbai (Bombay)  19.073975   72.880838         21042538   

   Unnamed: 5  
0         NaN  
1         NaN  
2         NaN  
3         NaN  
4         NaN  

Applying a function to every row of a Pandas dataframe

# we make a function, that takes a row object coming from Pandas. The single fields per row are addressed by their column name.
In [17]: def increase_by_factor_2(row):
   ....:     field_value = row['Population_2015']
   ....:     calc_value = field_value * 2
   ....:     return calc_value
   ....: 

# Go through every row, and calculate the value for  a new column ``Population_doubled``, by **apply**ing the function from above (downwards row by row -> axis=1)
In [18]: df['Population_doubled'] = df.apply(increase_by_factor_2, axis=1)

In [19]: print(df.head(5))
  Country or area Urban Agglomeration   Latitude   Longitude  Population_2015  \
0           Japan               Tokyo  35.689500  139.691710         38001018   
1           India               Delhi  28.666670   77.216670         25703168   
2           China            Shanghai  31.220000  121.460000         23740778   
3          Brazil           S?o Paulo -23.550000  -46.640000         21066245   
4           India     Mumbai (Bombay)  19.073975   72.880838         21042538   

   Unnamed: 5  Population_doubled  
0         NaN            76002036  
1         NaN            51406336  
2         NaN            47481556  
3         NaN            42132490  
4         NaN            42085076  

Tricky directory path names in windows

In the lesson and exercise 1 hints for reading a CSV file in Pandas, a few students got a very cryptic error message, something about “decoding of sequence UCXXXXX not possible”. The error occurs in the line with ‘pd.read_csv’ and you likely have used the complete path:

df = pd.read_csv('c:\users\alex\geopython\L1\global-city-population-estimates.csv', sep=';', encoding='latin1')

Windows uses backslashes ‘’ as folder separators. However, using backslashes can cause problems in String variables in programming languages. Therefore in Python we put an ‘r’ for ‘raw’ in front of the quotes for the String with the path name to the file, like so:

df = pd.read_csv(r'c:\users\alex\geopython\L1\global-city-population-estimates.csv', sep=';', encoding='latin1')

You could also just omit the long path and use only the filename. For that the file should also be saved where you Jupyter Notebook *.ipynb is located.