Exercise 1 hints¶
Add below some tips for working on Exercise 1 if needed.
Check of what type an object is¶
The isinstance(actual_object, variable_type)
function is a Python builtin function that will answer your question,
if the actual_object
is of the variable typpe variable_type
. Basic types are for example:
- str: String
- int: Integer number
- float: Floating point numbers
- list: List of thing aka [ ] brackety things
- dict: Dictionaries, Python versatile data structures, based on associative lists and objects, where you address via named fields (see Python recap lecture )
An example:
In [1]: a_string_var = "I am a string"
In [2]: an_int_var = 42
In [3]: a_float_var = 3.5
In [4]: a_boolean_true_false_var = True
In [5]: is_string = isinstance(a_string_var, str)
In [6]: print(is_string)
True
In [7]: is_int = isinstance(an_int_var, int)
In [8]: print(is_int)
True
In [9]: is_float = isinstance(a_float_var, float)
In [10]: print(is_float)
True
In [11]: true_or_false = isinstance(a_float_var, str)
In [12]: print(true_or_false)
False
Control flow for checks with if
and else
¶
If you want to make a “left” or “right” decision, you can use Python’s if then construct. For that you need to check a condition (or a fact) if it’s true or false. If it’s true, go only through the first block, if it’s false, go only through the else block.
initial_demo_output = 0
if 3 > 2:
print("3 is larger than 2")
initial_demo_output = 3
else:
print("3 not larger than 2")
initial_demo_output = 2
# guess the final value of ``initial_demo_output`` ?
print(initial_demo_output)
For more details and practice, see Python recap lecture
Reading a CSV file into Pandas¶
With Python it is basically possible to read data from any kind of input datafile (such as csv-, txt-, etc). The widely used library Pandas can easily read a file with tabular data and present it to us as a so called dataframe:
In [13]: import pandas as pd
# make sure you have the correct path to your working file, ideally in the same folder
In [14]: df = pd.read_csv('source/_static/data/L1/global-city-population-estimates.csv', sep=';', encoding='latin1')
In [15]: pd.set_option('max_columns',20)
In [16]: print(df.head(5))
Country or area Urban Agglomeration Latitude Longitude Population_2015 \
0 Japan Tokyo 35.689500 139.691710 38001018
1 India Delhi 28.666670 77.216670 25703168
2 China Shanghai 31.220000 121.460000 23740778
3 Brazil S?o Paulo -23.550000 -46.640000 21066245
4 India Mumbai (Bombay) 19.073975 72.880838 21042538
Unnamed: 5
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
Applying a function to every row of a Pandas dataframe¶
# we make a function, that takes a row object coming from Pandas. The single fields per row are addressed by their column name.
In [17]: def increase_by_factor_2(row):
....: field_value = row['Population_2015']
....: calc_value = field_value * 2
....: return calc_value
....:
# Go through every row, and calculate the value for a new column ``Population_doubled``, by **apply**ing the function from above (downwards row by row -> axis=1)
In [18]: df['Population_doubled'] = df.apply(increase_by_factor_2, axis=1)
In [19]: print(df.head(5))
Country or area Urban Agglomeration Latitude Longitude Population_2015 \
0 Japan Tokyo 35.689500 139.691710 38001018
1 India Delhi 28.666670 77.216670 25703168
2 China Shanghai 31.220000 121.460000 23740778
3 Brazil S?o Paulo -23.550000 -46.640000 21066245
4 India Mumbai (Bombay) 19.073975 72.880838 21042538
Unnamed: 5 Population_doubled
0 NaN 76002036
1 NaN 51406336
2 NaN 47481556
3 NaN 42132490
4 NaN 42085076
Tricky directory path names in windows¶
In the lesson and exercise 1 hints for reading a CSV file in Pandas, a few students got a very cryptic error message, something about “decoding of sequence UCXXXXX not possible”. The error occurs in the line with ‘pd.read_csv’ and you likely have used the complete path:
df = pd.read_csv('c:\users\alex\geopython\L1\global-city-population-estimates.csv', sep=';', encoding='latin1')
Windows uses backslashes ‘’ as folder separators. However, using backslashes can cause problems in String variables in programming languages. Therefore in Python we put an ‘r’ for ‘raw’ in front of the quotes for the String with the path name to the file, like so:
df = pd.read_csv(r'c:\users\alex\geopython\L1\global-city-population-estimates.csv', sep=';', encoding='latin1')
You could also just omit the long path and use only the filename. For that the file should also be saved where you Jupyter Notebook *.ipynb is located.