Exercise 2 hints

Hints regarding the storms csv dataset

  • You are not supposed to manually work with the data (e.g. Excel or something)

  • Reuse your “create_lineGeom” function from Exercise 1

  • be defensive, so that either you get a valid line_string, or otherwise don’t use that storms movement in the new movements geodataframe

  • reuse your calculate lengths from exercise 1, as it is now in a metric-unit projected coordinate system, the lengths are already meaningful

Converting Pandas DataFrame into a GeoDataFrame

Quite often you are in a situation where you have read data e.g. from text file into a Pandas DataFrame where you have latitude and longitude columns representing the location of a record.

  • Let’s continue with the previous example and consider that we have a column where we have stored the shapely geometries:

>>> print(data)
    value  lat  lon     geometry
0      0    2    4  POINT (4 2)
1      5    1    6  POINT (6 1)
2      2    6    1  POINT (1 6)
3      6    6    3  POINT (3 6)
4      5    5    1  POINT (1 5)
  • Notice that now our data is still a Pandas DataFrame, not a GeoDataFrame:

>>> type(data)
pandas.core.frame.DataFrame

We need to convert the DataFrame into a GeoDataFrame, so that we can e.g. save it into a Shapefile. It is easily done by passing the DataFrame into a GeoDataFrame object. Now we need to determine which column contains the geometry information (needs to be always a column called ‘geometry’), and optionally we can also determine the coordinate reference system when creating the GeoDataFrame:

# Convert DataFrame into a GeoDataFrame (providing the "geomtry" column from the pandas dataframe explicitly for GeoPandas dataframe as the geometry per feature)
geo = gpd.GeoDataFrame(data, geometry='geometry', crs=from_epsg(4326))

>>> type(geo)
geopandas.geodataframe.GeoDataFrame

>>> geo.crs
{'init': 'epsg:4326', 'no_defs': True}

Now we have converted Pandas DataFrame into a proper GeoDataFrame that we can export into a Shapefile for instance.

Different variants to join two list

  • side note: checking the length of a list, how many elements it contains

my_length = len(list_1)
  • via a dataframe building column-wise:

# dataframe from dict { 'column_name': list_of_data ... }
# if you have several lists, ideally they should be of same length
dfp = pd.DataFrame( {'xcoords': list_1, 'ycoords': list_2} )

def make_pair(row):
    return (row['xcoords'], row['ycoords'])

dfp['coord_pairs'] = dfp.apply(make_pair, axis=1)
dfp['coord_pairs'].tolist()
  • manual iterating over list positioning:

list_length = len(list_1)
coordpairs = []
for x in range(0, list_length):
    coordpairs.append((list_1[i], list_2[i]))
  • the special Python zip method (imagine a zipper):

# zipped variable here is in a state of waiting to be iterated over, zipped itself is not yet a list again
zipped = zip(list_1, list_2)
# trying to make a python list out of something list-like or something that can be iterated over
coord_list = list(zipped)

Sorting and Adding “advanced functions usage on the dataframes

  • use the sort_values sort to sort the rows by timestamp

  • In this case, we actually want to sort and work the “whole” thing, and therefore use axis=0 (NOT axis=1 like with functions apply) or just omit axis keyword should do just fine.

  • no need to translate the “text” based timestamp into a date format, because the “timestamp” is formatted iso, year first then month etc, text or string-wise sorting is working ok

  • in order to add/append new rows to our new empty dataframe - here are two examples, but in both you ideally collect the new rows at first in a separate list:

# version 1:
# append row by row, gives you more control based on how you stored the intermediate new rows in your list (e.g. as tuple or [] pair)
for idx in range(0, len(new_rows)):
    newdata = newdata.append({'Serial_Num': new_rows[idx][0], 'geometry': new_rows[idx][1]}, ignore_index=True)
# version 2:
# directly create a temporary dataframe and use collected rows-list;
# the rows-list needs to be a "list of lists", where each "sublists" consists of the entries for each row
temp_df = pd.DataFrame(new_rows, columns=['Serial_Num','geometry'])
# and then "just" append the temp dataframe onto the other dataframe
newdata = newdata.append(temp_df, sort=False)

Some Do’s and Dont’s recommendations for Python coding

  • DON’T

def function():
    # these return the actual strings that you put here
    return "Error: LineString or Polygon geometries required!"
    # or
    return("Error! Please insert a list of Shapely Points or coordinate tuples!")

length=function(xyz)
print(length)
  • DO

def function():
    # ...
    print("Error: LineString or Polygon geometries required!")
    # no return statement or return explicitly None
    return None
  • also…

# don't return in brackets
return (polygon)
# do
return polygon
  • Don’t mix/mistake tuple for lists

# tuple
PList = (point1, point2)

# list
PList = [point1, point2]
  • do not make spaces before the brackets, for readability, but after commas. All work though :-) but for style

# no
Point (x, y)
# no
Point(x,y)
# yes, ideal
Point(x, y)
  • variable and function names, lower case first letters is better

# no
Point1 = createPointGeom(1.5,3.2)
# no
point1 = CreatePointGeom(1.5,3.2)

# yes
point1 = createPointGeom(1.5, 3.2)
point1 = create_point_geom(1.5, 3.2)
  • reserved words, it works, but it’s dangerous and might be misleading

def getCentroid(object):
    return object.centroid