Month: February 2021

FLEXPART 10 into a Docker container

For a recent project, I wanted to include the Lagrangian particle dispersion model FLEXPART (more on that) into an easy-to-deploy project. The analysis part of that project runs on python, but uses some nifty geography libraries, which in turn require C and Fortran libraries. So, I already put this part into a docker container. FLEXPART provides the actual input for the analysis part, so it seemed logical to also include that part.

Not too long ago version 10 came available, also including netCDF output. However, with added dependencies, compiling the source became a bit more complicated. Let’s have a look at the Dockerfile:

FROM ubuntu:18.04


ARG DEBIAN_FRONTEND=noninteractive

RUN apt-get update && apt-get update && apt-get install -y \
  language-pack-en openssh-server vim software-properties-common \
  build-essential make gcc g++ zlib1g-dev git python3 python3-dev python3-pip \
  pandoc python3-setuptools imagemagick\
  gfortran autoconf libtool automake flex bison cmake git-core \
  libjpeg8-dev libfreetype6-dev libhdf5-serial-dev \
  libeccodes0 libeccodes-data libeccodes-dev \
  libnetcdf-c++4 libnetcdf-c++4-dev libnetcdff-dev \
  binutils  python3-numpy python3-mysqldb \
  python3-scipy python3-sphinx libedit-dev unzip curl wget
  
  
# replaced 'libpng12-dev' by libpng-dev
RUN add-apt-repository ppa:ubuntugis/ppa \
  && apt-get update \
  && apt-get install -y libatlas-base-dev libpng-dev \
     libproj-dev libgdal-dev gdal-bin  

RUN add-apt-repository 'deb http://security.ubuntu.com/ubuntu xenial-security main' \
  && apt-get update \
  && apt-get install -y libjasper1 libjasper-dev libeccodes-tools libeccodes-dev

#ENV HTTP https://confluence.ecmwf.int/download/attachments/45757960
#ENV ECCODES eccodes-2.9.2-Source

#
# Download, modify and compile flexpart 10
#
RUN mkdir flex_src && cd flex_src \
  && wget https://www.flexpart.eu/downloads/66 \
  && tar -xvf 66 \
  && rm 66 \
  && cd flexpart_v10.4_3d7eebf/src \
  && cp makefile makefile_local \
  && sed -i '74 a INCPATH1 = /usr/include\nINCPATH2 = /usr/include\nLIBPATH1 = /usr/lib\n F90 = gfortran' makefile_local \
  && sed -i 's/LIBS = -lgrib_api_f90 -lgrib_api -lm -ljasper $(NCOPT)/LIBS = -leccodes_f90 -leccodes -lm -ljasper $(NCOPT)/' makefile_local \
  && make -f makefile_local

ENV PATH /flex_src/flexpart_v10.4_3d7eebf/src/:$PATH

# pip install some more python libraries

The interesting part happens in L34-42. The source is downloaded and unpacked. Then the makefile  needs to be modified to find all libraries (which were installed with apt-get before). Sed is of great use here. First a line is added to the makefile_local and another line is searched and replaces. Finally make is executed to compile FLEXPART.

Now the container can be ran and FLEXPART be executed from within. At least for my application with a couple of thousand particles being simulated, performance is sufficient and a small penalty is outweigh by the ease of deployment.

Further reading:

Blue Planet

How much of the Earth’s surface is covered by water? – 2/3 elementary school knowledge. But how to confirm? And what fraction of the southern hemisphere mid-latitudes is covered by ocean?

We need some data first. For a quick shot, the MODIS land cover type classification should be sufficient. MCD12C1 provides global coverage at 0.05° in the HDF4 format. It is a resampled and  stitched together version of the 500m MCD12Q1 version. The user guide provides a nice overview.

Now we need to do some calculations. The python jupyter notebook is on github. The recipe is rather simple: Load the HDF4 dataset, select the IGBP[1] classification, quick plot for visualization, get the projection and the pixel sizes right and finally do some conditional sums.

The projections and the pixel sizes are a crucial point. The dataset is in the MODIS climate modeling grid, which is a geographic lat-lon projection. The pixel area gets smaller towards the poles, which we have to keep in mind, when calculating the size of the per pixel.

Looking at the final numbers, the fraction of water is 71.6%, which is sufficiently close to available estimates. Some discrepancy is expected, as we do not include ice shelfs, sea ice, tides, etc and the underlying resolution is ‘only’ 5km. Barren surfaces cover around 4.0% of the Earth (13.9% of the land) and ice sheets, including permanent snow cover 2.9% (10.2% of the land).

When splitting up the hemispheres, in the Northern Hemisphere water covers 61.4%, barren 7.5% and ice less than 1%. Of all the land, barren surfaces cover 19.4%. In the Southern Hemisphere, water makes up 81.8% of the surfaces, with barren less than 0.5% and ice 4.8%.

Finally the mid-latitudes. As a rather crude definition, the latitudes between 30° and 70° are used. In the northern mid-latitudes water covers 48.0% (5.7% barren and 0.7% ice). Barren surfaces make up 11.0% of the land. The southern mid-latitudes are overwhelmingly covered with water (93.9%, barren <0.2%, ice 1.4%).

[1] International Geosphere-Biosphere Programme: http://www.igbp.net/

micro-refactoring: long bool statements

“Hell is other people’s code” – while trying to reprocess some data on our server, a colleagues script did not produce the desired output. The problem was, that the location attribute was missing in the input netCDF files [1]. Which leads us to the code, that checked the attributes:

#next the formatting I'm supposed to check it against:

(key_for_range_dimension,key_for_time_dimension,key_for_azimuth_variable,key_for_elevation_variable,key_for_range_variable,key_for_unix_time_variable,key_for_doppler_velocity_variable,key_for_doppler_velocity_error_variable,key_for_nyquist_velocity_variable)=variable_name_key_setting_function(location_we_want,system_we_want,location_info,formatting_translator_dictionary,verbose)
       
#this is a logical statement that includes everything, using brackets to break up inline statements and make it easier to see
formatting_is_correct=(
               'location' in that_files_attributes
               ) and (
                              'system' in that_files_attributes
                              ) and (
                                            key_for_range_dimension in that_files_dimensions
                                            ) and (
                                                           key_for_range_variable in that_files_variables
                                                           ) and (
                                                                         key_for_time_dimension in that_files_dimensions
                                                                          ) and (
                                                                                        key_for_unix_time_variable in that_files_variables
                                                                                        ) and (
                                                                                                        key_for_azimuth_variable in that_files_variables
                                                                                                        ) and (
                                                                                                                       key_for_elevation_variable in that_files_variables
                                                                                                                       ) and (
                                                                                                                                     key_for_doppler_velocity_variable in that_files_variables
                                                                                                                                     )

print("Check format keys:")
print(('location' in that_files_attributes),
    ('system' in that_files_attributes),(key_for_range_dimension in that_files_dimensions),
    (key_for_range_variable in that_files_variables),(key_for_time_dimension in that_files_dimensions),
    (key_for_unix_time_variable in that_files_variables),
    (key_for_azimuth_variable in that_files_variables),
    (key_for_elevation_variable in that_files_variables),
    (key_for_doppler_velocity_variable in that_files_variables))

[You might want to expand the code snippet to fully appreciate the >380 column lines]

At least for me, that looked like a classical broken window. The final print statement is already a owned to the debugging efforts. Two points, that I consider problematic:

    1. insanely long lines (not the only occurrence in that script but a prominent one)
    2. the long and nested boolean statement

Both make it hard to understand, debug and modify the code. Let’s have see, what we quickly can do about it.

var_name_key_setting = variable_name_key_setting_function(
               location_we_want, system_we_want, location_info,
               formatting_translator_dictionary, verbose)
# unpack the function return
(key_for_range_dimension, key_for_time_dimension, 
 key_for_azimuth_variable, key_for_elevation_variable, 
 key_for_range_variable, key_for_unix_time_variable, 
 key_for_doppler_velocity_variable, 
 key_for_doppler_velocity_error_variable,
 key_for_nyquist_velocity_variable) = var_name_key_setting
       
formatting_is_correct_list = [
    'location' in that_files_attributes, 
    'location in attributes',
    'system' in that_files_attributes, 
    'system in attributes',
    key_for_range_dimension in that_files_dimensions, 
    'key range dim in files_dimensions',
    key_for_range_variable in that_files_variables,
    'key range var in files_variables',
    key_for_time_dimension in that_files_dimensions,
    'key time dim in files_dimensions',
    key_for_unix_time_variable in that_files_variables,
    'key unix_time_var in files_variables',
    key_for_azimuth_variable in that_files_variables,
    'key azimuth var in files_variables',
    key_for_elevation_variable in that_files_variables,
    'key elevation var in files_variables',
    key_for_doppler_velocity_variable in that_files_variables,
    'key doppler_vel_var in files_variables',
       ]
formatting_is_correct = all(formatting_is_correct_list[::2])

print("Check format keys:", formatting_is_correct)
if not formatting_is_correct:
       [print('  ', e[0], e[1]) for e in \
            zip(formatting_is_correct_list[0:-1:2], 
                formatting_is_correct_list[1::2])]

Still not perfect, but a bit more explicit and easier to read. In fact the [::2] could also be omitted, as a string also evaluates to True in all().

Maybe it’s time to setup a compulsory linter or at least make PEP 8 and the Google Python Style Guide required reading.

[1] actually required a second pair of eyes to figure out…