How to download and map Argo profiles for an ocean using Matlab ?

posted Jun 1, 2012, 6:04 AM by G. Maze ‎(admin)‎   [ updated Jun 6, 2012, 5:47 AM ]

This tutorial will describe how to download and plot Argo profiles (and a density map) for a given ocean.
This tutorial is for people who have a very small experience with Matlab.
This tutorial assumes that you want to plot Argo profiles for one of the given oceanic basin available on GDAC ftp servers under the geo folder: Atlantic, Pacific and Indian oceans.

Step 0: Setting-up your working environnement

Ok, open a terminal window, move to your favorite working folder and create a new folder, let's say 'argo_work':
mkdir argo_work
cd argo_work

Step 1: Download the Argo netcdf files (c-shell script)

We're going to use the wget program and for later reuse we're going to put all this step into a single c-shell script. The script is as follow:
#!/bin/csh
# Download all Argo netcdf profiles for a given ocean

# Define here which GDAC to use:
set gdac = 'ftp.ifremer.fr/ifremer/argo' # Use the French GDAC ftp server
#set gdac = 'usgodae.org/pub/outgoing/argo' # You can also use the US GDAC ftp server

# Define here which ocean to load:
set basin = 'indian_ocean'

# Define here where to put the data:
set folder = 'data'

# Download files:
wget -c -x -P./$folder -r -N ftp://$gdac'/geo/'$basin

# Done 
exit 0

The script starts by defining some parameters to indicate where to get the data and for which ocean, here the Indian Ocean. Note that this script will load ALL the data !
Copy/paste the code into a text editor, save the script as 'download_this' in the 'argo_work' folder, make it executable and launch it:
chmod ugo+x download_this
./download_this
All the data will go under the sub-folder named 'data'. 
The folder structure will depend on the GDAC you used, for the Ifremer one you'll get something like:
    data/ftp.ifremer.fr/ifremer/argo/geo/indian_ocean/<YEAR>/<MONTH>/<YEAR><MONTH><DAY>_prof.nc
and for the US one:
    data/usgodae.org/pub/outgoing/argo/geo/indian_ocean/<YEAR>/<MONTH>/<YEAR><MONTH><DAY>_prof.nc
where <YEAR>, <MONTH> and <DAY> are the list of available years and months while the end file is an Argo multiprofile netcdf with all profiles for a given day

Step 2: Read Argo profiles coordinates (matlab function)

Ok, so now that we have the raw material under the 'data' folder, we're going to move to Matlab scripting to do all the rest of the work.
Let's create a matlab function which is going to read all profiles coordinates (latitude,longitude,time).
To keep a clean folder architecture, in the 'argo_work' folder, creates a sub-folder where we'll put all Matlab scripts:
mkdir matlab
cd matlab
Now let's write the Matlab function. 
To do this, we create a file named like the function we want to set-up. Let's then create a file named 'read_data.m' and open it in your favorite text editor.

In the file, we need to tell Matlab that we're going to create a function. The way to do this is to embed all the code into:
function varargout = read_data(varargin)
    % All the function code goes here !
end% end function

We first need to declare variables to set a couple of informations:
%- DECLARE LOCAL VARIABLES

% Define where is the data folder relative to the Matlab script:
data_folder = fullfile('..','data'); 

% Define which ftp server we used to download the data.
% Un-comment by removing the % sign the correct line:
ftp_server = 'ifremer'; 
%ftp_server = 'us';

% Define which oceanic basin we downloaded the data for:
basin = 'indian_ocean'; % In this case the Indian ocean

% And now adjust the folder names down to basin folders:
switch ftp_server 
    case 'ifremer', data_folder = fullfile(data_folder,'ftp.ifremer.fr','ifremer','argo','geo',basin);
    case 'us',      data_folder = fullfile(data_folder,'usgodae.org','pub','outgoing','argo','geo',basin);
end% switch

Ok, with this piece of code, we know where are the YEAR/MONTH folders. Note that we used the Matlab function fullfile to create a correct path string whatever the platform we're working on. This allows to get rid of / or \ issues ...

Now we're facing multiple choices in the design of the code to scan the years/months folders. We can either:
  • look for any folder (whatever their names) or 
  • declare a list of years and months to look for in the data folder. 
I prefer the second approach for two reasons: 
  1. it's an implicit method to ensure that we're going to look and analyse data coming from the ftp server (its structure is preserved) and
  2. it provides an easy handling method if at some points we only want to load data for a given month or year.
But this choice is up to you. It is always a compromise between flexibility and security design. 

To implement the second choice we then need to declare a list of years and months we want to list the netcdf for:
% Define the list of years to scan:
year_list = [2010:2012];
% Define the list of months to scan:
month_list = [1:12];

Now that we have declared where to start the scan and some kind of iteration process, we can start to loop:
%- FILE SCANNING LOOP:

for iy = 1 : length(year_list)
    for im = 1 : length(month_list)
        % Define folder to be scanned:
        scan_folder = fullfile(data_folder,sprintf('%0.4d',year_list(iy)),sprintf('%0.2d',month_list(im)));
    end%for im
end%for iy

At this point, these two loops (over years and months) don't do anything, we'll add codes step by step.

In the scan_folder declaration we used the Matlab function sprintf in order to control the way years and months are written as string in the folder path. This is important for month's name, because for month numbers lower than 10 (January to September), the string must begins with a 0 like: '02' for February and not only '2'.

For each scan_folder, we can now use the Matlab function dir to list files present in the folder. If we find a file corresponding to an Argo Netcdf multiprofile, then we'll read the coordinates.
We scan the content of the folder with the Matlab function dir and store the result in the variable raw_file_list. This variable is a Matlab structure. A structure is not like double or cell arrays, it can contains a lot of different variable types which can be retrieved or set using an object oriented syntax. In our case, the structure returned by the function dir is a structure array (which means it's a list of elements with similar content pattern) with some useful properties like: name, dates, bytes, isdir, ... (see Matlab help for more details). We are going to use the name property to check if it's a Netcdf file we can expect in this folder:

%- FILE SCANNING LOOP:
for iy = 1 : length(year_list)
    for im = 1 : length(month_list)
            % Define folder to be scanned:
            scan_folder = fullfile(data_folder,sprintf('%0.4d',year_list(iy)),sprintf('%0.2d',month_list(im)));                        
        disp(scan_folder); % Print on screen the folder we're scanning.
            
            % List correct files in the folder:
       raw_file_list = dir(scan_folder);         
       for ifile = 1 : length(raw_file_list)

                % Is this file looking like an Argo Netcdf multiprofile ?
                read_this = false;

                % Perform tests:
if length(raw_file_list(ifile).name) == 16
if raw_file_list(ifile).name(end-2:end) == '.nc'
if raw_file_list(ifile).name(1:4) == sprintf('%0.4d',year_list(iy))
if raw_file_list(ifile).name(5:6) == sprintf('%0.2d',month_list(im))
read_this = true;
end%if
end%if
end%if
end%if

                % Read profiles coordinates out of this file:
                switch read_this
                    case false
                        % Nothing to do then
                    case true
                        % Let's read coordinates:
                end%swtich
        
       end%for ifile
        
   end%for im
end%for iy

Once the list of files in scan_folder is retrieved, we loop through all files in order to perform a couple of tests to verify the Netcdf file can be expected. 
We initiated a variable called read_this to false. We'll set it to true for a correct file. We could implement the file name test list in many different form. Here we used a very simple serie of embedded 'if' statements, we test the following:
  1. file name is 16 characters long (it must be ! otherwise it couldn't be of the form: <YEAR><MONTH><DAY>_prof.nc
  2. file extension correspond to the Argo Netcdf extension '.nc'
  3. the first 4 characters correspond to the folder year
  4. the 5th and 6th characters correspond the the folder month.
If these 4 tests are passed successfully, the variable read_this  is set to true because it looks good. 

Inside the ifile loop, if the read_this variable is set to true, we then need to open the Netcdf file and get profiles coordinates out of it. The piece of code is as follow:
% Let's read coordinates:
% Open the Netcdf file:
ncid = netcdf.open(fullfile(scan_folder,raw_file_list(ifile).name),'NC_NOWRITE');

% Read profiles latitude:
varid    = netcdf.inqVarID(ncid,'LATITUDE');
LATITUDE = netcdf.getVar(ncid,varid,'double');

% Read profiles longitude:
varid    = netcdf.inqVarID(ncid,'LONGITUDE');
LONGITUDE = netcdf.getVar(ncid,varid,'double');

% Read profiles time:
% Get the time reference:
varid = netcdf.inqVarID(ncid,'REFERENCE_DATE_TIME'); 
REFERENCE_DATE_TIME = netcdf.getVar(ncid,varid)'; 
ref = datenum(str2num(REFERENCE_DATE_TIME(1:4)),...
str2num(REFERENCE_DATE_TIME(5:6)),...
str2num(REFERENCE_DATE_TIME(7:8)),...
str2num(REFERENCE_DATE_TIME(9:10)),...
str2num(REFERENCE_DATE_TIME(11:12)),...
str2num(REFERENCE_DATE_TIME(13:14)));
% then the relative time axis:
varid = netcdf.inqVarID(ncid,'JULD'); 
JULD  = netcdf.getVar(ncid,varid);
% and finally the absolute time axis:
TIME  = ref + JULD;

% Close the Netcdf file:
netcdf.close(ncid);

% Add those profiles coordinates to the function output variables:
X = [X; LONGITUDE];
Y = [Y; LATITUDE];
T = [T; TIME];

This syntax implies that you're using a recent Matlab version, one with a built-in Netcdf api. First we open the file which is then referred to as the ncid variable pointer. Reading latitude and longitude is fairly easy as you can see. For time it's a little bit more complicated because we have to recompute the asbolute time axis from a reference and relative time axis. I won't go into details here, check out the Argo documentation for that.
In the end, we close the netcdf file (you must do this, otherwise Matlab we'll keep 'connection' open to files and at some point it will blow up your session if too many files are open) and finally concat the new profiles coordinates (LATITUDE,LONGITUDE,TIME) to variables used to output the results out of the function (X,Y,T). Those variables needs to be initiated before the time looping like:
% Initiate coordinates arrays:
X = [];
Y = [];
T = [];
%- FILE SCANNING LOOP:
    ...
We're only left with function output set up, just before the end function tag, we can insert:
% Output
varargout(1) = {X};
varargout(2) = {Y};
varargout(3) = {T};
This says that if one is requesting for outputs, we'll serve X then Y then T.
And that's it, save the file 'read_data' and you're all set. 

Oh wait ! What if we want to load profiles for a given year, or month ? We don't want to edit this function every time, we prefer to use parameters when calling it !
How can we do that ? Well, my favorite method is to insert a block of code after the 'DECLARE LOCAL VARIABLES' section of the function which is going to read variables from the function arguments and possibly overwrite some values. This can simply be achieved with:
%- LOAD USER PARAMETERS:
if nargin > 0
if mod(nargin,2) ~= 0
error('Please provide options as: ''OPTIONS NAME'',VALUE pair(s)');
end% if 
for in = 1 : 2 : nargin
eval(sprintf('%s = varargin{in+1};',varargin{in}));
end% for in
clear in
end% if
This syntax usage is simple. For instance, by default we're loading all months of any year because the variable month_list is set to 1:12. With the previous peace of code, we can call the function from the Matlab prompt (or another script) like:
>> [X,Y,T] = read_data('month_list',[12 1 2]);
and the variable month_list will be overwritten so that in this case only Dec/Jan/Feb months are loaded.
The final read_data.m file can be downloaded at the end the tutorial.

Step 3: Plot locations and compute a density map (matlab script)

Let's move on to the fun part: plotting ! Now we're going to write all the code in a Matlab file which is not a function. This means that all variables declared in the script are going to be available in the Matlab workspace. 
Requirements:
  • We are going to use the m_map Matlab toolbox to plot map easily (download the m_map package here). So first you need to install the toolbox (see their webpage).
  • The profile density map will be computed using the very nice function bin2mat created by A. Stevens and distributed on the Matlab Central file exchange platform. So next you need to put the bin2mat.m function file in your matlab folder.
First create a file named 'plot_data.m' and open it in your favorite text editor. Let's load the data (profiles coordinates) for the year 2010, then enter the line:
[X,Y,T] = read_data('year_list',2010);
You may want to start the file by a 'clear' statement so that the workspace is clear of any variables. You really don't want to enter a 'clear all' to also delete global variables.
If you now look at your workspace you have the following variables available:
>> whos 
  Name          Size             Bytes  Class     Attributes

  T         27385x1             219080  double              
  X         27385x1             219080  double              
  Y         27385x1             219080  double  
At the time of writing this tutorial (I just downloaded profiles for the Indian Ocean), we can see that 27,385 profiles were sampled in 2010 for this ocean. And this is the smallest ocean ! so be careful if you try to load profiles for multiple years or a bigger ocean, numbers can get very large and your computer may don't like it.

Number of profiles per week

First we're going to plot a very simple diagram showing the number of profiles per week (this time frame is appropriate because we loaded a single year of data).
This simplest way to do this is to use the Matlab function hist and ask the function to compute the number of profile for 52 bins (hence 52 weeks in a year). 
[nw tw] = hist(T,52);
figure;
plot(tw,nw,'.-');
grid on, box on,datetick('x')
xlabel('Time');ylabel('# of profiles');title('Number of Argo profiles per week in the Indian Ocean for 2010');
We then open a new figure, plot the number of profiles per week (nw) using the time axis returned by hist (tw), format the grid, the time axis and put some labels and a title. You should have something like:

There are about 520 profiles sampled every week in the Indian Ocean, at least in 2010. Very impressive network !

Map of profiles locations

We're going to plot a map with all the profiles locations now. We first need to initiate a projection using the m_map function m_proj:
m_proj('equid','lon',[min(X) max(X)]+[-1 1],'lat',[min(Y) max(Y)]+[-1 1]/2);
This line set up an equidistant projection system centered over a rectangular area specified by the options 'lon' and 'lat'. Note that here we're using the range of coordinates of profiles to fit the map correctly. If you're curious you can now open a new figure, type in m_grid;m_coast; at the prompt, and here we go: a map of the Indian Ocean ! The m_map toolbox is smart enough so that to plot a coastline and the grid you don't have to worry about anything as long as you're not customizing the look.
Note that the m_proj function defines global variables and needs to be called only once, you don't have to call it every time you plot a map.

How do we plot profiles locations ? Well, simply using the m_plot function:
figure;
hp = m_plot(X,Y,'+');
m_coast;m_grid;
title('Argo profiles for 2010');
And that's it.
Wait, wait, this standard figure is really ugly, we want to customize it so that profiles markers are not so big, etc ... Let's try again:
figure
hp = m_plot(X,Y,'+');
set(hp,'markersize',2,'color',[1 .5 0]);
m_coast('patch',[1 1 1]/3);
m_grid('box','fancy','xtick',[-180:30:180],'ytick',[-90:10:90]);
title('Argo profiles locations for 2010','fontweight','bold');
These are the figures I get from the two previous plots:

See how you can customize the map by looking at all the options for m_coast and m_grid. The profiles marker design is up to you as well ...

Map of profiles density

Ok, plotting profiles locations with a marker is very simple but as soon as the number of profiles increases maps become totally useless. One method out of this visualization issue is to compute and plot what we call a profile density map: ie the number of profiles for a given area, usually a rectangular cell of a couple of degrees wide.
To do such a thing we need to define a grid like for instance:
% Define a 1x1 degree grid:
dxi = 1; dyi = 1; % Grid size in longitude and latitude
xi = min(X)-1:dxi:max(X)+1; % Define grid point longitude
yi = min(Y)-.5:dyi:max(Y)+0.5; % Define grid point latitude
[Xi,Yi] = meshgrid(xi,yi); % Create the meshed grid.
And now here comes the magic ! Using the bin2mat function computing a profile density map is as simple as:
dens = bin2mat(X,Y,ones(size(X)),Xi,Yi,@sum);
The 1st/2nd arguments of bin2mat are the longitude and latitude axis of profiles, the 3rd argument is the quantity we want to bin (here we take 1 because we simply want to compute the number of points), 4th/5th arguments are the grid points coordinates and the 6th argument is the function handle you want to use. Here this command literrally means sum 1 each time you get a value in X/Y falling into a grid cell defined by Xi/Yi. 
And now we can plot the density map using m_pcolor:
figure
m_pcolor(xi,yi,dens);
m_coast('patch',[1 1 1]/3);
m_grid('box','fancy','xtick',[-180:30:180],'ytick',[-90:10:90]);
title('Argo profiles density map for 2010','fontweight','bold');
But this figure isn't showing much because one area (the Arabian Sea) has a very large number of profiles compared to the rest of the basin. So we may want to tune the color axis to highlight those poor regions as well. We can do it using the caxis Matlab function which set the range of the color axis, meaning if we use:
caxis([1 10]);
then all grid points with values higher or equal to 10 will have the same color (the last one from the colormap) while all grid points with values smaller or equal to 1 will also have the same color (the first one from the colormap). In this case, I think it would be a good practice to set-up a colorbar where the highest value color (more than 10) is significantly different than the one just smaller to clearly indicate that the color axis is saturated and that some informations are not visible in the map. If we use the standard 'jet' colormap, the last color is red. So we can change it to black and keep a nice look:
cmap = jet(64);
cmap(end,:) = [0 0 0]; % Replace the last color (red) by black
colormap(cmap);
Below are the two figures before and after the color axis and color map adjustment:
Profiles density (before) Profiles density (after)
Using a density grid of 1x1 degree may not the best choice here. Let's try the map with a 3x3 degree grid. The only thing to change in the previous code is the dxi/dyi line:
dxi = 3; dyi = 3; % Grid size in longitude and latitude
and the color axis because we're going to have 9 times the number of initial profiles per grid cell. So now the density map looks like: 

Conclusion

That's it for today. We've learn how to download profiles from a GDAC ftp server, to read those data from Matlab and plot some stuff about them. Many improvements could be added, like a test on the profiles coordinates quality or a more flexible method to list files to get the data from.
I hope you found this tutorial helpful. Feel free to post a comment if you have questions or issues...


ċ
download_this
(0k)
G. Maze (admin),
Jun 1, 2012, 6:04 AM
ċ
plot_data.m
(2k)
G. Maze (admin),
Jun 1, 2012, 6:04 AM
ċ
read_data.m
(4k)
G. Maze (admin),
Jun 1, 2012, 6:04 AM
Comments