Version 33 (modified by jgipsl, 11 years ago) (diff) |
---|
Orchidee Fortran style guide
Introduction
This is a working collaborative document which will outline standard working procedures and coding style for ORCHIDEE. Please make comments by logging into the wiki and editing the page code using the discussion markup, followed by your initials. Comments will then be reviewed and merged into the main text periodically.
Example:
Sample text > I don't understand this at all (BB) >> This needs further clarification (JB)
which will then appear in the document as:
Sample text
I don't understand this at all (BB)
This needs further clarification (JB)
NOTE: Fortran is becoming a much more object oriented language with the 2008 and 2003 standards. Josefine and I (Matt) agree that it would be nice to use some of these features where applicable. There will be situations where it will be advantageous, though as with the rules, we have no desire to introduce new things just to do it. The same can be said of structures, i.e. grouping variables together which are really related. Therefore, it is something we should keep in mind when facing a new problem, and not be afraid to do it. I, personally, feel that we are doing a diservice to the younger members of the community by not exposing them to a more modern language, though at the same time I understand the difficulty of introducing such features to an existing code base.
JP : this is indeed a good idea but it has to be treated with care. I would be in favour of putting all the structural information (grid, time, run-time options, I/O flags and parameters, domain decomposition, ...) into structures but strongly against doing this for physical variables of the model. Using these structures, and passing them in arguments would allow to keep the list of arguments manageable and with a strategy to pass information between routines only through arguments (and not as today as a mix of arguments and USE statements which is very confusing). The feasibility does not worry me.
Interfaces
Existing structure and interactions between module and subroutines and how to improve it
(1) Arguments per line: For function/subroutine calls, there should only be five arguments per line.
CALL subroutine(arg1, arg2, arg3, arg4, arg5, & arg6, arg7, ... )
The reason is that subroutine arguments are not strictly checked, so when one is hunting for bugs, it's nice to be able to quickly check that all the arguments are in the right place.
MJM: Thanks to Lionel Guez, I've run some additional tests. This is not standard behavior in gfortran, so the times when this occured must have been a special circumstance and not the norm. Therefore, this rule is probably not required.
JP : Good idea but please keep the comments in between the lines and try to give a standard logic to the order of arguments : 1) time information, 2) grid information 3) physical input variables, 4) physical IN/OUT variables, 5) Out variables , 6) I/O information (this could move up !)
JG : I'm not in favor of having comment lines in the subroutine argument list. The comments should be at each declaration line for each varaible. I agree of order of arguments or at least : INTENT(IN), then INTENT(INOUT), and last INTENT(OUT).
JG : I think the most important is to have the same number of arguments per line in the SUBROUTINE as in the CALL. It is not so important to always have 5 but to be consequent at both sides. It is also nice to aline the arguments verticaly as done in the exemple above.
Clarity
Layout of code for clarity to the reader, reminder about the commenting style and ensuring interaction with the documentation compiler (dOxygen)
(1) Sequence of variable declaration: Related to point one, in the variable declaration of the subroutine, it's nice to have all the variables which are passed to/from to be in the same order as they are listed.
SUBROUTINE subroutine(arg1, arg2, arg3, arg4, arg5, & arg6, arg7, ... ) ! !! 0. Variable and parameter declaration ! ! !! 0.1 Input variables ! INTEGER(i_std), INTENT(in) :: arg1 !! Domain size (unitless) REAL(r_std), INTENT (in) :: arg2 !! Time step (s) REAL(r_std),DIMENSION (kjpindex), INTENT (in) :: arg3 !! Downwelling short wave flux ! !! 0.2 Output variables ! INTEGER(i_std), INTENT(out) :: arg4 !! Domain size (unitless) REAL(r_std), INTENT (out) :: arg5 !! Time step (s) REAL(r_std),DIMENSION (kjpindex), INTENT (out) :: arg6 !! Downwellin !! 0.3 Modified variables ! INTEGER(i_std), INTENT(inout) :: arg7 !! Domain size (unitless)
(2) Comment at end of loop: For single loops and nested loops (loop within a loop) longer than about ten lines, it is helpful to repeat the loop instructions as a comment next to the END statement, as so:
JP : Always use the INTENT and DIMENSION arguments in the declaration. This helps readability and allows the compiler to do consistency checks.
eta_3_surf = 0.0d0 DO j = 1, nlevels DO k = j, 1 jfactor = jfactor * (1.0d0 - jomega(k)) ... ten more lines of code ... END DO ! k = j, 1 eta_3_surf = eta_3_surf + (jomega_surf * jomega(j) * jfactor * sbsigma * temp_leaf_pres(j)**4.0d0) END DO ! j = 1, nlevels
MJM: Lionel has suggested to label the loops instead. This would also be fine.
JP: Loop labels remind me too much of FORTRAN 4 and GOTO statements :-)
(3) Equations: Use brackets to improve readability (even though addition and subtraction are treated ahead of division and multiplication, it is easier for the reader to scan the equations if this is made explicit). Also, if the equation runs over several lines, try to break the expression at a close bracket or an addition/subtraction.
e.g. a = (b * i) + (c / n) is easier to read than a = b * i + c / n
(4) Line length: Although the maximum line length of Fortran90 is 132 characters, keep your code to less than 80 characters per line - this preserves the formatting for those who work with small terminal windows on their computer and when producing a printout.
NOTE: If you are an emacs user, loading the column-marker.el file will help you highlight column 80 so you know where to terminate the line at.
(5) Use of space: Always indent the code within conditional statements or loops, but don't use tabs, as the formatting will not be preserved across platforms.
NOTE: The emacs indent function works well for this, since it indents with spaces (even if you use the tab key).
Variable definitions
Choosing where and when to define particular variables; portability between compilers; allocation/de-allocation of arrays etc.
(1) Names of counters: Limit to four characters. If the variable being looped over begins with "n", replace the "n" by an "i" for the counter name. For example, nvm -> ivm, npts -> ipts,
SL : nelements -> ielem, naprts -> ipart, ncirc -> icirc, nleafage -> ilage (the more logic ileaf is already used)
Variables which have the same meaning but are named differently throughout the code: kjpindex/npts
JP : I am not sure that replacing the "n" by an "i" helps readability. I certainly always use the n as a short-cut for numbers. So if you see nlayer as an index it means the loop is over the number of layers.
Debugging and speed optimisation
guidelines for making loops more efficient, eliminating dead code
(1) 'bavard' (chatterbox!): is an externalised parameter that can be used to determine the nature of WRITE statements in the code for monitoring and debugging. It is proposed that for the trunk code a uniform set of parameters is used to control the size of the output text files as appropriate to the task in hand.
For example:
IF bavard EQUALS 0 then no output
IF bavard >= 1 then parameters used are reported
IF bavard >= 2 then entering and leaving subroutines are reported
IF bavard >= 3 then input parameters to major subroutines are reported
(2) Don't forget the ELSE: If you are using an IF...ELSEIF....ENDIF loop, always make sure you include an ELSE statement at the end to catch any situation not covered in the other cases. This should be done even if the ELSE statement doesn't do anything, just so that other people know that nothing needs to be done in some cases. Too many bugs are found because an IF statement is not triggered due to something the programmer didn't think of. This is especially problematic when the programmer thinks to him/herself, "This value will always be in this range, so I don't have to consider other possibilities"...and then one day things change.
IF()THEN ! do something blah ELSEIF()THEN ! do something else blah blah ELSE ! do something, or not, but at least you should be aware of the possibility ENDIF
MJM: if nothing is done in the ELSE, it could have something written in a comment instead. As long as it is clear that the programmer has thought of the possibility that the loop is not triggered, it's fine by me.
JP : Yes IF statement always should have an ELSE ... even only for a comment saying we should never be here. Even better is a call to ipslerr to stop the code and shay we should not have been here.
(3) Don't stick to one compiler: No compiler will catch all your bugs. Always use multiple compilers to check, including all the error flags. For example, I first compile locally with " gfortran -c -cpp -O0 -pg -g -Wall -ffpe-trap=invalid,zero -fbacktrace -fcheck=all -fbounds-check -pedantic". Then I compile on asterix with "ifort -c -cpp -g -O0 -debug -fpe0 -ftrapuv -traceback". I'm hoping to do it on Curie soon, too, since they have the NAG compiler there which is good with error checking. Finally, running the code through valgrind will catch every single piece of memory that you use without initilizing it (apart from some extreme cases), more thoroughly than a compiler. "valgrind -v --track-origins=yes ../../../bin/orchidee_ol". This is very slow, but a nice check. There are some errors in the NetCDF calling routines that I haven't tracked down, but all the DOFOCO code has been cleaned.
Code structure
Do we add more folders to handle parts of the code which are not neatly classified by the existing folders (e.g., effective LAI), or do we put everything in one folder and just impose naming conventions on all file names (e.g., stomate_*f90, sechiba*f90, which are already somewhat done)?
JP : This is one of the parts where I recall the discussion we have had with Marie-Alice to define our starting convention end of the 90s. Module naming convention : short names and all the routines inside have names of the type modulename_tasks_executed. So in stomate.f90 you can only find subroutines of the type stomate_*.
JP : Modules need to be self sufficient. They have their own prognostic variables for which they need to manage the allocation, restart, .... Prognostic variables are not to be exchanged with other modules (i.e. private to the module) else you cannot change one module without affecting the others.
JP : Modules have at least 3 subroutines : 1) module_main : manages the actions to be taken and the calling sequence, 2) module_init : initialise the module (configuration, restart, allocation) 3) module_clear : deallocate the internal memory. Only module_main and module_clear can be public.
JP : the first call to module_main triggers only the module_init phase and performs no computation as not all the input variable are guaranteed to be valid. It is only the second call to module_main that will start the calculations.
JP : This structure seems to pose problems for the automatic differentiation of the code. So it needs some modification. Also we used "USE model" to give infrastructural information to the module (grid, I/O, flags, ...) and this seemed to encourage other programmers to add USE for physical information. So there is information exchanged between the "physical modules" in the back of the subroutine calls. This is not healthy at all !
JP : A rigid module structure forces us to think ahead of development about the tasks reserved to each module and what information needs to be exchanged with other modules. This might sound like a pain as it makes "small fixes" difficult but it contributes in my mind to the clarity and modularity of the code. Planing model structure before development is a very healthy exercise we need to encourage that. We have the Group de Projet to do this now.
Points to Discuss (as recommended by Philippe)
(1) Use of structures to group related variables
(2) Long argument lists, structures, or "shared modules" to exchange variables between routines?
MJM: I like long argument lists or structures, since then it is easier to see what is modified by the routine
JP : I would be in favour of using structures for the infrastructure variables and thus shorten the argument list. USE statements should also be there to provide tools to the module (interpolation routines, IO routines, ...). No variable should be exchanged with a USE statement.
Signatories
If you think that the policies outlined are good, feel free to leave your initials and the date. This is dangerous because it indicates that you are willing to receive critiquing emails from other devs if some of your code isn't up to the policies above. Not only receive them, but you will also refrain from from sending snarky emails in response!
MJM, May 02, 2013
FM, May 02, 2013
JR, May 02, 2013