.. _typo: ==== Typo ==== Typo may be hard to find in all sources (code, documentation, tools) files. :command:`hunspell` help to detect those mistakes. :command:`pylint` can also be used for Python source. They use dictionaries providing natural langages list of words. Here is the flow .. blockdiag:: typo_blockdiag.dot Missing words in dictionaries by categories =========================================== System dictionaries lack some scientific words (ex barocline), computer langages reserved words, acronyms (ex. IGCMG) and code variables names. Those missing words can be listed in specific files. In the directory :file:`docs/manual/for_typo/`, there are :file:`*.txt` files containing words for a specific category. For example missing scientific words can be added in :file:`jargon.txt` There is also a file :file:`type.aff` which will be used by :command:`hunspell`. Build one supplemental dictionary +++++++++++++++++++++++++++++++++ Supplemental dictionaries can be joined in one and be added to the list of dictionaries used in an spelling check of via :command:`hunspell`. They must be encoded in UTF-8. .. code-block:: bash listf=$(find ${PROJECT}/docs/manual/for_typo -name "*.txt") nontypo=${PROJECT_LOG}/nontypo nontypo_uniq=${PROJECT_LOG}/nontypo_uniq rm -f ${nontypo} ${nontypo_uniq} for onefile in ${listf} do cat ${onefile} >> ${nontypo} done sort -u ${nontypo} | sort --ignore-case > ${nontypo_uniq} The list ${nontypo_uniq} can also be used to check for typo in documentation files and source code. First alter the list of variable to produce a :file:`.dic` file which can be used by :command:`hunspell` (i.e. add the number of lines at the top) .. code-block:: bash nontypo_uniq_dic=${PROJECT_LOG}/nontypo_uniq.dic linecount=$(wc -l < ${nontypo_uniq}) sed "1i ${linecount}" ${nontypo_uniq} > ${nontypo_uniq_dic} Associated :file:`nontypo_uniq.aff` file already exists in :file:`${PROJECT}/docs/manual/for_typo`: .. code-block:: bash ln -s ${PROJECT}/docs/manual/for_typo/nontypo_uniq.aff ${PROJECT_LOG}/nontypo_uniq.aff Now we have :file:`${PROJECT_LOG}/nontypo_uniq.dic` and :file:`${PROJECT_LOG}/nontypo_uniq.aff` usable for :command:`hunspell`. Check typo in files =================== .. todo:: find why -p option of hunspell is not ok. now we have to execute this command ${PROJECT_LOG} where .dic and .aff files are located. Check typo in wiki pages ++++++++++++++++++++++++ Until :ref:`tracwiki_migration` is not achieved, we have to check typo in HTML files produced by trac on the ``_ By convention, it has been decided to start all |igcmg_doc| pages names by ``Doc``. To get all wiki Doc* pages URI of ``_ [#tracoops]_: .. code-block:: bash excluded_uri=DocYgraphvizLibigcmprod list_uri=$(xsltproc \-\-novalid \ ${PROJECT}/docs/manual/for_tracwiki/titleindex.xsl http://forge.ipsl.jussieu.fr/igcmg_doc/wiki/TitleIndex | \ grep "/igcmg_doc/wiki/Doc" | grep -v ${excluded_uri} | sort -u | \ sed -e "s@^@http://forge.ipsl.jussieu.fr/@") .. [#tracoops] we exclude DocYgraphvizLibigcmprod because Trac detected an internal error ... some graphviz trac plugins issue To download those URI locally: .. code-block:: bash dirhtml=${PROJECT_LOG} rm -f ${dirhtml}/Doc* for uri in ${list_uri} do wget -P ${dirhtml} ${uri} done We can now check typo in the HTML files .. code-block:: bash cd ${PROJECT_LOG} listf=$(find ${dirhtml} -name "Doc*") hunspell_out=${PROJECT_LOG}/hunspell_out hunspell_out_uniq=${PROJECT_LOG}/hunspell_out_uniq rm -f ${hunspell_out} ${hunspell_out_uniq} for onefile in ${listf} do LC_ALL=C;hunspell -d en_US,nontypo_uniq --check-url -i utf-8 -l < ${onefile} >> ${hunspell_out} done sort -u ${hunspell_out} | sort --ignore-case > ${hunspell_out_uniq} .. warning:: side effect of LC_ALL=C; how to avoid LC_ALL change after execution execution :file:`${hunspell_out_uniq}` contains : - typo to be fixed in wiki pages - false positif to be added in a :file:`docs/manual/for_typo/` - false positif to be ignored because to hard to add (encoding issue for Greek word, etc.) .. warning:: The following command print only lines present in both ${hunspell_out_uniq} ${nontypo_uniq}. .. code-block :: bash comm --nocheck-order -12 ${hunspell_out_uniq} ${nontypo_uniq} If not empty, the supplemental dictionary has not being used by :command:`hunspell` .. todo:: give some ideas (LC, ?) To find one of the wrong spelling in downloaded HTML pages: .. code-block:: bash w=amonch # take a real one from ${hunspell_out_uniq} find ${dirhtml} -name "Doc*" -exec grep -Hi ${w} {} \; .. note:: It is also possible to find wrong spelling via the search facility on the trac interface but resultats may differ (case sensitivity, trac plugins) .. warning:: Correction have to be done via the wiki interface of the forge.