.. _typo:
====
Typo
====
Typo may be hard to find in all sources (code, documentation, tools) files.
:command:`hunspell` help to detect those mistakes.
:command:`pylint` can also be used for Python source.
They use dictionaries providing natural langages list of words.
Here is the flow
.. blockdiag:: typo_blockdiag.dot
Missing words in dictionaries by categories
===========================================
System dictionaries lack some scientific words (ex barocline),
computer langages reserved words, acronyms (ex. IGCMG) and
code variables names.
Those missing words can be listed in specific files.
In the directory :file:`docs/manual/for_typo/`, there are
:file:`*.txt` files containing words for a specific category.
For example missing scientific words can be added in :file:`jargon.txt`
There is also a file :file:`type.aff` which will be used by
:command:`hunspell`.
Build one supplemental dictionary
+++++++++++++++++++++++++++++++++
Supplemental dictionaries can be joined in one and be added to the list of
dictionaries used in an spelling check of via :command:`hunspell`.
They must be encoded in UTF-8.
.. code-block:: bash
listf=$(find ${PROJECT}/docs/manual/for_typo -name "*.txt")
nontypo=${PROJECT_LOG}/nontypo
nontypo_uniq=${PROJECT_LOG}/nontypo_uniq
rm -f ${nontypo} ${nontypo_uniq}
for onefile in ${listf}
do
cat ${onefile} >> ${nontypo}
done
sort -u ${nontypo} | sort --ignore-case > ${nontypo_uniq}
The list ${nontypo_uniq} can also be used to check for typo in
documentation files and source code.
First alter the list of variable to produce a :file:`.dic` file
which can be used by :command:`hunspell` (i.e. add the number of lines
at the top)
.. code-block:: bash
nontypo_uniq_dic=${PROJECT_LOG}/nontypo_uniq.dic
linecount=$(wc -l < ${nontypo_uniq})
sed "1i ${linecount}" ${nontypo_uniq} > ${nontypo_uniq_dic}
Associated :file:`nontypo_uniq.aff` file already exists in
:file:`${PROJECT}/docs/manual/for_typo`:
.. code-block:: bash
ln -s ${PROJECT}/docs/manual/for_typo/nontypo_uniq.aff ${PROJECT_LOG}/nontypo_uniq.aff
Now we have :file:`${PROJECT_LOG}/nontypo_uniq.dic` and
:file:`${PROJECT_LOG}/nontypo_uniq.aff` usable for :command:`hunspell`.
Check typo in files
===================
.. todo::
find why -p option of hunspell is not ok. now we have to execute this command
${PROJECT_LOG} where .dic and .aff files are located.
Check typo in wiki pages
++++++++++++++++++++++++
Until :ref:`tracwiki_migration` is not achieved, we have to check typo in
HTML files produced by trac on the
``_
By convention, it has been decided to start all |igcmg_doc| pages names by ``Doc``.
To get all wiki Doc* pages URI of ``_ [#tracoops]_:
.. code-block:: bash
excluded_uri=DocYgraphvizLibigcmprod
list_uri=$(xsltproc \-\-novalid \
${PROJECT}/docs/manual/for_tracwiki/titleindex.xsl http://forge.ipsl.jussieu.fr/igcmg_doc/wiki/TitleIndex | \
grep "/igcmg_doc/wiki/Doc" | grep -v ${excluded_uri} | sort -u | \
sed -e "s@^@http://forge.ipsl.jussieu.fr/@")
.. [#tracoops] we exclude DocYgraphvizLibigcmprod because Trac detected an internal error ... some graphviz trac plugins issue
To download those URI locally:
.. code-block:: bash
dirhtml=${PROJECT_LOG}
rm -f ${dirhtml}/Doc*
for uri in ${list_uri}
do
wget -P ${dirhtml} ${uri}
done
We can now check typo in the HTML files
.. code-block:: bash
cd ${PROJECT_LOG}
listf=$(find ${dirhtml} -name "Doc*")
hunspell_out=${PROJECT_LOG}/hunspell_out
hunspell_out_uniq=${PROJECT_LOG}/hunspell_out_uniq
rm -f ${hunspell_out} ${hunspell_out_uniq}
for onefile in ${listf}
do
LC_ALL=C;hunspell -d en_US,nontypo_uniq --check-url -i utf-8 -l < ${onefile} >> ${hunspell_out}
done
sort -u ${hunspell_out} | sort --ignore-case > ${hunspell_out_uniq}
.. warning::
side effect of LC_ALL=C;
how to avoid LC_ALL change after execution execution
:file:`${hunspell_out_uniq}` contains :
- typo to be fixed in wiki pages
- false positif to be added in a :file:`docs/manual/for_typo/`
- false positif to be ignored because to hard to add (encoding issue for
Greek word, etc.)
.. warning::
The following command print only lines present in both ${hunspell_out_uniq} ${nontypo_uniq}.
.. code-block :: bash
comm --nocheck-order -12 ${hunspell_out_uniq} ${nontypo_uniq}
If not empty, the supplemental dictionary has not being used by :command:`hunspell`
.. todo::
give some ideas (LC, ?)
To find one of the wrong spelling in downloaded HTML pages:
.. code-block:: bash
w=amonch # take a real one from ${hunspell_out_uniq}
find ${dirhtml} -name "Doc*" -exec grep -Hi ${w} {} \;
.. note::
It is also possible to find wrong spelling via the search facility
on the trac interface but resultats may differ (case sensitivity,
trac plugins)
.. warning::
Correction have to be done via the wiki interface of the forge.