source: trunk/docs/manual/source/developers/guides/typo.rst @ 9

Last change on this file since 9 was 9, checked in by pinsard, 9 years ago

add some supplemental dictionaries; more human readable result

File size: 4.9 KB

Typo

Typo may be hard to find in all sources (code, documentation, tools) files.

:command:`hunspell` help to detect those mistakes.

?

:command:`pylint` can also be used for Python source.

?

They use dictionaries providing natural langages list of words.

Here is the flow

?
.. blockdiag:: typo_blockdiag.dot

Missing words in dictionaries by categories

System dictionaries lack some scientific words (ex barocline), computer langages reserved words, acronyms (ex. IGCMG) and code variables names.

Those missing words can be listed in specific files.

In the directory :file:`docs/manual/for_typo/`, there are :file:`*.txt` files containing words for a specific category.

??

For example missing scientific words can be added in :file:`jargon.txt`

?

There is also a file :file:`type.aff` which will be used by :command:`hunspell`.

??

Build one supplemental dictionary

Supplemental dictionaries can be joined in one and be added to the list of dictionaries used in an spelling check of via :command:`hunspell`.

?

They must be encoded in UTF-8.

listf=$(find ${PROJECT}/docs/manual/for_typo -name "*.txt")
nontypo=${PROJECT_LOG}/nontypo
nontypo_uniq=${PROJECT_LOG}/nontypo_uniq
rm -f ${nontypo}  ${nontypo_uniq}
for onefile in ${listf}
do
   cat ${onefile} >> ${nontypo}
done
sort -u ${nontypo} | sort --ignore-case > ${nontypo_uniq}

The list ${nontypo_uniq} can also be used to check for typo in documentation files and source code.

First alter the list of variable to produce a :file:`.dic` file which can be used by :command:`hunspell` (i.e. add the number of lines at the top)

??
nontypo_uniq_dic=${PROJECT_LOG}/nontypo_uniq.dic
linecount=$(wc -l < ${nontypo_uniq})
sed "1i ${linecount}" ${nontypo_uniq} > ${nontypo_uniq_dic}

Associated :file:`nontypo_uniq.aff` file already exists in :file:`${PROJECT}/docs/manual/for_typo`:

??
ln -s ${PROJECT}/docs/manual/for_typo/nontypo_uniq.aff ${PROJECT_LOG}/nontypo_uniq.aff

Now we have :file:`${PROJECT_LOG}/nontypo_uniq.dic` and :file:`${PROJECT_LOG}/nontypo_uniq.aff` usable for :command:`hunspell`.

???

Check typo in files

?
.. todo::
   find why -p option of hunspell is not ok. now we have to execute this command
   ${PROJECT_LOG} where .dic and .aff files are located.

Check typo in wiki pages

Until :ref:`tracwiki_migration` is not achieved, we have to check typo in HTML files produced by trac on the http://forge.ipsl.jussieu.fr/igcmg_doc/wiki/

?

By convention, it has been decided to start all |igcmg_doc| pages names by Doc.

To get all wiki Doc* pages URI of http://forge.ipsl.jussieu.fr/igcmg_doc/ [1]:

excluded_uri=DocYgraphvizLibigcmprod
list_uri=$(xsltproc \-\-novalid \
${PROJECT}/docs/manual/for_tracwiki/titleindex.xsl http://forge.ipsl.jussieu.fr/igcmg_doc/wiki/TitleIndex | \
grep "/igcmg_doc/wiki/Doc" | grep -v ${excluded_uri} | sort -u | \
sed -e "s@^@http://forge.ipsl.jussieu.fr/@")
[1]we exclude DocYgraphvizLibigcmprod because Trac detected an internal error ... some graphviz trac plugins issue

To download those URI locally:

dirhtml=${PROJECT_LOG}
rm -f ${dirhtml}/Doc*
for uri in ${list_uri}
do
    wget -P ${dirhtml} ${uri}
done

We can now check typo in the HTML files

cd ${PROJECT_LOG}
listf=$(find ${dirhtml} -name "Doc*")
hunspell_out=${PROJECT_LOG}/hunspell_out
hunspell_out_uniq=${PROJECT_LOG}/hunspell_out_uniq
rm -f ${hunspell_out}  ${hunspell_out_uniq}
for onefile in ${listf}
do
   LC_ALL=C;hunspell -d en_US,nontypo_uniq --check-url -i utf-8 -l < ${onefile} >> ${hunspell_out}
done
sort -u ${hunspell_out} | sort --ignore-case > ${hunspell_out_uniq}

Warning

side effect of LC_ALL=C; how to avoid LC_ALL change after execution execution

:file:`${hunspell_out_uniq}` contains :

?
  • typo to be fixed in wiki pages

  • false positif to be added in a :file:`docs/manual/for_typo/`

    ?
  • false positif to be ignored because to hard to add (encoding issue for Greek word, etc.)

Warning

The following command print only lines present in both ${hunspell_out_uniq} ${nontypo_uniq}.

comm --nocheck-order -12 ${hunspell_out_uniq} ${nontypo_uniq}

If not empty, the supplemental dictionary has not being used by :command:`hunspell`

?
?
.. todo::
   give some ideas (LC, ?)

To find one of the wrong spelling in downloaded HTML pages:

w=amonch # take a real one from ${hunspell_out_uniq}
find ${dirhtml} -name "Doc*" -exec grep -Hi ${w} {} \;

Note

It is also possible to find wrong spelling via the search facility on the trac interface but resultats may differ (case sensitivity, trac plugins)

Warning

Correction have to be done via the wiki interface of the forge.

Docutils System Messages

?
Note: See TracBrowser for help on using the repository browser.