Proofing Tool GUI

V3.0 — ??.???.2017
© 2013-2017 Marco A.G.Pinto and Community Contributors.
Freely distributable and modifiable under the
Apache License v2.0.
SEMI-FINISHED MANUAL — REQUIRES A FULL REVISION WHEN I HAVE THE TIME!
LAST UPDATE: 2017-06-06


Index
1 — Introduction
2 — Copyright & DISCLAIMER
3 — Contacts
4 — Thanks
5 — How it works
  5.1a — Using UTF-8
  5.1b — EOL Windows VS Linux
  5.1c — Packing the files into Extensions
  5.1d — Shortcut keys
  5.2 — Dictionary
      5.2.1 — Creating a Dictionary
      5.2.2 — Editing a Dictionary
      5.2.3 — How Suffixes/Prefixes work
      5.2.4 — What is position and rule
      5.2.5 — Menus
  5.3 — Thesaurus
      5.3.1 — Creating a Thesaurus
      5.3.2 — Editing a Thesaurus
      5.3.3 — Menus
  5.4 — Hyphenation
      5.4.1 — Creating a Hyphenation
      5.4.2 — Editing a Hyphenation
      5.4.3 — Menus
  5.5 — Autocorrect
      5.5.1 — Creating an Autocorrect
      5.5.2 — Editing an Autocorrect
      5.5.3 — Menus
6 — History



1 — Introduction
An open-source tool coded in PureBasic for editing the Dictionary/Thesaurus/Hyphenation/Autocorrect files of OpenOffice/LibreOffice, Firefox, Thunderbird and SeaMonkey, provided they are in UTF-8 format.

This program was originally developed to easily edit the synonyms of OpenOffice and LibreOffice.

I had this idea after asking to the persons in charge of the pt_PT project, from Minho University in Portugal, what I should do to suggest synonyms since only suggested words for the Portuguese speller were added.

I was told that they didn't know how to add synonyms since the guy in charge of that project left it long-ago (2006).

Later, I wanted to make it compatible with Firefox and Thunderbird, after it became possible to edit dictionaries. I hoped that in the future someone would use it in Thunderbird and fix the en_GB speller which was full of typos and missing words. Since no one volunteered, I took this task myself.

This is where my idea came from: develop something easy to use since I tried some official tools for the tasks and I didn't understand anything on them, not even how to use them.

My tool is so intuitive that even a child can use it.

On
25.Aug.2013 I released a "forked" en_GB speller V2.00. The speller has been made available to OpenOffice,LibreOffice, Firefox, Thunderbird and SeaMonkey. So far, I have added 29'109 words (as of V2.49).


2 — Copyright & DISCLAIMER
This program is copyrighted to Marco A.G.Pinto and Community Contributors.

It is freely distributable and modifiable under the Apache License v2.0.


3 — Contacts
(coder)

S.Mail:
Marco A.G.Pinto

Apartado 3083
2746-501 Queluz
(Portugal )

E.Mail:
marcoagpinto@mail.telepac.pt


4 — Thanks
Some special thanks go to:
Groups/Organisations:
 — Apache Community;
 — LanguageTool Community;
 — LibreOffice Community;
 — Mozilla Community;
 — PureBasic Community.

Persons:

 — Alberto Simões (Minho University);
 — Alexandro Colorado (Apache OpenOffice);
 — Andrea Pescetti (Apache OpenOffice);
 — Andreas Mantke (LibreOffice);
 — Andrew Ferguson (PureBasic);

 — António Manuel Dias (former pt_PT maintainer);
 — Áron Budea (LibreOffice);
 — Ashley Scott (PureBasic);
 — Bernd Krüger-Knauber (PureBasic);
 — Chris Saxon (PureBasic);
 — Daniel Naber (LanguageTool);
 — Dennis Roczek (LibreOffice);
 — Filiep Spyckerelle (European Parliament);
 — Frédéric Laboureur (PureBasic);
 — Gervase Markham (Mozilla);

 — Guy Waterval (Apache OpenOffice);
 — Heinz Urban (PureBasic);
 — Ian Neal (Mozilla);
 — Jonathan Kew (Mozilla);

 — José Almeida (Minho University);
 — Kevin Scannell (Mozilla);
 — Martin Srebotnjak (LanguageTool);
 — Matthias Mailänder (LanguageTool);

 — Mauro Trevisan (LibreOffice);
 — Pedro Marques (IADE — Creative University);
 — Peter Chamberlin (Mozilla);
 — Ricardo Palomares Martínez (Apache OpenOffice);
 — Shantanu Oak (LibreOffice);
 — srod (PureBasic);
 — Stuart Swales (Apache OpenOffice);
 — Thomas Schulz (PureBasic);
 — Tiago Santos (LibreOffice)(LanguageTool).


5 — How it works
5.1a — Using UTF-8
This tool was made to work with UTF-8 encoding.

A good trick to convert the old encoding formats to UTF-8 is to use, for example, the Notepad++ editor for Windows.

Simply open the files with it, change the encoding to UTF-8 using the menu: Encoding → Convert to UTF-8 without BOM, so that accents appear well.

Then, use the Save As option and select "Normal text file (*.txt)" and it is done.

Please don't forget to change by hand in the header of the files, the word that has the old format, with the new one.

The headers with the font encoding are inside the files. See for example Version 2.4 (01/09/2007) of the Italian files:
— The Dictionary (.DIC + .AFF):
The .DIC has no keyword.

The .AFF has the following keyword:
SET ISO8859-15 → Replace with SET UTF-8

— The Thesaurus (.DAT):
It has in the first line:
ISO8859-15 → Replace with UTF-8


5.1b — EOL Windows VS Linux
I have done some tests saving in Windows and Linux and the Windows files become bigger than in Linux.

I believe this happens because the End of Line characters is different both in Windows and in Linux.

I have edited both files to compare and both have the same number of lines with the same words.

I believe this means that they both work, unless someone sees otherwise.



5.1c — Packing the files into Extensions
To create extensions you will have to use other package which I don't know yet.

You should use the SORT button before you can consider your Dictionary/Thesaurus/Hyphenation ready for being packed into an extension.

Making extensions for Mozilla seems easier than making for OpenOffice/LibreOffice, since for them it is more complex due to the fact that they can have multiple languages in one archive.


5.1d — Shortcut keys
TAB SWITCH RIGHT — CTR+TAB
TAB SWITCH LEFT — SHIFT+CTR+TAB
OPEN — CTR+O
SAVE — CTR+S
SAVE AS — SHIFT+CTR+S
FIND — CTR+F
ADD — CTR+A
GOTO — CTR+G
DELETE — DEL
EXIT A WINDOW & ABORT OPEN/SAVE/SAVE AS — <ESC>
EXIT — CTR+Q



5.2 — Dictionary
5.2.1 — Creating a Dictionary
If you have a Dictionary in memory, use PURGE to delete all entries.

To create a Dictionary from zero you just have to press the button ADD to add words.

Use EDIT or double-click to change information regarding the words.

Use DELETE or <DEL> to remove entries.

The format of the Dictionary is two UTF-8 format files with the extension .DIC and .AFF .

Even though the tool reads the .AFF file, I still haven't read documentation about how it works. This means that creating a Dictionary from scratch will require some previous knowledge.

Now-and-then remember to SAVE/SAVE AS to play safe.


5.2.2 — Editing a Dictionary
First download the extension of the language you intend to use, from the official pages.

You should have an .OXT or .XPI file which you rename to .ZIP in order to extract its contents to HDD.

Press OPEN and select the .DIC file of the Dictionary and my tool will also open the associated .AFF file.

Now just ADD/EDIT/DELETE the current entries.

Now-and-then remember to SAVE/SAVE AS to play safe.


5.2.3 — How Suffixes/Prefixes work
A small explanation how to make suffixes/prefixes work, based on the e-mail written by Ricardo Palomares Martínez:

While editing dictionaries, you can add one or more identifiers in front of a word, after a "/". For example, the en_GB .AFF uses the identifier "S" to create plural:
party/S

This will look in the .AFF file and find:
SFX S Y 9
SFX S y ies [^aeiou]y
SFX S 0 s [aeiou]y
SFX S 0 es [sxz]
SFX S 0 es [cs]h
SFX S 0 s [^cs]h
SFX S 0 s [ae]u
SFX S 0 x [ae]u
SFX S 0 s [^ae]u
SFX S 0 s [^hsuxyz]

SFX S Y 9
SFX → It is a suffix (PFX would mean a prefix).
S   → The suffix identifier.
Y   → Y for YES. It means the rule can be cross-used with other prefixes and suffixes.
       If N the rule can't be applied together with other affixes the word might have.
9   → The number of lines related to this rule.

SFX S y ies [^aeiou]y
SFX       → It is a suffix (PFX would mean a prefix).
S         → It is the suffix/prefix identifier.
y         → For a suffix it is the letter(s) to be removed from the end of the word.
             For a prefix, from the beginning of the word.
ies       → For a suffix, it is the letter(s) to be added at the end of a word.
             For a prefix, from the beginning of the word.

[^aeiou]y → Condition in regexp notation. Here, the rule is applied to words ending with
             a "y" and the letter next to the last is NOT a, e, i, o or u.
             Yes, the ^ means that the letters mustn't match.

So, party/S would produce: parties

And, boy/S would produce: boys, triggering the following rule which has a 0 saying that no letters are replaced, just added. It applies to words ending with a "y". There is no ^ which means that the second letter from the right must be a, e, i, o or u.
SFX S 0 s [aeiou]y

Also notice that if words have capitalised letters, the Hunspell in the used software will only accept them with capitalised letters exactly like in the .DIC (it suggests a typo if different).


5.2.4 — What is position and rule
The derived words listicongadget has the fields: "Position" and "Rule".

"Position" is the characters position of the first line (header) of each rule used. For example:
SFX S Y 9
(It is a Suffix with identifier "S", "Yes" and "9" rules in it)

Then, inside the dictionary editor, you now have a column with the rule number after the header. Double-clicking in a listicongadget line will jump to the header, then you will just have to scroll a few lines down to the rule number.

Please notice that the editor gadget in the add/edit word window has a "clean" version of the .AFF with space repetitions removed in order to be faster finding the codes (less characters to process).


5.3 — Thesaurus
5.3.1 — Creating a Thesaurus

If you have a Thesaurus in memory, use PURGE to delete all entries.

To create a Thesaurus from zero you just have to press the button ADD to add synonyms.

Use EDIT or double-click to change information regarding the synonyms.

Use DELETE or <DEL> to remove entries.

The format of the Thesaurus is a UTF-8 format file with the extension .DAT .

Now-and-then remember to SAVE/SAVE AS to play safe.


5.3.2 — Editing a Thesaurus
First download the extension of the language you intend to use, from the official pages.

You should have an .OXT file which you rename to .ZIP in order to extract its contents to HDD.

Press OPEN and select the .DAT file of the Thesaurus.

Now just ADD/EDIT/DELETE the current entries.

Now-and-then remember to SAVE/SAVE AS to play safe.


In build 82 (14.Aug.2015) I improved the Thesaurus part. It is now possible to use DEL to delete synonyms and added a menu "Thesaurus Tools" with options being the most important one the "Combine" which combines all meanings but only works with simple lines:
x|2
a
b
would generate:
a|2
x
b
and:
b|2
a
x



PTG now creates .idx files for the Thesaurus.





5.3.3 — Menus
5.3.x—Unduplicate simple meanings
What is the definition of a "duplicate" meaning?

It means for example:
apple|3
one
two
one

It means that it would remove the "one" once becoming:
apple|2
one
two

It checks line by line and not column by column:
apple|1

-|one|two|one

This wouldn't change the meanings.


  2) "Sort simple meanings" will work also line by line in the Thesaurus meanings.

5.4 — Hyphenation
5.4.1 — Creating a Hyphenation
Not working yet.


5.4.2 — Editing a Hyphenation
Not working yet.


5.5 — Autocorrect
5.5.1 — Creating an Autocorrect

5.5.2 — Editing an Autocorrect
First download the DocumentList.xml of the language you intend to use, from the official AOO/LO pages.

The autocorrect files in AOO/LO are stored in the path:
$instdir/share/autocorr/acor_*.dat which are actually zipped files containing the XML files.

Rename the .DAT files to .ZIP and extract the contents.

Using Notepad++ or other tool, format the DocumentList.xml so that it is in UTF-8 and it uses the structure (you can copy/paste this first line over the XML entry):
<?xml version="1.0" encoding="UTF-8" ?> <block-list:block-list xmlns:block-list="http://openoffice.org/2001/block-list">
<block-list:block block-list:abbreviated-name="incorrect1" block-list:name="correct1"/>
<block-list:block block-list:abbreviated-name="incorrect2" block-list:name="correct2"/>

<block-list:block block-list:abbreviated-name="incorrect3" block-list:name="correct3"/>

<block-list:block block-list:abbreviated-name="incorrect4" block-list:name="correct4"/>

   
etc. (use lines like the previous)
</block-list:block-list>

Have in mind that you must have per line only one pair of incorrect/correct. I noticed that the .XML I edited had all the text in one single line, so use Notepad++ to create a return at the end of each line:



Press OPEN and select the DocumentList.xml file.

Now just ADD/EDIT/DELETE the current entries.

Now-and-then remember to SAVE/SAVE AS to play safe.


6 — History

V3.0 — ??.???.2017
Compiled with PureBasic 5.XX.

— The manual has been rewritten;

— The GUI has been redesigned:
  1) Now supports two resolutions: 1024x600 and 1280x600;
  2) New menus and modern menus look for Windows;
  3) Several new options and gadgets;
  4) GTK3 support;
  5) Linux now has an icon when running.


— It uses dynamic arrays which makes all load/save operations ultrafast;

— Shortcut keys;

— Enhanced pop-up menu which can be used on the ListIconGadgets items with options for smart/faster use;

— Dictionary Editor now supports:
  1) Pop-up menu to copy the selected line word into the clipboard;
  2) Taboo warning if the NOSUGGEST flag is used;
  3) It is now possible to have custom
"AFF Aid" files with 16x16 PNGs flags;
  4) Replaced the ListIconGadget field "Position" with "Code Position".
  5) Support for FLAG NUM and LONG (with recursivity tested in twofold);
  6) Improved: If a code isn't found in the .aff it no longer exits the decoding function;
  7) Major speed gain in the .AFF optimising code (gl_ES);
  8) It now accepts: \/ are escaped "/" in dictionary words;
  9) It now combines PREFIXES against PRIMARY+SUFFIXES.

— Thesaurus:
  1) Sorting synonyms naturally by replacing | with chr(9) and after sorting with |;
  2) Update number of meaning while editing synonyms now supports Mac OS line endings;
  3) Saving the Thesaurus now creates an .idx file.


— Invalid characters, such as spaces, while inserting data, turns the gadgets background to red;

— Added a "Tools" menu for the preferences:
  1) Prefs now load and save in a file named
"ptg3.prefs" allowing to select a dynamic number of lines.
     This makes it compatible with all OSes.

— It saves Dictionary + Thesaurus + Autocorrect with #LF$ instead of #CRLF$ for Linux mode;

— Cleaned the code;

— Speeded up several operations;

— Better UTF-8 warnings.