Tutorial-e
rusconv v.3.11 tutorial.
Due historical reasons our country does not have standard encoding. So sometimes it is impossible to read content of files. In this cases you should use special programs which convert encodings. We are hope you choose rusconv.
In past most popular encoding was alternative (or codepage 866) which was used in MS-DOS. Now leader is windows encoding (codepage 1251). But encoding KOI-8 is still important. It was very usefull in first stages of growing of Russian Internet. Very rarely you can find text written in Macintosh. There were a lot of other encodings but they are died. It is impossibe to use one encoding to read text written in another one.To avoid this some people write texts in latinica (transliteracija, volapjuk) - russian text spelled latin letters. More, different operating systems use different methods to code end of line. DOS and Windows code it by two chars, UNIX - by one. So text written in UNIX will be as one long line in DOS/windows.
All this troubles can be solved by rusconv.
Content:
Printing help.
Converting file from one encoding to another.
Changing type of lines.
Abbreviations of flags for most often tasks.
File overwriting.
Converting to several encodings and specifing own extensions for files.
Converting of several files simultaniously.
Specifing output directory.
Using long file names and network files.
Other flags.
How to recognize file encoding.
Using rusconv in command scripts.
Printing help.
Rusconv is the program with a lot of flags. If you forget some of them you can get help from rusconv. To do this simple run rusconv without any arguments or give flag '-h'.
Examples:
DOS: C:\UTIL>RUSCONV C:\UTIL>rusconv /h UNIX: $rusconv -h $rusconv
Using of rusconv in windows is more difficult than in other operating systems because rusconv is command-line oriented program. It is recommended to use any file manager like Norton Commander (we recommend Windows Commander). Then usage of rusconv in windows will be simpler. Anyway, you can run program from menu "Start". In this menu choose item "Run...", write full path to rusconv (better use button "Browse..."), add flags and files and press button "OK".
As any other UNIX utility, UNIX version is not verbose. It prints only short list of flags. To get more help try
$man rusconv
The best way is to use HTML documentation.
Converting file from one encoding to another.
Suppose, you work in windows, found old DOS program and wish to remember how to run it. But "Notepad" instead of documentation prints unreadable text. To read it you first should convert it from DOS encoding to windows one:
C:\GAMES\WARCRAFT>rusconv -alt +win read.me
** rusconv -- convertor of Russian codepages, v.3.11. ** (c)w_re -- Oleg A. Paraschenko http://beta.math.spbu.ru/~prof/w_re/ .\read.me -> .\read.win: ok. 1 file(s) converted.
After converting in current directory (usually it is directory from which you run rusconv, here - c:\games\warcraft) will be created file with the same name but with other extension. Extension shows what encoding is in a new file.
Here a list of extensions for encodings:
.alt - alternative encoding, uses in DOS .koi - KOI-8 encoding, uses in UNIX .lat - latinica, russian text spelled latin letters .mac - Macintosh encoding .win - Windows encoding
To specify from which encoding to convert use one of flags
-alt, -koi, -mac or -win.
Rusconv can't convert from latinica. To specify target encoding use some of flags
+alt, +koi, +lat, +mac or +win.
As any other UNIX utility, UNIX version of rusconv is not verbose. By default it prints only warnings and error messages. To print all messages use flag '-v'.
In next example will be created file 'test.file' which contains text "îÏÍÁÅÏÉÞ ÒÊÞÖÞ '-v'." (testing '-v' flag.). This file will be in KOI-8 encoding. For begin we convert this file to windows encoding without flag '-v'. File 'test.win' will be created but you don't get any message from rusconv. Then we converts file to latinica using flag '-v'. To finish example, we check content of file 'test.lat'.
$echo îÏÍÁÅÏÉÞ ÒÊÞÖÞ '-v'. >test.file
$rusconv -koi +win test.file
$rusconv -v -koi +lat test.file
** rusconv -- convertor of Russian codepages, v.3.11. ** (c)w_re -- Oleg A. Paraschenko http://beta.math.spbu.ru/~prof/w_re/ ./test.file -> ./test.lat: ok. 1 file(s) converted. $cat test.lat
Proverka flaga '-v'.
Changing type of lines.
Even if text written in latinica there is no guarantee that it will be possible to read it on any operating system. In DOS and Windows end of line is coded by two chars, in UNIX - by one.
Suppose, you work in DOS or windows and downloaded from Internet text file created in UNIX. Notepad shows this text as one long line with funny chars where should be line breaks. To convert text to normal view use command
C:\NEW>rusconv -cr2crlf readme.txt
** rusconv -- convertor of Russian codepages, v.3.11. ** (c)w_re -- Oleg A. Paraschenko http://beta.math.spbu.ru/~prof/w_re/ .\readme.txt -> .\readme.crlf: ok. 1 file(s) converted.
Result of converting will be in file with the same name and with extension .crlf (UNIX and windows) or .crl (DOS). In this example - in 'readme.crlf'.
If you are in UNIX then incorrect format leads to another problem. At the end of lines text editors print additional char, some programs could not be compiled. To solve this problem use flag '-crlf2cr':
$rusconv -crlf2cr -v files.bbs
** rusconv -- convertor of Russian codepages, v.3.11. ** (c)w_re -- Oleg A. Paraschenko http://beta.math.spbu.ru/~prof/w_re/ ./files.bbs -> ./files.cr: ok. 1 file(s) converted.
Abbreviations of flags for most often tasks.
Every operating system use own methos of creating russian files.
- Encoding - alternative (cp866).
- End of line - two chars.
DOS:
Windows:
UNIX:
For correct converting you should consider encoding and type of line ends. Here is minimal set of flags to do it:
From DOS to UNIX : -alt +koi -crlf2cr
From UNIX to DOS : -koi +alt -cr2crlf
From windows to UNIX : -win +koi -crlf2cr
From UNIX to windows : -koi +win -cr2crlf
From DOS to windows : -alt +win
From windows to DOS : -win +alt
Probably most often tasks is converting text from UNIX to DOS, from UNIX to windows and back. Converting between DOS and windows styles usually unnecessary - for windows texts you can use Notepad, for DOS texts use can you old DOS file managers like Norton Commander.
It is not good idea to type every time this sets of flags. So you can use abbreviations:
-dos2unix - the same as '-alt +koi -crlf2cr' -unix2dos - the same as '-koi +alt -cr2crlf' -win2unix - the same as '-win +koi -crlf2cr' -unix2win - the same as '-koi +win -cr2crlf'
This abbreviations are usefull but and they are long enough. So you can cut them:
-d2u - the same as '-dos2unix' -u2d - the same as '-unix2dos' -w2u - the same as '-win2unix' -u2w - the same as '-unix2win'
rusconv -w2u index.html
** rusconv -- convertor of Russian codepages, v.3.11. ** (c)w_re -- Oleg A. Paraschenko http://beta.math.spbu.ru/~prof/w_re/ .\index.html -> .\index.koi: ok. 1 file(s) converted.
File overwriting.
When converting, rusconv creates new files. But sometimes you don't need them or you wish only replace encoding in specified file. In this case use flag '-o'. Then rusconv for begin creates temporary file where results of recoding will be placed and then moves this temporary file on place of source. If any error occurs then source file will be unchanged and temporary file will be contain text converted before error.
Example:
D:\HTML>rusconv -o -w2u index.html
** rusconv -- convertor of Russian codepages, v.3.11. ** (c)w_re -- Oleg A. Paraschenko http://beta.math.spbu.ru/~prof/w_re/ .\index.html -> D:\HTML\rcA290.TMP -> .\index.html: ok. 1 file(s) converted.
Converting to several encodings and specifing own extensions for files.
Sometimes, especially when you create web site, file should be converted to several encodings. For example, you write HTML pages in DOS but your homepage is in windows and KOI encodings. You can run rusconv twice but better do so:
C:\HTML>rusconv -alt +koi +win index-pre.html ** rusconv -- convertor of Russian codepages, v.3.11. ** (c)w_re -- Oleg A. Paraschenko http://beta.math.spbu.ru/~prof/w_re/ .\index-pre.html -> .\index-pre.koi, .\index-pre.win: ok. 1 file(s) converted.
Results are placed in files with the same names and with default extensions:
.alt - for alternative encoding .koi - for KOI-8 encoding .lat - for latinica .mac - for Macintosh encoding .win - for Windows encoding
Often default extensions are not convenient. Then you can define you own extension. To do this use commands:
-aext extension - for alternative encoding -kext extension - for KOI-8 encoding -lext extension - for latinica -mext extension - for Macintosh encoding -wext extension - for Windows encoding
For example, you typed russian alphabet in windows encoding and wish to know how it looks in all other encodings. More, you wish that results should be in text files. No problems:
E:\EX>dir
folder E:\EX . <FOLDER> 24.10.98 15:27 . .. <FOLDER> 24.10.98 15:27 .. ALPHABET TXT 66 02.10.98 13:15 alphabet.txt
E:\EX>rusconv -win +alt -aext alt.txt +koi -kext koi.txt +lat -lext lat.txt +mac -mext mac.txt +win -wext win.txt alphabet.txt
** rusconv -- convertor of Russian codepages, v.3.11. ** (c)w_re -- Oleg A. Paraschenko http://beta.math.spbu.ru/~prof/w_re/ .\alphabet.txt -> .\alphabet.alt.txt, .\alphabet.koi.txt, .\alphabet.mac.txt, .\alphabet.lat.txt, .\alphabet.win.txt: ok. 1 file(s) converted.
E:\EX>dir
folder E:\EX . <FOLDER> 24.10.98 15:27 . .. <FOLDER> 24.10.98 15:27 .. ALPHAB~1 TXT 66 24.10.98 16:24 alphabet.alt.txt ALPHAB~2 TXT 66 24.10.98 16:24 alphabet.koi.txt ALPHAB~3 TXT 66 24.10.98 16:24 alphabet.mac.txt ALPHAB~4 TXT 82 24.10.98 16:24 alphabet.lat.txt ALPHAB~5 TXT 66 24.10.98 16:24 alphabet.win.txt ALPHABET TXT 66 02.10.98 13:15 alphabet.txt
If you wish change extension you can use one of commands 'aext', 'kext', 'lext', 'mext' or 'wext'. But if you converts to only one encoding then it is more better to use command
-ext extension
Depending on target encoding this command is interpreted as one of commands '-aext extension', '-kext extension', '-lext extension', '-mext extension' or '-wext extension'.
Using command '-ext' you can also redefine default extensions '.cr' and '.crlf' when you change only type of end of lines:
E:\EX>rusconv -cr2crlf unixtext
** rusconv -- convertor of Russian codepages, v.3.11. ** (c)w_re -- Oleg A. Paraschenko http://beta.math.spbu.ru/~prof/w_re/ .\unixtext -> .\unixtext.crlf: ok. 1 file(s) converted.
E:\EX>rusconv -cr2crlf -ext txt unixtext
** rusconv -- convertor of Russian codepages, v.3.11. ** (c)w_re -- Oleg A. Paraschenko http://beta.math.spbu.ru/~prof/w_re/ .\unixtext -> .\unixtext.txt: ok. 1 file(s) converted.
Converting of several files simultaniously.
When you wish convert a group of files from one encoding to another it is convenient to convert all group simultaniously. To do this only write all file names after flags. You can use metachars - rusconv will find all appropriate files. In UNIX version use metachars with caution, see "Specifing output directory" for more information.
C:\HTML>rusconv -alt +koi +win -kext koi.html -wext win.html *.txt
** rusconv -- convertor of Russian codepages, v.3.11. ** (c)w_re -- Oleg A. Paraschenko http://beta.math.spbu.ru/~prof/w_re/ .\rusconv.txt -> .\rusconv.koi.html, .\rusconv.win.html: ok. .\readme.txt -> .\readme.koi.html, .\readme.win.html: ok. .\index.txt -> .\index.koi.html, .\index.win.html: ok. 3 file(s) converted.
Specifing output directory.
By default files with results are created in current directory. Usually it is directory from which you run rusconv. But if last argument is directory name then files will be created in this directory.
In UNIX use metachars with caution. Here interpretating of metachars is the work of operating system and program get ready list of arguments. There is no any guarantee that last argument is not a directory. So do not forget to specify output directory:
Content of current directory: $ls -l
-rwxr-xr-x 1 w_re w_re 21394 Oct 25 02:27 file1.html -rwxr-xr-x 1 w_re w_re 21394 Oct 25 02:27 file2.html drwxr-xr-x 2 w_re w_re 1024 Oct 25 02:27 res
May be error: $rusconv -v -w2u *
// After interpetating: // rusconv -v -w2u file1.html file2.html res ** rusconv -- convertor of Russian codepages, v.3.11. ** (c)w_re -- Oleg A. Paraschenko http://beta.math.spbu.ru/~prof/w_re/ ./file1.html -> res/file1.koi: ok. ./file2.html -> res/file2.koi: ok. 2 file(s) converted.
To create files in current directory: $rusconv -v -w2u * .
// After interpetating: // rusconv -v -w2u file1.html file2.html res . ** rusconv -- convertor of Russian codepages, v.3.11. ** (c)w_re -- Oleg A. Paraschenko http://beta.math.spbu.ru/~prof/w_re/ warning: 'res' is a directory, skipping. ./file1.html -> ./file1.koi: ok. ./file2.html -> ./file2.koi: ok. 2 file(s) converted.
Using long file names and network files.
Version 3.0 of rusconv was released for DOS and UNIX. To use long file names version 3.11 has release for windows. Because of use of new operating system functions this version can't be run on computer without Windows 95/98. But now you can use long file names and network files.
To convert file with space in its name use quotes:
C:\HTML>rusconv -win +alt "long file name.txt"
** rusconv -- convertor of Russian codepages, v.3.11. ** (c)w_re -- Oleg A. Paraschenko http://beta.math.spbu.ru/~prof/w_re/ .\long file name.txt -> .\long file name.alt: ok. 1 file(s) converted.
Most of file managers for windows like Norton Commander if you press keys Ctrl+Enter add file name to command line. If file name contains spaces then they automatically surround it by qoutes. If it is not so then change your file manager. We recommend Windows Commander.
Working in local windows network you can (if you have rights) convert files on other computers without drive mapping. To do it use universal file names (\\server\\resource\file):
rusconv -w2u -ext html \\comp\c\html\*.html "\\comp\c\html\koi version"
** rusconv -- convertor of Russian codepages, v.3.11. ** (c)w_re -- Oleg A. Paraschenko http://beta.math.spbu.ru/~prof/w_re/ \\comp\c\html\tutorial.html -> \\comp\c\html\koi version\tutorial.html: ok. \\comp\c\html\index.html -> \\comp\c\html\koi version\index.html: ok. \\comp\c\html\errors.html -> \\comp\c\html\koi version\errors.html: ok. 3 file(s) converted.
Other flags.
Now it is time to say about flags '--', '-s', '-v', '-close' and '-noclose'. They are usually used in command scripts.
-- end of flags
Rusconv scans command line from left to right. First argument which is not a flag starts a list of file. Rusconv consider that flag is argument which first char is '-' or '+' (or '/' in DOS and windows versions). Sometimes you need to break flag parsing. To do this use chars '--'. All after them is a file list.
Suppose, file with name '-file.txt' should be converted from windows encoding to KOI-8 encoding:
Error: E:\EX>rusconv -win +koi -file.txt
** rusconv -- convertor of Russian codepages, v.3.11. ** (c)w_re -- Oleg A. Paraschenko http://beta.math.spbu.ru/~prof/w_re/ error: unrecognized flag '-file.txt'. try 'rusconv -h' or read the manual for help.
Success: rusconv -win +koi -- -file.txt
** rusconv -- convertor of Russian codepages, v.3.11. ** (c)w_re -- Oleg A. Paraschenko http://beta.math.spbu.ru/~prof/w_re/ .\-file.txt -> .\-file.koi: ok. 1 file(s) converted.
-s silent mode, no any message will be printed -v verbose mode, all messages will be printed
Flag '-s' suppresses printing of messages. Contrary, flag '-v' causes talkative work. If you specify both flags '-s' and '-v' then error message will be printed. DOS and windows versions are talkative by default. UNIX version by default prints only warnings and error messages.
-close close rusconv's window after program finished -noclose do not close window
Flags '-close' and '-noclose' are used only in windows version. DOS and UNIX versions ignore them. Windows operating system runs rusconv in separate window which should be closed after program finished. To avoid this and to let user to see report rusconv after all files converted waits for key pressed ('-noclose', by default). Behavior can be changed by flag '-close'. With this flag rusconv finishes after all files converted. If you specify both flags '-close' and '-noclose' then error message will be printed.
How to recognize file encoding.
To recognize file encoding use program whatrus. This program is distributed with rusconv.
C:\UTIL>whatrus \\comp\c\html\index.html
WIN detected.
You can't specify several file names. If you don't need message and wish only to get return code then use flag '-s'.
Windows operating system runs whatrus in a separate windows which should be closed after program finished. To keep window and to let user to see result whatrus after recognition waits for key pressed. To avoid this use flag '-s'.
Using rusconv in command scripts.
If you are going to use rusconv in command scripts then consider this advises.
- Write chars '--' after flags.
This chars breaks flag parsing and prevents error if you script get file name which can be interpreted as rusconv's flag. - Specify output directory.
Is is very impotant when you are working in UNIX because user can use metachars. In UNIX interpretating of metachars is the work of operating system and program get ready list of arguments. There is no any guarantee that last argument is not a directory. - In windows version of rusconv use flag '-close'.
To keep window on the desktop and to let user to see report rusconv after all files converted waits for key pressed. In command scripts you probably don't need such behaviour. So use flag '-close' and rusconv will finish immediately after converting. - In windows version of whatrus use flag '-s'.
In windows version of whatrus flag '-s' is the same as flag '-close' in rusconv. - Use return codes of rusconv and whatrus.
Usage of return codes of rusconv and whatrus makes you script more intelligent.
- Number of converted files.
- 255 -
error occured - 0 -
encoding not recognized - 11 -
alternative encoding - 12 -
KOI-8 encoding - 13 -
Windows encoding - 14 -
Macintosh encoding
rusconv:
whatrus:
Here is an example of command script. It get any file and convert them to file index.html in windows encoding.
windows version, makeindex.bat:
@ECHO OFF
REM Copy source file to file with name 'index'. ECHO COPY %1 index copy %1 index IF EXIST index GOTO TAKEENC ECHO copy failed EXIT
REM Guess encoding :TAKEENC ECHO WHATRUS -s %1 whatrus -s %1
REM Branching started from big numbers because REM 'IF ERRORLEVEL = N' is indeed REM 'IF ERRORLEVEL >= N'. IF ERRORLEVEL = 255 GOTO WRERR IF ERRORLEVEL = 14 GOTO MACENC IF ERRORLEVEL = 13 GOTO WINENC IF ERRORLEVEL = 12 GOTO KOIENC IF ERRORLEVEL = 11 GOTO ALTENC ECHO encoding not recognized EXIT
:WRERR ECHO whatrus failed EXIT
REM convert file 'index' to 'index.html'. :ALTENC ECHO RUSCONV -close -alt +win -ext html index rusconv -close -alt +win -ext html index EXIT :KOIENC ECHO RUSCONV -close -alt +win -ext html index rusconv -close -koi +win -ext html index EXIT :MACENC ECHO RUSCONV -close -alt +win -ext html index rusconv -close -mac +win -ext html index EXIT :WINENC ECHO RUSCONV -win -alt +win -ext html index rusconv -close -alt +win -ext html index EXIT
UNIX version for bash, makeindex.sh:
# Copy source file to file with name 'index'. rm -f index cp $1 index if [ ! -f index ] then echo copy failed exit fi
# Guess encoding and convert file to 'index.html' whatrus $1 case $? in 255) echo error executing whatrus;; 0) can''t detect encoding;; 11) rusconv -alt +win -ext html index;; 12) rusconv -koi +win -ext html index;; 13) rusconv -win +win -ext html index;; 14) rusconv -mac +win -ext html index;; esac
Have a nice work!
tutorial-e.html
Document created by Oleg A. Paraschenko
Last changes - 15 November 1998