Thursday, 14 June 2018

windows 7 - Converting ascii Russian to Russian?


I have a text document that is supposed to be written in Russian but it seems to instead be ascii:


Óñòàíîâêà:
1)Çàïóñêàåì QuidamStudioSetup3.15.exe
2)Ïðè çàïðîñå ñåðèéíîãî íîìåðà ââîäèì

How could I convert this to unicode Russian characters that are readable?



Answer



It is not "ASCII" nor "ASCII Russian".


Before Unicode became widespread, most computer systems used the ISO-8859 character encodings, of which there were 16, each for a different region (Central European, Cyrillic, Greek...). Windows had its own 'code pages', very similar but with extra glyphs in otherwise-unused ranges. All these character encodings are 8-bit and only differ in the second half (128-255).


The problem with these encodings is that it's next to impossible for a program to determine which encoding was used to save a file, unless it was specified explicitly (such as in HTML pages; however, plain text files have no such metadata tags). Read the Wikipedia article on Mojibake for a more detailed description.


In your example, the document was saved using Windows-1251 (Cyrillic), but your program reads it as if it were Windows-1252 (Western European), which has very different characters in the same positions. To the computer, it looks perfectly okay – it doesn't understand languages or scripts. (There are programs which do statistical analysis in order to determine the correct encoding, though – some web browsers have such a function.)


There are several ways you could convert such text to Unicode:




  • Use online tools such as this one or this one.




  • Use your web browser:




    1. Drag the .txt file into the browser.




    2. From View → Character Encoding (or Firefox → Web Developer → Character Encoding, or Wrench → Tools → Encoding), pick the correct original encoding: "Cyrillic (Windows-1251)" in your case.






  • Use the Notepad2 text editor:




    1. Open the file.




    2. From File → Encoding → Recode..., choose the right original encoding.






  • Use GNU iconv, with Windows binaries either from GnuWin32 or Gettext for Win32.


    iconv -f cp1251 -t utf-8 < myfile.txt > myfile.fixed.txt

    Windows Notepad will correctly read UTF-8 and UTF-16 encoded text.




No comments:

Post a Comment

Where does Skype save my contact&#39;s avatars in Linux?

I'm using Skype on Linux. Where can I find images cached by skype of my contact's avatars? Answer I wanted to get those Skype avat...