Windows-1251 character set

Michael Ju. Tokarev mjt at tls.msk.ru
Tue Mar 7 10:50:22 GMT 2000


Aargh!  You hit a more common problem with interoperablity between os
and with national characters.  The only accurate solution for this I know
is to define some standard (like ascii was) dealing with intl chars,
for example, unicode, and to setup _all_ programs (ftp, browsers, archivers etc)
so them use that standard 8-(...

But for your particular ussue.
There are some routines in charcnv.c (you find them already)
that initializes charset conversation tables.  This probably should be
made as in charset.c (using external codepage files), but this requires
some thinking (again, maybe unicode can help here).
But in current situation you should define new mapping table to
translate characters coming from client to unix-side one and back.
I shure you know mapping between 866 and 1251 (OemToAnsiBuff() and
AnsiToOemBuff() if your locale in windows is set up properly).
So defining such a table is easy enouth.
Look to update_map() in charcnv.c.  It expects a string with pairs of
characters. First character is unix-side (strictly speaking server-side)
and second is client-side.  Remember that windoze will expect 866 charset
in server side, so you should convert it to 1251 on unix, and back -- 1251
to 866.  For example, russian letter A have code 0xc0 in 1251 and 0x80 in
866. You first pair is here: 0xc0 <=> 0x80.
So, init_windows_1251() should look like this:

static void init_windows_1251(int codepage) {
  setupmaps();
  update_map("\0xc0\0x80\0xc1\0x81\0xc2\0x82.....
  and so on for all russian chars...
}

Note that you _should_ add "client codepage = 866" into your global section.
Or, better (?), add "if (codepage == 866) {" and "}" around update_maps calls...

And please sorry if this will not work -- maybe order should be opposite
(e.g. \0x80\0xc0 instead of \0xc0\0x80) :)

And do not forget to add call to your function to interpret_character_set(),
as this:
    ...
    } else if (strequal (str, "koi8-r")) {
        init_koi8_r();
    } else if (strequal (str, "windows-1251")) { /* official mime name */
        init_windows_1251(codepage);
    ...
and add "charset = windows_1251" to smb.conf.

Regards,
  Michael.

Alexander Javoronkov wrote:
> 
> > > I've got Win'98 with Russian (windows-1251) locale & samba-2.0.6 with
> > > "client code page = 866".
> > > I want to store russian filenames on my Samba server in windows-1251
> > > character set.
> > Do you _really_ need this?
> 
> I think I really need this.
> 
> > As Sergei Makarov already pointed, there is no problem with "default"
> > settings for russian in samba.  You will get koi8-r filenames in unix,
> > and this is also pretty standard on unix systems, there are already
> > exists such things as locales etc on unix to support koi8 charset.
> 
> Right now I've got no problems storing cp866 filenames on ext2fs.
> 
> > If you will store your files in 1251 charset, you will need to setup
> > at least locales on unix, and find and load 1251 font to view filenames
> > on unix.
> 
> My box is located in such a gloomy and dark place where no man has been ;) I
> don't need to get to console 'cause I'm working under Win/ssh with 1251
> charset.
> 
> > Just one example where it _is_ matter what charset used in
> > unix (from windows's view) is if you want to create archives (e.g. .zip)
> > on unix and extract/view them on windows or vice versa.  But in that case,
> > you should use cp866, not 1251 codepage on unix side, as windows do
> > on it's filesystem ("oem" charset).  Or, alternatively, use patched
> > archivers for this, but again, in this case it is irrelevant -- what
> > codepage uses unix...
> 
> Ok, I'll try to explain my problem. Right now I'm storing russian filenames in
> cp866 on Linux. I've got big .mp3 archive with russian filenames - btw, check
> it out - ftp://ftp.vm.ru/pub/music/russian/
> Since they're in cp866, the only way to access it is via old-style ftp.exe
> from Windows/dos. My LAN clients are accessing archive just fine via Samba -
> no problems here.
> My goal is: Windows clients using CuteFTP, Netscape and stuff should access
> those files and store them named properly. I've tried to make it through
> Russian Apache recoding mechanism - http://www.vm.ru/music/ - now everyone can
> see russian filenames, but: 1) they're accessed via http - I don't wanna; and
> 2) when user clicks with the right button/save as - dialog appears with
> escaped russian characters - again, I don't wanna. I want to get my FTP
> accessible with windows-1251 filenames.


More information about the samba-technical mailing list