View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000625 | Subversion for OS/2 & eCS | Bug | public | 2014-09-25 09:18 | 2021-10-19 21:01 |
Reporter | dmik | Assigned To | |||
Priority | normal | Severity | major | Reproducibility | always |
Status | new | Resolution | open | ||
Summary | 0000625: Unicode support | ||||
Description | It seems that both subversion 1.6 and subversion 1.7 have problems supporting the characters outside the range of the current 8-bit OS/2 character set. For instance, there is a repository http://svn.netlabs.org/repos/java/branches/vendor/sourceforge/icedtea-web which contains a few files with the characters from the extended Latin set (the ones with diacritics) in their names. Since I have the Russian locale with the CP866 codepage, these characters are missing from my codepage and here the problems start. I need to attach screenshots to the ticket and since Mantis doesn't let you attach more than one screenshot at a time, I will put the steps to reproduce in separate comments. | ||||
Tags | No tags attached. | ||||
Attached Files | |||||
|
With subversion 1.6, checkout per se works (meaning it doesn't abort) but any further operation with the affected files (update, change & commit, etc) fails. After doing `svn co http://svn.netlabs.org/repos/java/branches/vendor/sourceforge/icedtea-web/current@430` followed by `svn stat` I get what you see on svn16.png. I.e. for the file named `encodingTests?Š??ŽÝÁÍÉ?É??ÝÚ?ÍÓÁŠ?Ž??` I get latin letters corresponding to diacritic ones followed by a character with code 0x7F (the "house" mark on the screen shot). You may also see that svn doesn't recognize these files well after the checkout: they are marked as BOTH missing and as new (untracked). |
|
Apparently, Mantis itself has big problems with unicode. The `encodingTests` file should look like on the screenshot `svn_proper.png`. |
|
With subversion 1.7, after doing the same checkout as above, the picture is a bit different. See `svn17.png`. In particular, the names of the "missing" and "new" files don't match — in the file system I see the "new" files to be created. I.e. svn 1.6 creates diacritics with the "house" marks in the file system (and they still look "not the same" to the ones it stores in the index when you do stat though "house" is used there as well), while svn 1.7 uses "house" marks in the index and various graphical chars in the file system. |
|
I see only one solution to this problem: svn should refuse to check out repositories with file names which it can't create in the local character set. Any other solution is unaccepted since it would be to dangerous and error prone. It is easy to detect a failure in character set conversion, so it shouldn't be too much work to implement this. |
|
Seems this describes a similar situation on MacOS from SVN 1.5.x timeframe.. http://subversion.tigris.org/issues/show_bug.cgi?id=2464 |
|
JFYI, there is also a somewhat similar problem in git: http://stackoverflow.com/questions/5581857/git-and-the-umlaut-problem-on-mac-os-x. BTW, there is another solution besides refusing to work with such repos at all: we may escape non-representable chars using printable-ASCII using one of the known algorithms (e.g. like they do when passing URLs between systems on the Internet). This would let fully support all file operations (adding, renaming, deletion) on any Unicode character within any 8-bit codepage. The only drawback is that the user will see not what it actually is for characters that are not in their code page. For URLs they do what is called "percent-encoding": http://en.wikipedia.org/wiki/Percent-encoding. We may just use that algorithm. It may operate on UTF-8 strings (stored by the SVN/GIT server and transferred on the wire if I get it right) so there is not much to implement. Only one mangle/demangle function. We should find a right place for it though. And the existing MacOS solution may help with that. BTW, we should not percent-encode characters that CAN be represented using the current 8-bit code page — to keep user-native characters look native. This makes the working copy non portable (you won't be able to zip it and transfer to another machine with a different code page) but this doesn't make any worse since SVN working copies are already non-portable on OS/2 due to the nature of 8-bit code pages. And in either case non-portability is not a big issue at all. Nobody should normally do that. |
|
This issue appears to still exist in 1.14 svn co https://svn.code.sf.net/p/md5deep/code/trunk . svn st -q reports >svn st -q ~ sample-hashes/hashkeeper/sample-hashkeeper ~ tests/testfiles/symlinktest/dir1/badlink.txt ~ tests/testfiles/symlinktest/dir1/dir1 ~ tests/testfiles/symlinktest/dir1/dir2/dir1 ~ tests/testfiles/symlinktest/dir1/dir2/dir2 ~ tests/testfiles/symlinktest/dir1/dir2/dir3/dir1 ~ tests/testfiles/symlinktest/dir1/dir2/dir3/dir2 ~ tests/testfiles/symlinktest/dir1/dir2/dir3/dir3 ~ tests/testfiles/symlinktest/dir1/dir3 ! tests/testfiles/unicode_circled_bullet_.txt ! tests/testfiles/unicode_snowman_.txt The obstruction is probably an issue with symlinked directories. |
Date Modified | Username | Field | Change |
---|---|---|---|
2014-09-25 09:18 | dmik | New Issue | |
2014-09-25 09:18 | dmik | File Added: svn16.png | |
2014-09-25 09:22 | dmik | Note Added: 0002837 | |
2014-09-25 09:24 | dmik | Note Added: 0002838 | |
2014-09-25 09:24 | dmik | File Added: svn_proper.png | |
2014-09-25 09:28 | dmik | File Added: svn17.png | |
2014-09-25 09:34 | dmik | Note Added: 0002839 | |
2014-09-25 09:36 | dmik | Note Added: 0002840 | |
2014-10-03 05:23 | psmedley | Note Added: 0002843 | |
2014-10-03 09:42 | dmik | Note Added: 0002844 | |
2014-10-03 10:19 | dmik | Note Edited: 0002844 | |
2014-10-03 10:22 | dmik | Note Edited: 0002844 | |
2021-10-19 21:01 | Steven Levine | Note Added: 0003899 |