⮨ Aide

Unicode filenames in ZIP format

UTF-8 is a standard character-encoding method for storing Unicode which is developed to display various languages.

Since Unicode and UTF-8 were developed after the 1990s, an initial version of ZIP format (made in the 1980s) didn’t support UTF-8. As ZIP format became the standard archive format and it was required to support Unicode, however, several ways have been introduced to process UTF-8 strings on ZIP files.

Bandizip supports two of them; one way is to convert the filenames to UTF-8, and the other is to store additional UTF-8 filenames in an extra header field while storing the original ones in MBCS.



Use Unicode filenames in Zip files (UTF-8)

Bandizip stores filenames in ZIP files with converting them to UTF-8. It is a standard filename storage method defined by APPNOTE, yet some archivers occasionally fail to recognize the ZIP files or mishandle them causing filenames broken. APPNOTE

Store Unicode filenames in an extra header field of Zip files (UTF-8)

Bandizip stores additional UTF-8 filenames in an extra header field of ZIP format, while storing the original ones in MBCS. This method is also defined by APPNOTE as “Info-ZIP Unicode Path Extra Field.” Because it uses the extra field to store the UTF-8 filenames, the file sizes would be tens of bytes larger than the former ones. However the original filenames are stored in MBCS, and therefore the files are safer and more compatible.

As most archivers (such as 7zip, Winrar, and Winzip) supports this feature, it prevents your filenames from being broken in OS with a different system language.

The picture below shows the difference between using the feature and not using it when sending a ZIP file compressed on Korean OS to Japanese OS.



Use Unicode filenames in tar/tgz files (UTF-8)



TAR and TGZ formats are archive formats mainly used on Unix (which uses the UTF-8 filenames). This feature makes you extract TAR/TGZ files on Unix without filename issues.

NOTE: Some applications for Windows may fail to recognize UTF-8 code page of TAR/TGZ formats properly.