Changes

Jump to: navigation, search

Portable Filenames

9,916 bytes removed, 02:33, 8 October 2011
empty page redirect to where mentioned
In order to be able to move our media files from one computer to another it is critical that the names of our files can be understood by the different file systems and encodings they meet. To find a set of characters which can meet all these criteria this article is originally based on content from Wikipedia Online Encyclopedia, especially the articles [http://en.wikipedia.org/wiki/Filename Filenames], [http://en.wikipedia.org/wiki/Comparison_of_file_systems Comparison of file systems] and [http://en.wikipedia.org/wiki/Ascii ASCII character encoding]. Please add other references to improve this article. == File system issues ==For genealogy purposes you will need to decide how many different situations you want your files to be usable in. To see what file systems you have on your computer !!NEEDS TO BE WRITTEN!! If you sometimes use a USB key you should remember that they typically use the FAT32 file system, which does not accept the same file names as (for example) Ubuntu Linux. This page assumes you want to support the situations listed below and don't use more than 255 characters in the file names. * Windows accessing hard drives with the** NTFS file system** FAT32 file system* Linux accessing hard drives with the** NTFS file system** FAT32 file system (VFAT)** EXT2 and EXT3 file systems It should be noted that FAT12 and FAT16 are [http://en.wikipedia.org/wiki/8.3_filename 8.3 filenames], limiting the useful file name length to 8 characters. == Reserved characters and words ==(Copied directly from [http://en.wikipedia.org/wiki/Filename Filename] 20th July 2008, 16:14 CET) Many operating systems prohibit control characters from appearing in file names. For example, DOS and early Windows systems require files to follow the 8.3 filename convention. Unix-like systems are an exception, as the only control character forbidden in file names is the null character, as that's the end-of-string indicator in C. Trivially, Unix also excludes the path separator / from appearing in filenames. Some operating systems prohibit some particular characters from appearing in file names: {| border="!"|-! Character! Name! Reason|-| /| slash| used as a path name component separator in Unix-like, Windows, and Amiga systems. (The MS-DOS command.com shell would consume it as a switch character, but Windows itself //always// accepts it as a separator [http://www.thescripts.com/forum/thread23123.html])|-| \| backslash| Also used as a path name component separator in MS-DOS, OS/2 and Windows (there is no difference between slash and backslash); allowed in Unix filenames, see Note 1|-| ?| question mark| used as a wildcard in Unix, Windows and AmigaOS; marks a single character. Allowed in Unix filenames, see Note 1|-| %| percent sign| used as a wildcard in RT-11; marks a single character.|-| *| asterisk| used as a wildcard in Unix, MS-DOS, RT-11, VMS and Windows. Marks any sequence of characters (Unix, Windows, later versions of MS-DOS) or any sequence of characters in either the basename or extension (thus "*.*" in early versions of MS-DOS means "all files". Allowed in Unix filenames, see note 1|-| :| colon| used to determine the mount point / drive on Windows; used to determine the virtual device or physical device such as a drive on AmigaOS, RT-11 and VMS; used as a pathname separator in classic Mac OS. Doubled after a name on VMS, indicates the DECnet nodename (equivalent to a NetBIOS (Windows networking) hostname preceded by "\\".)|-| <nowiki>|</nowiki>| vertical bar| designates software pipelining in Unix and Windows; allowed in Unix filenames, see Note 1|-| "| quotation mark| used to mark beginning and end of filenames containing spaces in Windows, see Note 1<!-- allowed in Unix -->|-| <| less than | used to redirect input, allowed in Unix filenames, see Note 1|-| >| greater than | used to #redirect output, allowed in Unix filenames, see Note 1|- | .| full stop/period| allowed but the last occurrence will be interpreted to be the extension separator in VMS, MS-DOS and Windows. In other OSes, usually considered as part of the filename, and more than one full stop may be allowed.|} Note 1: Most Unix shells require certain characters such as spaces, <, >, |, \, and sometimes :, (, ), &, ;, as well as wildcards such as ? and *, to be quoted or escaped:<blockquote>five\ and\ six\<seven (example of escaping)<br />'five and six<seven' or "five and six<seven" (examples of quoting)</blockquote> In Windows the space and the period are not allowed as the final character of a filename. The period is allowed as the first character, but certain Windows applications, such as Windows Explorer, forbid creating or renaming such files (despite this convention being used in Unix-like systems to describe hidden files and directories). Among workarounds are using different explorer applications or saving a file from an application with the desired name. Some file systems on a given operating system (especially file systems originally implemented on other operating systems), and particular applications on that operating system, may apply further restrictions and interpretations. See comparison of file systems for more details on restrictions imposed by particular file systems. In Unix-like systems, MS-DOS, and Windows, the file names "." and ".." have special meanings (current and parent directory respectively).  In addition, in Windows and DOS, some words might also be reserved and can not be used as filenames.<ref name="win"/> For example, DOS Device file: CON, PRN, AUX, CLOCK$, NUL COM0, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9 LPT0, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9.Operating systems that have these restrictions cause incompatibilities with some other filesystems. For example, Windows will fail to handle, or raise error reports for, these legal UNIX filenames: aux.c, q"uote"s.txt, or NUL.txt. If you put your files onto any other file system you risk losing part of the files name, capitalisation of the name, characters not accepted in a file name. All these problems can make it hard to find and recover your files. == Encoding Issues ==Please read the Wikipedia article Many of us use computers with Unicode as the default character encoding, which supports about 100,000 characters<ref>[http://en.wikipedia.org/wiki/Unicode ref]</ref>. It is now standard for Linux systems{{fact}}. But many systems are still in use which by default use ASCII encoding which supports only 128 characters<ref>[http://en.wikipedia.org/wiki/Ascii ref]</ref> not all of which can be used in texts. The first clear choice then is to use characters which are in the ASCII character set. If we use Unicode characters we can easily use something which won't be understood by an operating system using ASCII. For example Åse (a girl's name in Danish) simply cannot be represented as an ASCII character.  === Safe characters === Keeping our sights firmly on the [http://en.wikipedia.org/wiki/Ascii ASCII character set], specifically the ASCII printable characters (and skipping the control characters), we get the list below {| border="1"|-! Glyph! Name! Safe character?| Remarks|-| | space| {{yes}}| Not reserved|-| !| exclamation mark| {{yes}}| Not reserved|-| "| quotation mark| {{no}}| used to mark beginning and end of Meaningful filenames containing spaces in Windows|-| #| number sign| {{yes}}| Not reserved|-| $| dollar sign| {{yes}}| Not reserved. Used to start variables in many programming languages|-| %| percent sign| {{no}}| used as a wildcard in RT-11; marks a single character|-| &| Ampersand| {{no}}| 010 0111 047 39 27 '010 1000 050 40 28 (010 1001 051 41 29 )010 1010 052 42 2A *010 1011 053 43 2B +010 1100 054 44 2C ,010 1101 055 45 2D -010 1110 056 46 2E .010 1111 057 47 2F /011 0000 060 48 30 0011 0001 061 49 31 1011 0010 062 50 32 2011 0011 063 51 33 3011 0100 064 52 34 4011 0101 065 53 35 5011 0110 066 54 36 6011 0111 067 55 37 7011 1000 070 56 38 8011 1001 071 57 39 9011 1010 072 58 3A :011 1011 073 59 3B ;011 1100 074 60 3C <011 1101 075 61 3D =011 1110 076 62 3E >011 1111 077 63 3F ?100 0000 100 64 40 @100 0001 101 65 41 A100 0010 102 66 42 B100 0011 103 67 43 C100 0100 104 68 44 D100 0101 105 69 45 E100 0110 106 70 46 F100 0111 107 71 47 G100 1000 110 72 48 H100 1001 111 73 49 I100 1010 112 74 4A J100 1011 113 75 4B K100 1100 114 76 4C L100 1101 115 77 4D M100 1110 116 78 4E N100 1111 117 79 4F O101 0000 120 80 50 P101 0001 121 81 51 Q101 0010 122 82 52 R101 0011 123 83 53 S101 0100 124 84 54 T101 0101 125 85 55 U101 0110 126 86 56 V101 0111 127 87 57 W101 1000 130 88 58 X101 1001 131 89 59 Y101 1010 132 90 5A Z101 1011 133 91 5B [101 1100 134 92 5C \101 1101 135 93 5D ]101 1110 136 94 5E ^101 1111 137 95 5F _110 0000 140 96 60 `110 0001 141 97 61 a110 0010 142 98 62 b110 0011 143 99 63 c110 0100 144 100 64 d110 0101 145 101 65 e110 0110 146 102 66 f110 0111 147 103 67 g110 1000 150 104 68 h110 1001 151 105 69 i110 1010 152 106 6A j110 1011 153 107 6B k110 1100 154 108 6C l110 1101 155 109 6D m110 1110 156 110 6E n110 1111 157 111 6F o111 0000 160 112 70 p111 0001 161 113 71 q111 0010 162 114 72 r111 0011 163 115 73 s111 0100 164 116 74 t111 0101 165 117 75 u111 0110 166 118 76 v111 0111 167 119 77 w111 1000 170 120 78 x111 1001 171 121 79 y111 1010 172 122 7A z111 1011 173 123 7B {111 1100 174 124 7C |111 1101 175 125 7D }111 1110 176 126 7E ~ [edit]

Navigation menu