Difference between revisions of "Portable Filenames"

From Gramps
Jump to: navigation, search
m (extended a bit)
m (interim save)
Line 17: Line 17:
 
It should be noted that FAT12 and FAT16 are [http://en.wikipedia.org/wiki/8.3_filename 8.3 filenames], limiting the useful file name length to 8 characters.
 
It should be noted that FAT12 and FAT16 are [http://en.wikipedia.org/wiki/8.3_filename 8.3 filenames], limiting the useful file name length to 8 characters.
  
If you put your files onto any other file system you risk losing part of the files name, capitalisation of the name, characters not accepted in a file name. All these problems can make it hard to find and recover your files.
+
== Reserved characters and words ==
 +
(Copied directly from [http://en.wikipedia.org/wiki/Filename Filename])
 +
 
 +
Many operating systems prohibit control characters from appearing in file names. For example, DOS and early Windows systems require files to follow the [[8.3 filename]] convention. Unix-like systems are an exception, as the only control character forbidden in file names is the null character, as that's the end-of-string indicator in C. Trivially, Unix also excludes the path separator / from appearing in filenames.
  
!! UNFINISHED !!
+
Some operating systems prohibit some particular characters from appearing in file names:
  
{|
+
{| class="wikitable"
! File system
+
|-
! Alphabetic Case Sensitivity
+
! Character
! Allowed
+
! Name
! Not Allowed
+
! Reason
 
|-
 
|-
|  
+
| /
|
+
| [[slash (punctuation)|slash]]
 +
| used as a path name component separator in Unix-like, Windows, and Amiga systems. (The MS-DOS command.com shell would consume it as a switch character, but Windows itself //always// accepts it as a separator [http://www.thescripts.com/forum/thread23123.html])
 +
|-
 +
| \
 +
| [[backslash]]
 +
| Also used as a path name component separator in MS-DOS, OS/2 and Windows (there is no difference between slash and backslash); allowed in Unix filenames, see Note 1
 +
|-
 +
| ?
 +
| [[question mark]]
 +
| used as a wildcard in Unix, Windows and [[AmigaOS]]; marks a single character.  Allowed in Unix filenames, see Note 1
 +
|-
 +
| %
 +
| [[percent sign]]
 +
| used as a wildcard in [[RT-11]]; marks a single character.
 +
|-
 +
| *
 +
| [[asterisk]]
 +
| used as a wildcard in Unix, MS-DOS, RT-11, VMS and Windows. Marks any sequence of characters (Unix, Windows, later versions of MS-DOS) or any sequence of characters in either the basename or extension (thus "*.*" in early versions of MS-DOS means "all files". Allowed in Unix filenames, see note 1
 +
|-
 +
| :
 +
| [[colon (punctuation)|colon]]
 +
| used to determine the mount point / drive on Windows; used to determine the virtual device or physical device such as a drive on AmigaOS, [[RT-11]] and VMS; used as a pathname separator in classic [[Mac OS]]. Doubled after a name on VMS, indicates the DECnet nodename (equivalent to a NetBIOS (Windows networking) hostname preceded by "\\".)
 +
|-
 +
| <nowiki>|</nowiki>
 +
| [[vertical bar]]
 +
| designates [[Pipeline (Unix)|software pipelining]] in Unix and Windows; allowed in Unix filenames, see Note 1<!-- maybe also MSDOS, but definitely allowed in Unix filenames -->
 +
|-
 +
| "
 +
| [[quotation mark]]
 +
| used to mark beginning and end of filenames containing spaces in Windows, see Note 1<!-- allowed in Unix -->
 +
|-
 +
| <
 +
| [[Inequality|less than]]
 +
| used to [[Redirection (Unix)|redirect input]], allowed in Unix filenames, see Note 1
 +
|-
 +
| >
 +
| [[Inequality|greater than]]
 +
| used to [[Redirection (Unix)|redirect output]], allowed in Unix filenames, see Note 1
 +
|-
 +
| .
 +
| [[full stop|period]]
 +
| allowed but the last occurrence will be interpreted to be the extension separator in VMS, MS-DOS and Windows. In other OSes, usually considered as part of the filename, and more than one full stop may be allowed.
 +
|}
 +
 
 +
Note 1: Most [[Unix shell]]s require certain characters such as spaces, <, >, |, \, and sometimes :, (, ), &, ;,  as well as wildcards such as ? and *, to be quoted or escaped:
 +
<blockquote>five\ and\ six\<seven (example of escaping)<br />'five and six<seven' or "five and six<seven" (examples of quoting)</blockquote>
 +
 
 +
In Windows the space and the period are not allowed as the final character of a filename. The period is allowed as the first character, but certain Windows applications, such as [[Windows Explorer]], forbid creating or renaming such files (despite this convention being used in Unix-like systems to describe [[hidden file]]s and directories). Among workarounds are using different explorer applications or saving a file from an application with the desired name.<ref name="win">[http://msdn2.microsoft.com/en-us/library/aa365247.aspx Naming a file] ''msdn.microsoft.com'' (MSDN), filename restrictions on Windows</ref>
 +
 
 +
Some file systems on a given operating system (especially file systems originally implemented on other operating systems), and particular applications on that operating system, may apply further restrictions and interpretations.  See [[comparison of file systems]] for more details on restrictions imposed by particular file systems.
 +
 
 +
In Unix-like systems, MS-DOS, and Windows, the file names "." and ".." have special meanings (current and parent directory respectively).
  
\?*<":>+[]/ control characters
+
In addition, in Windows and DOS, some words might also be reserved and can not be used as filenames.<ref name="win"/> For example, [[DOS]] [[Device file]]:
 +
CON, PRN, AUX, CLOCK$, NUL
 +
COM0, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9
 +
LPT0, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9.
 +
Operating systems that have these restrictions cause incompatibilities with some other filesystems. For example, Windows will fail to handle, or raise error reports for, these legal UNIX filenames: aux.c, q"uote"s.txt, or NUL.txt.<!-- Windows XP filesystem has no problem with file.aux -->
 +
 
 +
If you put your files onto any other file system you risk losing part of the files name, capitalisation of the name, characters not accepted in a file name. All these problems can make it hard to find and recover your files.
 +
 
 +
!! UNFINISHED !!

Revision as of 14:13, 20 July 2008

In order to be able to move our media files from one computer to another it is import that the names of our files can be understood by the different operating systems, file systems and encoding they meet.

The Wikipedia Online Encyclopedia article Filename has a good explanation of the relevant issues. This page is basically an extract from the Wikipedia article.

For genealogy purposes you will need to decide how many different situations you want your files to be usable in. To see what file systems you have on your computer !!NEEDS TO BE WRITTEN!! If you sometimes use a USB key you should remember that they typically use the FAT32 file system, which does not accept the same file names as (for example) Ubuntu Linux.

This page assumes you want to support the situations listed below and don't use more than 255 characters in the file names.

  • Windows accessing hard drives with the
    • NTFS file system
    • FAT32 file system
  • Linux accessing hard drives with the
    • NTFS file system
    • FAT32 file system (VFAT)
    • EXT2 and EXT3 file systems

It should be noted that FAT12 and FAT16 are 8.3 filenames, limiting the useful file name length to 8 characters.

Reserved characters and words

(Copied directly from Filename)

Many operating systems prohibit control characters from appearing in file names. For example, DOS and early Windows systems require files to follow the 8.3 filename convention. Unix-like systems are an exception, as the only control character forbidden in file names is the null character, as that's the end-of-string indicator in C. Trivially, Unix also excludes the path separator / from appearing in filenames.

Some operating systems prohibit some particular characters from appearing in file names:

Character Name Reason
/ slash used as a path name component separator in Unix-like, Windows, and Amiga systems. (The MS-DOS command.com shell would consume it as a switch character, but Windows itself //always// accepts it as a separator [1])
\ backslash Also used as a path name component separator in MS-DOS, OS/2 and Windows (there is no difference between slash and backslash); allowed in Unix filenames, see Note 1
? question mark used as a wildcard in Unix, Windows and AmigaOS; marks a single character. Allowed in Unix filenames, see Note 1
% percent sign used as a wildcard in RT-11; marks a single character.
* asterisk used as a wildcard in Unix, MS-DOS, RT-11, VMS and Windows. Marks any sequence of characters (Unix, Windows, later versions of MS-DOS) or any sequence of characters in either the basename or extension (thus "*.*" in early versions of MS-DOS means "all files". Allowed in Unix filenames, see note 1
: colon used to determine the mount point / drive on Windows; used to determine the virtual device or physical device such as a drive on AmigaOS, RT-11 and VMS; used as a pathname separator in classic Mac OS. Doubled after a name on VMS, indicates the DECnet nodename (equivalent to a NetBIOS (Windows networking) hostname preceded by "\\".)
| vertical bar designates software pipelining in Unix and Windows; allowed in Unix filenames, see Note 1
" quotation mark used to mark beginning and end of filenames containing spaces in Windows, see Note 1
< less than used to redirect input, allowed in Unix filenames, see Note 1
> greater than used to redirect output, allowed in Unix filenames, see Note 1
. period allowed but the last occurrence will be interpreted to be the extension separator in VMS, MS-DOS and Windows. In other OSes, usually considered as part of the filename, and more than one full stop may be allowed.

Note 1: Most Unix shells require certain characters such as spaces, <, >, |, \, and sometimes :, (, ), &, ;, as well as wildcards such as ? and *, to be quoted or escaped:

five\ and\ six\<seven (example of escaping)
'five and six<seven' or "five and six<seven" (examples of quoting)

In Windows the space and the period are not allowed as the final character of a filename. The period is allowed as the first character, but certain Windows applications, such as Windows Explorer, forbid creating or renaming such files (despite this convention being used in Unix-like systems to describe hidden files and directories). Among workarounds are using different explorer applications or saving a file from an application with the desired name.<ref name="win">Naming a file msdn.microsoft.com (MSDN), filename restrictions on Windows</ref>

Some file systems on a given operating system (especially file systems originally implemented on other operating systems), and particular applications on that operating system, may apply further restrictions and interpretations. See comparison of file systems for more details on restrictions imposed by particular file systems.

In Unix-like systems, MS-DOS, and Windows, the file names "." and ".." have special meanings (current and parent directory respectively).

In addition, in Windows and DOS, some words might also be reserved and can not be used as filenames.<ref name="win"/> For example, DOS Device file:

CON, PRN, AUX, CLOCK$, NUL
COM0, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9
LPT0, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9.

Operating systems that have these restrictions cause incompatibilities with some other filesystems. For example, Windows will fail to handle, or raise error reports for, these legal UNIX filenames: aux.c, q"uote"s.txt, or NUL.txt.

If you put your files onto any other file system you risk losing part of the files name, capitalisation of the name, characters not accepted in a file name. All these problems can make it hard to find and recover your files.

!! UNFINISHED !!