Difference between revisions of "Portable Filenames"

From Gramps
Jump to: navigation, search
m (Safe characters)
(empty page redirect to where mentioned)
(26 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{languages|Portable_Filenames}}
+
#redirect [[Meaningful filenames]]
 
 
 
 
 
 
In order to be able to move our media files from one computer to another it is critical that the names of our files can be understood by the different file systems and encodings they meet.
 
 
 
To find a set of characters which can meet all these criteria this article is originally based on content from Wikipedia Online Encyclopedia, especially the articles [http://en.wikipedia.org/wiki/Filename Filenames], [http://en.wikipedia.org/wiki/Comparison_of_file_systems Comparison of file systems] and [http://en.wikipedia.org/wiki/Ascii ASCII character encoding]. Please add other references to improve this article.
 
 
 
== File system issues ==
 
For genealogy purposes you will need to decide how many different situations you want your files to be usable in. To see what file systems you have on your computer !!NEEDS TO BE WRITTEN!! If you sometimes use a USB key you should remember that they typically use the FAT32 file system, which does not accept the same file names as (for example) Ubuntu Linux.
 
 
 
This page assumes you want to support the situations listed below and don't use more than 255 characters in the file names.
 
 
 
* Windows accessing hard drives with the
 
** NTFS file system
 
** FAT32 file system
 
* Linux accessing hard drives with the
 
** NTFS file system
 
** FAT32 file system (VFAT)
 
** EXT2 and EXT3 file systems
 
 
 
It should be noted that FAT12 and FAT16 use [http://en.wikipedia.org/wiki/8.3_filename 8.3 filenames], limiting the useful file name length to 8 characters (and three for the extension) before the introduction of Long Filenames ([http://en.wikipedia.org/wiki/Long_filename LFN]) in 1994.
 
 
 
=== Reserved characters and words ===
 
(Copied directly from [http://en.wikipedia.org/wiki/Filename Filename] 20th July 2008, 16:14 CET)
 
 
 
Many operating systems prohibit control characters from appearing in file names. For example, DOS and early Windows systems require files to follow the 8.3 filename convention. Unix-like systems are an exception, as the only control character forbidden in file names is the null character, as that's the end-of-string indicator in C. Trivially, Unix also excludes the path separator / from appearing in filenames.
 
 
 
Some operating systems prohibit some particular characters from appearing in file names:
 
 
 
{| border="1"
 
|-
 
! Character
 
! Name
 
! Reason
 
|-
 
| /
 
| slash
 
| used as a path name component separator in Unix-like, Windows, and Amiga systems. (The MS-DOS command.com shell would consume it as a switch character, but Windows itself //always// accepts it as a separator [http://www.thescripts.com/forum/thread23123.html])
 
|-
 
| \
 
| backslash
 
| Also used as a path name component separator in MS-DOS, OS/2 and Windows (there is no difference between slash and backslash); allowed in Unix filenames, see Note 1
 
|-
 
| ?
 
| question mark
 
| used as a wildcard in Unix, Windows and AmigaOS; marks a single character.  Allowed in Unix filenames, see Note 1
 
|-
 
| %
 
| percent sign
 
| used as a wildcard in RT-11; marks a single character.
 
|-
 
| *
 
| asterisk
 
| used as a wildcard in Unix, MS-DOS, RT-11, VMS and Windows. Marks any sequence of characters (Unix, Windows, later versions of MS-DOS) or any sequence of characters in either the basename or extension (thus "*.*" in early versions of MS-DOS means "all files". Allowed in Unix filenames, see note 1
 
|-
 
| :
 
| colon
 
| used to determine the mount point / drive on Windows; used to determine the virtual device or physical device such as a drive on AmigaOS, RT-11 and VMS; used as a pathname separator in classic Mac OS. Doubled after a name on VMS, indicates the DECnet nodename (equivalent to a NetBIOS (Windows networking) hostname preceded by "\\".)
 
|-
 
| <nowiki>|</nowiki>
 
| vertical bar
 
| designates software pipelining in Unix and Windows; allowed in Unix filenames, see Note 1
 
|-
 
| "
 
| quotation mark
 
| used to mark beginning and end of filenames containing spaces in Windows, see Note 1<!-- allowed in Unix -->
 
|-
 
| <
 
| less than
 
| used to redirect input, allowed in Unix filenames, see Note 1
 
|-
 
| >
 
| greater than
 
| used to redirect output, allowed in Unix filenames, see Note 1
 
|-
 
| .
 
| full stop/period
 
| allowed but the last occurrence will be interpreted to be the extension separator in VMS, MS-DOS and Windows. In other OSes, usually considered as part of the filename, and more than one full stop may be allowed.
 
|}
 
 
 
In Windows the space and the period are not allowed as the final character of a filename. The period is allowed as the first character, but certain Windows applications, such as Windows Explorer, forbid creating or renaming such files (despite this convention being used in Unix-like systems to describe hidden files and directories). Among workarounds are using different explorer applications or saving a file from an application with the desired name.
 
 
 
Some file systems on a given operating system (especially file systems originally implemented on other operating systems), and particular applications on that operating system, may apply further restrictions and interpretations. See comparison of file systems for more details on restrictions imposed by particular file systems.
 
 
 
In Unix-like systems, MS-DOS, and Windows, the file names "." and ".." have special meanings (current and parent directory respectively).
 
 
 
In addition, in Windows and DOS, some words might also be reserved and can not be used as filenames.<ref name="win"/> For example, DOS Device file:
 
CON, PRN, AUX, CLOCK$, NUL
 
COM0, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9
 
LPT0, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9.([http://msdn.microsoft.com/en-us/library/aa365247.aspx ref])
 
Operating systems that have these restrictions cause incompatibilities with some other filesystems. For example, Windows will fail to handle, or raise error reports for, these legal UNIX filenames: aux.c, q"uote"s.txt, or NUL.txt.
 
 
 
If you put your files onto any other file system you risk losing part of the files name, capitalisation of the name and characters not accepted in a file name. All these problems can make it hard to find and recover your files.
 
 
 
== Encoding Issues ==
 
Please read the Wikipedia article
 
Many of us use computers with Unicode as the default character encoding, which supports about 100,000 characters<ref>[http://en.wikipedia.org/wiki/Unicode ref]</ref>. It is now standard for Linux systems{{fact}}. But many systems are still in use which by default use ASCII encoding which supports only 128 characters<ref>[http://en.wikipedia.org/wiki/Ascii ref]</ref> not all of which can be used in texts.
 
 
 
The first clear choice then is to use characters which are in the ASCII character set. If we use Unicode characters we can easily use something which won't be understood by an operating system using ASCII. For example Åse (a girl's name in Danish) simply cannot be represented as an ASCII character.
 
 
 
=== Safe characters ===
 
 
 
Keeping our sights firmly on the [http://en.wikipedia.org/wiki/Ascii ASCII character set], specifically the ASCII printable characters (and skipping the [http://en.wikipedia.org/wiki/Control_character control characters]), we get the list below of available characters.
 
 
 
{| border="1"
 
|-
 
! Glyph
 
! Name
 
! Safe character?
 
! Remarks
 
|-
 
| a-z
 
| English letters ''a'' through to ''z''
 
| {{yes}}
 
| Not reserved[http://en.wikipedia.org/wiki/Filename ref]
 
|-
 
| A-Z
 
| English letters ''A'' through to ''Z''
 
| {{yes}}
 
| Not reserved[http://en.wikipedia.org/wiki/Filename ref]
 
|-
 
| 0-9
 
| Digits ''0'' through to ''9''
 
| {{yes}}
 
| Not reserved[http://en.wikipedia.org/wiki/Filename ref]
 
|-
 
|
 
| space
 
| {{maybe}}
 
| In Windows the space and the period are not allowed as the final character of a filename. Not reserved[http://en.wikipedia.org/wiki/Filename ref]
 
|-
 
| !
 
| exclamation mark
 
| {{yes}}
 
| Not reserved[http://en.wikipedia.org/wiki/Filename ref]
 
|-
 
| "
 
| quotation mark
 
| {{no}}
 
| Used to mark beginning and end of filenames containing spaces in Windows([http://msdn.microsoft.com/en-us/library/aa365247.aspx ref])
 
|-
 
| #
 
| number sign
 
| {{yes}}
 
| Not reserved[http://en.wikipedia.org/wiki/Filename ref]
 
|-
 
| $
 
| dollar sign
 
| {{yes}}
 
| Not reserved[http://en.wikipedia.org/wiki/Filename ref]. Used to start variables in many programming languages
 
|-
 
| %
 
| percent sign
 
| {{no}}
 
| used as a wildcard in RT-11; marks a single character[http://en.wikipedia.org/wiki/Filename ref]
 
|-
 
| &
 
| Ampersand
 
| {{yes}}
 
| Not reserved([http://en.wikipedia.org/wiki/Filename ref])
 
|-
 
| '
 
| Apostrophe
 
| {{yes}}
 
| Not reserved([http://en.wikipedia.org/wiki/Filename ref])
 
|-
 
| ( and )
 
| Parentheses
 
| {{yes}}
 
| Not reserved([http://en.wikipedia.org/wiki/Filename ref])
 
|-
 
| *
 
| Asterisk
 
| {{no}}
 
| Used as a wildcard in Unix, MS-DOS, RT-11, VMS and Windows. Marks any sequence of characters (Unix, Windows, later versions of MS-DOS) or any sequence of characters in either the basename or extension (thus "*.*" in early versions of MS-DOS means "all files"([http://msdn.microsoft.com/en-us/library/aa365247.aspx Win ref]). Allowed in Unix filenames.([http://en.wikipedia.org/wiki/Filename ref])
 
|-
 
| +
 
| Plus
 
| {{yes}}
 
| Not reserved([http://en.wikipedia.org/wiki/Filename ref])
 
|-
 
| ,
 
| Comma
 
| {{yes}}
 
| Not reserved([http://en.wikipedia.org/wiki/Filename ref])
 
|-
 
| -
 
| Hyphen
 
| {{yes}}
 
| Not reserved([http://en.wikipedia.org/wiki/Filename ref])
 
|-
 
| .
 
| Period / Full stop
 
| {{maybe}}
 
| Not reserved. The last occurrence will be interpreted to be the extension separator in VMS, MS-DOS and Windows. Do not end a file or directory name with a trailing space or a period. Although the underlying file system may support such names, the Windows operating system does not.([http://msdn.microsoft.com/en-us/library/aa365247.aspx Win ref]
 
|-
 
| / and \
 
| Slash and Backslash
 
| {{no}}
 
| Slash is used as a path name component separator in Unix-like, Windows, and Amiga systems. Backslash is also used as a path name component separator in MS-DOS, OS/2 and Windows (there is no difference between slash and backslash)([http://msdn.microsoft.com/en-us/library/aa365247.aspx Win ref], ([http://en.wikipedia.org/wiki/Filename#Reserved_characters_and_words Unix ref]))
 
|-
 
| :
 
| Colon
 
| {{yes}}
 
| Reserved character on Windows([http://msdn.microsoft.com/en-us/library/aa365247.aspx ref]).
 
|-
 
| ;
 
| Semi colon
 
| {{yes}}
 
| Not reserved([http://en.wikipedia.org/wiki/Filename ref])
 
|-
 
| <
 
| Less than sign
 
| {{no}}
 
| Used to redirect input([http://msdn.microsoft.com/en-us/library/aa365247.aspx ref])
 
|-
 
| =
 
| Equals sign
 
| {{yes}}
 
| Not reserved([http://en.wikipedia.org/wiki/Filename#Reserved_characters_and_words ref])
 
|-
 
| >
 
| Greater than sign
 
| {{no}}
 
| Used to redirect input([http://msdn.microsoft.com/en-us/library/aa365247.aspx ref])
 
|-
 
| ?
 
| Question mark
 
| {{no}}
 
| Used as a wildcard in Unix, Windows and AmigaOS; marks a single character([http://msdn.microsoft.com/en-us/library/aa365247.aspx Win ref]). Allowed in Unix filenames([http://en.wikipedia.org/wiki/Filename#Reserved_characters_and_words ref])
 
|-
 
| @
 
| At sign
 
| {{yes}}
 
| Not reserved([http://en.wikipedia.org/wiki/Filename#Reserved_characters_and_words ref])
 
|-
 
| [ and ]
 
| square brackets or box brackets
 
| {{yes}}
 
| Not reserved([http://en.wikipedia.org/wiki/Filename#Reserved_characters_and_words ref])
 
|-
 
| ^
 
| Caret
 
| {{yes}}
 
| Not reserved([http://en.wikipedia.org/wiki/Filename#Reserved_characters_and_words ref])
 
|-
 
| _
 
| Underscore
 
| {{yes}}
 
| Not reserved([http://en.wikipedia.org/wiki/Filename#Reserved_characters_and_words ref])
 
|-
 
| `
 
| Grave accent
 
| {{no}}
 
| Not reserved([http://en.wikipedia.org/wiki/Filename#Reserved_characters_and_words ref]), but Outside the US often replaced by the local currency symbol. Many older UK computers, such as the ZX Spectrum and BBC Micro, have the £ symbol in it's place.([http://en.wikipedia.org/wiki/Grave_accent#Computer-related ref])
 
|-
 
| { and }
 
| Curly brackets
 
| {{yes}}
 
| Not reserved([http://en.wikipedia.org/wiki/Filename#Reserved_characters_and_words ref])
 
|-
 
| &#124;
 
| Vertical bar
 
| {{no}}
 
| Designates software pipelining in Unix and Windows([http://en.wikipedia.org/wiki/Filename#Reserved_characters_and_words Unix ref], [http://msdn.microsoft.com/en-us/library/aa365247.aspx Win ref])
 
|-
 
| ~
 
| Tilde
 
| {{yes}}
 
| Not reserved([http://en.wikipedia.org/wiki/Filename#Reserved_characters_and_words ref])
 
|-
 
|}
 
 
 
== Test files ==
 
Here are some filenames which GRAMPS users have tested to make sure they are okay.
 
 
 
{| border="1"
 
|-
 
! Filename
 
! Operating system
 
! File system
 
! Description
 
! User Name
 
|-
 
| ~test.txt
 
| Windows Vista
 
| NTFS
 
| Simply created on a Vista machine's desktop, not moved around.
 
| [[User:Duncan|Duncan]]
 
|-
 
|}
 

Revision as of 02:33, 8 October 2011