I made reference to the concept of textual identity in my last post, but didn’t go into a lot of detail. In this post, I’d like to describe the broader concept of assembly identity to provide folks with further insight as to what I was going on about. The following text is from a spec that I wrote; it should apply equally to v1.0, v1.1 and v2.0 and future .NET Framework versions.
Assembly Identity
The assembly identity is the name of an assembly. The filename, path, file hash or other characteristics are not part of the identity. The identity is used in two different ways: (1) to define the name of an assembly, and (2) to reference an assembly by name. These are sometimes referred to as “def” and “ref”. Both of these are assembly identity. In the case of a reference, the identity is used during binding to determine if and where an assembly is available.
Identity Composition
The assembly identity is composed of several distinct attributes that detail different characteristics about an assembly. Each attribute is used in binding if it is provided. The allowed attributes follow:
· Simple name
o Format: string
o Description: The name is the simple name of the assembly. It is essentially the name without all the other attributes
o Note: The name should always be the same as filename minus extension
· Version
o Format: Four 16-bit integers separated by “.”
o Description: A four-part version number (Major.Minor.Build.Revision)
o Note: Each one of the 16-bit integers overflow at 165536 and underflow at -1
· PublicKeyToken or PublicKey
o Format: An 8-byte or variable length (48- to 2048-byte) string, respectively or “neutral” or “null”
o Description: The public key token or key specifies the cryptographic signature of an assembly, guaranteeing that the assembly is from a particular publisher or set of assemblies (with that same token or key)
o Note: The token is almost always provided instead of the much longer key
· Culture
o Format: string or “neutral” or “null”
o Description: An arbitrary string that represents a culture installed on the system
· ProcessorArchitecture
o Format: “MSIL” or “X86” or “X64” or “IA64”
o Description: The processor architecture (PA) attribute specifies the requirement of a particular platform to execute a particular assembly. “MSIL” is an agnostic PA, as “MSIL” assemblies are allowed to be executed on any processor. All other PA options are “bit-specific” and must be run on a specific platform.
o Note: PA doesn’t relate directly to processor or CPU. For example, X86 assemblies can be execute on X64 machines using the WoW64 infrastructure.
· Retargetable
o Format: “yes” or “no”
o Description: The retargetable attribute specifies that an assembly can be retargeted to another assembly, meaning a reference to another assembly can be retargeted to this one.
o Note: The retargeting mechanism is more complicated than described here, and is not at all a common scenario
Note: In all cases, the attributes and enumeration values are matched during binding case insensitively.
Textual Identity
The assembly identity can be specified in a string format, most often referred to as a “textual identity”. This form of the assembly identity is used by many APIs within the .NET Framework. There are also APIs that parse the textual identity string and return the identity back as a class, removing the need for developers to parse or create a textual identity string. More on that class later.
Textual Identity Specification
The specification of this format follows:
simple_name (“,” attribute “=” value)+
The textual identity starts with the simple name, and then a set of attributes and their values. Each attribute starts with a “,” and then its name, followed by a “=” and then a quoted or non-quoted value for that attribute. Whitespace can occur pretty much anywhere within the string, except within the attribute names, which are essentially tokens.
The allowed attributes, mapping directly to the components described in the previous section, are listed below:
- Simple name à has no attribute name, requiring only its value
- Version à “Version”
- Culture à “Culture”
- Public key or token à “PublicKey” or “PublicKeyToken”
- Processor architecture à “ProcessorArchitecture”
- Retargetable à “Retargetable”
The following characters can be escaped as part of the identity, with the escape character “\”:
Textual Identity Error Cases
The textual identity parser will error in the following cases:
- The textual identity doesn’t match the correct general format (“,” attribute “=” value)+
- The textual identity includes attributes that are unknown (i.e. foo=bar)
- There is a value that doesn’t match a member of a given enumeration (i.e. “powerpc”)
- A part of any version number under- or over-flows the integer
AssemblyName Class
The .NET Framework includes a class called System.Reflection.AssemblyName. An AssemblyName takes the textual identity in its constructor, parses the string and then provides access to each attribute of the identity via handy properties. The constructor throws if the provided string is invalid according to the specification provided above.
The term “assembly name” is sometimes used interchangeably with “assembly identity”. It is generally best to think of “assembly name” as the AssemblyName class, and “assembly identity” as the boarder concept of the multi-part name of an assembly as described earlier.
Strong versus Partial versus Weak Names
Up until this point, the assembly identity has been described as if all the parts always have to be there. That is not the case. There are essentially three categories of names, depending on how much of the identity attributes are specified.
Strong name
· Simple name
· Version
· Culture
· PublicKey or PublicKeyToken
Weak name
· Simple name
Partial name
· A weak name, or
· Additional attributes beyond a weak name, but not enough to be considered a strong name
There are two other attributes: ProcessorArchitecture and Retargetable. ProcessorArchitecture is always optional, and just really makes a strong name more stronger. Retargetable is not commonly used.