Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPv6 Hop-by-Hop & Destination Option #56

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

rsivakolundu
Copy link
Contributor

IPv6 Hop-by-Hop and Destination Option for INT Header and INT Metadata transport

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<-+

Nxt Hdr: 8-bit selector. Identifies the type of header immediately following the
Hop-by-Hop or Desitnation Options header. Uses the same values as the IPv4 Protocol
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: Desitnation -> Destination

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Hop-by-Hop or Desitnation Options header. Uses the same values as the IPv4 Protocol
Field [IANA-PN]

HDR Ext Len: 8-bit unsigned integer. Length of the Hop-by-Hop or Desitnation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same typo: Desitnation -> Destination

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

| Payload + Padding (L4/ESP/….) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add some text to clarify the following regarding INT within IPv6:

  1. What is the use-case for INT as IPv6 destination option? INT has its own destination header type, meant for the INT sink. In general, INT sink and IPv6 destination may not be the same.
  2. How does an INT switch handle the padding following INT data? Each hop inserts Hop-ML worth of metadata which is a multiple of 4 bytes, but not necessarily a multiple of 8 bytes. What happens if each hop is inserting an odd multiple of 4 bytes which is not a multiple of 8 bytes (say 20B). In such a case, at hop 1, we need 4 bytes of padding. Hop 2 can remove the padding inserted by hop 1 and comply. Hop 3 can insert 4 bytes padding again. Or do we say that each hop adds 4B of padding if HopML is an odd multiple of 4B? That would be wasteful.
  3. Regardless of what we do for Reporting physical and logical port ID in telemetry metadata #2, when HopML is an odd multiple of 4, each hop needs to push metadata at the top of the stack and do some manipulation (add/remove padding) at end of the stack. So INT behavior is different here. In other encapsulations, each hop simply inserts at the head of the metadata stack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us discuss the padding issue in person.

Geneve options to be defined for INT Headers.

### INT over IPv6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add INT over IPv6 after the other three encaps? Specially because the text in the paragraph is referring to scenarios where "INT over VXLAN or Geneve is not helpful"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed what was done earlier. TCP/UDP was listed first and it referenced encaps. I just stuck to that. I am fine with changing the order.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be woefully out of date on IPv6 extension header behavior, but regarding the option '"INT over IPv6" - INT Headers are carried in the IPv6 packets as Hop-by-Hop option.', I had thought that switches in practice have to punt packets with an IPv6 Hop-by-Hop extension header to the slow path, e.g. software forwarding on a general purpose CPU.

I did a quick search and found that RFC 7045 (published Dec 2013) says this in Section 2.2 "Hop-by-Hop Options":

The IPv6 Hop-by-Hop Options header SHOULD be processed by
intermediate forwarding nodes as described in [RFC2460]. However, it
is to be expected that high-performance routers will either ignore it
or assign packets containing it to a slow processing path. Designers
planning to use a hop-by-hop option need to be aware of this likely
behaviour.

Is there really a desire to put INT data into a header that will likely result in slow path processing in the network?

@jklr
Copy link
Contributor

jklr commented Oct 3, 2018

@rsivakolundu @mhira1 @mickeyspiegel We can merge this change first, create a v1.x cut, advance INT.mdk to version 2 and merge the two other changes. One thing we need to discuss is where to merge the other future transportations (SRv6, GRE) to. INT v1.x or 2.x, or both?

. . |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| Variable Opt Data (INT Data) | Padding | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<-+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This alignment is not correct. This needs to be either:

  • Option Type and Opt Data Len is in this position, no Reserved (MBZ) field, and no padding, and the option has an alignment constraint of 4n+2.
  • The option has an alignment constraint of 4n, so there are two bytes of padding after HDR Ext Len, then Option Type and Opt Data Len starting at byte 4, then two bytes of Reserved (MBZ), then INT data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IOAM did it this way. I am okay with your suggestion. I will close with IOAM folks on a separate thread.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think IOAM did my second bullet. They did not explicitly mention the 4n alignment constraint, but they should have. That is the only way that what they defined works.

I need to propose to ask for only one code point and replace Reserved (MBZ) with 1 byte reserved and 1 byte IOAM Type.

Option Type: 8-bit identifier of the type of option.

001xxxxxx 8-bit identifier of the type of option. xxxxxx=TBD_IANA_INT_HOP_BY_HOP_OPTION_IPV6.
001xxxxxx 8-bit identifier of the type of option. xxxxxx=TBD_IANA_INT_DESTINATION_OPTION_IPV6.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the IANA registry, there are a total of 32 code points of which 17 have already been allocated. The registration procedure is IESG Approval, IETF Review or Standards Action. IOAM is asking for 4 code points, which seems unlikely. The chances for INT to get any code points are not high.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see two options:

  1. Go with the single experimental use hop-by-hop-options codepoint (0b11110), and split the "Reserved (MBZ)" field into 8 bits reserved followed by 8 bits of INT type.
  2. Wait for IOAM to get a codepoint and use that. Split the "Reserved (MBZ)" field into 8 bits reserved followed by 8 bits of IOAM type. Assign relatively high IOAM type codepoints for INT hop-by-hop option and INT destination option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see another problem with the corresponding IETF IOAM IPv6 draft. The text says that "a router MUST drop packets which contain extension headers carrying IOAM data-fields", to "ensure that the IOAM data does not unintentionally get forwarded outside the IOAM domain." However, they asked for an Option Type codepoint starting with "00", which means when the option type is unrecognized, "skip over this option and continue processing the header". If the text is correct, then they should ask for any of the other codepoint prefixes "01" (discard the packet), "10" (discard and send ICMP parameter problem, code 2, back to the packet's source address), or "11" (discard and send ICMP only if the packet's destination address was not a multicast address).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will close loop with IETF and address this comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever we do, two codepoints will not fly. At a minimum we would have to go with TBD_IANA_INT_OPTION_IPV6 (not distinguishing between INT hop-by-hop and INT destination), which would later get resolved to either experimental hop-by-hop options codeopint or whatever IOAM has assigned. If we go with IOAM then the INT Type values might need to be shifted to avoid conflicts.

I also wonder if we should use xxx or yyy for the first 3 bits as well given the other open issue I stated above.

Variable Opt Data: INT Header and Metadata, multiple of four octets in length.

Padding: 16-bit pad. Needed to ensure that the variable length of the complete
Hop-by-Hop or Destination Options Header is an integer multiple of 8 octets long.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the padding is added explicitly. If there are multiple options in the same extension header, then padding is only inserted to meet the next option's alignment constraint. The padding to 8 bytes is done at the end, after all options have been added to the extension header.

What I suggested for IOAM is that the Hop ML must be even so that padding is not required. However, I did not consider that the INT header is included. For v1 and with the second padding option described above, if Hop ML is even no padding would be required. For v2 due to the odd number of octets before the per hop data, this would not work and padding would have to be added in some cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For v2, it looks like we have 4 octets shim header and 12 octets fixed part of INT header before the variable metadata. There are two options with respect to padding:

  1. Go with 4n alignment. When INT is the only option present in the hop-by-hop options, then the "Padding|(MBZ)" above would be inserted in front of this option, and if Hop ML is even then there would be an additional 4 octets of padding after this option. If Hop ML is odd, then the additional 4 octets of padding after this option would only be present at even hops, so each hop either strips the incoming 4 octets of padding after this option, or adds 4 octets of padding after this option. If there are other hop-by-hop options present, padding may differ both before and after this option.
  2. Go with 8n alignment and require that Hop ML must be even. When INT is the only option present in the hop-by-hop options, then the "Padding|(MBZ)" in front of this option would be 6 octets rather than 2 octets. No padding would be added after this option. If there are other hop-by-hop options present, then the padding may differ both before and after this option, but it would not need to be adjusted at each hop.

IOAM has a similar issue, but the header size is different so the 4n alignment implications are different. When IOAM incremental trace is the only option present in the hop-by-hop options, and if hop ML is even then there would be no padding after this option. 8n alignment would not work since the total size of the fixed headers is not a multiple of 8 octets.

@rsivakolundu
Copy link
Contributor Author

rsivakolundu commented Oct 5, 2018 via email

telemetry/specs/INT.mdk Outdated Show resolved Hide resolved
Option Type: 8-bit identifier of the type of option.

001xxxxxx 8-bit identifier of the type of option. xxxxxx=TBD_IANA_INT_HOP_BY_HOP_OPTION_IPV6.
001xxxxxx 8-bit identifier of the type of option. xxxxxx=TBD_IANA_INT_DESTINATION_OPTION_IPV6.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see two options:

  1. Go with the single experimental use hop-by-hop-options codepoint (0b11110), and split the "Reserved (MBZ)" field into 8 bits reserved followed by 8 bits of INT type.
  2. Wait for IOAM to get a codepoint and use that. Split the "Reserved (MBZ)" field into 8 bits reserved followed by 8 bits of IOAM type. Assign relatively high IOAM type codepoints for INT hop-by-hop option and INT destination option.

Variable Opt Data: INT Header and Metadata, multiple of four octets in length.

Padding: 16-bit pad. Needed to ensure that the variable length of the complete
Hop-by-Hop or Destination Options Header is an integer multiple of 8 octets long.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For v2, it looks like we have 4 octets shim header and 12 octets fixed part of INT header before the variable metadata. There are two options with respect to padding:

  1. Go with 4n alignment. When INT is the only option present in the hop-by-hop options, then the "Padding|(MBZ)" above would be inserted in front of this option, and if Hop ML is even then there would be an additional 4 octets of padding after this option. If Hop ML is odd, then the additional 4 octets of padding after this option would only be present at even hops, so each hop either strips the incoming 4 octets of padding after this option, or adds 4 octets of padding after this option. If there are other hop-by-hop options present, padding may differ both before and after this option.
  2. Go with 8n alignment and require that Hop ML must be even. When INT is the only option present in the hop-by-hop options, then the "Padding|(MBZ)" in front of this option would be 6 octets rather than 2 octets. No padding would be added after this option. If there are other hop-by-hop options present, then the padding may differ both before and after this option, but it would not need to be adjusted at each hop.

IOAM has a similar issue, but the header size is different so the 4n alignment implications are different. When IOAM incremental trace is the only option present in the hop-by-hop options, and if hop ML is even then there would be no padding after this option. 8n alignment would not work since the total size of the fixed headers is not a multiple of 8 octets.

Option Type: 8-bit identifier of the type of option.

001xxxxxx 8-bit identifier of the type of option. xxxxxx=TBD_IANA_INT_HOP_BY_HOP_OPTION_IPV6.
001xxxxxx 8-bit identifier of the type of option. xxxxxx=TBD_IANA_INT_DESTINATION_OPTION_IPV6.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see another problem with the corresponding IETF IOAM IPv6 draft. The text says that "a router MUST drop packets which contain extension headers carrying IOAM data-fields", to "ensure that the IOAM data does not unintentionally get forwarded outside the IOAM domain." However, they asked for an Option Type codepoint starting with "00", which means when the option type is unrecognized, "skip over this option and continue processing the header". If the text is correct, then they should ask for any of the other codepoint prefixes "01" (discard the packet), "10" (discard and send ICMP parameter problem, code 2, back to the packet's source address), or "11" (discard and send ICMP only if the packet's destination address was not a multicast address).

Field [IANA-PN]

HDR Ext Len: 8-bit unsigned integer. Length of the Hop-by-Hop or Destination
Options header in 8-octet units, not including the first 8 octets.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above it only says Hop-by-Hop Options, not Destination Options. We can go either way, but it should be consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

telemetry/specs/INT.mdk Outdated Show resolved Hide resolved
telemetry/specs/INT.mdk Outdated Show resolved Hide resolved
Copy link
Contributor

@mickeyspiegel mickeyspiegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the specific comments, we need to figure out what to say about padding before we finalize.

* Reserved (MBZ): 16 bit field, must be filled with zeroes upon transmission and ignored upon
reception.

* Type: This field indicates the type of INT Metadata Header and Metadata following.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The figure above labels the field as "INT TYPE". We should make this consistent, I guess with "INT Type"?

option, in octets.

Reserved (MBZ): 16 bit field, must be filled with zeroes upon transmission and ignored upon reception.
* Reserved (MBZ): 16 bit field, must be filled with zeroes upon transmission and ignored upon
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8 bit field

Option Type: 8-bit identifier of the type of option.

001xxxxxx 8-bit identifier of the type of option. xxxxxx=TBD_IANA_INT_HOP_BY_HOP_OPTION_IPV6.
001xxxxxx 8-bit identifier of the type of option. xxxxxx=TBD_IANA_INT_DESTINATION_OPTION_IPV6.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever we do, two codepoints will not fly. At a minimum we would have to go with TBD_IANA_INT_OPTION_IPV6 (not distinguishing between INT hop-by-hop and INT destination), which would later get resolved to either experimental hop-by-hop options codeopint or whatever IOAM has assigned. If we go with IOAM then the INT Type values might need to be shifted to avoid conflicts.

I also wonder if we should use xxx or yyy for the first 3 bits as well given the other open issue I stated above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants