INTERIM_MEETING_REPORT_ Reported by Robin Iddon/AXON Networks and Jeanne Haney/Bay Networks Minutes of the Remote LAN Monitoring Working Group (RMONMIB) An interim meeting of the RMONMIB Working Group was held in Santa Clara, CA on 15-17 May. The meeting was sponsored by cisco Systems. Agenda o Protocol directory o Protocol distribution o Address mapping o Network layer host/matrix o Seven-layer host/matrix o Relative offset filtering o Time filter o Probe capabilities o Generic control table issues - dropped packet counter - lastActivationTime - lastDeleteTime elimination - tableSizeRequested/Granted o Seven-layer topN/history o RMON1 additions o User history o Probe config MIB o Dynamic protocol discovery o Channel as dataSource The following notes are intended to provide a overview of the issues discussed at the meeting. Refer to the upcoming draft for detailed changes. Protocol Directory Issues involving the protocol identifier format were discussed. Concerns over OID tree data explosion led to a new ID format using an OID to represent the protocol layering and an octet string to represent the attributes or parameters associated with each protocol layer. OID tree data explosion issues: (1) Reference document explodes. (2) protocolDirectoryTable grows by same factor. (3) Agent code grows potentially. (4a) The number of stats/host/matrix rows may grow also. (4b) The number of filter entries grows also. Agreed to add some kind of protocolDirType to indicate whether or not this node could be (user) extended. Adopted proposal: protocolDirEntry protocolDirID -- OID { ip.udp.tftp } protocolDirOptions -- OCTET STRING, 8-bit per sub ID INDEX { protocolDirID, protocolDirOptions } There must be exactly one 8-bit option per sub ID in protocolDirID. The intent is that all protocol defs have exactly one option -- those that need none use zero; those that need one or more must define how to combine their values into a single 8-bit. For WANs in particular, there is real concern that there is no way to handle the multitude of link layer encapsulations. Previously we hoped to allow vendors to insert their own subtrees; we can still do the same thing provided we identify the places where it will occur in advance and provide for vendor-extension bit. Agreed to remove protocolDirParentID pending further discussion. Agreed to add an ``unknown network layer protocol enumeration'' which handles all cases where absolutely nothing could be determined about a packet (except, its mac addresses and length.) Protocol Distribution A proposal came up to add size distribution to this table. Discussion over the granularity of the buckets led to a proposal to use three buckets: < media-min, >= media-min, and > media-max. Agreement could not be reached and size distribution was dropped from consideration. Proposal to use protocolDirIndex (aka local integer) in the protocolDistTable INDEX { protocolDistControlIndex, protocolDirIndex }. Agreed that if there was no other use for the protocolDirIndex then this table will revert to its original use of protocolDirID. Discussion of fragmentation and whether we are interested in monitoring higher layer fragmentation (i.e., whether we want to try and provide counters which instrument fragmentation at all layers) -- generally the group appears not to be interested in directly counting fragmentation at any layer. Address Mapping Much discussion was made of whether to include the addressMapIfIndex in the INDEX (and hence to differentiate rows on different interfaces that are otherwise the same). There was discussion on how much effort it is for the NMS to utilize this table. Possible problems include: o How the NMS maps RMON1 host addresses through this table without totally uploading the table? o Whether the NMS uses the random access capability. o Should ifIndex be replaced by an OID to allow it to point to a repeater port? Data source still tells you which network the data came from. Include controlIndex instead of ifIndex as (a) ifIndex is being replaced and (b) do not want to keep port histories (which would happen if a device moved from one port to another and the OID was part of the INDEX). Agreed to INDEX { protocol, address, controlIndex } Agreed to incorporate portOID into addressMapEntry -- intent to point at point of origin of this device (best guess of agent). After much discussion about NMS control of agent resource utilization it was agreed that the protocolDirectory should contain a set of flags to control usage of this protocol. At a minimum this should control whether or not a protocol is used in maintaining the address mapping (hence it appears in this section of the agenda). Ideally we would also have a few more flags to enable usage in the protocolDistribution and the host/matrix tables. Network Layer Host/Matrix Discussion of using an enumerated value vs. protocol dir index led to further discussion of protocol directory `counting' issues and the need to control which protocols are counted in which tables: o One idea is to turn on/off a protocol via the protocol dir table. This means you collect the same protocols for all interfaces and all application tables. This seems very restrictive. o The second idea is to define the protocol channel which defines a set of protocols that a control entry points to, to determine which protocols it is collecting. The control tables would still have a separated data source value (i.e. not tie with protocol channel, so protocol channel can be shared across several control tables). This serves two purposes. It allows the NMS to give the agent help in conserving its resources. It also makes the tables smaller to retrieve so it helps the NMS. o The final choice was to turn the protocol on/off on a per application (Net Map, Matrix, Host, etc.). You cannot control it on a per interface basis. You cannot control it on a per control table basis. This is the one that most people voted for. The counters within the nlHostTable were discussed: o nlHostOutErrors discussion -- agreed object removed. o nlHostOutMACNUCastPkts agreed to replace nlHostOutBroadcastPkts, nlHostOutMulticastPkts. o nlHostOutFragmentPkts agreed not to implement this class of counter. nlHostEntry creation was discussed. Certainly do not insert on MAC error packets; do insert on new source address. There was some discussion on whether or not to insert on destination address. It was finally agreed to insert on good source and destination addresses but that the agent may need to use an improved aging technique to eliminate the host destination addresses generated by programs which ping sequential addresses in an attempt to discover which hosts exist. Agreed to drop hlMatrix[SDjDS]Errors. Agreed to keep both DS and SD tables (despite their being good reasons not to). It was deemed (a) too complex to dismiss the NMS's inability to easily know of some classes of uni-directional conversations and (b) the overheads on the agent are not severe enough to make the pain of pushing this through worth doing). Agreed to not do subnet aggregation because there was no standardizable proposal and no one volunteered to do one. Seven-Layer Host/Matrix Three models were discussed based on nl/sl host tables: 1. Merge them 2. Keep them separate but closely related so that the agent can be efficient 3. Keep them totally independent Long discussion over the product class<->mib group mapping followed. Eventually the group came to a vote on: 1. Single control table causing a nlHostTable and slHostTable to be constructed (related solution 1' recognizes that within the single control table entry will be parameters specific to the nl and sl tables, e.g., rm2HostControlNlMaxDesired and rm2HostControlSlMaxDesired). 2. Merge both tables (voted out 1 for merge, 16 against). 3. Split control tables but slHostControlTable depends on an instance of nlHostControlTable. Notice that this is also the same functions as 1'. 4. No sharing of data, hence duplicate memory requirements! (Deleted.) Proposal 1' was accepted over 3. Steve will add a straw proposal for the combined sl/nlHostControlTable in the next draft. slHostEntry will contain only inPkts/outPkts and inOctets/outOctets. slHostEntry will not contain slHostAddress, instead INDEX will reference nlHostAddress, and words will be added to ensure that for each slHostEntry there must be an nlHostEntry with the same address and hence deleting an nlHostEntry will cause deletion of the associated slHostEntries. Misconfiguring the protocolDirectory such that slHost function is enabled for a protocol but nlHost function is not enabled for its network layer protocol causes no data to be collected in either table for this protocol (because there are no nlHostEntries to relate slHostEntries to). Proposal adopted: INDEX { controlIndex, protDirIndex(addrType), nlHostAddress, protDirIndex(protocolType) } and that the slHostTable contain neither an address nor a MACNUCastPkts counter. A proposal was adopted to include a bit/enum in the protocolDirectory to indicate whether or not a network layer address is available for this protocolDirectoryEntry (it would not make sense to set this bit for ip.udp, for instance, but it could be set for both the ip entry and the ip.udp.appleTalk entry; an agent would set the bit if it supports the protocol as a network layer protocol and not if it supports it only as an application protocol). Ideally we would incorporate this into the nodeType object. This is not something to be placed in the parameters object because it can only relate to the final protocol of the OID, not all of them). Proposal for slMatrix is: INDEX { controlIndex, protDirIndex(addrType), sa, da, protDir(protType) } Agreed to let Steve apply results of the nl/sl host table discussions to the matrix and so avoid long discussions over basically the same subject. Agreed to move forward to the topN/history on host/matrix tables out of order because we want to discuss it in the context of the host/matrix tables. Discussion of data table columns: o Issue of error counters. What does it include? Why count L2 errors by protocol. Errors can propagate up to this table. It is too hard to make it meaningful to count network layer errors. Therefore we will leave it out. o Bcast and mcast? Could there be permutation of bcast/mcast at the L2 level and bcast/mcast at the L3 level. Is a broadcast to MAC addresses with a multicast IP address counted as bcast or mcast. Robin believes that the impact on the net is the fact that it is bcast, i.e., everyone received and processed it. We decided that we are merging the bcast and mcast counts into one counter. We are still counting L2 counters with an NLHostOutNUcastPkts (not unicast). Get rid of Broadcast, Multicast, and Errors. o Robin proposes an OutFragment counter that only bumps up when fragments are detected from a particular SA. Most people abstained, so it is a closed issue. Fragments are not counted. o We discussed not adding entries to the Host Table based on DA, so that the table does not get filled up with erroneous addresses from MIB sweeps, etc. On the other hand there are L3 broadcast addresses in video multicast addresses that will never appear in the source. Maybe we can use a different aging algorithm so entries without out pkts, get deleted sooner. But then would you be deleting these interesting mcast and bcast pkt as frequently as these bogus sweep addresses. o Good packets for this table is defined as good MAC packets. o Drop the Matrix error counters, do not add the bcast counter, they can get them from the host table. Remove nlMatrixSDAddressType. Discussion of encapsulated network layers (e.g., IP in IP): o The problem of NL layer protocols being wrapped on other NL protocols, causes some problem in the how to count the pkt and what the NL address is. How do you record both NL address. Do you consider the encapsulated protocol to be application? There is no place to save the encapsulated NL address. o Steve proposes an address structure that encode what the protocol is so that we can model both NL protocols and NL protocols encapsulated in other NL protocols. Should we try to solve this problem? (Vote: 8-2-5.) Now the NL tables could have entries that count a pkt twice, since the NL table accounts for all NL protocols, not just the NL usage at this particular probe in the network. Not all probes need to implement this, but all NMSs need to be aware of this anomaly. I.e., if you take all the entries for a particular NL Host, they could total up to more than 100% of the Net utilization for that Host. How does this affect the protocol distribution table. There would be a protocol directory entry for AppleTalk with IP and it would be counted in the prot distribution. o The upshot of the vote to handle protocols that may be encapsulated within other protocols, how you might represent the addressing. Can we change the network address mapping table to record this information that we have learned from encapsulated NL protocols. Add pDir Index as last index to the slHostTable (and slMatrix) NL -- address object nonUnicasts, SL -- pDirIndex on end Add a bit/boolean to pDirTable that defines whether addresses are recognized for that protocol. Relative Offset Filtering There was a lot of discussion of various filtering related topics. In the end it was agreed to treat the channels as data source issue elsewhere. Agreed to pursue filterLogicTable and mod to filterChannelIndex 0..65535. Robin to write up proposal (15 for, 0 against, 2 abstain). Time Filter After an example and some discussion it was agreed to implement time filter as proposed (15 for, 0 against, 1 abstain). It was also agreed that the timeMark goes in between the control index and the rest of the index. Probe Capabilities We discussed probe classes and the nl/sl split. We finally closed with nl/sl remain different tables (7 for, 3 against, 3 abstain). Next we voted on whether or not any kind of capabilities object was needed; in favour (11 for, 1 against, 2 abstain). Next we discussed per-interface vs. per-device capabilities. First vote on scalar only (per-device) (7 for, 2 against, 4 abstain). Scalar adopted. Generic Control Table Issues A) Dropped Packet Counter There was a lot of discussion about how the counters work and what they are (and are not) intended to do. In the end it was agreed that these counters are not intended to enable the agent to do statistical sampling/scaling. Indeed the notion of scaled data in the RMON2 tables is explicitly precluded (the group cannot define a scaling algorithm that is universally appropriate). Finally there was debate over whether statistical sampling and scaling were really the only solution to the 10x media speed increases, and while there was no agreement the discussion polarized between those that felt that the current agent technology would enable 100MBit and those that did not. It was agreed that there would be one droppedFrame counter per control entry by default but that for some groups/functions we may decide to use a scalar should that prove more appropriate. It was agreed that the [etherjtokenRingP]StatsDropEvents would continue to exist in RMON2 agents and that its semantics would be unchanged. The following rules define how the fooDropFrame counter (from the fooControlEntry) relates to the [etherjtokenRingP]StatsDropEvents counter and [etherjtokenRingP]StatsPkts counter for the same interface: 1. For each time the agent recognizes that one or more packets have been missed without it knowing exactly how many were missed it must increment the dropEvents counter for that interface. This is the only time that the dropEvents counter is incremented. 2. Whenever the agent chooses not to update a table/data collection function based on the contents of a packet which it knows was present on the network it must increment the droppedFrames counter for that table/function. 3. For all packets which are not lost in (1) above or dropped in (2) above the agent must update tables/data collection functions accurately. Two results of applying these rules are: 1. The sum of all packet counters in a table or data collection function (e.g., the hostOutPkt counter) plus the associated droppedFrame counter should be exactly equal to the sum of the [etherjtokenRingP]StatsPkts and [etherjtokenRingP]DroppedFrames counters for the same data source. Of course this assumes that the there are enough resources in the agent such that the table is not being LRU'd. 2. For all agents where the dropEvent counter is zero the sum of the droppedFrame and Pkt counters in a given table or function on the same interface should be exactly equal to the number of packets that there were on the network. It was agreed that there should be strong recommendations for RMON2 agents to utilize the droppedFrame counters as a means of accurately reporting the number of frames missed and that if at all possible the dropEvents counters should never be incremented -- in this way an NMS can use the data with much higher confidence. B) lastActivationTime Proposal to have this object set to sysUpTime at the point in time this control row's status transitioned from not active to active. This lets the NMS notice that another NMS restarted data collection (without picking a new control index) and so deltas will be invalid. It also gives an indication of the age of the table (but may not be used to rate the first ever poll -- the data counters still do not have to start from zero and so you do not know the delta over the interval). Agreed to adopt proposal (13 for, 3 abstain, 0 against). Notice that we will decide later which tables and functions to apply this to. C) lastDeleteTime Elimination Discussion -- it was agreed that lastDeleteTime was easy to implement, but it is also agreed that it was designed specifically for creationOrder which no longer exists. Proposal is to replace tableSize and lastDeleteTime with insertCount and deleteCount (where insertCount - deleteCount == tableSize). Agreed unanimously to adopt. D) tableSizeRequested/Granted Proposal to implement a maxDesired (i.e., a ceiling) per controlEntry. 0 implies consume as much memory as is required/available. > 0 instructs the agent to create at most this many data table entries associated with this control entry -- once this ceiling is reached the agent should delete old resources (associated with this control entry) in order to create new rows. Agreed to adopt proposal (16 for, 0 against, 1 abstain). Notice that we later had a discussion which suggested a valid use of zero would be for the new hostTable where the control entry creates both nlHostTable and slHostTable; a user who did not want an slHostTable on an interface might use 0 to indicate that. Perhaps we should use -1 to imply unlimited rather than zero. Seven-Layer topN/history Agreed to do any kind of topN in addition to the RMON1 stuff (8 for, 0 against, 7 abstain). Agreed to do slMatrixTopN (7 for, 0 against, 0 abstain) Marginally agreed to do nlMatrixTopN (5 for, 1 against, 5 abstain) Agreed to not do slHostTopN and nlHostTopN (1 for, 5 against, 7 abstain and 0 for, 4 against, 7 abstain respectively). Agreed not to support TopN by protocol (1 for, 10 against, 4 abstain). A real proposal bringing together all the best ideas of how to do TopN on the nl/sl matrix tables is needed -- Steve, Matt and Shay to get together on producing this proposal. RMON1 Additions 1. netUtilization Etherstats gives you the number of octets seen. Robin proposes that we provide a count of the number of bits and include interpkt gap and the preamble. This gives you a better approximation of utilization. Bytes seems like a better unit to use, then the counter will not wrap as readily. It still is the same way another analyzer would calculate utilization. We still run the risk that RMON gets compared with these analyzers and is not identical. So the question is, is the esterStatsOctets value a good enough approximation to get utilization or do we want to provide a new object that counts more of the overhead. People seem to favor just sticking with the original counter and obtaining an approximation to utilization for thresholding via Alarms. The group voted to use the octets approximation and not add any new bandwidth utilization indicators. 2. filterDescr Proposal withdrawn without opposition. 3. [filter changes] Robin to make proposal on the list based on what was discussed at the meeting (i.e. the filterLogicTable with m:1 relation reversed). 4. Control table additions The group considered four additions and how they applied to each control table: (a) insert, delete counters (b) maxDesired (c) activationTime (d) droppedFrames EtherStatsTable+TokenRingPStats+TokenRingMLStats activationTime, droppedFrames HistoryControlTable Nothing EtherHistoryTable+TokenRingPHistoryTable+TokenRingMLHistoryTable droppedFrames AlarmTable Nothing HostControlTable maxDesired, activationTime and droppedFrames (maxDesired needs note in implementors guide, apparently) HostTable/HostTimeTable Nothing HostTopNControlTable Nothing HostTopNTable Nothing MatrixControlTable Same as hostControlTable MatrixSD/DSTable Nothing FilterTable/ChannelTable/BufferTable Nothing EventControlEntry + LogTable Nothing RingStationControlTable activationTime, droppedFrames SourceRoutingStatsControlTable activationTime, droppedFrames 5. Storage type Steve to propose an object which is per-control row and indicates what NVRAM processing an agent has performed on that row (ROM, will-write, wont-write, written). (7 for, 0 against, 4 abstain). 6. Alarms enhancements Make it robust when monitored OID disappears. It was agreed that Steve would produce a draft based on an alarmValueStatus object which defines whether the agent managed to get the value last interval, an alarmValueUnavailable event/trap, an alarmUnavailableEventPollThreshold (i.e. the number of unavailable intervals before generating the event). 7. WAN status bits It was agreed that bit6 will be supported in the pktStatus bitmask as the packet direction bit. Further study of bit7 (other physical errors) will be done, but this bit needs to be more clearly defined before it can be adopted. User History Get rid of objectsGranted. BucketsRequested cannot be changed after row goes valid. Otherwise it stands as is. Probe Config MIB In this section OK means that we accepted to do it -- there were no votes as such, just a call for objections. o probeID, probeFirmwareRev, probeHardwareRev OK. Discussion on converting probeDateAndTime from ASCII into v2 DateAndTime TC (except we will do our own TC which is length 0 or 8 or 11 to allow optionality) OK. o probeResetControl OK. o probeDownloadFile, etc. (5 for, 0 against, 7 abstain) OK. o serialConfigTable: Agreed to keep serialIP and serialSubnet. o The agent might need to implement the two tables that contain line speed and flow control objects, but we will try to get away without doing it; should it be rejected elsewhere we will have to adopt usage of the appropriate serial MIBs instead (charPortTable and portTable). o Modify serialConfigProtocol slip(1), ppp(2), other(3). o Take all modem string DEFVALs and make them comments instead. o Rename serialTrapTimeout to serialDialoutTimeout OK. o netConfigIpAddress/netConfigSubnetMask OK. Remove netConfigIfSpeed and netConfigIfRingNumber. o trapDestIndex, trapDestCommunity, trapDestIpAddress, trapDestOwner, trapDestStatus OK (8 for, 0 against, 2 abstain). o serialConnect: index, dest ip, connection type (direct, modem, switch, switch-and-modem), dial/connect strings, owner, status. OK. Dynamic Protocol Discovery Populate the prot directory that is sensible at startup. Then the agent could add some protocols that it discovers existing on the net. The assumption is that the agent was capable to decode those protocols all along, but there is such large set of them and they may never appear on the net. It is OK that these added protocols grow more than a single level, as originally thought. It is up to the probe whether to turn it on or not for collection. How do we document these in the protocol document and provide options for these fields? Further discussion on the mailing list is needed. Channel as dataSource There was a vote on how many people violently object to using channel as data source (2). Those that wanted the standard to be changed to mandate that an agent must allow channel as data source (3). Those that want to leave the standard as is (and accept that there will continue to be proprietary extensions) and that the behaviour of any other kind of data source value is undefined (11). We voted to modify the text that states ifIndex is the only recognized dataSource that all should support, but that other values are not illegal -- just considered out of scope.