CS268 Summaries: September 2008

Friday, September 26, 2008

Modeling wireless Links for Transport Protocols

Wireless links face issues like variable bandwidth, corruption, variable latency etc. This affects the performance of TCP because TCP assumes packet losses are due to congestion. Link layer protocols like FEC/ARQ have been proposed to improve performance but they suffer from drawbacks like large link-level queues.

This paper proposes models that should be used for evaluating transport protocols over wireless links. Wireless links include Cellular, WLAN and satellite. The authors say that the traffic model can have significant effect on the simulation results. It is often the case that if we do not model all types of traffic in the network then the evaluation of the protocol will not be complete. Transport protocols are generally evaluated over throughput, delay, fairness and dynamics. In case of wireless networks it makes sense to see how much useful data got through with respect to all the delivered data. The authors term this as "goodput".

Models used should be realistic, explore all the parameters, do not use something overly realistic and should be available to be used by others. The paper gives very good examples related to these features.

Modeling of wireless links include:

Error loss and corruption should be modeled by dropping packets at a per-packet, per-bit or time-based loss probability. Losses due to data corruption should be modeled at the end of the link while handover losses should be modeled before the link.
Delay variation can be modeled by suspending the data transmission on a simulated link.
Packet reordering requires either building a queue and swapping packets or delaying one packet for a given time.
On demand resource allocation can use a model where additional delay is introduced when a packet arrives to a queue that has been empty for a longer time.
Bandwidth variation can be modeled by having periods of low and high bandwidth.
Asymmetry is modeled by having different settings for uplink and downlink.
Queue management

WLAN/Cellular links: use drop-tail buffer with configurable maximum size in packets
Satellite links: use RED

Effects of mobility have to be modeled based on whether underlying mechanisms of handover have to be evaluated or the effects of the handover need to b e evaluated.

The authors in the end discuss how bit errors in wireless links should be handled at the link level while reordering of packets be handled at the transport level. Transport protocols can be made robust to address delay variations while link-level protocols can be changed to hide delay variations from the transport payer.

The paper was interesting to read because it explained different environments that wireless communication can face and how the models should take these into consideration.

The paper says that currently wireless links at both ends is not common. Is this true today as well? I am not sure what the authors mean. Do they mean that complete communication uses wireless links or that both the machines communicating are not using wireless links?

A Comparison of MechanA Comparison of Mechanisms for Improving TCP Performance over Wireless Linisms for Improving TCP Performance over Wireless Links

Summary:
TCP proves to not provide good throughput in case of wireless networks. This is because packet losses in TCP are handled by invoking congestion control and avoidance algorithms. Wireless networks face packet losses due to high bit error rates, intermittent connectivity etc. There have been several schemes proposed to improve TCP performance in wireless networks:

End-to-End protocol - In this scheme the losses are handled by the sender. There are two techniques that are typically used:

SACK provides selective acknowledgement to allow the sender to recover from multiple packet losses in a window without invoking timeout.
Use ELN (explicit loss notification) to distinguish loss from congestion.

Link-Layer protocols - This protocol tries to hide the link layer losses from the sender by using local transmissions (ARQ) and FEC (forward error connection). The TCP sender may see the effect of losses because the lossy link is part of the connection. There are some protocols that use the knowledge of TCP along with link-layer protocols.
Split-connection protocols - This scheme uses a separate reliable connection between the base and the sender to communicate SACK/NACK.
Snoop Protocol - This places a snoop agent at the base station which monitors every packet in the TCP connection. A cache of packets that have not been acknowledged is kept by the receiver. Small number of duplicate acknowledgements or local timeout causes the snoop agent to retransmit the packet (if it is in the cache) and suppresses the duplicate acknowledgements.

Impact and Critique:
The authors evaluate different schemes that use one of the three protocols for improving performance. It is found that reliable link layer protocol with some knowledge of TCP provides very good performance. This is due to the fact that typical link layer protocols are affected by the TCP fast retransmissions while TCP aware link layer protocols suppress the excess duplicate acknowledgements thus improving performance. End-to-End protocols that have both ELN and SACK out perform typical end-to-end protocols. While split connection protocols improve performance but they are worse that TCP aware link layer protocols.

I found the paper interesting to read because it discusses what issues TCP faces in case of wireless networks. TCP is designed keeping in mind wired networks that generally have packet loss due to congestion. Comparison of different wireless protocols and how they perform in wireless networks was very insightful. Since I did not know much about wireless protocols this paper helped me understand different schemes that have been proposed and also the issues they face.

It would have been nice to discuss the pros and cons of each protocol in more detail. I could not understand how the ELN scheme distinguishes whether a loss is due to congestion or packet loss. It seems to me that the TCP aware Link layer protocol violates the layered concept. We are now mixing link layer with transport layer. Also, it would have been insightful if the paper also showed how the protocols perform in case of congestion.

Thursday, September 25, 2008

MACAW: A Media Access Protocol for Wireless LAN's

Summary:
The authors in this paper propose MACAW that modifies MACA. Techniques that have been proposed for wireless communication include:

CSMA (Carrier Sense Multiple Access) that senses carrier before sending data. this faces "Hidden node" and "exposed node" problem.
MACA (Multiple Access Collision Avoidance) uses RTS-CTS-Data along with exponential backoff to select retransmission time.

MACAW proposes four changes:

Fairness:

The binary exponential backoff (BEB) algorithm does not provide fairness. It can happen that a terminal with low backoff counter will always win over another terminal with higher backoff counter where both terminals are sending data at channel capacity. MACAW proposes to have all terminals in a range update their backoff counter on hearing a packet.
To make BEB algorithm not have large oscillations the authors propose MILD technique.
Fairness over stream instead of per station is achieved by implementing separate queues for each stream.

RTS-CTS-DS-DATA-ACK modifications:

RTS-CTS-DATA-ACK exchange provides error checking at link level.
Send DS before DATA so that all nearby terminals wait until ACK is heard. This provides efficient synchronization of information about contention periods.
RRTS solves contention problem due to lack of synchronization. Example we have A-to-B and D-to-C stream where B and C are close to each other. In this case D-to-C stream RTS may not be responded because A-to-B stream has large data size and short gaps between complete data transmission as well as completion of B's next CTS. In order to solve this problem RRTS can be issued by C to D (as response to non-responded RTS) during the next contention period in order to initiate communication .

Location dependent

Since congestion varies per region it is required to modify BEB to be per sender-receiver pair. This is achieved by having the backoff scheme copy both sender and receiver backoff counters.

Background:
It is helpful to read about 802.11 MAC protocol before reading this paper.

Impact and Critique:
The paper very nicely gives a brief overview over the past protocols and their drawbacks. I felt that per stream implementation by keeping separate queue for each stream was similar to the concept of VOQ.

MACAW protocol addresses a lot of issues like fairness and better congestion control but there are issues like NACK and multicast that have not been addressed or do not have a good solution. While reading the paper I felt that few modifications were made to handle very specific type of traffic and may have resulted in making the protocol very complicated. Moreover the experiments are run on a simulator thus the scenario may not be exactly the same as real-time traffic. The MILD scheme increases the counter interval by 1.5 times and decreases by 1. There is no mention about how these parameters were chosen. Do they need to be changed as per traffic pattern?

Monday, September 22, 2008

Scaling Internet Routers Using Optics

After reading the previous paper I had raised a concern that a centralized scheduler can cause problems and the authors in this paper discuss that removing the central scheduler can make the router more scalable. It is proposed that optics with zero power consumption can be used to place the fabric of a 100Tb/s router in a single rack without losing throughput guarantees.

The load balancing architecture provides an extra load-balancing stage that helps in spreading non-uniform traffic and making it uniform. Uniform arrival of packets have guaranteed throughput of 100% with fixed equal-rate switch that has virtual queues. The load balancing switch does not have a central scheduler but it requires a switch fabric of NxN that can be reconfigured for each packet transfer. To get rid of reconfigurations the switch fabric can be replaced with a fixed mesh of optical channels. Each fixed equal-rate switch can be replaced with NxN fixed channels. The mesh needs to be uniform if the traffic matrix is not known. Wavelength division multiplexing (WDM) can be used to reduce the number of fibers by having each linecard multiplex N WDM channels onto one fiber. FOFF (Full Ordered Frames First) is used to have packets leave the switch in order. Failure of linecards raises the need to scatter traffic in uniformly. The solution is to increase the number of paths M between local switches.

The architecture proposed in the paper removes the centralized scheduler and have a switch fabric that does not need to be reconfigured.

The paper was interesting to read but for me it was difficult to follow all the details given in the paper. I feel that I have not learned much out of this paper and there are lots of things that I have missed. Probably after tomorrow's lecture this paper will be more clear to me.

A Fast Switched Backplane for a Gigabit Switched Router

The paper discusses the need of switched backplanes to provide simultaneous transfer of multiple packets. Routers most perform datapath function (forwarding decision) on each datagram. Thus improving routers performance requires improving the performance of datapath functions. General trend in performance improvement involves implementing datapath functions at the hardware level, adding parallelism and replacing shared buses.

The author proposes using crossbar switch in which multiple line cards can communicate with each other simultaneously and this greatly enhances performance. He argues that very high speed shared backplanes are impractical because of congestion (since bandwidth is shared) and transfer capacity of a multiport shared bus is limited by electrical loading.

A crossbar switch gives high performance because the connections from the line cards to the central switch are simple point-to-point links and thus they can operate at high speed. Also, it can support multiple bus transactions simultaneously increasing aggregate bandwidth. It is simple to pass fixed-length cells across a switched backplane and they are chosen for providing better performance. Even though crossbar switches are non-blocking they experience head-of-line (HOL), input and output blocking. HOL is addressed by virtual queueing and prioritization mechanisms address the delay introduced due to input and output blocking.

Crossbar scheduling algorithm should provide high throughput, be starvation free, fast and simple to implement. The paper discusses iSLIP algorithm that is used by crossbar switches. Multicast traffic can be handled easily by closing multiple crosspoints simultaneously thus performing cell replication within the fabric.

It was insightful to see how the hardware of a router is designed to achieve better performance. Until now I had never thought about hardware techniques used improving router performance. I must say that the paper is written very nicely. I was under the assumption that since the paper is more about hardware I will not understand the paper very nicely. But the paper gave a very clear picture of what decisions were made and why. The paper should definitely be kept in the syllabus.

It seems that the centralized scheduler can become a bottleneck in case of high-speed traffic and cause delay.

Wednesday, September 17, 2008

Supporting Real-Time Applications in an Integrated Services Packet Network: Architecture and Mechanism

This paper is mainly targeted for real-time applications. The authors propose that real-time traffic require guaranteed service or predicted service (adjust themselves based on the network delay). The architecture discussed handles both types of traffic.

Real-time applications are sensitive to data delivery delay, require delay information to set playback point, data should be delivered before its playback point and can tolerate only a certain fraction of packet losses. It is required that real-time applications get service as if they are the only one using the link and they should not be affected by other application traffic. Service commitments made to an end-user depends on the traffic commitment of the end-user.

For guaranteed traffic it is proposed to use token bucket filters to handle the rate of traffic while WFQ is used for scheduling. For the predicted services the authors propose FIFO+. To make both these algorithms work together guaranteed service flows are assigned as different WFQ flows while datagram/predicted service flows are assigned a pseudo flow 0. This flow has different priority classes and each class runs FIFO+.

The paper puts forward the need and requirement for servicing real-time application traffic. The discussion of predicted service shows that not all of the real-time traffic requires guaranteed service. One of the main weaknesses of the paper is the lack of concrete evaluation. Also, it has not been discussed how deployment will be handled. Techniques that propose changes to the Internet should always keep in mind the ease of deployment and the incentive for people to adopt the scheme.

Fundamental Design Issues for the future Internet

With the evolution of the Internet the requirements for service have changed dramatically. Moreover, there are different types of traffic that flow through the Internet. This raises the need to revisit the design and architecture of Internet and evaluate its performance. The author in the paper discusses some of these issues and also suggests the changes that can make Internet work in the future.

The authors say that the Internet was designed with having a single class of "best-effort" service with no admission control or delivery guarantee. Most of the applications can handle packet losses or varied delays but multimedia applications are not tolerant against these problems. To make real-time applications work well we can either change the application or the network implementation. Both approaches have their pros and cons and these are discussed in great length in this paper.

The goal of the Internet is to provide better performance to users. To make network more efficient it is more plausible to provide a service model instead of increasing bandwidth or providing separate networks for different types of traffic. Providing a service model raises the need of deciding which service to pick and who makes this decision. This decision can be implicitly decided by the network based on the packet information or an application can explicitly request a service. Explicit request must be used with incentives to encourage users to choose a correct service. Another issue that explicit approach faces is that the applications need to be aware of these choices. Any change in the service model requires updating all the applications.

Admission control is used by networks to avoid overloading. Over-provisioning Internet for special traffic (real-time applications) so that no admission control is required does not prove to be cost effective.The paper shows that the traditional applications worked efficiently with no admission control but with the real-time applications it is required to drop some flows for providing better efficiency.

The paper was very interesting to read because it put forward how the design of the Internet needs to be changed for special traffic. The design decisions change as Internet evolves. It will be nice to discuss whether some of the changes discussed in the paper have been incorporated in the Internet today. Also, with security coming into play we have to also take into consideration malicious users. We also have traffic from different types of network like wireless networks and sensor networks. These networks require different types of algorithms for providing efficiency.

Saturday, September 13, 2008

Random Early Detection Gateways for Congestion Avoidance

Most congestion control and fairness techniques do not work well because they need large queue size for connections with large delay-bandwidth products, require global synchronization, have no isolation against unresponsive flows and are biased towards bursty traffic.

RED algorithm proposed by the authors tries to address these issues. Congestion avoidance is provided by controlling the average queue size. Average queue size is compared between two thresholds. When the average queue size is below minimum threshold no packet is marked while all packets are marked if average queue size is above maximum threshold. When average queue size is between minimum and maximum threshold packets are marked with probability p.

Queue size determines the degree of burstiness allowed by the gateway. Average queue size is calculated with an exponential weighted moving average and this aids in handling burst of packets. RED gets rid of global synchronization by marking packets at a rate that is as low as possible. Fairness is achieved by marking each connections packet based on its share of bandwidth.

I feel that the most challenging aspect of using RED is setting all RED parameters correctly. I am curios to know whether RED has been implemented in routers today? Even if it is implemented in the routers do the routers use this option? It will be interesting to see how RED performs today with such varied type of traffic.

Friday, September 12, 2008

Congestin Control for High Bandwidth-Delay Product Networks

XCP was designed keeping in mind the increased use of high bandwidth and latency networks. The authors argue that conventional TCP implementation and congestion control schemes falter in settings that involve high bandwidth links and large delays. Moreover with the growth of the Internet we now have a mix of networks that have varied bandwidth and delays. This raises the need to make sure that the transport layer protocol is stable in extreme scenarios and handle congestion control. Also, TCP biases against flows that has large round trip time and short flows get stuck in the slow-start phase.

XCP proposes to solve these problems. The authors propose modifying explicit congestion notification (ECN) and have routers inform senders about the degree of congestion at the bottleneck. They argue that a binary variable is not enough to convey congestion information. Sending rate of a source should be adjusted based on RTT and fairness should be independent of the number of flows. They also decouple utilization control from fairness.

Each XCP packet contains a congestion header that has H_cwnd, H_rtt and H_feedback. The sender sets H_cwnd and H_rtt. Routers along the path update H_feedback using EC (efficiency controller) and FC (fairness controller). EC uses MIMD while FC uses AIMD. The receiver copies H_feedback to ACK and the sender updates its CWND based on the feedback.

This paper was insightful to understand the requirements for high speed networks. How the current implementation of TCP does not cater to these requirements and what needs to be done. With the evolution of Internet happening at such fast pace we need to always keep in mind how current protocols scale with respect to today's requirements. I would recommend keeping this paper in the syllabus.

XCP outperforms TCP because it sends complete information of the degree of congestion instead of sending a bit signaling congestion. Decoupling FC and EC enables utilizing spare bandwidth and it also considers the delay in feedback while updating CWND. It is scalable because this does not require per-flow state to be stored at the routers.

One thing that seemed weird is that XCP is depending on the feedback sent in the ACK packet. The authors say that packet losses happen rarely but if a router is malicious or due to some other reasons that ACK is lost or it is modified then XCP can become unstable. Also, XCP is relying that the sender updates H_cwnd and H_rtt as per the rules. In case of an ill-behaved user XCP can be fooled to work incorrectly.

Wednesday, September 10, 2008

Core-Stateless Fair Queueing: Achieving Aproxmately Fair Bandwidth Allocations in High Speed Networks

The authors in the paper argue that fair allocation mechanisms require complex operations to be implemented at the router level. These operations include maintenance of state per flow and performing operations on a per flow basis which are not suited for high speed network routers. The authors propose a technique called core-stateless fair queueing where the edge routers maintain per flow state and insert labels into the packets. These labels convey the estimated flow rate and are used by core routers. The core routers perform FIFO scheduling with probabilistic dropping of packets based on the packet labels.

Edge routers compute flow arrival rate using exponential averaging. The estimated rate of arrival of each new packet is computed from the inter arrival time, the length of the packet and the previous rate. At the core router fair rate is estimated from the algorithm of aggregated arrival rate and aggregated accepted rate which are updated on each packet arrival. In case of congestion at a router the packet labels are not correct and they need to be updated as minimum of old label and fair rate. The paper also discusses weighted CSFQ which supports flows with different weights. The authors show simulations comparing CSFQ with FIFO, RED, FRED and DRR.

The argument that laid the basis of the research involved reducing the complex operations performed by routers in order to achieve fairness. I am curious what is the current trend, do we use CSFQ type implementation or is DRR used these days. Also, are there scenarios where the estimations used in CSFQ do not work well in complex scenarios where flows come and go simultaneously and their life span is too short. The simulations show that CSFQ performs better with TCP burstiness but how long does it take to stabilize when multiple new connections are started.

I would recommend having this paper in the syllabus because it ties very well with the other reading for fair queuing lecture.

Tuesday, September 9, 2008

Analysis and Simulation of a Fair Queueing Algorithm

This paper discusses queuing algorithms implemented at the gateway and how fair queueing aids in congestion control. Queueing algorithms attempt to allocate bandwidth, promptness and buffer space. The most common queueing algorithm FCFS uses the order of packet arrival to determine allocation. On the other hand Nagle's FQ algorithm uses the concept of having separate queues per flow and service the queues in a round-robin fashion. The authors in this paper propose a FQ algorithm that is a modified version of Nagle's approach and discuss how their modified version resolves some of the drawbacks of Nagle's alogrithm.

Authors argue that bandwidth and buffer allocation per user changes based on what constitutes a user. A user can be a source, a receiver, process or a source-destination pair. The authors propose allocation on the basis of source-destination pairs because this fits well with respect to security and efficiency.

Nagle's proposed approach does not do fair allocation of bandwidth in case of varying packet sizes. This problem can be solved by sending packets in a bit-by-bit fashion but that has a lot of overhead. They emulate the bit-by-bit concept by servicing a packet that is going to finish transmission first if the bit-by-bit algorithm was used. This algorithm allocates bandwidth to competing flows fairly. No flow may be further ahead than another by more than the maximum packet size. Promptness fairness is addressed by giving less delay to packets from flows that are using less than their fair share of the bandwidth. The authors briefly describe several flow control algorithms used at the source and present simulations of using these algorithms in conjuction with fair queuing and FCFS and compare the results with reapect to throughput, RTT, number of retransmissions and dropped packets.

The paper was interesting to read and the simulations aided in understanding how FCFS and FQ perform when coupled with other flow control algorithms and with different type of source and different network settings. FQ algorithms proposed in this paper require complex operations to be performed at the router. The authors argue about this and give reasoning behind it. CSFQ algorithm proposed by Stoica et al. tries to address this issue. It will be nice to discuss the pros and cons of both these approaches and also talk what is used these days.

Saturday, September 6, 2008

Analysis of the Increase and Decrease Algorithms for Congestion Avoidance in Computer Networks

The authors say that congestion control and avoidance algorithms in computer networks fall in the general category of dynamic resource management problems. They analyze the algorithms based on metrics like efficiency, fairness, distributed nature, convergence time and size of oscillations. The analysis is done on analyze increase/decrease algorithms used in the binary feedback scheme. This scheme sets a bit in the packet to send feedback whether the resource is overloaded or underloaded.

The network performance graph in the paper shows the pattern of throughput as load increases. The point at which packets start getting lost is called a cliff while knee represents the point at which increase is throughput is very less compared to the large increase in response time. Congestion avoidance schemes try to keep the network to the left of the cliff.

In order to analyze the algorithms the authors show graphs that show the system state transitions in a 2 dimensional space depicting 2 users. The graphs aid in understanding the argument to a great length. The efficiency line shows settings where the total is equal to the maximum load while the fairness line is at 45 degrees depicting equal usage. The optimal point lies that the intersection of these two lines. The trajectory path of additive increase/multiplicative decrease converges to the optimal state. The authors then use rigorous mathematical proofs to show that in order to achieve fairness policy multiplicative decrease should be used while additive increase along with multiplicative decrease aids in maximizing fairness and efficiency at each feedback cycle.

I liked this paper because it explains the mathematical reasoning behind the algorithms used in computer networks. Reading such papers helps in finding the proofs behind the algorithms used in the field of networking. It is nice to have this paper in the syllabus.

The only thing that I did not like in the paper is that it does not assume things like delayed feedback and asynchronous operations. The authors discuss these issues in the paper very briefly but the settings have changed significantly since the time this paper was written. We now have different type of traffic flowing on the same link. Does this complicate calculation of congestion per link? Maybe for different types of flows the feedback signal should be interpreted differently. There are many questions like this that come up because of the current design and usage of networks.

Congestion Avoidance and Control

This is a seminal paper on congestion control for TCP. The authors say that the initial implementation of window-based TCP can behave wrongly in case of congestion. The algorithms proposed in the paper use the principle of "packet conservation" to gain network stability. Conservation principle means that when the connection is running stably with full window of data being transmitted, no new packet should be put into the network until an old packet leaves the connection.

Three problems that violate packet conservation: (as stated in the paper)

The connection does not get equilibrium
A sender injects a new packet before an old packet has exited
The equilibrium cannot be reached because of resource limits along the path

Solution for problem 1 is proposed by the slow start algorithm. This algorithm is used to start self-clocking systems. A self-clocking system generates packet at the rate at which acks are sent. It is difficult to start a self-clocked system thus the slow start algorithm gradually increases the amount of data in-transit. Per-connection variable 'cwnd' variable is set to 1 in case of a start or restart and is incremented by 1 in case of an ack.

Problem 2 is addressed by improving the retransmit timer. In order to survive heavy loads the timer should consider variation of the round trip time as well. Variation estimates help in eliminating unnecessary retransmissions. Also, exponential back off after retransmit aids in stabilizing an unstable system

In order to address problem 3 the authors say that if the timers are working correctly then in case of a packet loss the main reason can be congestion (given loss due to physical damages are rare). For congestion avoidance a additive increase and multiplicative decrease algorithm is proposed. At the end the authors discuss how congestion control at the gateway is required for fair sharing of the bandwidth.

Relevant background needed for this paper requires a good understanding of TCP protocol and its working. Most of the algorithms discussed in the paper are used today and have a great impact on Internet. With the growth of the Internet one of the main problems that had to be addressed was congestion control and avoidance. This is an early paper that proposes how to solve the major concerns. It also shows how problems in computer networks were actually well-studied in the context of control theory and queuing theory.

Another aspect that comes into picture after reading this paper is about security. These algorithms rely on the fact that the receiver is not misbehaving. There is a paper by Savage et al. that shows how a misbehaving receiver can make the sender go fast without losing end-to-end reliability.

This paper should be kept in the syllabus because the algorithms proposed lay the platform for the congestion control algorithms used these days.

Wednesday, September 3, 2008

On Inferring Autonomous Sytem Relationships in the Internet

Understanding AS relationships in the Internet can be useful to provide a deep understanding of traffic flow and also aiding ASes to make policy changes for improvement. Unfortunately these relationships are not known. The paper proposes generating an augmented AS graph that will represent the relation between ASes as customer-provider, provider-customer, peering and sibling relationship. Gao suggests that these relationships can be inferred from BGP routing table entries.

The author says that export policies depict the kind of relationship between ASes (export policies are explained in the previous post on interdomain routing). Based on the export policies the author explains a selective export rule and says that if the export policies adhere to the selective rule then the AS path of a routing table entry is Valley-Free. Valley free property states that after traversing a provider-to-customer or peer-to-peer edge the AS path cannot traverse a customer-to-provider or peer-to-peer edge. Routing table entry patterns have been classified as downhill path, uphill path, maximal uphill path and maximal downhill path.

The heuristic algorithm proposed in the paper rely on the assumption that a provider is larger in size than its customer and the size of an AS is proportional to its degree in the AS graph. In the AS graph, top provider of an AS path is the AS that has the highest degree among all ASes in the path.

The background needed for this paper requires a good understanding of BGP and the export and import policies. The Interdomain Internet Routing lecture reading is perfect to read before reading this paper.

The results provided in the paper are very promising with an accuracy of almost 99.1%. I was not very happy with the way the paper was written. It was little difficult to follow the proofs. Moreover I am curious to see how this method worked on other data. I have a feeling that the refined algorithm that handles misconfiguration by setting a value L may cherry pick a value.

I think it will be nice to discuss some other algorithms that were designed after this paper since this paper was the first one in this area of research . Also, how this research has benefited ASes, customers or providers in the real world. Overall the paper was an interesting read and is nice to have in the syllabus because it ties very well with the routing policies of BGP.

Interdomain Internet Routing

This lecture reading gives a description of how communication between different ASes work in the Internet. The lecture in the beginning shows real Internet routing infrastructure diagram compared to the general image that people have. Internet infrastructure involve a large number of commercial entities (ISPs) that provide Internet service and cooperation is required between these entities to provide global connectivity. ISPs are categorized as Tier1 (global scope), Tier2 (regional scope) and Tier3 (have small number of localized customers).

The routing architecture uses the concept of ASes instead of ISPs. ASes are managed by a single ISP though there can be multiple ASes in an ISP. Border Gateway Protocol (BGP) is the protocol that is used between ASes (by their border routers) to share reachability information for achieving global connectivity. Within an AS Interior Gateway Protocol (IGP) is used. IGP includes protocols like OSPF, IS-IS and RIP. BGP is designed to be scalable while IGP concentrates on optimizing path cost.

There are two kinds of AS-AS connection types: Transit (provider-customer, involves financial agreement) and peering (generally does not have a financial agreement). Rules for routing information:

Exporting routes:

Provider sends customer routes to every AS (customer, provider, peer).
ASes generally do not share provider routes.
Peer ASes share their transit customers routes and internal ISP routes with peers

Importing routes

Customer routes are preferred over peer route and peer routes are preferred over provider routes. LOCAL PREF attribute is used to implement this priority.

BGP was designed with three goals in mind: Scalability, Policy and Cooperation. BGP does not optimize over path cost. Rather, it has a set of attributes that are used to make the decision. These attributes are Next Hop, AS Path, Local Preference and Multiple-Exit Discriminator (MED). AS Path attribute includes AS identifiers of the ASes the route advertisement message has passed through. MED is used for comparing two or more routes from the same AS. MEDs are generally used to allow enforcement of cold-potato routing.

BGP has two flavors:

eBGP - this runs between BGP routers in different ASes.
iBGP - this is run between BGP routers within an AS. The topology used for iBGP is full-mesh but that can have scalability issues. Two methods used to solve this problem involve using route reflectors or setting up confederations of BGP routes.

There have been security issues related to BGP like DoS and issues due to misconfiguration. Some limitations of BGP include no authentication of route attributes and tampering of route information. Also, from the reading it wasn't clear what export and import policies are applied between two ASes that have the same provider.

It will be nice to have a small discussion about the security issues and how people have addressed them. Another thing which confused me in the beginning was the difference between iBGP and IGP. It was not very clear when is IGP used and when is iBGP used.

I really enjoyed reading it because before this I did not know about the internals of BGP and how routing between commercial entities took place. In fact I really liked the way the lecture explained how selection of routes is directly tied to the economic model. Hot-potato routing and route export and import policies were interesting to read. This reading is a must for the class.

Monday, September 1, 2008

The Design Philosophy of the DARPA Internet Protocols

This paper discusses how the TCP/IP protocol design decisions were made. The main goal of the protocol was to interconnect existing networks and other new networks that emerge. Store and forward packet switching technique was used to address this requirement. The structure of the Internet was a network of multiple networks communicating using packet switching and using gateways that use store and forward packet forwarding.

Secondary goals included survivability, support for different types of services, accommodating variety of networks and management of resources. The paper introduces the concept of fate-sharing for survivability. In the fate-sharing model where endpoint is responsible for storing the state information for any of its communication. In the beginning it was thought that TCP would be general to provide support for any type of service. Large number of varied service types caused the split to have TCP and IP. TCP provided reliable sequenced data stream while IP proved to be the basic building block for all types of services. UDP was created to provide datagram service. It was decided to have flow control and acknowledgment in TCP based on byte number.

It was required for the Internet architecture to support variety of networks. This is achieved by making very few assumptions like the network can support packet or datagram, the network has some suitable form of addressing. Some of the drawbacks of the Internet are lack of tools for doing management of resources, high cost for attaching a host to the Internet, poor implementation of the methods and lack of accountability.

The current design of the Internet is influenced by a lot of factors. Packet switching was chosen because most applications at that time fitted well with this scheme and also most of the networks were packet switching networks. Although after the design it has been seen that circuit switching properties are being included in the current architecture (MPLS). Stateless nature has brought up issues related to accountability (IP spoofing). It will be nice to have a discussion regarding what design decisions would have been made if Internet was designed today.

It was really informative reading the paper. Quite a few of us have read about the protocols and mechanisms used in the Internet but did not now what led to the design of these. These kind of papers help in understanding the reasoning behind why things are the way they have been. I would highly recommend having this paper in the syllabus.

End-To-End Arguments In System Design

The paper argues about the placement of functions at higher level instead of providing them at low levels in a distributed system. The argument is supported by a detailed example of file transfer between two machines. Things that can go wrong during the transfer range from file corruption due to error in copying or buffering, hardware processor or memory transient error, loss or corruption of data in packets etc. One might think of addressing these problems at the lower level by providing duplicate copies, timeout and retries at each step. Even though the lower layers may provide correction mechanism but still the end system will have to perform end-to-end check for corruption during file reading or copying and host crashing. Thus, having the lower layer provide checks proves to be redundant and more of an overhead.

The argument is also supported by giving other examples like delivery guarantees, secure transmission of data, duplicate message suppression, FIFO message delivery, transaction management and encryption. End-to-end argument does not work in case of applications that require reliable delivery of message. In this case it is better to have checks for reliability at the lower layer than provide it at the application layer. Application layer checks will require retransmission of the complete message thus proving to be more expensive.

The end-to-end argument varies as per the requirement of the applications. It lays a good ground for layered architectures and is one of the main design principles used for Transport Control Protocol. Although the argument proposed in the paper is against the general notion of trying to put everything at the lower layer, but the supporting points given for the argument are relevant and make sense.

I feel that even though the argument and examples in the paper are relevant but providing everything at the higher layer may prove to be a drawback. This can raise security concerns. Programs at the application layer can be tampered with and may not run as expected. This is one of the issues that TCP faces. TCP Congestion Control with a Misbehaving Receiver paper by Savage et al. talks about how clients can misbehave and what should be done to address these problems.

The paper should be kept in the syllabus because it puts forward the issues one should think about before making a design decision. It also helps a person in understanding the decision made behind the TCP/IP protocol layered architecture. I also liked the paper presentation. The argument proposed by the author is supported by giving examples.