The Session Initiation Protocol (SIP) is an application-layer control
protocol that can establish, modify and terminate multimedia sessions or
calls. These multimedia sessions include multimedia conferences, distance
learning, Internet telephony and similar applications. SIP can invite both
persons and "robots", such as a media storage service. SIP can invite parties
to both unicast and multicast sessions; the initiator does not necessarily
have to be a member of the session to which it is inviting. Media and participants
can be added to an existing session. SIP can be used to initiate sessions
as well as invite members to sessions that have been advertised and established
by other means. Sessions can be advertised using multicast protocols such
as SAP, electronic mail, news groups, web pages or directories (LDAP),
among others.
SIP has its origins in late 1996 as a component of the “Mbone” set of
utilities and protocols. The Mbone, or multicast backbone, was an experimental
multicast network overlayed on top of the public Internet. It was used
for distribution of multimedia content, including talks and seminars, broadcasts
of space shuttle launches, and IETF meetings. One of its essential components
was a mechanism for inviting users to listen in on an ongoing or future
multimedia session on the Internet. Basically - a session initiation protocol.
Since its approval in early 1999 as an official standard, the Session
Initiation Protocol has gained tremendous market acceptance for signaling
communications services on the Internet.
Despite its historical strengths, SIP saw relatively slow progress
throughout 1996 and 1997. That's about when interest in Internet telephony
began to take off. People began to see SIP as a technology that would also
work for VoIP, not just Mbone sessions. The result was an intensified effort
towards completing the specification in late 1998, and completion by the
end of the year. It received official approval as an RFC (Request for Comments,
the official term for an IETF standard) in February and issuance of an
RFC
number, 2543, in March.
From there, industry acceptance of SIP grew exponentially. Its scalability,
extensibility, and - most important - flexibility appealed to service providers
and vendors who had needs that a vertically integrated protocol, such as
H.323, could not address. Among service providers MCI (particularly MCI's
Henry Sinnreich, regarded as the “Pope” of SIP) led the evangelical charge.
Throughout 1999 and into 2000, it saw adoption by most major vendors, and
announcements of networks by service providers. Interoperability bake-offs
were held throughout 1999, attendance doubling at each successive event.
Tremendous success was achieved in interoperability among vendors. Other
standards bodies began to look at SIP as well, including ITU and ETSI TIPHON,
IMTC, Softswitch Consortium, and JAIN.
As an Mbone tool (and as a product of the IETF), SIP was designed with
certain assumptions in mind. First was scalability:
Since users could reside anywhere on the Internet, the protocol needed
to workwide-area from day one. Users could be invited to lots of sessions,
so the protocol needed to scale in both directions. A second assumption
was component reuse: Rather than inventing new protocol tools, those already
developed within the IETF would be used. That included things like MIME,
URLs, and SDP (already used for other protocols, such as SAP). This resulted
in a protocol that integrated well with other IP applications (such as
web and e-mail).
Interoperability was another key goal, although not one specific to
SIP. Interoperability is at the heart of IETF's process and operation,
as a forum attended by implementers and operational experts who actually
build and deploy the technologies they design. To these practical-minded
standardizers, the KISS (Keep It Simple Stupid) principle was the best
way to help ensure correctness and interoperability.
Once the user to be called has been located, SIP can perform its second main function - delivering a description of the session that the user is being invited to. As mentioned, SIP itself does not know about the details of the session. What SIP does do is convey information about the protocol used to describe the session. SIP does this through the use of multipurpose internet mail extensions (MIME), widely used in web and e-mail services to describe content (HTML, audio, video, etc.). The most common protocol used to describe sessions is the session description protocol (SDP), described in RFC2327. SIP can also be used to negotiate a common format for describing sessions, so that other things besides SDP can be used.
Once the user has been located and the session description delivered, SIP is used to convey the response to the session initiation (accept, reject, etc.). If accepted, the session is now active. SIP can be used to modify the session as well. Doing so is easy - the originator simply re-initiates the session, sending the same message as the original, but with a new session description. For this reason, modification of sessions (which includes things like adding and removing audio streams, adding video, changing codecs, hold and mute) are easily supported with SIP, so long as the session description protocol can support them (SDP supports all of the above).
Finally, SIP can be used to terminate the session (i.e., hang up)
SIP is designed as part of the overall IETF multimedia data and control
architecture.This multimedia data and control architecture is currently
incorporating protocols such as
SIP is based on the request-response
paradigm. To initiate a session, the caller (known as the User Agent Client,
or UAC ) sends a request (called an INVITE), addressed
to the person the caller wants to talk to. In SIP, addresses are URLs.
SIP defines a URL format that is very similar to the popular mailto URL.
If the user's e-mail address is jdrosen@dynamic-soft.com, their SIP URL
would be sip:jdrosen@dynamicsoft.com. This message is not sent directly
to the called party, but rather to an entity known as a proxy server. The
proxy server is responsible for routing and delivering messages to the
called party. The called party then sends a response, accepting or rejecting
the invitation, which is forwarded back through the same set of proxies,
in reverse order.
A proxy can receive a single INVITE request, and send out more than
one INVITE request to different addresses. This feature, aptly called “forking,”
allows a session initiation attempt to reach multiple locations, in the
hopes of finding the desired user at one of them. A close analogy is the
home phone line service, where all phones in the home ring at once.
More detailed description on request
- response scheme
This section explains the basic protocol functionality and operation. Callers and callees are identified by SIP addresses, described in Section SIP-URL. When making a SIP call, a caller first locates the appropriate server (Section SERVER) and then sends a SIP request (Section Type of Operation). The most common SIP operation is the invitation. Instead of directly reaching the intended callee, a SIP request may be redirected or may trigger a chain of new SIP requests by proxies. Users can register their location(s) with SIP servers.
Assuming the caller (jdrosen@dynamicsoft.com) wishes to place a call
to joe@columbia.edu. Jdrosen sends his SIP INVITE message to the proxy
for dynamicsoft.com (Step 1). This proxy then forwards the request out
to Columbia, where it reaches the Columbia.edu server (Step 2).
This server is actually not a proxy, but a similar device called a
redirect server. Instead of forwarding calls, a redirect server asks the
requestor to contact the next server directly. The Columbia.edu server
looks up Joe in its database, and determines that today, Joe is on sabbatical
to foo.com. It therefore sends a special response, called a redirect, to
the dynamicsoft.com proxy, instructing it to instead try joe@foo.com (Step
3).
The dynamicsoft proxy then acts on this response, which means it directly
tries to contact joe@foo.com. So, its sends the INVITE to the foo.com server
(Step 4). This server consults its database (Step 5), and learns (Step
6) that Joe is actually in sales. So, it constructs a new URL, joe@sales.
foo.com, and sends the INVITE to the sales.foo.com proxy (Step 7).
The proxy for the sales department then needs to forward the INVITE
to the PC where Jo is currently sitting. For getting out which PC Joe is
currently using, SIP defines another request, called REGISTER, which is
used to inform a proxy of an address binding. In this case, when Joe turned
on his SIP client on his PC, the client would register the binding sip:joe@sales.engineering.com
to sip:joe@mypc.sales.foo.com. This would allow the proxy to know that
Joe is actually at mypc, a specific host on the network. The bindings registered
through SIP are periodically refreshed, so that if the PC crashes, the
binding is eventually removed.
The sales.foo.com proxy consults this registration database, and forwards
the INVITE to joe@mypc.sales.foo.com (Step 8). This INVITE then reaches
Joe at his PC. Joe can then respond to it (thus the request-response model).
SIP provides many responses, and these include acceptance, rejection, redirection,
busy, and so on. The response is forwarded back through the proxies to
the original caller (Steps 9,10,11,12). An acknowledgement is sent (another
type of request, called ACK) in Step 13, and the session is established.
Media can then flow (Step 14).
A SIP URL follows the guidelines of RFC 2396 and has the syntax shown here. It is described using Augmented Backus-Naur Form. Note that reserved characters have to be escaped and that the "set of characters" reserved within any given URI component is defined by that component. In general, a character is reserved if the semantics of the URI changes if the character is replaced with its escaped US-ASCII encoding
SIP URLs are used within SIP messages to indicate the originator (From), current destination (Request-URI) and final recipient (To) of a SIP request, and to specify redirection addresses (Contact). A SIP URL can also be embedded in web pages or other hyperlinks to indicate that a particular user or service can be called via SIP. When used as a hyperlink, the SIP URL indicates the use of the INVITE method. The SIP URL scheme is defined to allow setting SIP request-header fields and the SIP message-body. This corresponds to the use of mailto: URLs. It makes it possible, for example, to specify the subject, urgency or media types of calls initiated through a web page or as part of an email message.
Some examples for use and default values of URL components for SIP headers:
sip:j.doe@big.com
sip:j.doe:secret@big.com;transport=tcp
sip:j.doe@big.com?subject=project
sip:+1-212-555-1212:1234@gateway.com;user=phone
sip:1212@gateway.com
sip:alice@10.1.2.3
sip:alice@example.com
sip:alice%40example.com@gateway.com
sip:alice@registrar.com;method=REGISTER
SIP URLs are case-insensitive, so that for example the two URLs sip:j.doe@example.com
and SIP:J.Doe@Example.com are equivalent. All URL parameters are included
when comparing SIP URLs for equality. The Request-URI is a SIP URL or a
general URI. It indicates the user or service to which this request is
being addressed. Unlike the To field, the Request-URI MAY be re-written
by proxies
In Figure A, Caller A completes a call to User B using two proxies: Proxy 1 and Proxy 2. The initial INVITE (F1) does not contain the Authorization credentials that Proxy 1 requires, so an Authorization response is sent containing the challenge information.A new INVITE (F4) is then sent containing the correct credentials and the call proceeds. The call terminates when User B disconnects by initiating a BYE message.
F1 INVITE A -> Proxy 1
The call begins, as always, with an INVITE message that contains information on caller and called party as well as the session description request (2nd part).
INVITE sip:UserB@ss1.wcom.com SIP/2.0
Via: SIP/2.0/UDP here.com:5060
From: BigGuy
To: LittleGuy
Call-ID: 12345600@here.com
CSeq: 1 INVITE
Contact: BigGuy
Content-Type: application/sdp
Content-Length: 147
v=0
o=UserA 2890844526 2890844526 IN IP4 here.com
s=Session SDP
c=IN IP4 100.101.102.103
t=0 0
m=audio 49170 RTP/AVP 0
a=rtpmap:0 PCMU/8000
F2 407 Proxy Authorization Required Proxy 1 -> User A
SIP always works in a request-response mode and in this example Proxy 1 challenges Caller A for authentication
SIP/2.0 407 Proxy Authorization Required
Via: SIP/2.0/UDP here.com:5060
From: BigGuy
To: LittleGuy
Call-ID: 12345600@here.com
CSeq: 1 INVITE
Proxy-Authenticate: Digest realm="MCI WorldCom SIP",
domain="wcom.com", nonce="wf84f1ceczx41ae6cbe5aea9c8e88d359",
opaque="", stale="FALSE", algorithm="MD5"
Content-Length: 0
As we move further down the call flow, the actual voice call begins, using Realtime Transport Protocol (RTP) to move the voice stream.
F17 ACK Proxy 2 -> B
ACK sip: UserB@there.com SIP/2.0
Via: SIP/2.0/UDP ss2.wcom.com:5060
Via: SIP/2.0/UDP ss1.wcom.com:5060
Via: SIP/2.0/UDP here.com:5060
From: BigGuy
To: LittleGuy ;tag=314159
Call-ID: 12345601@here.com
CSeq: 1 ACK
Content-Length: 0
Calls are then terminated with a BYE request to the caller.
F18 BYE User B -> Proxy 2
BYE sip: UserA@ss2.wcom.com SIP/2.0
Via: SIP/2.0/UDP there.com:5060
Route: ,
From: LittleGuy ;tag=314159
To: BigGuy
Call-ID: 12345601@here.com
CSeq: 1 BYE
Content-Length: 0
In this scenario User B wants calls forwarded to another destination if the original line is busy. It is assumed that the proxy knows where to forward the call.
F1 INVITE A -> Proxy
INVITE sip:UserB@ss1.wcom.com SIP/2.0
Via: SIP/2.0/UDP here.com:5060
From: BigGuy
To: LittleGuy
Call-ID: 12345600@here.com
CSeq: 1 INVITE
Contact: BigGuy
Proxy-Authorization: DIGEST username="UserA", realm="MCI
WorldCom
SIP", nonce="9137d175c20a0d6eadd7be1c863302ae", opaque="",
uri="sip:ss1.wcom.com",
response="cf25aad811c806bde46a369220158cec"
Content-Type: application/sdp
Content-Length: ...
v=0
o=UserA 2890844526 2890844526 IN IP4 client.here.com
s=Session SDP
c=IN IP4 100.101.102.103
t=0 0
m=audio 49170 RTP/AVP 0
a=rtpmap:0 PCMU/8000
F4 486 Busy Here B1 -> Proxy
User B's phone responds back with a busy message ( 486 )
SIP/2.0 486 Busy Here
Via: SIP/2.0/UDP ss1.wcom.com:5060;branch=1
Via: SIP/2.0/UDP here.com:5060
From: BigGuy
To: LittleGuy ;tag=123456
Call-ID: 12345600@here.com
CSeq: 1 INVITE
Content-Length: 0
F6 INVITE Proxy -> B2
The call is then forwarded to a new location at "THERE.COM." Since the call is going to a new location, a new INVITE and session description is sent.
INVITE sip:UserB2@ there.com SIP/2.0
Via: SIP/2.0/UDP ss1.wcom.com:5060;branch=2
Via: SIP/2.0/UDP here.com:5060
Record-Route:
From: BigGuy
To: LittleGuy
Call-ID: 12345600@here.com
CSeq: 1 INVITE
Contact: BigGuy
Content-Type: application/sdp
Content-Length: ...
v=0
o=UserA 2890844526 2890844526 IN IP4 client.here.com
s=Session SDP
c=IN IP4 100.101.102.103
t=0 0
m=audio 49170 RTP/AVP 0
a=rtpmap:0 PCMU/8000
Here User A attempts to call User B, who does not answer. The call is forwarded to User B's mailbox, and the voicemail system plays a message for a ring-no-answer. The call flow then moves from User A to the voicemail system.
F1
INVITE sip:UserB@wcom.com SIP/2.0
Via: SIP/2.0/UDP here.com:5060
From: TheBigGuy
To: TheLittleGuy
Call-Id: 12345600@here.com
CSeq: 1 INVITE
Contact: TheBigGuy
Proxy-Authorization:Digest username="UserA",
realm="MCI WorldCom SIP",
nonce="ea9c8e88df84f1cec4341ae6cbe5a359", opaque="",
uri="sip:UserB@wcom.com", response= calculated hash goes here>
Content-Type: application/sdp
Content-Length:
v=0
o=UserA 2890844526 2890844526 IN IP4 client.here.com
s=Session SDP
c=IN IP4 100.101.102.103
t=0 0
m=audio 49170 RTP/AVP 0
a=rtpmap:0 PCMU/8000
Here B1 rings for, let's say, nine seconds. (This duration is a configurable parameter in the Proxy Server.) The Proxy sends a Cancel option and proceeds down its internal list of options, eventually selecting a voicemail URL for "forward no answer."
SIP/2.0 180 Ringing
F5 Via: SIP/2.0/UDP here.com:5060From: TheBigGuy
To: TheLittleGuy ;tag=3145678
Call-Id: 12345600@here.com
CSeq: 1 INVITE
Content-Length: 0
Here the voicemail system is ready to set up a RTP session, and record the message by using the 200 OK signal. The 'contact' field indicates where the file is being stored.
F9 SIP/2.0 200 OK
Via: SIP/2.0/UDP wcom.com:5060; branch=2
Via: SIP/2.0/UDP here.com:5060
Record-Route:
From: TheBigGuy
To: TheLittleGuy ;tag=123456
Call-Id: 12345600@here.com
CSeq: 1 INVITE
Contact: TheLittleGuyVoiceMail
Content-Type: application/sdp
Content-Length:
v=0
o=UserB 2890844527 2890844527 IN IP4 vm.wcom.com
s=Session SDP
c=IN IP4 110.111.112.114
t=0 0
m=audio 3456 RTP/AVP 0
a=rtpmap:0 PCMU/8000
Finally User A hangs up on the voicemail system ...but the VM system could have disconnected the call on its own.
F13 BYE sip:UserB@wcom.com SIP/2.0
Via: SIP/2.0/UDP here.com:5060
Route:
From: TheBigGuy
To: TheLittleGuy ;tag=123456
Call-Id: 12345600@here.com
CSeq: 2 BYE
Content-Length: 0
layers
transparently supports name mapping
redirection
transport mode (=text based)
using ABNF
possibly lower layers:
UDP
TCP
ATM AAL5
IPX
frame relay
X.25
The 1500 bytes accommodates encapsulation within the "typical" ethernet
MTU without IP fragmentation. The next lower common MTU values are 1006
bytes for SLIP and 296 for low-delay PPP (RFC 1191 ). Thus, another reasonable
value would be a message size of 950 bytes, to accommodate packet headers
within the SLIP MTU without fragmentation.
There are numerous differences between SIP and H.323. The first is scope;
H.323 specifies a complete, vertically integrated system. Not much
room is left for flexibility or different architectures. SIP, on the other
hand, is a single component. It works with RTP, for example, but does not
mandate it. SIP systems can be composed into a variety of architectures,
and numerous protocols and additional systems can be plugged in at the
discretion of the service provider. SIP can be considered a building block,
whereas H.323 is a specific system.
H.323 | ITU developed H.323.
Version 1 standardized in 1996. Focus was multimedia communications services for LANs without QoS. H.323 v.1 not targeted for IP specifically, but any type of packet LAN. Version 2, released in 1998.
|
SIP | IETF, origins in late 1996 as a component of the “Mbone” set of utilities
and protocols.
Focus on distribution of multimedia content, including talks and seminars, broadcasts of space shuttle launches, and IETF meetings. mechanism for inviting users to listen in on an ongoing or |
H.323 | complete, vertically integrated suite of protocols
architecture for delivering multimedia conferencing applications.
includes signaling, registration, admission control, security, interworking requirements with H.320, H.321, and other ITU conferencing systems, inter-domain data exchange, transport, and codecs. defines several entities, including terminals (end systems, like PCs), gateways, multipoint conferencing units, and something called a gatekeeper. A gatekeeper is similar to a SIP proxy, in that it plays the role of a signaling relay. |
SIP | single component, works with e.g. RTP but does not mandate it
SIP systems can be composed into a variety of architectures, and numerous protocols and additional systems can be plugged in at the discretion of the service provide |
H.323 | LAN protocol; numerous enhancements (such as FastStart) added to gain behaviour as a wide-area protocol |
SIP | designed as a wide-area protocol, no enhancements needed. |
H.323 | borrows its call-signaling component from existing work done in ITU, namely the Q.931 protocol, used for user-to-network signaling in ISDN => telephony-centric flavor |
SIP | borrows much of its concepts from HTTP => web flavor
allows to integrate with web, e-mail, and other existing IP applications. KISS (Keep It Simple Stupid) principle => easier to implement and interoperate |
H.323 | extendable by add non-standard elements identified by vendor ID and version change => backward compatible, takes up more room than its predecessor |
SIP | extended in numerous ways: including adding headers, new methods, new bodies, and parameters to existing headers |
H.323 | H.245 contains powerful mechanisms for conference control for distributed multiparty conferences. (deny - grant speaking privileges) |
SIP | kind of control possible within SIP-established conference, but not addressed by SIP itself, currently no standalone standard protocols that can do this |
SIP transparently supports name mapping and redirection services, allowing
the implementation of ISDN and Intelligent Network telephony subscriber
services. The phone identifier is to be used when connecting to a telephony
gateway. Even without this parameter, recipients of SIP URLs MAY interpret
the pre-@ part as a phone number if local restrictions on the name space
for user name allow it.
Services:
SIP transparently supports name mapping and redirection services, allowing the implementation of ISDN and Intelligent Network telephony subscriber services. These facilities also enable personal mobility. In the parlance of telecommunications intelligent network services, this is defined as: "Personal mobility is the ability of end users to originate and receive calls and access subscribed telecommunication
Internet telephony began on the premise that it was cheaper than normal
phone calling. Users were willing to tolerate degraded quality or reduced
function for lower cost. However, the cost differentials are rapidly disappearing.
To continue to exist, Internet telephony must find another reason to be.
The answer is services.
Some of the most exciting applications have already found killer status
on the Internet, though not (yet) in the form of multimedia services. Now
think of integrating multimedia communications, such as voice, with web,
e-mail, buddy lists, instant messaging, and online games. Whole new sets
of features, services, and applications become conceivable.
SIP is ideally suited here. Its use of URLs, its support for MIME and carriage of arbitrary content (SIP can carry images, MP3s, even Java applets), and its usage of e-mail routing mechanisms, means that it can integrate well with these other applications. For example, it is just as easy to redirect a user to another phone as it is to redirect a user to a web page.
Scalability:
SIP uses the Internet model for scalability - fast and simple in the
core, smarter with less volume in the periphery. To accomplish this, SIP
defines several types of proxy servers. “Call-stateful” proxies generally
live at the edge of the network. These proxies track call state, and can
provide rich sets of services based on this knowledge. Closer to the core,
“transaction-stateful” (also known as just “stateful”) proxies track requests
and responses, but have no knowledge of session or call state. Once a session
invitation is accepted, the proxy forgets about it. When the session termination
arrives, the proxy forwards it without needing to know about the session.
Finally, “stateless” proxies exist in the core. These proxies receive
requests, like INVITE, forward them, and immediately forget. The SIP protocol
provides facilities to ensure that the response can be correctly routed
back to the caller. Stateless proxies are very fast, but can provide few
services. Call-stateful proxies are not as fast, but they live at the periphery,
where call volumes are lower.
Extensibility:
History has taught Internet engineers that protocols get extended and
used in ways they never intended (e-mail and web are both excellent examples
of this). So, they've learned to design in support for extensibility from
the outset. SIP has numerous mechanisms to support extensions. It does
not require everyone to implement the extensions. Facilities are provided
that allow two parties to determine the common set of capabilities, so
that a session initiation can always be completed, no matter what.
Flexibility:
SIP is not a complete system for Internet telephony. It does not dictate
architecture, usage patterns, or deployment scenario. It does not mandate
how many servers there are, how they are connected, or where they reside.
This leaves operators tremendous flexibility in how the protocol is used
and deployed. One way to think of it is that SIP is a LEGO block; operators
can piece together a complete solution by obtaining other LEGO blocks,
and putting them together in the way that they see fit.
Multimedia:
Besides the traditional call-forwarding, follow-me, and do-not-disturb,
SIP has the potential for enabling a whole new class of services that integrate
multimedia with web, e-mail, instant messaging, and “presence” (meant here
as, “are you currently online?”). The value that the Internet brings to
Internet telephony is the suite of existing applications that can be merged
with voice and video communications. As an example, at the end of a call,
a user can transfer the other party to a web page instead of another phone.
This transfer would end the call, and cause the other party's web browser
to jump to the new page. In essence, the value of VoIP and SIP comes not
from integration at the network layer (i.e., run your voice services on
top of your data network), but at the services layer (i.e., combine your
voice services with your data services)
Emerging issues in the Internet could ruin the promise of SIP (as well
as H.323) over the long term. The problem is the increasing shortage of
IP v4 numbers and the growing use of network address translators (NATs).
There are similar issues when running SIP and H.323 through firewalls.
NATs break many protocols that act as establishment mechanisms for
other protocols, such as SIP. NATs provide a boundary between the private
IP addressing of a network and the public Internet. They aremost often
used if an enterprise is unable to secure access to a sufficient block
of IP numbers from their ISP, or if the enterprise wants the presumed luxury
of being able to switch ISPs without having to renumber their network.
SIP, fundamentally, is a control channel for establishing other sessions
(namely, the media sessions). These kinds of protocols (of which FTP and
H.323 are other examples) cause problems for NATs, since the addresses
for the established sessions are in the body of the application layer messages,
as we see in the session description protocol examples shown in the sidebar,
“SIP Call Flow Examples.”
When used with SDP, SIP messages carry the IP addresses and ports to be used for the media sessions. There may be multiple media sessions within a particular SIP call. Since SDP carries IP addresses and not host names, the external caller user agent will send media to an IP address that is not globally routable. It is only a valid IP address within the private network.
A nearly identical problem exists for firewalls. When a user inside
the firewall sends media to an address outside the firewall, it will be
dropped by the firewall unless a rule is established to allow it to pass.
Since the media is sent on dynamic ports to dynamic addresses, these rules
must be dynamically installed through application-aware devices, such as
proxies.
Developing services, of course, requires APIs. What kind of APIs are used to program services delivered by SIP? There has been significant activity in this area, resulting in numerous new interfaces, each with its own distinct set of strengths and weaknesses.
The first API that surfaced is the call processing language (CPL).
CPL is not actually an API, but rather an XML-based scripting language
for describing call services. It is not a complete programming language,
either. It has primitives for making decisions based on call properties,
such as time-of-day, caller, called party, and priority, and then taking
actions, such as forwarding calls, rejecting calls, redirecting calls,
and sending e-mail. CPL is engineered for end-user service creation.
A server can easily parse and validate a CPL, guarding against malicious
behavior. The running time and resource requirements of a CPL can also
be computed automatically from the CPL. An interpreter for CPL is very
lightweight, allowing CPL services to execute very quickly. For these reasons,
it is possible for an end user to write a CPL (typically with some kind
of GUI tool), upload it to the network, and have it instantly verified
and instantiated in real time.
At the opposite end of the spectrum in SIP is CGI (the common gateway interface). Many web designers are familiar with HTTP CGI; it's an interface that allows people to generate dynamic web content using Perl, Tcl, or any other programming language of choice. Since HTTP and SIP are so similar, it was recognized that an almost identical interface could be used for SIP. The result is SIP CGI, which is roughly 90% equivalent to HTTP CGI. Like HTTP CGI, SIP CGI passes message parameters through environment variables to a script that runs in a separate process. The process sends instructions back to the server through its standard output file descriptor. The benefit of SIP CGI is that it makes development of SIP services work much like the creation of dynamic web content. In fact, for SIP services that contain substantial web components, development will closely mirror web-only services. The importance of leveraging web tools for voice service creation is that a much larger class of developers becomes available.
CGI has substantially more flexibility than CPL (CGI doesn't even mandate
a particular programming language), but is much more risky to execute.
Furthermore, because of its usage of separate processes, SIP CGI doesn't
scale as well as CPL. Somewhere in the middle are SIP Servlets. HTTP Servlets
are in wide use for developing dynamic web content. Servlets are very similar
to the CGI concept. However, instead of using a separate process, messages
are passed to a class that runs within a JVM (Java Virtual Machine) inside
of the server. As a result, Servlets are restricted to Java, but suffer
less overhead than SIP CGI. Use of a JVM for executing servlets means that
the Java “sandbox” concept can be applied to protect the server from the
script. Like SIP CGI, SIP Servlets closely mirror the operation of HTTP
Servlets; they simply enhance the interface to support the wider array
of functions a proxy can execute, as compared to an HTTP origin server
IETF audio-video transport group started to develop RTP in 1993. The
aim of the protocol was to provide services required by interactive multimedia
conferences, such as play-out synchronization, demultiplexing, media identification
and active party identification. However, not only
multimedia conferencing applications can benefit from RTP, but also
storage of continuous data, interactive media distribution, distributed
simulation, and control applications can utilize RTP
RTP consists of a data and a control part. The latter is called RTCP. Implementation will often be integrated into application rather than being implemented as a separate protocol layer. In applications RTP is typically run on top of UDP to make use of its port numbers and checksums. The RTP framework is relatively "loose" allowing modifications and tailoring depending on application. Additionally, a complete specification for a particular application will require a payload format and profile specification. The payload format defines how a particular payload is to be carried in RTP. A payload specification defines how a set of payload type codecs are mapped into payload formats.
RTP session setup consists of defining a pair of destination transport addresses one IP address and UDP port pair, one for RTP and another for RTCP. In the case of multicast conference the IP address is a class D multicast address. In multimedia session each medium is carried in a separate RTP session with its own RTCP packets reporting only the quality of that session. Usually additional media are allocated in additional port pairs and only one multicast address is used for the conference.
RTP has important properties of a transport protocol: it runs on end
systems, it provides demultiplexing. It differs from transport protocols
like TCP in that it (currently) does not offer any form of reliability
or a protocol-defined flow/congestion control. However, it provides the
necessary hooks for adding reliability, where appropriate, and flow/congestion
control. (application-level framing). As lower layers are required to transfer
data RTP is not really real time, but provides functionality suited for
carrying real-time content, e.g., a timestamp and control mechanisms for
synchronizing different streams with timing properties.
Perhaps the most vexing problem in voice-over-IP, in general, has been the issue of quality of service. The delay in conversations that many VoIP users encounter is caused by the jitter and latency of packet delivery within the Internet itself. It's useful to review some of the basic principles of the Internet to understand what can be done about the problem, what the IETF's response has been, and how it impacts SIP.
Currently, the Internet offers a single service, traditionally referred
to as “best effort.” In other words, all packets are created equal. There
is no difference to the Internet whether a packet is e-mail, FTP, or the
download of a web page. If the Internet gets very busy, packets get dropped
or delayed.
Unfortunately, the human ear is extremely sensitive to latency in the
delivery of sound. The human ear can detect delays of 200 milliseconds
or greater in voice conversations.
SIP itself does not get involved in reservation of network resources
or admission control. This is because SIP messages may not even run over
the same networks that the voice packets traverse. The complete independence
of the SIP path and the voice path enables ASPs to provide voice services
without providing network connectivity. This is an extremely important
advantage of the SIP architecture. Given this, SIP relies on other protocols
and techniques in order to provide quality of service.
SIP requests and responses can contain sensitive information about the communication patterns and communication content of individuals. The SIP message body MAY also contain encryption keys for the session itself. SIP supports three complementary forms of encryption to protect privacy:
End-to-end encryption of the SIP message body and certain sensitive
header fields;
hop-by-hop encryption to prevent eavesdropping that tracks who is calling
whom;
hop-by-hop encryption of Via fields to hide the route a request has
taken.
Not all of the SIP request or response can be encrypted end-to-end because header fields such as To and Via need to be visible to proxies so that the SIP request can be routed correctly. Hop-by-hop encryption encrypts the entire SIP request or response on the wire so that packet sniffers or other eavesdroppers cannot see who is calling whom. Hop-by-hop encryption can also encrypt requests and responses that have been end-to-end encrypted. Note that proxies can still see who is calling whom, and this information is also deducible by performing a network traffic analysis, so this provides a very limited but still worthwhile degree of protection.
SIP Via fields are used to route a response back along the path taken by the request and to prevent infinite request loops. However, the information given by them can also provide useful information to an attacker. End-to-end encryption relies on keys shared by the two user agents involved in the request. Typically, the message is sent encrypted with the public key of the recipient, so that only that recipient can read the message. All implementations should support PGP-based encryption and may implement other schemes.
A SIP request (or response) is end-to-end encrypted by splitting the
message to be sent into a part to be encrypted and a short header that
will remain in the clear. Some parts of the SIP message, namely the request
line, the response line and certain header fields need to be read and returned
by proxies and thus MUST NOT be encrypted end-to-end. Possibly sensitive
information that needs to be made available as plaintext include destination
address (To) and the forwarding path (Via) of the call. The Authorization
header field must remain in the clear if it contains a digital signature
as the signature is generated after encryption, but MAY be encrypted if
it contains "basic" or "digest" authentication. The From header field should
normally remain in the clear, but MAY be encrypted if required, in which
case some proxies MAY return a 401 (Unauthorized) status if they require
a From field.
Privacy of SIP Responses
SIP requests can be sent securely using end-to-end encryption and authentication to a called user agent that sends an insecure response. This is allowed by the SIP security model, but is not a good idea. However, unless the correct behavior is explicit, it would not always be possible for the called user agent to infer what a reasonable behavior was. Thus when end-to-end encryption is used by the request originator, the encryption key to be used for the response should be specified in the request. If this were not done, it might be possible for the called user agent to incorrectly infer an appropriate key to use in the response. Thus, to prevent key-guessing becoming an acceptable strategy, we specify that a called user agent receiving a request that does not specify a key to be used for the response should send that response unencrypted.
Any SIP header fields that were encrypted in a request should also be
encrypted in an encrypted response. Contact response fields MAY be encrypted
if the information they contain is sensitive, or MAY be left in the clear
to permit proxies more scope for localized searches.
Known Security Problems
With either TCP or UDP, a denial of service attack exists by a rogue proxy sending 6xx responses. Although a client SHOULD choose to ignore such responses if it requested authentication, a proxy cannot do so. It is obliged to forward the 6xx response back to the client. The client can then ignore the response, but if it repeats the request it will probably reach the same rogue proxy again, and the process will repeat.
The example shows a message encrypted with ASCII-armored PGP that was
generated by applying "pgp -ea" to the payload to be encrypted.
INVITE sip:watson@boston.bell-telephone.com SIP/2.0
Via: SIP/2.0/UDP 169.130.12.5 From: <sip:a.g.bell@bell-telephone.com> To: T. A. Watson <sip:watson@bell-telephone.com> Call-ID: 187602141351@worcester.bell-telephone.com Content-Length: 885 Encryption: PGP version=2.6.2,encoding=ascii hQEMAxkp5GPd+j5xAQf/ZDIfGD/PDOM1wayvwdQAKgGgjmZWe+MTy9NEX8O25Redh0/pyrd/+DV5C2BYs7yzSOSXaj1
|
Redirect Server
User Agent Server
Proxy Server
Proxying Requests
Proxying Responses
Stateless Proxy: Proxying Responses
Stateful Proxy: Receiving Requests
Stateful Proxy: Receiving ACKs
Stateful Proxy: Receiving Responses
Stateless, Non-Forking Proxy
Forking Proxy
http://www.cs.columbia.edu/sip/
http://www.sipforum.org/
ISO 10646 defines formally a 31-bit character set. However, of this
huge code space, so far characters have been assigned only to the first
65534 positions (0x0000 to 0xFFFD). This 16-bit subset of UCS is called
the Basic Multilingual Plane (BMP) or Plane 0. The characters that are
expected to be encoded outside the 16-bit BMP belong all to rather exotic
scripts (e.g., Hieroglyphics) that are only used by specialists for historic
and scientific purposes. Current plans suggest that there will never be
characters assigned outside the 21-bit code space from 0x000000 to 0x10FFFF,
which covers a bit over one million potential future characters. The ISO
10646-1 standard was first published in 1993 and defines the architecture
of the character set and the content of the BMP. A second part ISO 10646-2
which defines characters encoded outside the BMP is under preparation,
but it might take a few years until it is finished. New characters are
still being added to the BMP on a continuous basis, but the existing characters
will not be changed any more and are stable.
SIP-message = Request | Response
Request = Request-Line *( general-header | request-header | entity-header ) CRLF [ message-body ]Method = "INVITE" | "ACK" | "OPTIONS" | "BYE" | "CANCEL" | "REGISTER"Request-Line = Method SP Request-URI SP SIP-Version CRLF
The INVITE method indicates that the user or service is being invited to participate in a session. The message body contains a description of the session to which the callee is being invited. For two-party calls, the caller indicates the type of media it is able to receive and possibly the media it is willing to send as well as their parameters such as network destination. A success response must indicate in its message body which media the callee wishes to receive and may indicate the media the callee is going to send.
The ACK request confirms that the client has received a final response to an INVITE request. (ACK is used only with INVITE requests.) 2xx responses are acknowledged by client user agents, all other final responses by the first proxy or client user agent to receive the response. The Via is always initialized to the host that originates the ACK request, i.e., the client user agent after a 2xx response or the first proxy to receive a non-2xx final response. The ACK request is forwarded as the corresponding INVITE request, based on its Request-URI.
The ACK request MAY contain a message body with the final session description to be used by the callee. If the ACK message body is empty, the callee uses the session description in the INVITE request.
The server is being queried as to its capabilities. A server that believes it can contact the user, such as a user agent where the user is logged in and has been recently active, may respond to this request with a capability set. A called user agent MAY return a status reflecting how it would have responded to an invitation, e.g., 600 (Busy). Such a server SHOULD return an Allow header field indicating the methods that it supports. Proxy and redirect servers simply forward the request without indicating their capabilities.
The user agent client uses BYE to indicate to the server that it wishes to release the call. A BYE request is forwarded like an INVITE request and may be issued by either caller or callee. A party to a call should issue a BYE request before releasing a call ("hanging up"). A party receiving a BYE request must cease transmitting media streams specifically directed at the party issuing the BYE request.
The CANCEL request cancels a pending request with the same Call-ID, To, From and CSeq (sequence number only) header field values, but does not affect a completed request. (A request is considered completed if the server has returned a final status response.)REGISTER
A user agent client or proxy client may issue a CANCEL request at any time. A proxy, in particular, may choose to send a CANCEL to destinations that have not yet returned a final response after it has received a 2xx or 6xx response for one or more of the parallel-search requests. A proxy that receives a CANCEL request forwards the request to all destinations with pending requests.
The Call-ID, To, the numeric part of CSeq and From headers in the CANCEL request are identical to those in the original request. This allows a CANCEL request to be matched with the request it cancels. However, to allow the client to distinguish responses to the CANCEL from those to the original request, the CSeq Method component is set to CANCEL. The Via header field is initialized to the proxy issuing the CANCEL request. (Thus, responses to this CANCEL request only reach the issuing proxy.)
Once a user agent server has received a CANCEL, it must not issue a 2xx response for the cancelled original request.
A client uses the REGISTER method to register the address listed in the To header field with a SIP server.Response:
A user agent MAY register with a local server on startup by sending a REGISTER request to the well-known "all SIP servers" multicast address "sip.mcast.net" (224.0.1.75). This request SHOULD be scoped to ensure it is not forwarded beyond the boundaries of the administrative system. This MAY be done with either TTL or administrative scopes [25], depending on what is implemented in the network. SIP user agents MAY listen to that address and use it to become aware of the location of other local users [20]; however, they do not respond to the request. A user agent MAY also be configured with the address of a registrar server to which it sends a REGISTER request upon startup.
Requests are processed in the order received. Clients SHOULD avoid sending a new registration (as opposed to a retransmission) until they have received the response from the server for the previous one.
The meaning of the REGISTER request-header fields is defined as follows. We define "address-of-record" as the SIP address that the registry knows the registrand, typically of the form "user@domain" rather than "user@host". In third-party registration, the entity issuing the request is different from the entity being registered.To: The To header field contains the address-of-record whose registration is to be created or updated.
From: The From header field contains the address-of-record of the person responsible for the registration. For first-party registration, it is identical to the To header field value.
After receiving and interpreting a request message, the recipient responds with a SIP response message. The response message format is shown below:1xx:Response = Status-Line *( general-header | response-header | entity-header ) CRLF [ message-body ]
Status-Line = SIP-version SP Status-Code SP Reason-Phrase CRLF
Status-Code = Informational | Success | Redirection | Client-Error | Server-Error | Global-Failure | extension-code
extension-code = 3DIGIT
Reason-Phrase = *<TEXT-UTF8, excluding CR, LF>
Informational -- request received, continuing to process the request;2xx:
Success -- the action was successfully received, understood, and accepted;3xx:
Redirection -- further action needs to be taken in order to complete the request;4xx:
Client Error -- the request contains bad syntax or cannot be fulfilled at this server;5xx:
Server Error -- the server failed to fulfill an apparently valid request;6xx:
Global Failure -- the request cannot be fulfilled at any server.
SIP-MESSAGE
SIP response codes are extensible. SIP applications are not required to understand the meaning of all registered response codes, though such understanding is obviously desirable. However, applications must understand the class of any response code, as indicated by the first digit, and treat any unrecognized response as being equivalent to the x00 response code of that class, with the exception that an unrecognized response must not be cached.
Both Request and Response messages use the generic-message format of RFC 822 for transferring entities (the body of the message). Both types of messages consist of a start-line, one or more header fields (also known as "headers"), an empty line (i.e., a line with nothing preceding the carriage-return line-feed (CRLF)) indicating the end of the header fields, and an optional message-body.
SIP-URL
generic-message = start-line *message-header CRLF [ message-body ] start-line = Request-Line | Status-Line message-header = ( general-header | request-header | response-header | entity-header )
general-header = Accept | Accept-Encoding | Accept-Language | Call-ID | Contact | CSeq | Date | Encryption | Expires | From | Record-Route | Timestamp | To | Via entity-header = Content-Encoding | Content-Length | Content-Type request-header = Authorisation | Contact | Hide | Max-Forwards | Organization | Priority | Proxy-Authorisation | Proxy-Require | Route | Require | Response-Key | Subject | User-Agent response-header = Allow | Proxy-Authenticate | Retry-After | Server | Unsupported | Warning | WWW-Authenticate
A SIP URL follows the guidelines of RFC 2396 and has the syntax shown below.It is described using Augmented Backus-Naur Form. Note that reserved characters have to be escaped and that the "set of characters reserved within any given URI component is defined by that component. In general, a character is reserved if the semantics of the URI changes if the character is replaced with its escaped US-ASCII encoding".
SIP-URL | = "sip:" [ userinfo "@" ] hostport url-parameters [ headers ] |
userinfo | = user [ ":" password ] |
user | = *( unreserved | escaped | "&" | "=" | "+" | "$" | "," ) |
password | = *( unreserved | escaped | "&" | "=" | "+" | "$" | "," ) |
hostport | = host [ ":" port ] |
host | = hostname | IPv4address |
hostname | = *( domainlabel "." ) toplabel [ "." ] |
domainlabel | = alphanum | alphanum *( alphanum | "-" ) alphanum |
toplabel | = alpha | alpha *( alphanum | "-" ) alphanum |
IPv4address | = 1*digit "." 1*digit "." 1*digit "." 1*digit |
port | = *digit |
url-parameters | = *( ";" url-parameter ) |
url-parameter | = transport-param | user-param | method-param | ttl-param | maddr-param | other-param |
ttl-param | = "ttl=" ttl |
ttl | = 1*3DIGIT ; 0 to 255 |
transport-param | = "transport=" ( "udp" | "tcp" ) |
maddr-param | = "maddr=" host |
user-param | = "user=" ( "phone" | "ip" ) |
method-param | = "method=" Method |
tag-param | = "tag=" UUID |
UUID | = 1*( hex | "-" ) |
other-param | = ( token | ( token "=" ( token | quoted-string ))) |
headers | = "?" header *( "&" header ) |
header | = hname "=" hvalue |
hname | = 1*uric |
hvalue | = *uric |
uric | = reserved | unreserved | escaped |
reserved | = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | "," |
digits | = 1*DIGIT |
telephone-subscriber | = global-phone-number | local-phone-number |
global-phone-number | = "+" 1*phonedigit [isdn-subaddress] [post-dial] |
local-phone-number | = 1*(phonedigit | dtmf-digit | pause-character) [isdn-subaddress] [post-dial] |
isdn-subaddress | = ";isub=" 1*phonedigit |
post-dial | = ";postd=" 1*(phonedigit | dtmf-digit | pause-character) |
phonedigit | = DIGIT | visual-separator |
visual-separator | = "-" | "." |
pause-character | = one-second-pause | wait-for-dial-tone |
one-second-pause | = "p" |
wait-for-dial-tone | = "w" |
dtmf-digit | = "*" | "#" | "A" | "B" | "C" | "D" |