Using Wireshark SIP Analysis for VoIP scenarios

Troubleshooting VoIP issues can be troublesome. When something goes wrong, the lack of direct visibility as to what is occurring on the network with SIP and RTP packets can initially be intimidating to network and voice engineers. However, Wireshark SIP analysis turns ordinary engineers into superheroes, allowing them to see deep into the network and determine exactly what is happening.

In a series of previous articles, we took a look at several methods that can be used to capture voice packets on a network. We also introduced the Wireshark packet sniffing software, providing you with a solid foundation for understanding how captured packets are obtained and stored.

In this article, we’ll get our hands dirty by examining a real voice packet capture from a production network. We’ll go through the whole scenario, delving deeply into the details of the voice packets being exchanged. Let’s get started!

Build pro IOS configs. FAST.

Download Free

Contents

Wireshark SIP Analysis Scenario Details

The scenario for Wireshark SIP analysis that will be examined is one where there is an X-lite SIP client, now known as Bria Solo Free, configured on a computer with an extension of 3XX and an IP address of 192.168.1.61. This device registers with a SIP server somewhere on the Internet with an IP address of X.Y.Z.23. There is a second phone that also registers on the same SIP server which has an extension of 4XX and an IP address of X.Y.Z.183. Details of the intervening network infrastructure are not included here, such as the NAT router behind which the X-lite client is operating as these are irrelevant to the exercise. The following diagram depicts the scenario.

Note that the addresses you see are taken from the Wireshark capture that is used in this article. Because this capture is from a real VoIP production network, for obvious reasons, the first three octets of the IP addresses have been obscured in the screenshots, and therefore, will be referred to as X.Y.Z.23 and X.Y.Z.183 respectively in the text. The 192.168.1.61 address has not been obscured since this is a private IP address and cannot be reached from the Internet. The extension numbers have been similarly obscured.

Sample Wireshark SIP Analysis Capture Characteristics

The sample .pcap file that we will be using for this example has the following characteristics:

Number of captured packets: 10993
Capture duration in seconds: 115
File size in MB: 2.56

The capture was performed on the computer running the X-lite software.

The “interesting” traffic that exists in this capture is a telephone call made from extension 3XX to extension 4XX. However, during the time period of the capture, additional traffic was recorded that we don’t want to view including some ICMP, ARP and TLS packets, each of which corresponds to various other applications and utilities running on the computer. We’ll have to find a way to filter those out in order to view only the packets pertaining to the call in question.

Packet Viewing Exercises

In order to view portions of the captured traffic, we’ll take a look at various filtering commands that we can apply in order to isolate the packets we want. We will also look into specific types of packets to glean important information from each.

Filtering the SIP Control Packets

In this exercise, to perform Wireshark SIP analysis, we will be looking at how to isolate the SIP control packets of the conversation. In the display filter field, we’ll use the SIP keyword in conjunction with the IP addresses of the X-lite computer and the SIP server involved in the conversation. These addresses are 192.168.1.61 and X.Y.Z.23 respectively. The filter that will be used is specifically:

sip && ip.addr == 192.168.1.61 && ip.addr X.Y.Z.23

This will result in showing all packets that are using the SIP protocol and have a source or destination IP address of 192.168.1.61 and X.Y.Z.23. The results can be seen below.

Notice the following:

Out of the over 10000 packets captured, only 20 packets match our criteria. This is typical, as the vast majority of exchanged packets are those whose payload is the voice itself, and those use Real-time Transfer Protocol (RTP). SIP does not carry voice, but carries only the control information to set up, control and tear down calls.
The yellow highlighted area shows that the first packet is a SIP INVITE packet. This initiates the telephone call. The invite also indicates the number that was called. In this case it is 4XX.
The yellow highlighted area also indicates the port being used; this is the number that comes after the colon “:”. In this case, the port number is 15060. Notice that this is not the default SIP port which is 5060. Changing the default is typically good practice to avoid attacks targeting default port values for SIP.
There are packets that are Requests and others that are designated as Status as indicated in the Info column. The Requests are SIP messages initiating a specific functionality while Status messages, which are more properly referred to as a Response, are messages responding to Requests and indicate the status of those requests. Typical Requests include INVITE, ACK, NOTIFY, and BYE. Typical Responses include Trying, Ringing, OK, and Bad Request.
All Responses are shown to have a protocol of SIP while Requests have a protocol of SIP/SDP in the Protocol column. Session Description Protocol or SDP is a companion protocol of SIP used for describing multimedia communication sessions for the purposes of session announcement, session invitation, and parameter negotiation. More about this protocol will be described in the following sections.
The SIP messages that have been filtered here are exchanged exclusively between the X-Lite client and the SIP server. SIP messages may also be sent between the VoIP end devices involved in the communication as well.

Analyzing the List of Packets

From the Wireshark SIP analysis obtained in the above output, you can also follow along in the SIP procedures taking place in initiating the call. The following table lists the steps taken by the SIP protocol to initiate and begin the call. All of the information in the table has been obtained from the above packet capture. Note that all of these messages are sent within a time span of less than three seconds. Notice also that the call was answered within two seconds of it ringing.

Pkt. No.	Time (s)	Type	Command	Description
2	1.17	Request	INVITE	The initiating of the call that is made to 4XX from 3XX
3	1.80	Status	Trying	The SIP server responds stating that it is attempting the connection
4	1.80	Status	Proxy Authentication Required	The SIP server responds and states it requires authentication
5	1.80	Request	ACK	The initiator responds acknowledging the requirement of authentication
6	1.81	Request	INVITE	The initiator of the call resends an INVITE with authentication information
11	1.84	Status	Trying	The SIP server responds stating that it is attempting the connection
12	2.00	Status	Ringing	The remote device is ringing
13	4.16	Status	OK	The SIP server sends a message stating that the call has been answered

From the Wireshark SIP analysis output we can also notice that twice during the duration of the call, once at around 49 seconds and once at about 94, there were additional exchanges between the X-lite client and the SIP server. The table below explains the first of these exchanges.

Pkt. No.	Time (s)	Type	Command	Description
4687	49.15	Request	INVITE	Initiation of a new event sent from SIP server to extension 3XX
4695	49.23	Status	Trying	Acknowledgement from extension 3XX that the command has been received
4696	49.24	Status	OK	Response from extension 3XX stating the event was successfully completed
4698	49.27	Request	ACK	Response from SIP server acknowledging the response

INVITE commands that are sent within the duration of a call that is already in progress, like the ones above, indicate that changes to the call are being made. These events include functions such as call hold, call transfer, call park, or other telephony features.

Finally, the call ends with the SIP server sending a BYE request to tear down the call, and the X-lite client responds with an OK confirming it. This can be seen in the yellow highlighted area in the screenshot below.

Examining the Contents of SIP Packets

In this exercise, we’ll perform Wireshark SIP analysis to look at a single packet from those captured in the screenshots above. Specifically, we’ll study packet number 6. This is the response of the X-Lite SIP client after the server requested Proxy Authentication. The following screenshot shows this packet selected and shows its contents in the second pane.

Notice that the second pane now has five entries, the last of which is the Session Initiation Protocol or SIP. By expanding this entry and various subentries, we can view more details about the packet.

Under the Session Initiation Protocol item, there are several sections. The highlighted areas are described:

Via SIP/2.0/UDP – This indicates the version of the SIP protocol that is being used, namely 2.0, and the underlying Transport Layer protocol that is carrying the SIP packet, which in this case is UDP.
To and From – These sections detail the information of the caller and the called parties including extension numbers, IP addresses, Transport Layer port numbers as well as caller ID information which includes the name of the caller. The name can be seen in the line that begins with SIP Display Info but is obscured in the screenshot.
Proxy-Authorization – This section is included in this SIP control packet because the original INVITE sent to the SIP server (packet number 2) got a response of Proxy Authentication Required (packet 4). This section includes the username, which is the extension number and three parameters that are involved in the authentication process, namely Nonce Value, Digest Authentication Response and the MD5 Algorithm. The original INVITE (packet 2) did not contain any proxy authorization information. This can be confirmed by comparing the screenshot above to the following which is a screenshot of the SIP section of packet 2 below. Notice the Proxy Authorization section is missing.
User-Agent – This information includes specifics of the device that initiated the INVITE request. It is stated here that the device is an X-Lite client version 5.0.1.
Message Body – The Message Body item only exists in SIP packets that are also using SDP. In this section, information about the communication session being established is included and will be described below.

The following image shows the detail found under the Message Body item.

Note the following information that can be found under this entry:

The creator of the session is indicated, including the IP address, as well as the name of the session. In this case, the name of the session adopts the name of the actual client.
Under the Media Description subentry, there are details dealing with the following issues:
- The type of media, which is audio.
- The port being used for the media stream itself. This is not the same as the port used by the SIP control protocol, but that used by the RTP stream carrying the voice packets.
- The Media Protocol being used, that is the protocol carrying the actual voice packets. In this case, it is RTP using Audio Video Profile (AVP).
- The Media format indicating the codec that will be used by the communication. There are several formats being listed here in order of priority. During the negotiation with the remote endpoint, the first commonly supported codec is used.

There is additional information there that you can further research using your favorite search engine as well as the links at the end of this document.

Filtering RTP Packets

RTP is the protocol used to transport the actual voice packets between devices. In this exercise, we will be looking at how to isolate these RTP packets. The RTP packets are exchanged between the two endpoints directly and do not traverse the SIP server. For this reason, we will use the following filter:

rtp && ip.addr == 192.168.1.61

This results in showing all packets that are using the RTP protocol and have a source or destination IP address of 192.168.1.61. In other words, it will show all RTP packets sent and received by the X-Lite client. The results can be seen below.

Notice the following:

The number of packets that resulted from this filtering is in the thousands. This is to be expected since a voice conversation is composed of tens or even hundreds of packets per second.
Notice that the first stream of packets is flowing from the source of 192.168.1.61 to the destination of X.Y.Z.183. Half way through the above list of packets, some packets begin to move in the opposite direction, from X.Y.Z.183 to 192.168.1.61. This is the point where the person on the remote device begins to speak and voice packets are consequently sent in the other direction. By scrolling through this list, it can be determined when each caller was speaking by the number of voice packets sent in a specific direction for specific durations of the call.
Each packet has some extra information in the Info column that indicates the codec being used, which in this case is G.711.
Notice the size of each individual voice packet is 214 bytes. This uniformity is to be expected from voice because of the fact that voice requires a steady stream of information rather than the more common bursty behavior of data transmissions. The uniformity in size promotes a more timely delivery of the packets resulting in a better quality of voice service.
The size of the packets is indeed much smaller than the maximum allowable size of 1500 bytes. This is typical of voice packets and is also used to maintain a steady flow of data.

By clicking on a specific packet, you can view more detail about the RTP protocol as shown below.

Note the following:

The Real-Time Transport Protocol entry is an additional entry that is running on top of the UDP Transport protocol.
In the RTP entry, some of the information that you can obtain includes:
- Version of the RTP protocol being used
- The Payload type which includes the codec being used
- The Sequence number which aids in the reconstruction of the voice once the packets are received
- A Timestamp is used to indicate the instant of sampling of the specific piece of the voice contained within the packet

Build pro IOS configs. FAST.

Download Free

Conclusion

Having gone through the various types of SIP and RTP packets and having viewed them through your x-ray goggles (a.k.a. Wireshark SIP analysis), hopefully you’ve gained a deeper understanding of the anatomy of VoIP packets and flows, and how Wireshark can be used to identify and troubleshoot specific VoIP problems.

But that’s not all Wireshark has up its sleeves. There are some very powerful voice-specific features available that will allow you to not only view specific packets, but to view them in the context of the voice conversations themselves. For those that didn’t get it, that’s a spoiler for the next article to come.