Online references:
[Introduction], [Client-Server Architecture], [Summary on Socket Functions], [Socket Functions], [Examples], [Measurements], [Socket States]
A problem in communication is how to identify interlocutors. In the case of
phones we have telephone numbers, for mail we have addresses. For communicating
between sockets we [usually, since within a single computer we could use
file names] identify an interlocutor with a pair:
IP address and port. [In reality there is a third component, the protocol,
but that will not be relevant to us since the protocol used will always be obvious in
our cases.]
This represents the address or
name of the interlocutor.
IP addresses (things like 155.247.207.190)
are 32 bit unsigned integers (155, 247, 207, 190 are the bytes, with 155 the most significant
bits - the network representation of integers, called big-endian).
We only consider Version 4 IP Addresses.
An IP address consists of two parts, one identifying a network (in the case of
155.247.207.190, the network is 155.247), the other
identifying a computer within that network (in our case 207.190) and can be used in a number
of formats.
IP addresses are more easily rememberer as host names
(things like snowhite.cis.temple.edu). [You may find
about IP to host conversions with the nslookup command and by looking in the
/etc/hosts file.] [IP addresses, to be exact, identify the Network Interface
Card between a computer and a network, and a computer might have a number of
such cards connecting to a number of networks. But for brevity we will not
worry about this distinction.]
A special IP is used to refer to the local host, 127.0.0.1, the loopback
localhost. The IP address, 0.0.0.0, is called INADDR_ANY and is tied
to all the IP addresses of this machine (used during bootstrap of this system).
Another special IP, 255.255.255.255, is used for broadcast to all hosts
on the local network of the current machine. And the host id consisting
of all 1s is used to broadcast to all the computers of a LAN.
Ports are 16 bit unsigned integers. (The first 1024 port numbers
are reserved for things like http, 80. These ports are called
well-known ports.
You can look in the files /etc/services
and /etc/inetd.conf
to see standard uses (ftp, telnet, finger, ..) of these ports.
From 1014 up things are not too well established. Certainly from
49152 to 65535 the ports are private and can be dynamically allocated
(ephemeral ports). The interval 1024 to 49151 consists
of registered ports. For reasons I am not sure of it is recommended
that the ports that you personally select be in the range 5000 to 49151.
The port 0 is used as a wild card, to request the kernel to find a port for us, we
do not care which.
An address, host+port, can be used for multiplexing more than one communication channel. So one server can communicate simultaneously with more than one client. Each communication channel on the server will have its own socket bound to the same address. In other words, each connection on the internet is identified by a socket pair: [client IP, client port] + [server IP, server port], plus the protocol being used (say, TCP or UDP).
#include <sys/types.h> #include <sys/socket.h> int socket(int domain, int type, int protocol) domain is either AF_UNIX, AF_INET, or AF_OSI, or .. AF_UNIX is the Unix domain, it is used for communication within a single computer system. [AF_LOCAL is the Posix name for AF_UNIX.] AF_INET is for communication on the internet to IP addresses. We will only use AF_INET. type is either SOCK_STREAM (TCP, connection oriented, reliable), or SOCK_DGRAM (UDP, datagram, unreliable), or SOCK_RAW (IP level). It is the name of a file if the domain is AF_UNIX. protocol specifies the protocol used. It is usually 0 to say we want to use the default protocol for the chosen domain and type. We always use 0. It returns, if successful, a socket descriptor which is an int. It returns -1 in case of failure.Here is a typical call to socket:
if ((sd = socket(AF_INET, SOCK_DGRAM, 0) < 0) { perror("socket"); exit(1);}
struct in_addr { u_long s_addr; }; struct sockaddr_in { u_short sin_family; /*protocol identifier; usually AF_INET */ u_short sin_port; /*port number. 0 means let kernel choose */ struct in_addr sin_addr; /*the IP address. INADDR_ANY refers to */ /*the IP addresses of the current host.*/ /*It is considered a wildcard IP address.*/ char sin_zero[8];}; /*Unused, always zero */ In order to use struct sockaddr_in you need to include in your program #include <netinet/in.h> The following structure sockaddr is more generic than but compatible with sockaddr_in (both are 16 bytes starting with the same field). struct sockaddr { u_short sa_family; char sa_dat[14];}; In the Unix domain we have a different address, sockaddr_un, which is also compatible with sockaddr. In order to use sockaddr_un you need to include in your program #include <sys/un.h>
#include <sys/types.h> #include <sys/socket.h> int bind(int sd, const struct sockaddr *addr, int addrlen) sd: File descriptor of local socket, as created by the socket function. addr: Pointer to protocol address structure of this socket. addrlen: Length in bytes of structure referenced by addr. It returns an integer, the return code (0=success, -1=failure)Bind is used to specify for a socket the protocol port number where it will wait for messages. Here is a typical call to bind:
struct sockaddr_in name; ..... bzero((char *) &name, sizeof(name)); /*zeroes out sizeof(name) characters*/ name.sin_family = AF_INET; /*use internet domain*/ name.sin_port = htons(0); /*ask kernel to provide a port*/ name.sin_addr.s_addr = htonl(INADDR_ANY); /*use all IPs of host*/ if (bind(sd, (struct sockaddr *)&name, sizeof(name)) < 0) { perror("bind"); exit(1);} A call to bind is optional on the client side, required on the server side. After a socket is bound we can retrieve its address structure, given the socket file descriptor (the int) by using the function getsockname.We need to understand the reasons for the calls to htons and htonl. Numbers on different machines may be represented differently (big-endian machines and little-endian machines - in a little endian machine the low order byte of an integer appears at the lower address; in a big endian machine instead the low order byte appears at the higher address. For example if c[2] is a byte array initialized to 0x0102, in a little endian machine c[0] contains 2 and c[1] contains 1. Network order, the order in which numbers are sent on the internet, is big-endian. Sun-Sparc machines are big endian. i-386 PC and Digital Alpha are little endian. Here is a simple program that lets you see if your machine is big endian or little endian.) We need to make sure that the right representation is used on each machine. We use functions to convert from host to network form before transmission (htons for short integers, and htonl for long integers), and from network to host form after reception (ntohs for short integers, and ntohl for long integers).
The functions bzero zeroes out a buffer of specified length. It is one of a group of functions for dealing with arrays of bytes. bcopy copies a specified number of bytes from a source to a target buffer. bcmp compares a specified number of bytes of two byte buffers. Alternatively one can use memcpy, memset, ..
#include <sys/types.h> #include <sys/socket.h> int connect(int sd, const struct sockaddr *addr, int addrlen) sd file descriptor of local socket addr pointer to protocol address of other socket addrlen length in bytes of address structure It returns an integer (0=success, -1=failure)Here is a typical call to connect:
#define SERV_NAME ... /* say, "snowhite.cis.temple.edu */ #define SERV_PORT ... /* say, 8001 */ struct sockaddr_in servaddr; struct hostent *hp; /* Here we store information about host*/ int sd; /* File descriptor for socket */ ....... /* initialize servaddr */ bzero((char *)&servaddr, sizeof(servaddr)); servaddr.sin_family = AF_INET; servaddr.sin_port = htons(SERV_PORT); hp = gethostbyname(SERV_NAME); if (hp == 0) { fprintf(stderr, "failure to address of %s\n", SERV_NAME); exit(1);} bcopy(hp->h_addr_list[0], (caddr_t)&servaddr.sin_addr, hp->h_length); if (connect(sd, (struct sockaddr *)&servaddr, sizeof(servaddr)) < 0) { perror("connect"); exit(1);}The function gethostbyname is described below.
struct hostent { char *h_name; /* official name of host */ char **h_aliases; /* null terminated list of aliases*/ int h_addrtype; /* host address type */ int h_length; /* length of address structure */ char **h_addr_list; /* null terminated list of addresses */ /* from name server */ #define h_addr h_addr_list[0] /*address,for backward compatibility*/};In this structure h_addr_list[0] is the first IP address associated with the host. In order to use this structure you must include in your program:
#include <netdb.h>The function prototype is
struct hostent *gethostbyname(const char *hostname);Other functions help us find out things about hosts, services, protocols, networks: getpeername, gethostbyaddr, getprotobyname, getprotobynumber, getprotoent, getservbyname, getservbyport, getservent, getnetbyname, getnetbynumber, getnetent.
int listen(int fd, int qlen) fd file descriptor of a socket that has already been bound qlen specifies the maximum number of connection requests that can wait to be processed by the server while the server is busy servicing another connection request. It returns an integer (0=success, -1=failure)Here is a typical call to listen:
if (listen(sd, 5) < 0) { {perror("listen"); exit(1);}
#include <sys/types.h> #include <sys/socket.h> int accept(int fd, struct sockaddr *addressp, int *addrlen) fd is an int, the file descriptor of the socket the server was listening on [in fact it is called the listening socket], i.e. on which the server has successfully completed socket, bind, and listen. addressp points to an address. It will be filled with address of the calling client. We can use this address to determine the IP address and port of the client. addrlen is an integer that will contain the actual length of address structure of client. It returns an integer representing a new socket (-1 in case of failure). It is the socket that the server will use from now on to communicate with the client that requested connection [in fact it is called the connected socket]. Different calls to accept will result in different connected sockets.Here is a typical call to accept:
struct sockaddr_in client_addr; int ssd, csd, length; ........... if ((cfd = accept(ssd, (struct sockaddr *)&client_addr, &length) < 0) { perror("accept"); exit(1);} /* here we give the new socket to a thread or a process that will */ /* handle communication with this client. */Successive calls to accept on the same listening socket return different connected sockets. These connected sockets are multiplexed on the same port of the server by the TCP software. [This software uses the quartet Client-IP-Address.Client-Port.Server-IP-Address.Server-Port (plus the protocol) to identify the various simultaneous connections.]
#include <sys/types.h> #include <sys/socket.h> int sendto(int sd, char *buff, int len, int flags, struct sockaddr *addressp, int addrlen) sd, socket file descriptor buff, address of buffer with the information to be sent len, size of the message flags, usually 0; could be used for priority messages, etc. addressp, address of process we are sending message to addrlen, length of message It returns number of characters sent. It is -1 in case of failure.The flags we can use with sendto are:
#include <sys/types.h> #include <sys/socket.h> int recvfrom (int sd, char *buff, int len, int flags, struct sockaddr *addressp, int *addrlen) sd, socket file descriptor buff, address of buffer where message will be stored len, size of buffer flags, usually 0; used for priority messages, peeking etc. addressp, buffer that will receive address of process that sent message addrlen, contains size of addressp structure; It returns number of characters received. It is -1 in case of failure.The flags we can use with recvfrom are:
int shutdown(int sd, int action) sd is a socket descriptor action is (0 = close for reads) (1 = close for writes) (2 = close for both reads and writes) It returns an integer (0=success, -1=failure)
int getsockname(int sd, struct sockaddr *addrp, int *addrlen) sd is the socket descriptor of a bound socket. addrp points to a buffer. After the call it will have the address associated to the socket. addrlen gives the size of the buffer. After the call gives size of address. It returns an integer (0=success, -1=failure)
int getpeername(int sd, struct sockaddr *addrp, int *addrlen) sd is the socket descriptor of a connected socket i.e. of a socket returned by accept. addrp points to a sockaddr buffer. After the call it will have the address associated to peer of socket. It is the same structure and information as it is available in the client address structure after the accept call. addrlen gives the size of the buffer. After the call gives size of address. It returns an integer (0=success, -1=failure)Here is an example of use of getpeername:
struct sockaddr_in name; in namelen = sizeof(name); ....... if (getpeername(sd, (struct sockaddr *)&name, &namelen) < 0) { perror("getpeername"); exit(1);} printf("Connection from %s\n", inet_ntoa(name.sin_addr));We see here a new function inet_ntoa: Translate an internet integer address into a dot formatted character string such as 155.247.71.60. It requires the include files:
#include <netinet/in.h> #include <arpa/inet.h>The inverse of inet_ntoa is inet_addr which, given an IP address as a string returns its value as an unsigned network ordered integer:
#include <netinet/in.h> #include <arpa/inet.h> unsigned int inet_addr(char *IP_address_string);Another inverse of inet_ntoa is inet_aton which is not available in many Unix systems.
#include <sys/types.h> #include <socket.h> int setsockopt(int socket, /* the socket created by a socket call */ int level, /* use SOL_SOCKET or, better, see man page */ int option_name, /* use SO_REUSEADDR, SO_LINGER .. */ char * option_value, /* address of an integer set to 1 to enable an option, 0 to disable */ size_t option_length); /* size of option_value buffer */ /* Returns 0 in case of success, -1 otherwise */The function getsockopt is used to retrieve the value of options associated to a socket. Here is an example of use of setsockopt.
option_value = 1; /* int option_value; */ if (setsockopt(sd,SOL_SOCKET, SO_REUSEADDR, (char *)&option_value, sizeof(option_value)) < 0) { perror("setsockopt"); exit(0); }One may see the impact of this statement, and the impact of the SO_REUSEADDR, by first writing a server without this statement. Then running the server using a port, say 5194, and a client, then executing at the unix prompt
netstat - a | grep 5194This will display how 5194 is being used. The use will not change for a few seconds even if client and server are killed. If setsockopt is used instead the use will change and a new bind on 5194 will be accepted immediately. The reason is explained somewhat in the states section where we see the TIME_WAIT state in which the socket remains at the end of the close.
For a detailed treatment of socket options, see Stevens. Here is a Stevens's program to determine the current values of socket options and here is the corresponding output on my computer (digital unix).
Example 2 (Datagram communication): a client and a server. In a loop, the client sends the current time to the server, waits for the reply, prints it out, and sleeps for a while. The server receives messages from clients and prints them out. It replies with its own current time. It also prints out information identifying the IP address and port of the client. No provision is made to cope with the unreliability of the communication channel.
Example 3 (TCP): Similar to Example 2, using TCP and runnable on both Unix and NT (from D.Comer: Computer Networks and Internets)
client, and
server
Here are other simple TCP client
and server (an "echoserver").
Here is a multithreaded echoserver.
And here, modified from Stevens, Network Programming, Vol. 1, a "daytime"
server and client.
Example 4: (Datagram communication) a client and a server. The client is invoked with three parameters: the name of a user, of a host, and a port. It sends the user name to the server and prints out the response. The server when it receives a user name checks if the user is currently logged on the host. It replies with an appropriate response. No provision is made to cope with the unreliability of the communication channel.
Example 5: (Datagram communication): a client and a server. A client in a loop sends a message to the server and waits with timeout for reply. The server receives messages and gives them to threads to respond to.
Example 6: A TCP concurrent server that forks child processes to handle client connections.
Example 7: Threaded Server from the Threads Primer book.
Example 8: A program for testing TCP servers. It is an abridged, slightly modified version of the ab.c program that comes with the apache server distribution. This program establishes a number of concurrent connections to a TCP server (usually an HTTP server) and measures latency, response time, data rate, etc.
Example 9: Another TCP concurrent server. Now, using a technique seen in the Apache server, each child process executes an accept statement on the listening socket. You can use the program in Example 8 to test this server (you will need a "index.html" file in the directory where the server is being run).
For more examples, and a wonderful book on Network Programming, read
R.Stevens: "Unix Network Programming", Prentice-Hall, 1998. Here is the
code
presented in that book.
You may download the tarred file of this code from
ftp://ftp.kohala.com/pub/rstevens/unpv12e.tar.gz
You might look in particular to
Buffer Size | Average | Standard Deviation (KB) | (Mbps) | (Mbps) ============================================ 1024 | 0.046 | 0.000 2048 | 3.058 | 2.835 4096 | 40.313 | 1.229 8192 | 56.271 | 0.812 16384 | 60.934 | 7.706 32768 | 69.078 | 1.269 65536 | 71.621 | 1.554 131072 | 66.575 | 8.319 262144 | 65.037 | 5.481 524288 | 65.426 | 0.809
Buffer Size | Average | Standard Deviation (KB) | (Mbps) | (Mbps) ============================================ 1024 | 0.045 | 0.000 2048 | 2.157 | 0.181 4096 | 3.365 | 0.168 8192 | 4.988 | 0.222 16384 | 6.643 | 0.344 32768 | 6.747 | 0.182 65536 | 6.550 | 0.478 131072 | 6.875 | 0.266 262144 | 6.876 | 0.370 524288 | 6.745 | 0.271Some observations:
Transfer Size | Data Rate | Data Rate (Bytes) | Average | Standard Deviation | (Mbps) | (Mbps) ================================================== 4 | 68.327 | 0.963 8 | 120.254 | 1.952 16 | 218.575 | 1.873 32 | 313.049 | 5.371 64 | 435.295 | 1.331 128 | 503.564 | 1.422 256 | 531.316 | 10.307 512 | 541.813 | 2.403 1024 | 545.021 | 1.241
| time to | latency | response | data | connect | | time | rate =========================================================== multiserver | 4 | 120 | 180 | 150 ----------------------------------------------------------- multiserver1 | 6 | 26 | 42 | 630 ---------------------------------------------------------- where times are in milleseconds and data rate in kilobits per second.Testing an Apache webserver, we found a similar behavior, with TimeToConnect=10, Latency=99, ResponseTime=108, and DataRate = 98.
appl: active open send: SYN
receive: ACK send: nothing
You can recognise in this state diagram the handshakes taking place when
a connection is started and when it is terminated.
The transition CLOSED -> SYN_SENT -> ESTABLISHED takes place on the client
as a result of the successful completion of the connect operation
(active open).
The transition CLOSED -> LISTEN takes place in the server upon successful
completion of socket+bind+listen (passive open).
The transition LISTEN -> SYN_RECVD -> ESTABLISHED takes place
in the server upon successful completion of accept.
The transitions in the dotted boxes names "active close" and "passive close"
take place upon successful completion
of close operations.
The TIME_WAIT state is required to make sure that all packets between the client and server have either been delivered or totally lost. A client will remain in the TIME_WAIT state for up to 2*MSL, where MSL is the Maximum Segment Life, the maximum time a segment may remain alive in the network, say, going around because of a routing anomaly. MSL is up to one minute, so a client may remain in TIME_WAIT for up to 2 minutes.
Also from Stevens is the following diagram that displays a typical interaction between a client and a server. Notice in particular the 3-way handshake when the connection is established and the 4-way handshake when the connection is terminated. [mss stands for Maximum Segment Size.]
In the above diagram, the client, immediately after the write operation,
should block on a read operation which will complete when the data reply
arrives from the server.
The above diagram provides a second reason for the presence of the
TIME_WAIT state: assume that the last ack from the active closer to the
passive closer is lost. Then the passive closer upon timeout
will resend the FIN N message. If the passive closer is in the TIME_WAIT
state it can resend the ack. If state TIME_WAIT did not exist and the
active closer went directly to the CLOSED state, then upon receiving
the secoond FIN it would think it is an error and send an RST to the
passive closer that, in absence of the final ack, it would be unable to close the
connection cleanly.
You may use the netstat command to determine the sockets currently used in the system, the protocol they use, the local host+port, the remote host+port, and the socket state. For example in my machine netstat printed out (truncated):
Active Internet connections Proto Recv-Q Send-Q Local Address Foreign Address (state) tcp 0 0 joda.80 sp185047.sbm.tem.2129 ESTABLISHED tcp 0 0 joda.80 sp185047.sbm.tem.2128 TIME_WAIT tcp 0 0 joda.80 sp185047.sbm.tem.2127 TIME_WAIT tcp 0 0 joda.80 sp185047.sbm.tem.2126 TIME_WAIT tcp 0 0 joda.80 sp185047.sbm.tem.2123 ESTABLISHED tcp 0 0 joda.80 sp185047.sbm.tem.2122 TIME_WAIT tcp 0 0 joda.80 sp185047.sbm.tem.2121 TIME_WAIT tcp 0 0 joda.80 ww-to06.proxy.ao.1280 TIME_WAIT tcp 0 0 joda.pop3 bamboo.1690 TIME_WAIT tcp 0 0 joda.80 dhcp40-98.netman.1631 FIN_WAIT_2 tcp 0 0 joda.80 hd1-220.hil.comp.1132 FIN_WAIT_2 tcp 0 0 joda.80 200.5.111.75.1205 FIN_WAIT_2 tcp 0 0 joda.80 hd1-220.hil.comp.1129 FIN_WAIT_2 tcp 0 0 joda.2197 wilma.flair.temp.6000 ESTABLISHED tcp 0 0 joda.imap cc16262-a.wlgrv1.1922 ESTABLISHED tcp 0 0 joda.80 207.205.89.226.38539 FIN_WAIT_2 tcp 0 0 joda.80 207.86.17.198.3866 FIN_WAIT_2 tcp 0 0 joda.2105 joda.2104 CLOSE_WAIT tcp 0 0 joda.telnet bamboo.1445 ESTABLISHED tcp 0 0 joda.telnet mrgrump.2056 ESTABLISHED tcp 0 0 joda.telnet wilma.flair.temp.33079 ESTABLISHED tcp 0 0 joda.1547 joda.80 CLOSE_WAIT tcp 0 0 joda.1538 www.Sun.COM.80 CLOSE_WAIT tcp 0 0 joda.1533 wilma.flair.temp.6000 ESTABLISHED tcp 0 0 joda.telnet warhog.2316 ESTABLISHED tcp 0 0 joda.1482 wilma.flair.temp.6000 ESTABLISHED tcp 0 0 joda.telnet wilma.flair.temp.33077 ESTABLISHED tcp 0 0 localhost.1032 *.* LISTEN tcp 0 0 localhost.1031 *.* LISTEN tcp 0 0 localhost.1030 *.* LISTEN
ingargio@joda.cis.temple.edu