Chapter: Network Programming and Management : Elementary TCP Sockets

Byte Manipulation Functions

There are two groups of functions that operate on multi byte fields, without interpreting the data, and without assuming that the data is a null terminated C string.

Byte Manipulation Functions:

There are two groups of functions that operate on multi byte fields, without interpreting the data, and without assuming that the data is a null terminated C string. We need these types of functions when dealing with sockets address structures as we need to manipulate fields such as IP addresses which can contain byte of 0, but these fields are not character strings. The functions beginning with str ( for string), defined by including the < string.h> header, deal with null terminated C character strings.

The first group of functions whose name begin with b (for byte) are from 4.3. BSD. The second group of functions whose name begin with mem ( for memory) are from ANSI C library.

First Berkeley derived functions are shown.

# include <strings.h>

void bzero ( void *dest, size_t nbytes);

void bcpy (const void *src, void * dest, size_t nbytes);

int bcmp ( cost void * ptr1, const void *ptr2, size_t nbytes)

constant qualifier indicates that the pointer with this qualification, src, ptr1, ptr2 are not modified by the function.. That is memory pointed to by the cost pointer is read but not modified by the function.

bzero ( ) sets the specified number of bytes to 0 in the destination. This function is often used to initialize a socket address structure to 0. bcopy ( ) moves the specified number of bytes from the source to the destination. bcmp ( ) compares two arbitrary byte strings . The return value is zero if the two byte strings are identical; otherwise it is nonzero.

Following are the ANSI C functions:

# include <strings.h>

void memset ( void *dest, int c, size_t len);

void memcpy (void *dest, const void * src, size_t nbytes); int memcmp ( const void * ptr1, const void *ptr2, size_t

nbytes); Returns 0 if equal, <0 or >0 if unequal.

memset () sets the specified number of bytes to the value in c in the destination, memcpy() is similar to bcopy () but the order of the two pointer arguments is swapped. bcopy correctly handles overlapping fields, while the behaviour of memcpy() is undefined if the source and destination overlap. memmove() functions can be used when the fields overlap. memcpy() compares two arbitrary byte strings and returns 0 if they are identical, if not, the return value is either greater than 0 or less than 0 depending whether the first unequal byte pointed to by ptr1 is greater than or less than the corresponding byte pointed to by ptr 2.

Address conversion functions: There are two groups of address conversion function that convert the Internet address between ASCII strings (readable form) to network byte ordered binary values and vice versa.

inet_aton( ), inet_addr( ), and inet_ntoa( ) : convert an IPv4 address between a dotted decimal string (eg 206.62.226.33) and it s 32 bit network byte ordered binary values

#include <arpa/inet.h>

int inet_aton (const * strptr, strut in_addr * addptr);

The first of these, inet_aton( ) converts the C character strings pointed to by the strptr into its 32 bit binary network byte ordered value which is stored through the pointer addptr. If successful 1 is returned otherwise a 0.

in_addr_t inet_addr (const char * strptr);

inet_addr( ) does the same conversion, returning the 32 bit binary network byte ordered value as the return value. Although the IP address (0.0.0.0 through 255.255.255.255) are al valid addresses, the functions returns the constant INADDR_NONE on an error.

This is deprecated and the new code should use inet_aton instead.

The function inet_ntoa ( ) function converts a 32 bit binary network byte ordered IPv4 address into its corresponding dotted decimal string. The string pointed to by the return value of the function resides in static memory. This functions structure as arguments, not a pointer to a structure. (This is rare)

inet_pton ( ) and inet_ntop( ) functions:

These two functions are new with the IPv6 and work with both IPv4 and IPv6 addresses.

The letter p and n stands for presentation and numeric. Presentation format for an address is often ASCII string and the numeric format is the binary value that goes into a socket address structure.

# include <arpa/inet.h>

int inet_pton (int family, const char *strptr, void *addrptr);

const char *inet_ntop (int family, cost void *addrptr, char *strptr, size_t len);

The family argument for both function is either AF-INET or AF_ INET6. If family is not supported, both functions return –1 with errno set to EAFNOSUPPORT.

The first functions tries to convert the string pointed to by strptr, storing the binary results through the pointer addrptr. IF successful, the return value is 1. If the input string is not valid presentation format for the specified family, 0 is returned.

inet_pton () does the reverse conversion from numeric (addrptr) to presentation (strptr). The len argument is the size of the destination, to prevent the function from overflowing the caller‘s buffer. To help specify this size, following two definitions are defined by including the

<netinet/in.h> header:

#define INET_ADDRSTRLEN 16

#define INET6_ADDRSTRLEN 46

If LEN is too small to hold the resulting presentation format including the terminating null, a null pointer is returned and errno is set ot ENOSPC.

The strptr argument to inet_ntop cannot be a null pointer. The caller must allocate memory for the destination and specify its size. On success this pointer is the return value of the function.

This is summarized in the following figure.

System calls used with sockets:

Socket calls are those functions that provide access to the underlying functionality and utility routines that help the programmer. A socket can be used by client or by a server, for a stream transfer (TCP) or datagram (UDP) communication with a specific endpoints address.

Following figure shows a time line of the typical scenario that takes place between client and server.

First server is started, then sometimes later a client is started that connects to the server. The client sends a request to the server, the server processes the request, and the server sends back reply to the client. This continues until the client closes its end of the connection, which sends an end of file notification to the server. The server then closes its end of the connections and either terminates or waits for a new connection.

socket function:

#include socket (int family, int type, int protocol);

returns negative descriptor if OK & –1 on error.

Arguments specify the protocol family and the protocol or type of service it needs (stream or datagram). The protocol argument is set to 0 except for raw sockets.

Not all combinations of socket family and type are valid. Following figure shows the valid combination.

connect Function : The connect function is by a TCP client to establish a active connection with a remote server. The arguments allows the client to specify the remote end points which includes the remote machines IP address and protocol port number.

# include <sys/socket.h>

int connect (int sockfd, const struct sockaddr * servaddr, socklen_t addrelen) returns 0 if ok -1 on error.

sockfd is the socket descriptor that was returned by the socket function. The second and third arguments are a pointer to a socket address structure and its size.

In case of TCP socket, the connect() function initiates TCP’s three way handshake. The function returns only when the connection is established or an error occurs. Different type of errors are :

1. If the client TCP receives no response to its SYN segment, ETIMEDOUT is returned. This is done after the SYN is sent after, 6sec, 24sec and if no response is received after a total period of 75 seconds, the error is returned.

2. In case for SYN request, a RST is returned (hard error), this indicates that no process is waiting for connection on the server. In this case ECONNREFUSED is returned to the client as soon the RST is received. RST is received when (a) a SYN arrives for a port that has no listening server (b) when TCP wants to abort an existing connection, (c) when TCP receives a segment for a connection does not exist.

3. If the SYN elicits an ICMP destination is unreachable from some intermediate router, this is considered a soft error. The client server saves the message but keeps sending SYN for the time period of 75 seconds. If no response is received, ICMP error is returned as

EHOSTUNREACH or ENETUNREACH.

In terms of the TCP state transition diagram, connect() moves from the CLOSED state to the SYN_SENT state and then on success to the ESTABLISHED state. If the connect fails, the socket is no longer usable and must be closed.

Bind(): When a socket is created, it does not have any notion of end points addresses An application calls bind to specify the local endpoint address for a socket. That is the bind function assigns a local port and address to a socket..

#include <sys/socket.h>

int bind (int sockfd, const strut sockaddr *myaddr, socklen_t addrlen)

The second arguments is a pointer to a protocol specific address and the third argument is the size of this address structure. Server bind their well known port when they start. (A TCP client does not bind an IP address to its socket.)

listen Function:

The listen function is called only by TCP server and it performs following functions.

The listen function converts an unconnected socket into a passive socket, indicating that the kernel should accept incoming connection requests directed to this socket. In terms of TCP transmission diagram the call to listen moves the socket from the CLOSED state to the LISTEN state.

The second argument to this function specifies the maximum number of connections that the kernel should queue for this socket.

#include <sys/socket.h>

int listen (int sockfd, int backlog); returns 0 if OK -1 on error.

This function is normally called after both the socket and bind functions and must be called before calling the accept function.

The kernel maintains two queues and the backlog is the sum of these two queues. These are :

An incomplete connection queue, which contains an entry for each SYN that has arrived from a client for which the server is awaiting completion of the TCP three way handshake. These sockets are in the SYN_RECD state.

A Completed Connection Queue which contains an entry for each client with whom three handshake has completed. These sockets are in the ESTABLISHED state.

Following figure depicts these two queues for a given listening socket.

The two queues maintained by TCP for a listening socket.

When a SYN arrives from a client, TCP creates a new entry on the incomplete queue and then responds with the second segment of the three way handshake. The server ‘s SYN with an ACK of the clients SYN. This entry will remain on the incomplete queue until the third segment of the three way handshake arrives ( the client‘s ACK of the server‘s SYN) or the entry times out. If the three way hand shake completes normally, the entry moves from the incomplete queue to the completed queue. When the process calls accept, the first entry on the completed queue is returned to the process or, if the queue is empty, the process is put to sleep until an entry is placed onto the completed queue. If the queue are full when a client arrives, TCP ignores the arriving SYN, it does not send an RST. This is because the condition is considered temporary and the client TCP will retransmit its SYN with the hope of finding room in the queue.

accept Function : accept is called by a TCP server to return the next completed connection from the from of the completed connection queue. If the completed queue is empty, the process is put to sleep.

# include <sys/socket.h>

int accept ( sockfd, struct sockaddr * cliaddr, socklen_t *addrlen) ;

return non negative descriptor if OK, -1 on error.

The cliaddr and addrlen arguments are used to return the protocol address of the connected peer process (the client). addrlen is a value-result argument before the call, we set the integer value pointed to by *addrlen to the size of the socket address structure pointed to by cliaddr and on return this integer value contains the actual number of bytes stored by the kernel in the socket address structure. If accept is successful, its return value is a brand new descriptor that was automatically created by the kernel. This new descriptor refers to the TCP connection with the client. When discussing accept we call the first argument to accept the listening and we call the return value from a accept the connected socket

fork function:

fork is the function that enables the Unix to create a new process

#inlcude <unistd.h>

pid_t fork (void); Returns 0 in child, process ID of child in parent, -1 on error

There are two typical uses of fork function:

1. A process makes a copy of itself so that one copy can handle one operation while the other copy does another task. This is normal way of working in a network servers.

2. A process wants to execute another program. Since the only way to create a new process is by calling fork, the process first calls fork to make a copy of itself, and then one of the copies(typically the child process) calls exec function to replace itself with a the new program. This is typical for program such as shells.

3. fork function although called once, it returns twice. It returns once in the calling process (called the parent) with a return value that is process ID of the newly created process (the child). It also returns once in the child, with a return value of 0. Hence the return value

tells the process whether it is the parent or the child.

4. The reason fork returns 0 in the child, instead of parent‘s process ID is because a child has only one parent and it can always obtain the parent‘s process ID by calling getppid A parent, on the other hand, can have any number of children, and there is no way to obtain

the process Ids of its children. If the parent wants to keep track of the process Ids of all its children, it must record the return values form fork.

exec function :

The only way in which an executable program file on disk is executed by Unix is for an existing process to call one of the six exec functions. exec replaces the current process image with the new program file and this new program normally starts at the main function. The process ID does not change. The process that calls the exec is the calling process and the newly executed program as the new program.

The differences in the six exec functions are:

a. whether the program file to execute is specified by a file name or a pathname.

b. Whether the arguments to the new program are listed one by one or reference through an array of pointers, and

c. Whether the environment of the calling process is passed to the new program or whether a new environment is specified.

#include <unistd.h>

int execl (const char *pathname, const char arg 0, …/ (char *) 0 */); int execv (const char *pathname, char *const argv[ ]);

int execle (const char *pathname, const char *arg 0, ./ * (char *)0,char *const envp[] */); int execve (const char *pathname, char *const arg [], char *const envp[]);

int execlp (const char *filename, const char arg 0, …/ (char *) 0 */); int execvp (const char *filename, char *const argv[]);

These functions return to the caller only if an error occurs. Otherwise control passes to the start of the new program, normally the main function.

The relationship among these six functions is shown in the following figure . Normally only execve is a system call within the kernal and the other five are library functions that call execve.

1. The three functions in the top row specify each argument string as a separate argument to the exec function, with a null pointer terminating the variable number of arguments. The three functions in the second row have an argv array containing the pointers to the argument strings. This argv array must contain a null pointer to specify its end, since a count is not specified.

1. The two functions in the left column specify a filename argument. This is converted into a pathname using current PATH environment variable. IF the execlp (file, arg,.., 0)

2. filename argument to execlp or execvp contains a slash (/) anywhaere in the string, the PATH variable is not used. The four functions in the right two columns specify a fully qualified pathname arguments.

3. The four functions in the left two column do no specify an explicit environment pointer. Instead the current value of the external variable environ is used for building an environment list that is passed to the new program. The two functions in the right column specify an explicit environment list. The envp array of pointers must be terminated by a null pointer.

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

Network Programming and Management : Elementary TCP Sockets : Byte Manipulation Functions |