gdp / doc / internal / gdp-library-implementation.html @ master
History | View | Annotate | Download (21.9 KB)
1 |
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
---|---|
2 |
<html>
|
3 |
<head>
|
4 |
<meta content="text/html; charset=UTF-8" http-equiv="content-type"> |
5 |
<title>GDP Library Implementation</title> |
6 |
<style type="text/css"> |
7 |
.metanotes {
|
8 |
font-style: italic;
|
9 |
font-weight: bold;
|
10 |
background-color: #ffff66;
|
11 |
color: red;
|
12 |
}
|
13 |
|
14 |
</style></head> |
15 |
<body>
|
16 |
<h1>Global Data Plane Library Implementation</h1> |
17 |
Eric Allman<br>
|
18 |
2015-02-26<br>
|
19 |
<br>
|
20 |
<p class="metanotes">This document is not yet complete.</p> |
21 |
<p> </p> |
22 |
<p>This document describes the internals of the Global Data Plane (GDP)
|
23 |
run-time library at a conceptual level. This library is linked into
|
24 |
any client that wishes to participate in the Global Data Plane. The
|
25 |
base library is implemented in C, which is what this document will assume, |
26 |
but bindings for the external interfaces are available for other |
27 |
languages. See the document Global Data Plane Programmatic API for
|
28 |
details of that interface.<br>
|
29 |
</p>
|
30 |
<h2>Overview</h2> |
31 |
<p class="metanotes">To be completed.</p> |
32 |
<p> Main modules: </p> |
33 |
<ul>
|
34 |
<li>API.</li> |
35 |
<li>Request management.</li> |
36 |
<li>Datum management.<br> |
37 |
</li>
|
38 |
<li>I/O buffering.</li> |
39 |
<li>GOB associative cache.</li> |
40 |
<li>Protocol Data Units (PDUs).</li> |
41 |
<li>GDP protocol.</li> |
42 |
<li>Event loop.</li> |
43 |
</ul>
|
44 |
<p>Generally speaking, the GDP library is structured as an event-driven
|
45 |
program with a synchronous API. One thread services events (e.g.,
|
46 |
responses from the GDP daemon) while the main thread executes the user |
47 |
application. When the application needs to contact the daemon, it
|
48 |
sends the message and then waits on a condition variable until |
49 |
signaled. In the meantime, the event look will wait for a response,
|
50 |
associate it with the appropriate GOB handle on the basis of the GOB |
51 |
associative cache, and then signal the application to collect the results.<br>
|
52 |
</p>
|
53 |
<p>Data is exchanged through a data structure of type <code>gdp_datum_t</code>, |
54 |
which contains a record number a time stamp, and a data buffer. When
|
55 |
sending data to the GDP daemon, the application creates a datum, fills it |
56 |
in with data, and sends it to the daemon. When receiving data from
|
57 |
the GDP daemon, the application passes a datum into the GDP library that |
58 |
will be filled in. Generally when sending data to the daemon the
|
59 |
record number and time stamp are ignored and replaced with the real record |
60 |
number and timestamp after the write completes.<br>
|
61 |
</p>
|
62 |
<h2>Important Data Types and Structures</h2> |
63 |
<h2>Module Details</h2> |
64 |
<h3>API</h3> |
65 |
<p>The API module (<code>gdp/gdp_api.c</code>) defines all the externally |
66 |
visible routines. Since these are already documented in the GDP
|
67 |
Programmatic API document, suffice it to say that it does the |
68 |
"translation" between the internal protocol and the external API.
|
69 |
Approximately speaking, it packages up parameters into a request, invokes |
70 |
the request, and translates the updated request into any return codes.<br>
|
71 |
</p>
|
72 |
<h3>Request Management</h3> |
73 |
<p>Implemented in <code>gdp/gdp_req.c</code>.<br> |
74 |
</p>
|
75 |
<p>Internally the data flow is managed through a series of requests. |
76 |
In many cases there will be only one request active on a given GOB at a |
77 |
time, but this is not necessarily true, especially in the GDP daemon when |
78 |
handling subscriptions (each subscription is a separate request).
|
79 |
Requests (potentially) have a pointer to a GOB handle, a pointer to a |
80 |
protocol data unit (in internal form; essentially a packet), the status |
81 |
code from the operation embodied in the request, and special information |
82 |
for use when processing subscriptions.<br>
|
83 |
</p>
|
84 |
<p>The internal routines are:<br> |
85 |
</p>
|
86 |
<table border="1" cellpadding="2" cellspacing="2" width="100%"> |
87 |
<tbody>
|
88 |
<tr>
|
89 |
<td valign="top"><code>_gdp_req_new</code><br> |
90 |
</td>
|
91 |
<td valign="top">Create a new request and fill it in with a GDP |
92 |
protocol command, GOB handle, I/O channel, and flags (passed in as |
93 |
parameters) as well as space for a packet (in internal form) and |
94 |
request ID.<br>
|
95 |
</td>
|
96 |
</tr>
|
97 |
<tr>
|
98 |
<td valign="top"><code>_gdp_req_free</code><br> |
99 |
</td>
|
100 |
<td valign="top">Free the request and all associated resources such as |
101 |
the space for packet information. It also decrements the
|
102 |
reference count on the GOB handle indicated in the request.<br>
|
103 |
</td>
|
104 |
</tr>
|
105 |
<tr>
|
106 |
<td valign="top"><code>_gdp_req_freeall</code><br> |
107 |
</td>
|
108 |
<td valign="top">Free all requests associated with a particular GOB |
109 |
handle. GOBs that have pending subscriptions will have one
|
110 |
request per subscription, which are linked off the GOB handle.<br>
|
111 |
</td>
|
112 |
</tr>
|
113 |
<tr>
|
114 |
<td valign="top"><code>_gdp_req_find</code><br> |
115 |
</td>
|
116 |
<td valign="top">Given a GOB handle and a request ID, find the |
117 |
associated request on the list associated with that GOB |
118 |
handle. Note that request IDs need only be unique within a
|
119 |
particular GOB handle.<br>
|
120 |
</td>
|
121 |
</tr>
|
122 |
</tbody>
|
123 |
</table>
|
124 |
<p><br> |
125 |
</p>
|
126 |
<h3>Datums</h3> |
127 |
<p>Implemented in <code>gdp/gdp_datum.c</code>.<br> |
128 |
</p>
|
129 |
<p>As described above, a datum is the internal version of a GOB
|
130 |
record. The routines, which are externally visible, are:<br> |
131 |
</p>
|
132 |
<table border="1" cellpadding="2" cellspacing="2" width="100%"> |
133 |
<tbody>
|
134 |
<tr>
|
135 |
<td valign="top"><code>gdp_datum_new</code><br> |
136 |
</td>
|
137 |
<td valign="top">Create a new empty datum, including its associated |
138 |
(empty) buffer.<br>
|
139 |
</td>
|
140 |
</tr>
|
141 |
<tr>
|
142 |
<td valign="top"><code>gdp_datum_free</code><br> |
143 |
</td>
|
144 |
<td valign="top">Free the datum, including it's associated data |
145 |
buffer.<br>
|
146 |
</td>
|
147 |
</tr>
|
148 |
<tr>
|
149 |
<td valign="top"><code>gdp_datum_getrecno</code><br> |
150 |
</td>
|
151 |
<td valign="top">Get the record number.<br> |
152 |
</td>
|
153 |
</tr>
|
154 |
<tr>
|
155 |
<td valign="top"><code>gdp_datum_getts</code><br> |
156 |
</td>
|
157 |
<td valign="top">Get the timestamp.<br> |
158 |
</td>
|
159 |
</tr>
|
160 |
<tr>
|
161 |
<td valign="top"><code>gdp_datum_getdlen</code><br> |
162 |
</td>
|
163 |
<td valign="top">Get the length of the data buffer.<br> |
164 |
</td>
|
165 |
</tr>
|
166 |
<tr>
|
167 |
<td valign="top"><code>gdp_datum_getbuf</code><br> |
168 |
</td>
|
169 |
<td valign="top">Get the data buffer itself.<br> |
170 |
</td>
|
171 |
</tr>
|
172 |
<tr>
|
173 |
<td valign="top"><code>gdp_datum_print</code><br> |
174 |
</td>
|
175 |
<td valign="top">Print a datum in a format suitable for debugging use.<br> |
176 |
</td>
|
177 |
</tr>
|
178 |
</tbody>
|
179 |
</table>
|
180 |
<br>
|
181 |
When sending data to the GDP, the application has to create a datum, get the |
182 |
buffer from that datum, and add the data to the buffer. When receiving
|
183 |
data from the GDP, the application creates a datum, hands it in to the |
184 |
appropriate read API, and upon the return can access the data buffer with |
185 |
return data, the record number, and the timestamp.<br>
|
186 |
<h3>GOB Associative Cache</h3> |
187 |
<p>Implemented in <code>gdp/gdp_gob_cache.c</code>.<br> |
188 |
</p>
|
189 |
<p>The primary purpose of the GOB Associative Cache is to allow quick
|
190 |
association between a GOB name and the associated handle. When a
|
191 |
packet is received that contains a GOB name, this delivers the handle that |
192 |
contains the necessary state information.<br>
|
193 |
</p>
|
194 |
<table border="1" cellpadding="2" cellspacing="2" width="100%"> |
195 |
<tbody>
|
196 |
<tr>
|
197 |
<td valign="top"><code>_gdp_gob_cache_init</code><br> |
198 |
</td>
|
199 |
<td valign="top">Initializes the GOB cache. Called only once on |
200 |
startup.<br>
|
201 |
</td>
|
202 |
</tr>
|
203 |
<tr>
|
204 |
<td valign="top"><code>_gdp_gob_cache_get</code><code><br> |
205 |
</code></td> |
206 |
<td valign="top">Extracts the GOB handle from the cache based on name |
207 |
and I/O mode. If it is found the reference count on the GOB
|
208 |
handle is incremented; if not, it returns NULL.<br>
|
209 |
</td>
|
210 |
</tr>
|
211 |
<tr>
|
212 |
<td valign="top"><code>_gdp_gob_cache_add</code><br> |
213 |
</td>
|
214 |
<td valign="top">Adds the GOB handle to the cache.<br> |
215 |
</td>
|
216 |
</tr>
|
217 |
<tr>
|
218 |
<td valign="top"><code>_gdp_gob_cache_drop</code><br> |
219 |
</td>
|
220 |
<td valign="top">Removes the GOB name → handle association from |
221 |
the cache.<br>
|
222 |
</td>
|
223 |
</tr>
|
224 |
<tr>
|
225 |
<td valign="top"><code>_gdp_gob_incref</code><br> |
226 |
</td>
|
227 |
<td valign="top">Increments the reference count on the GOB handle.<br> |
228 |
</td>
|
229 |
</tr>
|
230 |
<tr>
|
231 |
<td valign="top"><code>_gdp_gob_decref</code><br> |
232 |
</td>
|
233 |
<td valign="top">Decrements the reference count on the GOB |
234 |
handle. If the reference count reaches zero the handle becomes
|
235 |
a candidate for cleanup, but this is deferred because, in the common |
236 |
case, another request for this GOB will appear shortly.<br>
|
237 |
</td>
|
238 |
</tr>
|
239 |
<tr>
|
240 |
<td valign="top"><code>_gdp_gob_newhandle</code><br> |
241 |
</td>
|
242 |
<td valign="top">Creates a new GOB handle. Note that this is |
243 |
just the library data — it sends no protocol to the GDP
|
244 |
daemon.<br>
|
245 |
</td>
|
246 |
</tr>
|
247 |
<tr>
|
248 |
<td valign="top"><code>_gdp_gob_freehandle</code><br> |
249 |
</td>
|
250 |
<td valign="top">Does the actual deallocation of the handle. |
251 |
Removes the GOB from the cache (by calling <code>_gdp_gob_cache_drop</code>). |
252 |
If the GOB includes a free function, that function is called (this |
253 |
is used by the GDP daemon). It then frees the memory allocated
|
254 |
to the handle itself.<br>
|
255 |
</td>
|
256 |
</tr>
|
257 |
</tbody>
|
258 |
</table>
|
259 |
<br>
|
260 |
<h3>Protocol Data Units</h3> |
261 |
Implemented in <code>gdp/gdp_pdu.c</code>.<br> |
262 |
<br>
|
263 |
Packets are marshalled and demarshalled in <code>gdp/gdp_pdu.c</code>. |
264 |
Each packet has the following fields (with the number of octets for the |
265 |
field):<br>
|
266 |
<ul>
|
267 |
<li>Protocol version (1).</li> |
268 |
<li>Time to live (in hops) (1).</li> |
269 |
<li>Reserved for future use (must be zero when sending, ignored on
|
270 |
receive) (1).</li>
|
271 |
<li>Command or Ack/Nak (1).</li> |
272 |
<li>Destination address (32).</li> |
273 |
<li>Source address (32).</li> |
274 |
<li>Request id (4).</li> |
275 |
<li>Signature algorithm (1).</li> |
276 |
<li>Signature length (in 32-bit words) (1).</li> |
277 |
<li>Length of optional header fields (in 32-bit words) (1). This |
278 |
indicates the number of 32-bit words between the data length and the |
279 |
data (exclusive).</li>
|
280 |
<li>Flags (1). These indicate the presence of the optional fields.</li> |
281 |
<li>Length of data portion (4).</li> |
282 |
<li>Record number (8, optional).</li> |
283 |
<li>Sequence number (8, optional).</li> |
284 |
<li>Commit timestamp (16, optional).</li> |
285 |
<li>possible future extensions (variable length).</li> |
286 |
<li>Data (as indicated by the length field).</li> |
287 |
<li>Signature (as indicated by the signature length field).</li> |
288 |
</ul>
|
289 |
<p>One field is used to indicate both commands and acknowledgements/negative
|
290 |
acknowledgements. See the comments in <code>gdp/gdp_pdu.h</code> |
291 |
for the details of those values. This is worth emphasizing: the
|
292 |
command field in the protocol encodes both imperative commands (e.g., |
293 |
"write this data") and responses ("that data was written" or "could not |
294 |
write that data"). Both forms are described in the code as
|
295 |
"commands", and even share a single dispatch table.<br>
|
296 |
</p>
|
297 |
<p>The routines for handling packets are:</p> |
298 |
<table border="1" cellpadding="2" cellspacing="2" width="100%"> |
299 |
<tbody>
|
300 |
<tr>
|
301 |
<td valign="top"><code>_gdp_pdu_new</code><br> |
302 |
</td>
|
303 |
<td valign="top">Allocates a new (empty) packet.<br> |
304 |
</td>
|
305 |
</tr>
|
306 |
<tr>
|
307 |
<td valign="top"><code>_gdp_pdu_free</code><br> |
308 |
</td>
|
309 |
<td valign="top">Frees a packet.<br> |
310 |
</td>
|
311 |
</tr>
|
312 |
<tr>
|
313 |
<td valign="top"><code>_gdp_pdu_out</code><br> |
314 |
</td>
|
315 |
<td valign="top">Given a packet structure and an output buffer, |
316 |
converts that packet to external format and writes it to the |
317 |
buffer. Under normal circumstances this buffer is associated
|
318 |
with an I/O channel, and hence is written to the communication |
319 |
socket automatically.<br>
|
320 |
</td>
|
321 |
</tr>
|
322 |
<tr>
|
323 |
<td valign="top"><code>_gdp_pdu_in</code><br> |
324 |
</td>
|
325 |
<td valign="top">Reads a packet from an I/O buffer and converts it to |
326 |
internal format. It is possible that this routine can return
|
327 |
without reading the entire packet with the special status code <code>GDP_STAT_KEEP_READING</code>. |
328 |
Under most cases it should be called in a loop until a successful |
329 |
status is returned. As with <code>_gdp_pdu_out</code>, the |
330 |
I/O buffer is normally associated with an I/O channel. See the
|
331 |
discussion of the event loop for more details.<br>
|
332 |
</td>
|
333 |
</tr>
|
334 |
<tr>
|
335 |
<td valign="top"><code>_gdp_pdu_dump</code><br> |
336 |
</td>
|
337 |
<td valign="top">Prints a packet in a form suitable only for |
338 |
debugging.<br>
|
339 |
</td>
|
340 |
</tr>
|
341 |
</tbody>
|
342 |
</table>
|
343 |
<br>
|
344 |
<h3>GDP Protocol</h3> |
345 |
<p>Implemented in <code>gdp/gdp_proto.c</code>.<br> |
346 |
</p>
|
347 |
<p>The basic model is that users (e.g., the API layer) create a request with
|
348 |
<code>_gdp_req_new</code> which contains a command (what operation needs |
349 |
to be done), an optional PDU buffer, an optional pointer to a GOB handle |
350 |
on which to perform the operation, the connection on which to operate, and |
351 |
some flag bits. The PDU buffer in turn contains all the information
|
352 |
to be passed to or from the service, including data, timestamps, record |
353 |
numbers, etc. For example, on read the PDU will contain the record
|
354 |
number to be read, and the response will fill in the rest of the |
355 |
information. The client then calls <code>_gdp_invoke</code>, |
356 |
passing it the request. That routine in turn sends the request using
|
357 |
<code>_gdp_req_send</code>, waits on the condition variable contained in |
358 |
the request to get the final return status, and returns that.<br>
|
359 |
</p>
|
360 |
<p>The sending part, implemented by <code>_gdp_req_send</code>, links the |
361 |
request to the requesting GOB (so it can be found when the reply |
362 |
eventually comes in), makes sure that the GOB name is in the associative |
363 |
name → handle cache, and sends the packet.<br> |
364 |
</p>
|
365 |
<p>When the reply message eventually comes in, it triggers an event in the
|
366 |
main I/O loop; that is handed to (another) thread for processing.
|
367 |
This is done through the bufferevent interface, part of libevent2, which |
368 |
invokes callbacks when events happen on sockets (that is, it can be |
369 |
considered as having a similar functionality to <code>select</code>, or |
370 |
more accurately <code>kqueue</code> or <code>/dev/poll</code>). |
371 |
The primary callback used is <code>gdp_read_cb</code>. That routine |
372 |
allocates a new packet and reads a packet into that area. If the
|
373 |
packet is incomplete (i.e., it hasn't all been read in) the packet is |
374 |
freed and the callback returns (it will be called again later when more of |
375 |
the packet is read). If the entire packet is available, it calls <code>_gdp_pdu_process</code> |
376 |
(indirectly, via the <code>process</code> field in the channel) to |
377 |
interpret it.<br>
|
378 |
</p>
|
379 |
<p>[The code currently has a non-functional #ifdef for <code>GDP_PDU_QUEUE</code>. |
380 |
This is for a future extension allowing the packet to be dropped into |
381 |
another queue for interpretation from a process in the thread pool so that |
382 |
the read thread can focus entirely on reading and handing off |
383 |
packets. This technique is already used in gdplogd, and will only be
|
384 |
used in the client library if necessary for performance.]<br>
|
385 |
</p>
|
386 |
<p>PDU processing in <span style="font-family: monospace;">_gdp_pdu_process</span><code></code> |
387 |
involves finding the associated GOB, if available, and from that finding |
388 |
the associated request. If no request is found a new request is
|
389 |
created; this is the case with spontaneous commands. There is some
|
390 |
processing of datums to handle the case where a request had an existing |
391 |
datum that needs to be replaced with a new one (for example, a read |
392 |
request that passed in a datum with the record number and returns a datum |
393 |
with the associated timestamp and data; since the read API passes in the |
394 |
datum, there is some shuffling necessary with the underlying data buffers |
395 |
so that the caller can actually access the returned data). The
|
396 |
request (now a response) is then passed to <code>_gdp_req_dispatch</code> |
397 |
for processing.<br>
|
398 |
</p>
|
399 |
<p>Processing in <code>_gdp_req_dispatch</code> is done through a simple |
400 |
dispatch table indexed by command. Requests can be either commands
|
401 |
or responses (ack/nak); in most cases, client programs should only receive |
402 |
responses, and will get a "not implemented" if any commands are |
403 |
received. Responses fall into 2½ classes: successes (acks), |
404 |
client naks, and server naks. There are several of each of these,
|
405 |
which roughly correspond to HTTP response codes (2xx, 4xx, and 5xx |
406 |
respectively) or CoAP codes (2.xx, 4.xx, and 5.xx). These piggyback
|
407 |
on three routines (<code>ack_success</code>, <code>nak_client</code>, and |
408 |
<code>nak_server</code>), the latter two of which are essentially |
409 |
identical, doing nothing but passing the error on up the stack. The
|
410 |
<code> ack_success</code> routine checks for some nonsensical situations |
411 |
and passes the (interpreted) status back up to <span style="font-family: monospace;">_gdp_pdu_process</span><code></code>.<br> |
412 |
</p>
|
413 |
<p>The response from the command is then interpreted by <code>_gdp_pdu_process</code>. |
414 |
There are two cases. The simpler one is when the command/response
|
415 |
was a simple ack/nak, in which case the status is stored in the request |
416 |
and the thread waiting on that request is poked to wake up. The
|
417 |
other case is when the request is a subscription, in which case this |
418 |
request (which in this case must be a response) must be turned into an |
419 |
event. If so, a new event is created and passed off to the event
|
420 |
subsystem for delivery (described elsewhere).<br>
|
421 |
</p>
|
422 |
<h4>Response Confusion</h4> |
423 |
<p>One major confusion results from the large variety of response codes from
|
424 |
various existing subsystems. One common status encoding is HTTP
|
425 |
status codes and CoAP status codes. These overlap (mostly), so we
|
426 |
can (mostly) treat them as the same thing. They are the primary mode
|
427 |
of passing protocol status in the GDP protocol, but since those response |
428 |
are only eight bits the codes are offset: HTTP/COAP codes 200–264
|
429 |
become commands 128–191, codes 400–432 become commands |
430 |
192–223, and codes 500–531 become commands 224–254.<br> |
431 |
</p>
|
432 |
<p>Internally the GDP library uses the <code>EP_STAT</code> abstraction as |
433 |
a status lingua franca. <code>EP_STAT</code>s allow encoding of |
434 |
response codes in a single integer (so they can be passed easily back from |
435 |
functions). Those integers encode a severity (most importantly,
|
436 |
success or failure), a registry and a module identifier (which for this |
437 |
purpose can be treated as one piece), and detail information. The
|
438 |
module is used to create broad categories: for example, one module |
439 |
corresponds to Unix errnos, allowing them to be passed back |
440 |
directly. Another module is specific to the GDP. There are |
441 |
several generic status codes defined in <code>gdp/gdp_stat.h</code> such |
442 |
as <code>GDP_STAT_KEEP_READING</code> (a warning that a partial packet |
443 |
has been read but the remainder remains to be read) or <code>GDP_STAT_CORRUPT_GOB</code> |
444 |
(a severe error saying that the disk representation of a GOB is corrupt |
445 |
and cannot be read). The HTTP/CoAP codes are encoded in the same
|
446 |
module, but back in their original positions, i.e., in the 200–599
|
447 |
range.<br>
|
448 |
</p>
|
449 |
<h3>Event Processing</h3> |
450 |
<p>Events (see <code>gdp/gdp_event.c</code>) are a way of delivering |
451 |
information to a client without using an RPC-style blocking |
452 |
response. Specifically, a client can issue several commands and then
|
453 |
wait for the responses to come in an arbitrary order. The
|
454 |
implementation is simple: as messages are read that cannot be immediately |
455 |
processed they are turned into events. Those events are linked onto
|
456 |
an active list. The client can collect events using <code>gdp_event_next</code>, |
457 |
which takes them off the active queue. The client is responsible for
|
458 |
freeing them using <code>gdp_event_free</code>.<br> |
459 |
</p>
|
460 |
<h3>Event Loop</h3> |
461 |
<p class="metanotes">To be written.</p> |
462 |
</body>
|
463 |
</html>
|