Project

General

Profile

Statistics
| Branch: | Tag: | Revision:

gdp / doc / internal / gdp-library-implementation.html @ master

History | View | Annotate | Download (21.9 KB)

1
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
2
<html>
3
  <head>
4
    <meta content="text/html; charset=UTF-8" http-equiv="content-type">
5
    <title>GDP Library Implementation</title>
6
    <style type="text/css">
7
.metanotes {  
8
  font-style: italic;  
9
  font-weight: bold;  
10
  background-color: #ffff66;  
11
  color: red;
12
}
13

    
14
</style></head>
15
  <body>
16
    <h1>Global Data Plane Library Implementation</h1>
17
    Eric Allman<br>
18
    2015-02-26<br>
19
    <br>
20
    <p class="metanotes">This document is not yet complete.</p>
21
    <p> </p>
22
    <p>This document describes the internals of the Global Data Plane (GDP)
23
      run-time library at a conceptual level.&nbsp; This library is linked into
24
      any client that wishes to participate in the Global Data Plane.&nbsp; The
25
      base library is implemented in C, which is what this document will assume,
26
      but bindings for the external interfaces are available for other
27
      languages.&nbsp; See the document Global Data Plane Programmatic API for
28
      details of that interface.<br>
29
    </p>
30
    <h2>Overview</h2>
31
    <p class="metanotes">To be completed.</p>
32
    <p> Main modules: </p>
33
    <ul>
34
      <li>API.</li>
35
      <li>Request management.</li>
36
      <li>Datum management.<br>
37
      </li>
38
      <li>I/O buffering.</li>
39
      <li>GOB associative cache.</li>
40
      <li>Protocol Data Units (PDUs).</li>
41
      <li>GDP protocol.</li>
42
      <li>Event loop.</li>
43
    </ul>
44
    <p>Generally speaking, the GDP library is structured as an event-driven
45
      program with a synchronous API.&nbsp; One thread services events (e.g.,
46
      responses from the GDP daemon) while the main thread executes the user
47
      application.&nbsp; When the application needs to contact the daemon, it
48
      sends the message and then waits on a condition variable until
49
      signaled.&nbsp; In the meantime, the event look will wait for a response,
50
      associate it with the appropriate GOB handle on the basis of the GOB
51
      associative cache, and then signal the application to collect the results.<br>
52
    </p>
53
    <p>Data is exchanged through a data structure of type <code>gdp_datum_t</code>,
54
      which contains a record number a time stamp, and a data buffer.&nbsp; When
55
      sending data to the GDP daemon, the application creates a datum, fills it
56
      in with data, and sends it to the daemon.&nbsp; When receiving data from
57
      the GDP daemon, the application passes a datum into the GDP library that
58
      will be filled in.&nbsp; Generally when sending data to the daemon the
59
      record number and time stamp are ignored and replaced with the real record
60
      number and timestamp after the write completes.<br>
61
    </p>
62
    <h2>Important Data Types and Structures</h2>
63
    <h2>Module Details</h2>
64
    <h3>API</h3>
65
    <p>The API module (<code>gdp/gdp_api.c</code>) defines all the externally
66
      visible routines.&nbsp; Since these are already documented in the GDP
67
      Programmatic API document, suffice it to say that it does the
68
      "translation" between the internal protocol and the external API.&nbsp;
69
      Approximately speaking, it packages up parameters into a request, invokes
70
      the request, and translates the updated request into any return codes.<br>
71
    </p>
72
    <h3>Request Management</h3>
73
    <p>Implemented in <code>gdp/gdp_req.c</code>.<br>
74
    </p>
75
    <p>Internally the data flow is managed through a series of requests.&nbsp;
76
      In many cases there will be only one request active on a given GOB at a
77
      time, but this is not necessarily true, especially in the GDP daemon when
78
      handling subscriptions (each subscription is a separate request).&nbsp;
79
      Requests (potentially) have a pointer to a GOB handle, a pointer to a
80
      protocol data unit (in internal form; essentially a packet), the status
81
      code from the operation embodied in the request, and special information
82
      for use when processing subscriptions.<br>
83
    </p>
84
    <p>The internal routines are:<br>
85
    </p>
86
    <table border="1" cellpadding="2" cellspacing="2" width="100%">
87
      <tbody>
88
        <tr>
89
          <td valign="top"><code>_gdp_req_new</code><br>
90
          </td>
91
          <td valign="top">Create a new request and fill it in with a GDP
92
            protocol command, GOB handle, I/O channel, and flags (passed in as
93
            parameters) as well as space for a packet (in internal form) and
94
            request ID.<br>
95
          </td>
96
        </tr>
97
        <tr>
98
          <td valign="top"><code>_gdp_req_free</code><br>
99
          </td>
100
          <td valign="top">Free the request and all associated resources such as
101
            the space for packet information.&nbsp; It also decrements the
102
            reference count on the GOB handle indicated in the request.<br>
103
          </td>
104
        </tr>
105
        <tr>
106
          <td valign="top"><code>_gdp_req_freeall</code><br>
107
          </td>
108
          <td valign="top">Free all requests associated with a particular GOB
109
            handle.&nbsp; GOBs that have pending subscriptions will have one
110
            request per subscription, which are linked off the GOB handle.<br>
111
          </td>
112
        </tr>
113
        <tr>
114
          <td valign="top"><code>_gdp_req_find</code><br>
115
          </td>
116
          <td valign="top">Given a GOB handle and a request ID, find the
117
            associated request on the list associated with that GOB
118
            handle.&nbsp; Note that request IDs need only be unique within a
119
            particular GOB handle.<br>
120
          </td>
121
        </tr>
122
      </tbody>
123
    </table>
124
    <p><br>
125
    </p>
126
    <h3>Datums</h3>
127
    <p>Implemented in <code>gdp/gdp_datum.c</code>.<br>
128
    </p>
129
    <p>As described above, a datum is the internal version of a GOB
130
      record.&nbsp; The routines, which are externally visible, are:<br>
131
    </p>
132
    <table border="1" cellpadding="2" cellspacing="2" width="100%">
133
      <tbody>
134
        <tr>
135
          <td valign="top"><code>gdp_datum_new</code><br>
136
          </td>
137
          <td valign="top">Create a new empty datum, including its associated
138
            (empty) buffer.<br>
139
          </td>
140
        </tr>
141
        <tr>
142
          <td valign="top"><code>gdp_datum_free</code><br>
143
          </td>
144
          <td valign="top">Free the datum, including it's associated data
145
            buffer.<br>
146
          </td>
147
        </tr>
148
        <tr>
149
          <td valign="top"><code>gdp_datum_getrecno</code><br>
150
          </td>
151
          <td valign="top">Get the record number.<br>
152
          </td>
153
        </tr>
154
        <tr>
155
          <td valign="top"><code>gdp_datum_getts</code><br>
156
          </td>
157
          <td valign="top">Get the timestamp.<br>
158
          </td>
159
        </tr>
160
        <tr>
161
          <td valign="top"><code>gdp_datum_getdlen</code><br>
162
          </td>
163
          <td valign="top">Get the length of the data buffer.<br>
164
          </td>
165
        </tr>
166
        <tr>
167
          <td valign="top"><code>gdp_datum_getbuf</code><br>
168
          </td>
169
          <td valign="top">Get the data buffer itself.<br>
170
          </td>
171
        </tr>
172
        <tr>
173
          <td valign="top"><code>gdp_datum_print</code><br>
174
          </td>
175
          <td valign="top">Print a datum in a format suitable for debugging use.<br>
176
          </td>
177
        </tr>
178
      </tbody>
179
    </table>
180
    <br>
181
    When sending data to the GDP, the application has to create a datum, get the
182
    buffer from that datum, and add the data to the buffer.&nbsp; When receiving
183
    data from the GDP, the application creates a datum, hands it in to the
184
    appropriate read API, and upon the return can access the data buffer with
185
    return data, the record number, and the timestamp.<br>
186
    <h3>GOB Associative Cache</h3>
187
    <p>Implemented in <code>gdp/gdp_gob_cache.c</code>.<br>
188
    </p>
189
    <p>The primary purpose of the GOB Associative Cache is to allow quick
190
      association between a GOB name and the associated handle.&nbsp; When a
191
      packet is received that contains a GOB name, this delivers the handle that
192
      contains the necessary state information.<br>
193
    </p>
194
    <table border="1" cellpadding="2" cellspacing="2" width="100%">
195
      <tbody>
196
        <tr>
197
          <td valign="top"><code>_gdp_gob_cache_init</code><br>
198
          </td>
199
          <td valign="top">Initializes the GOB cache.&nbsp; Called only once on
200
            startup.<br>
201
          </td>
202
        </tr>
203
        <tr>
204
          <td valign="top"><code>_gdp_gob_cache_get</code><code><br>
205
            </code></td>
206
          <td valign="top">Extracts the GOB handle from the cache based on name
207
            and I/O mode.&nbsp; If it is found the reference count on the GOB
208
            handle is incremented; if not, it returns NULL.<br>
209
          </td>
210
        </tr>
211
        <tr>
212
          <td valign="top"><code>_gdp_gob_cache_add</code><br>
213
          </td>
214
          <td valign="top">Adds the GOB handle to the cache.<br>
215
          </td>
216
        </tr>
217
        <tr>
218
          <td valign="top"><code>_gdp_gob_cache_drop</code><br>
219
          </td>
220
          <td valign="top">Removes the GOB name &rarr; handle association from
221
            the cache.<br>
222
          </td>
223
        </tr>
224
        <tr>
225
          <td valign="top"><code>_gdp_gob_incref</code><br>
226
          </td>
227
          <td valign="top">Increments the reference count on the GOB handle.<br>
228
          </td>
229
        </tr>
230
        <tr>
231
          <td valign="top"><code>_gdp_gob_decref</code><br>
232
          </td>
233
          <td valign="top">Decrements the reference count on the GOB
234
            handle.&nbsp; If the reference count reaches zero the handle becomes
235
            a candidate for cleanup, but this is deferred because, in the common
236
            case, another request for this GOB will appear shortly.<br>
237
          </td>
238
        </tr>
239
        <tr>
240
          <td valign="top"><code>_gdp_gob_newhandle</code><br>
241
          </td>
242
          <td valign="top">Creates a new GOB handle.&nbsp; Note that this is
243
            just the library data &mdash; it sends no protocol to the GDP
244
            daemon.<br>
245
          </td>
246
        </tr>
247
        <tr>
248
          <td valign="top"><code>_gdp_gob_freehandle</code><br>
249
          </td>
250
          <td valign="top">Does the actual deallocation of the handle.&nbsp;
251
            Removes the GOB from the cache (by calling <code>_gdp_gob_cache_drop</code>).&nbsp;
252
            If the GOB includes a free function, that function is called (this
253
            is used by the GDP daemon).&nbsp; It then frees the memory allocated
254
            to the handle itself.<br>
255
          </td>
256
        </tr>
257
      </tbody>
258
    </table>
259
    <br>
260
    <h3>Protocol Data Units</h3>
261
    Implemented in <code>gdp/gdp_pdu.c</code>.<br>
262
    <br>
263
    Packets are marshalled and demarshalled in <code>gdp/gdp_pdu.c</code>.&nbsp;
264
    Each packet has the following fields (with the number of octets for the
265
    field):<br>
266
    <ul>
267
      <li>Protocol version (1).</li>
268
      <li>Time to live (in hops) (1).</li>
269
      <li>Reserved for future use (must be zero when sending, ignored on
270
        receive) (1).</li>
271
      <li>Command or Ack/Nak (1).</li>
272
      <li>Destination address (32).</li>
273
      <li>Source address (32).</li>
274
      <li>Request id (4).</li>
275
      <li>Signature algorithm (1).</li>
276
      <li>Signature length (in 32-bit words) (1).</li>
277
      <li>Length of optional header fields (in 32-bit words) (1).&nbsp; This
278
        indicates the number of 32-bit words between the data length and the
279
        data (exclusive).</li>
280
      <li>Flags (1).&nbsp; These indicate the presence of the optional fields.</li>
281
      <li>Length of data portion (4).</li>
282
      <li>Record number (8, optional).</li>
283
      <li>Sequence number (8, optional).</li>
284
      <li>Commit timestamp (16, optional).</li>
285
      <li>possible future extensions (variable length).</li>
286
      <li>Data (as indicated by the length field).</li>
287
      <li>Signature (as indicated by the signature length field).</li>
288
    </ul>
289
    <p>One field is used to indicate both commands and acknowledgements/negative
290
      acknowledgements.&nbsp; See the comments in <code>gdp/gdp_pdu.h</code>
291
      for the details of those values.&nbsp; This is worth emphasizing: the
292
      command field in the protocol encodes both imperative commands (e.g.,
293
      "write this data") and responses ("that data was written" or "could not
294
      write that data").&nbsp; Both forms are described in the code as
295
      "commands", and even share a single dispatch table.<br>
296
    </p>
297
    <p>The routines for handling packets are:</p>
298
    <table border="1" cellpadding="2" cellspacing="2" width="100%">
299
      <tbody>
300
        <tr>
301
          <td valign="top"><code>_gdp_pdu_new</code><br>
302
          </td>
303
          <td valign="top">Allocates a new (empty) packet.<br>
304
          </td>
305
        </tr>
306
        <tr>
307
          <td valign="top"><code>_gdp_pdu_free</code><br>
308
          </td>
309
          <td valign="top">Frees a packet.<br>
310
          </td>
311
        </tr>
312
        <tr>
313
          <td valign="top"><code>_gdp_pdu_out</code><br>
314
          </td>
315
          <td valign="top">Given a packet structure and an output buffer,
316
            converts that packet to external format and writes it to the
317
            buffer.&nbsp; Under normal circumstances this buffer is associated
318
            with an I/O channel, and hence is written to the communication
319
            socket automatically.<br>
320
          </td>
321
        </tr>
322
        <tr>
323
          <td valign="top"><code>_gdp_pdu_in</code><br>
324
          </td>
325
          <td valign="top">Reads a packet from an I/O buffer and converts it to
326
            internal format.&nbsp; It is possible that this routine can return
327
            without reading the entire packet with the special status code <code>GDP_STAT_KEEP_READING</code>.&nbsp;
328
            Under most cases it should be called in a loop until a successful
329
            status is returned.&nbsp; As with <code>_gdp_pdu_out</code>, the
330
            I/O buffer is normally associated with an I/O channel.&nbsp; See the
331
            discussion of the event loop for more details.<br>
332
          </td>
333
        </tr>
334
        <tr>
335
          <td valign="top"><code>_gdp_pdu_dump</code><br>
336
          </td>
337
          <td valign="top">Prints a packet in a form suitable only for
338
            debugging.<br>
339
          </td>
340
        </tr>
341
      </tbody>
342
    </table>
343
    <br>
344
    <h3>GDP Protocol</h3>
345
    <p>Implemented in <code>gdp/gdp_proto.c</code>.<br>
346
    </p>
347
    <p>The basic model is that users (e.g., the API layer) create a request with
348
      <code>_gdp_req_new</code> which contains a command (what operation needs
349
      to be done), an optional PDU buffer, an optional pointer to a GOB handle
350
      on which to perform the operation, the connection on which to operate, and
351
      some flag bits.&nbsp; The PDU buffer in turn contains all the information
352
      to be passed to or from the service, including data, timestamps, record
353
      numbers, etc.&nbsp; For example, on read the PDU will contain the record
354
      number to be read, and the response will fill in the rest of the
355
      information.&nbsp; The client then calls <code>_gdp_invoke</code>,
356
      passing it the request.&nbsp; That routine in turn sends the request using
357
      <code>_gdp_req_send</code>, waits on the condition variable contained in
358
      the request to get the final return status, and returns that.<br>
359
    </p>
360
    <p>The sending part, implemented by <code>_gdp_req_send</code>, links the
361
      request to the requesting GOB (so it can be found when the reply
362
      eventually comes in), makes sure that the GOB name is in the associative
363
      name &rarr; handle cache, and sends the packet.<br>
364
    </p>
365
    <p>When the reply message eventually comes in, it triggers an event in the
366
      main I/O loop; that is handed to (another) thread for processing.&nbsp;
367
      This is done through the bufferevent interface, part of libevent2, which
368
      invokes callbacks when events happen on sockets (that is, it can be
369
      considered as having a similar functionality to <code>select</code>, or
370
      more accurately <code>kqueue</code> or <code>/dev/poll</code>).&nbsp;
371
      The primary callback used is <code>gdp_read_cb</code>.&nbsp; That routine
372
      allocates a new packet and reads a packet into that area.&nbsp; If the
373
      packet is incomplete (i.e., it hasn't all been read in) the packet is
374
      freed and the callback returns (it will be called again later when more of
375
      the packet is read).&nbsp; If the entire packet is available, it calls <code>_gdp_pdu_process</code>
376
      (indirectly, via the <code>process</code> field in the channel) to
377
      interpret it.<br>
378
    </p>
379
    <p>[The code currently has a non-functional #ifdef for <code>GDP_PDU_QUEUE</code>.&nbsp;
380
      This is for a future extension allowing the packet to be dropped into
381
      another queue for interpretation from a process in the thread pool so that
382
      the read thread can focus entirely on reading and handing off
383
      packets.&nbsp; This technique is already used in gdplogd, and will only be
384
      used in the client library if necessary for performance.]<br>
385
    </p>
386
    <p>PDU processing in <span style="font-family: monospace;">_gdp_pdu_process</span><code></code>
387
      involves finding the associated GOB, if available, and from that finding
388
      the associated request.&nbsp; If no request is found a new request is
389
      created; this is the case with spontaneous commands.&nbsp; There is some
390
      processing of datums to handle the case where a request had an existing
391
      datum that needs to be replaced with a new one (for example, a read
392
      request that passed in a datum with the record number and returns a datum
393
      with the associated timestamp and data; since the read API passes in the
394
      datum, there is some shuffling necessary with the underlying data buffers
395
      so that the caller can actually access the returned data).&nbsp; The
396
      request (now a response) is then passed to <code>_gdp_req_dispatch</code>
397
      for processing.<br>
398
    </p>
399
    <p>Processing in <code>_gdp_req_dispatch</code> is done through a simple
400
      dispatch table indexed by command.&nbsp; Requests can be either commands
401
      or responses (ack/nak); in most cases, client programs should only receive
402
      responses, and will get a "not implemented" if any commands are
403
      received.&nbsp; Responses fall into 2&frac12; classes: successes (acks),
404
      client naks, and server naks.&nbsp; There are several of each of these,
405
      which roughly correspond to HTTP response codes (2xx, 4xx, and 5xx
406
      respectively) or CoAP codes (2.xx, 4.xx, and 5.xx).&nbsp; These piggyback
407
      on three routines (<code>ack_success</code>, <code>nak_client</code>, and
408
      <code>nak_server</code>), the latter two of which are essentially
409
      identical, doing nothing but passing the error on up the stack.&nbsp; The
410
      <code> ack_success</code> routine checks for some nonsensical situations
411
      and passes the (interpreted) status back up to <span style="font-family: monospace;">_gdp_pdu_process</span><code></code>.<br>
412
    </p>
413
    <p>The response from the command is then interpreted by <code>_gdp_pdu_process</code>.&nbsp;
414
      There are two cases.&nbsp; The simpler one is when the command/response
415
      was a simple ack/nak, in which case the status is stored in the request
416
      and the thread waiting on that request is poked to wake up.&nbsp; The
417
      other case is when the request is a subscription, in which case this
418
      request (which in this case must be a response) must be turned into an
419
      event.&nbsp; If so, a new event is created and passed off to the event
420
      subsystem for delivery (described elsewhere).<br>
421
    </p>
422
    <h4>Response Confusion</h4>
423
    <p>One major confusion results from the large variety of response codes from
424
      various existing subsystems.&nbsp; One common status encoding is HTTP
425
      status codes and CoAP status codes.&nbsp; These overlap (mostly), so we
426
      can (mostly) treat them as the same thing.&nbsp; They are the primary mode
427
      of passing protocol status in the GDP protocol, but since those response
428
      are only eight bits the codes are offset: HTTP/COAP codes 200&ndash;264
429
      become commands 128&ndash;191, codes 400&ndash;432 become commands
430
      192&ndash;223, and codes 500&ndash;531 become commands 224&ndash;254.<br>
431
    </p>
432
    <p>Internally the GDP library uses the <code>EP_STAT</code> abstraction as
433
      a status lingua franca.&nbsp; <code>EP_STAT</code>s allow encoding of
434
      response codes in a single integer (so they can be passed easily back from
435
      functions).&nbsp; Those integers encode a severity (most importantly,
436
      success or failure), a registry and a module identifier (which for this
437
      purpose can be treated as one piece), and detail information.&nbsp; The
438
      module is used to create broad categories: for example, one module
439
      corresponds to Unix errnos, allowing them to be passed back
440
      directly.&nbsp; Another module is specific to the GDP.&nbsp; There are
441
      several generic status codes defined in <code>gdp/gdp_stat.h</code> such
442
      as <code>GDP_STAT_KEEP_READING</code> (a warning that a partial packet
443
      has been read but the remainder remains to be read) or <code>GDP_STAT_CORRUPT_GOB</code>
444
      (a severe error saying that the disk representation of a GOB is corrupt
445
      and cannot be read).&nbsp; The HTTP/CoAP codes are encoded in the same
446
      module, but back in their original positions, i.e., in the 200&ndash;599
447
      range.<br>
448
    </p>
449
    <h3>Event Processing</h3>
450
    <p>Events (see <code>gdp/gdp_event.c</code>) are a way of delivering
451
      information to a client without using an RPC-style blocking
452
      response.&nbsp; Specifically, a client can issue several commands and then
453
      wait for the responses to come in an arbitrary order.&nbsp; The
454
      implementation is simple: as messages are read that cannot be immediately
455
      processed they are turned into events.&nbsp; Those events are linked onto
456
      an active list.&nbsp; The client can collect events using <code>gdp_event_next</code>,
457
      which takes them off the active queue.&nbsp; The client is responsible for
458
      freeing them using <code>gdp_event_free</code>.<br>
459
    </p>
460
    <h3>Event Loop</h3>
461
    <p class="metanotes">To be written.</p>
462
  </body>
463
</html>