source: trunk/SRC/Matrix/cmset_op.pro @ 163

Last change on this file since 163 was 163, checked in by navarro, 18 years ago

header improvements : type of parameters and keywords, default values, spell checking + idldoc assistant (IDL online_help)

  • Property svn:keywords set to Id
File size: 16.0 KB
Line 
1;+
2; @hidden
3;
4; @file_comments
5; Simplified version of CMSET_OP_UNIQ which sorts, and takes the
6; "first" value, whatever that may mean.
7;
8; @todo seb
9;-
10;
11function cmset_op_uniq, a
12;
13  compile_opt idl2, strictarrsubs
14;
15  if n_elements(a) LE 1 then return, 0L
16
17  ii = sort(a) & b = a[ii]
18  wh = where(b NE shift(b, +1L), ct)
19  if ct GT 0 then return, ii[wh]
20
21  return, 0L
22;
23end
24;+
25;
26; @file_comments
27; Performs an AND, OR, or XOR operation between two sets
28;
29; Description: SET_OP performs three common operations between two sets. The
30; three supported functions of OP are:
31;
32;        OP          Meaning
33;      'AND' - to find the intersection of A and B;
34;      'OR'  - to find the union of A and B;
35;      'XOR' - to find the those elements who are members of A or B
36;              but not both;
37;
38;   Sets as defined here are one dimensional arrays composed of
39;   numeric or string types. Comparisons of equality between elements
40;   are done using the IDL EQ operator.
41;
42;   The complements of either set can be taken as well, by using the
43;   NOT1 and NOT2 keywords. For example, it may be desirable to find
44;   the elements in A but not B, or B but not A (they are different!).
45;   The following IDL expressions achieve each of those effects:
46;
47;      SET = CMSET_OP(A, 'AND', /NOT2, B)   ; A but not B
48;      SET = CMSET_OP(/NOT1, A, 'AND', B)   ; B but not A
49;
50;   Note the distinction between NOT1 and NOT2.  NOT1 refers to the
51;   first set (A) and NOT2 refers to the second (B).  Their ordered
52;   placement in the calling sequence is entirely optional, but the
53;   above ordering makes the logical meaning explicit.
54;
55;   NOT1 and NOT2 can only be set for the 'AND' operator, and never
56;   simultaneously. This is because the results of an operation with
57;   'OR' or 'XOR' and any combination of NOTs -- or with 'AND' and
58;   both NOTs -- formally cannot produce a defined result.
59;
60;   The implementation depends on the type of operands. For integer
61;   types, a fast technique using HISTOGRAM is used. However, this
62;   algorithm becomes inefficient when the dynamic range in the data
63;   is large. For those cases, and for other data types, a technique
64;   based on SORT() is used. Thus the compute time should scale
65;   roughly as (A+B)*ALOG(A+B) or better, rather than (A*B) for the
66;   brute force approach. For large arrays this is a significant
67;   benefit.
68;
69; @categories
70; Array
71;
72; @param A {in}{required}
73; The two sets to be operated on. A one dimensional array of
74; either numeric or string type. A and B must be of the same
75; type. Empty sets are permitted, and are either represented
76; as an undefined variable, or by setting EMPTY1 or EMPTY2.
77;
78; @param B {in}{required}
79; See A
80;
81; @param OP0 {in}{required}{type=string}
82; a string, the operation to be performed. Must be one of
83; 'AND', 'OR' or 'XOR' (lower or mixed case is permitted).
84; Other operations will cause an error message to be produced.
85;
86; @keyword NOT1
87; If set and OP is 'AND', then the complement of A (for
88; NOT1) or B (for NOT2) will be used in the operation.
89; NOT1 and NOT2 cannot be set simultaneously.
90;
91; @keyword NOT2
92; See NOT1
93;
94; @keyword EMPTY1
95; If set, then A (for EMPTY1) or B (for EMPTY2) are
96; assumed to be the empty set. The actual values
97; passed as A or B are then ignored.
98;
99; @keyword EMPTY2
100; See EMPTY1
101;
102; @keyword INDEX
103; if set, then return a list of indexes instead of the array
104; values themselves. The "slower" set operations are always
105; performed in this case.
106;
107; The indexes refer to the *combined* array [A,B]. To
108; clarify, in the following call: I = CMSET_OP(..., /INDEX);
109; returned values from 0 to NA-1 refer to A[I], and values
110; from NA to NA+NB-1 refer to B[I-NA].
111;
112; @keyword COUNT
113; upon return, the number of elements in the result set.
114; This is only important when the result set is the empty
115; set, in which case COUNT is set to zero.
116;
117; @returns
118; The resulting set as a one-dimensional array. The set may be
119; represented by either an array of data values (default), or an
120; array of indexes (if INDEX is set). Duplicate elements, if any,
121; are removed, and element order may not be preserved.
122;
123; The empty set is represented as a return value of -1L, and COUNT
124; is set to zero. Note that the only way to recognize the empty set
125; is to examine COUNT.
126;
127; SEE ALSO:
128;
129;  SET_UTILS.PRO by RSI
130;
131; @history Written, CM, 23 Feb 2000
132;   Added empty set capability, CM, 25 Feb 2000
133;   Documentation clarification, CM 02 Mar 2000
134;   Incompatible but more consistent reworking of EMPTY keywords, CM,
135;     04 Mar 2000
136;   Minor documentation clarifications, CM, 26 Mar 2000
137;   Corrected bug in empty_arg special case, CM 06 Apr 2000
138;   Add INDEX keyword, CM 31 Jul 2000
139;   Clarify INDEX keyword documentation, CM 06 Sep 2000
140;   Made INDEX keyword always force SLOW_SET_OP, CM 06 Sep 2000
141;   Added CMSET_OP_UNIQ, and ability to select FIRST_UNIQUE or
142;     LAST_UNIQUE values, CM, 18 Sep 2000
143;   Removed FIRST_UNIQUE and LAST_UNIQUE, and streamlined
144;     CMSET_OP_UNIQ until problems with SORT can be understood, CM, 20
145;     Sep 2000 (thanks to Ben Tupper)
146;   Still trying to get documentation of INDEX and NOT right, CM, 28
147;     Sep 2000 (no code changes)
148;   Correct bug for AND case, when input sets A and B each only have
149;     one unique value, and the values are equal.  CM, 04 Mar 2004
150;     (thanks to James B. jbattat at cfa dot harvard dot edu)
151;   Add support for the cases where the input data types are mixed,
152;      but still compatible; also, attempt to return the same data
153;      type that was passed in; CM, 05 Feb 2005
154;   Fix bug in type checking (thanks to "marit"), CM, 10 Dec 2005
155;   Work around a stupidity in the built-in IDL HISTOGRAM routine,
156;      which tries to "help" you by restricting the MIN/MAX to the
157;      range of the input variable (thanks to Will Maddox), CM, 16 Jan 2006
158;
159;   Author: Craig B. Markwardt, NASA/GSFC Code 662, Greenbelt, MD 20770
160;   craigm\@lheamail.gsfc.nasa.gov
161;
162; @version $Id$
163;
164; @examples
165; Utility function, similar to UNIQ, but allowing choice of taking
166; first or last unique element, or non-unique elements.
167; Unfortunately this doesn't work because of implementation dependent
168; versions of the SORT() function.
169;
170; function cmset_op_uniq, a, first=first, non=non, count=ct, sort=sortit
171;   if n_elements(a) LE 1 then return, 0L
172;   sh = (2L*keyword_set(first)-1L)*(-2L*keyword_set(non)+1)
173;
174;   if keyword_set(sortit) then begin
175;       ;; Sort it manually
176;       ii = sort(a) & b = a[ii]
177;       if keyword_set(non) then wh = where(b EQ shift(b, sh), ct) $
178;       else                     wh = where(b NE shift(b, sh), ct)
179;       if ct GT 0 then return, ii[wh]
180;   endif else begin
181;       ;; Use the user's values directly
182;       if keyword_set(non) then wh = where(a EQ shift(a, sh), ct) $
183;       else                     wh = where(a NE shift(a, sh), ct)
184;       if ct GT 0 then return, wh
185;   endelse
186;
187;   if keyword_set(first) then return, 0L else return, n_elements(a)-1
188; end
189
190; Simplified version of CMSET_OP_UNIQ which sorts, and takes the
191; "first" value, whatever that may mean.
192;
193;-
194
195
196function cmset_op, a, op0, b, not1=not1, not2=not2, count=count, $
197              empty1=empty1, empty2=empty2, maxarray=ma, index=index
198;
199  compile_opt idl2, strictarrsubs
200;
201
202  on_error, 2 ;; return on error
203  count = 0L
204  index0 = -1L
205  ;; Histogram technique is used for array sizes < 32,000 elements
206  if n_elements(ma) EQ 0 then ma = 32L*1024L
207
208  ;; Check the number of arguments
209  if n_params() LT 3 then begin
210      ARG_ERR:
211      message, 'USAGE: SET = CMSET_OP(A, OP, B [, COUNT=ct])', /info
212      message, '  KEYWORDS: /NOT1, /NOT2, /EMPTY1, /EMPTY2, INDEX', /info
213      return, -1L
214  endif
215  if n_elements(op0) EQ 0 then goto, ARG_ERR
216  kind = keyword_set(index)
217  fst = 1L
218  if keyword_set(last) then fst = 0L
219  if keyword_set(first) then fst = 1L
220
221  ;; Check the operation
222  sz = size(op0)
223  if sz[sz[0]+1] NE 7 then begin
224      OP_ERR:
225      message, "ERROR: OP must be 'AND', 'OR' or 'XOR'"
226      return, -1L
227  endif
228  op = strupcase(op0)
229  if op NE 'AND' AND op NE 'OR' AND op NE 'XOR' then goto, OP_ERR
230
231  ;; Check NOT1 and NOT2
232  if keyword_set(not1) AND keyword_set(not2) then begin
233      message, "ERROR: NOT1 and NOT2 cannot be set simultaneously"
234      return, -1L
235  endif
236  if (keyword_set(not1) OR keyword_set(not2)) AND $
237    (op EQ 'OR' OR op EQ 'XOR') then begin
238      message, "ERROR: NOT1 and NOT2 cannot be set with 'OR' or 'XOR'"
239      return, -1L
240  endif
241
242  ;; Special cases for empty set
243  n1 = n_elements(a) & n2 = n_elements(b)
244  if keyword_set(empty1) then n1 = 0L
245  if keyword_set(empty2) then n2 = 0L
246  if n1 EQ 0 OR n2 EQ 0 then begin
247      ;; Eliminate duplicates
248      if n1 GT 0 then a1 = cmset_op_uniq(a)
249      if n2 GT 0 then b1 = cmset_op_uniq(b)
250      n1 = n_elements(a1) < n1 & n2 = n_elements(b1) < n2
251      case op of
252          'OR': if n1 EQ 0 then goto, RET_A1 else goto, RET_B1
253         'XOR': if n1 EQ 0 then goto, RET_B1 else goto, RET_A1
254         'AND': begin
255             if keyword_set(not1) AND n1 EQ 0 then goto, RET_B1
256             if keyword_set(not2) AND n2 EQ 0 then goto, RET_A1
257             return, -1L
258         end
259     endcase
260     return, -1L
261     RET_A1:
262     count = n1
263     if kind then begin
264         if count GT 0 then return, a1 else return, -1L
265     endif
266     if count GT 0 then return, a[a1] else return, -1L
267     RET_B1:
268     count = n2
269     if kind then begin
270         if count GT 0 then return, b1+n1 else return, -1L
271     endif
272     if count GT 0 then return, b[b1] else return, -1L
273 endif
274
275  ;; Allow data to have different types, but they must be at least of
276  ;; the same "base" type.  That is, you can't combine a number with a
277  ;; string, etc.
278  ;; basetype 0:undefined 1:real number 6:complex number 7:string
279  ;;     8:structure 10:pointer 11:object
280
281  ;;          0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
282  basetype = [0, 1, 1, 1, 1, 1, 6, 7, 8, 6,10,11, 1, 1, 1, 1]
283
284  ;; Check types of operands
285  sz1 = size(a) & tp1 = sz1[sz1[0]+1]
286  sz2 = size(b) & tp2 = sz2[sz2[0]+1]
287  if tp1 LT 0 OR tp1 GE 16 OR tp2 LT 0 OR tp2 GE 16 then begin
288      message, 'ERROR: unrecognized data types for operands'
289      return, -1
290  endif
291  if basetype[tp1] NE basetype[tp2] then begin
292      TYPE1_ERR:
293      message, 'ERROR: both A and B must be of the same type'
294      return, -1L
295  endif
296  if tp1 EQ 8 OR tp1 EQ 10 OR tp1 EQ 11 then begin
297      TYPE2_ERR:
298      message, 'ERROR: operands must be a numeric or string type'
299      return, -1L
300  endif
301
302  ;; Now use two different kinds of algorithms: a slower but more
303  ;; general algorithm for generic types, and the histogram technique
304  ;; for integer types.  Even for integer types, if there is too much
305  ;; dynamic range, then the slow method is used.
306
307  if tp1 GE 4 AND tp1 LE 9 then begin
308      ;; String and real types, or large int arrays
309      SLOW_SET_OP:
310      case op of
311          'OR': begin
312              uu = [a,b]    ;; OR is simple; just take unique values
313              index0 = cmset_op_uniq(uu)
314              count = n_elements(index0)
315              if kind then return, index0
316              return, uu[index0]
317          end
318
319          'XOR': begin
320              ;; Make ordered list of set union
321              ai = cmset_op_uniq(a) & na = n_elements(ai)
322              bi = cmset_op_uniq(b) & nb = n_elements(bi)
323              ui = [ai, bi+n1]
324              uu = [a,b]    & uu = uu[ui] ;; Raw union...
325              us = sort(uu) & uu = uu[us] ;; ...and sort
326              if kind then ui = ui[temporary(us)] else ui = 0
327
328              ;; Values in one set only will not have duplicates
329              wh1 = where(uu NE shift(uu, -1), count1)
330              if count1 EQ 0 then return, -1L
331              wh = where(wh1[1:*]-wh1 EQ 1, count)
332              if wh1[0] EQ 0 then begin
333                  if count GT 0 then wh = [-1L, wh] else wh = [-1L]
334                  count = n_elements(wh)
335              endif
336              if count EQ 0 then return, -1
337              if kind then return, ui[wh1[wh+1]]
338              return, uu[wh1[wh+1]]
339          end
340
341          'AND': begin
342              ;; Make ordered list of set union
343              ai = cmset_op_uniq(a) & na = n_elements(ai)
344              bi = cmset_op_uniq(b) & nb = n_elements(bi)
345              ui = [ai, bi+n1]
346              uu = [a,b]    & uu = uu[ui]  ;; Raw union...
347              us = sort(uu) & uu = uu[us]  ;; ...and sort
348              if kind then ui = ui[us] else ui = 0
349
350              if NOT keyword_set(not1) AND NOT keyword_set(not2) then begin
351
352                  ;; Special case: if there are one in each set, and
353                  ;; they are equal, then the SHIFT() technique below
354                  ;; fails.  Do this one by hand.
355                  if na EQ 1 AND nb EQ 1 AND uu[0] EQ uu[1] then begin
356                      count = 1L
357                      if kind then return, 0L
358                      return, [uu[0]]
359                  endif
360
361                  ;; If neither "NOT" is set, then find duplicates
362                  us = 0L  ;; Save memory
363                  wh = where(uu EQ shift(uu, -1L), count) ;; Find non-unique
364                  if count EQ 0 then return, -1L
365                  ;; This should always select the element from A
366                  ;; rather than B (the smaller of the two)
367                  if kind then return, (ui[wh] < ui[wh+1])
368                  return, uu[wh]
369              endif
370
371              ;; For "NOT" cases, we need to identify by set
372              ii = make_array(na+nb, value=1b)
373              if keyword_set(not1) then ii[0:na-1] = 0
374              if keyword_set(not2) then ii[na:*]   = 0
375              ii = ii[temporary(us)]
376
377              ;; Remove any duplicates
378              wh1 = where(uu EQ shift(uu, -1L), count1) ;; Find non-unique
379              if count1 GT 0 then ii[wh1, wh1+1] = 0
380              ;; Remainder is the desired set
381              wh = where(ii, count)
382              if count EQ 0 then return, -1L
383              if kind then return, ui[wh]
384              return, uu[wh]
385          end
386
387      endcase
388      return, -1L  ;; DEFAULT CASE
389  endif else begin
390
391      ;; INDEX keyword forces the "slow" operation
392      if kind then goto, SLOW_SET_OP
393
394      ;; Integer types - use histogram technique if the data range
395      ;; is small enough, otherwise use the "slow" technique above
396      min1 = min(a, max=max1) & min2 = min(b, max=max2)
397      minn = min1 < min2 & maxx = max1 > max2
398      nbins = maxx-minn+1
399      if (maxx-minn) GT floor(ma[0]) then goto, SLOW_SET_OP
400
401      ;; Work around a stupidity in the built-in IDL HISTOGRAM routine
402      if (tp1 EQ 2 OR tp2 EQ 2) AND (minn LT -32768 OR maxx GT 32767) then $
403        goto, SLOW_SET_OP
404
405      ;; Following operations create a histogram of the integer values.
406      ha = histogram(a, min=minn, max=maxx) < 1
407      hb = histogram(b, min=minn, max=maxx) < 1
408
409      ;; Compute NOT cases
410      if keyword_set(not1) then ha = 1b - ha
411      if keyword_set(not2) then hb = 1b - hb
412      case op of
413          ;; Boolean operations
414          'AND': mask = temporary(ha) AND temporary(hb)
415           'OR': mask = temporary(ha)  OR temporary(hb)
416          'XOR': mask = temporary(ha) XOR temporary(hb)
417      endcase
418
419      wh = where(temporary(mask), count)
420      if count EQ 0 then return, -1L
421
422      result = temporary(wh+minn)
423      if tp1 NE tp2 then return, result
424      szr = size(result) & tpr = szr[szr[0]+1]
425
426      ;; Cast to the original type if necessary
427      if tpr NE tp1 then begin
428          fresult = make_array(n_elements(result), type=tp1)
429          fresult[0] = temporary(result)
430          result = temporary(fresult)
431      endif
432
433      return, result
434
435  endelse
436
437  return, -1L  ;; DEFAULT CASE
438end
439
440;     Here is how I did the INDEX stuff with fast histogramming.  It
441;     works, but is complicated, so I forced it to go to SLOW_SET_OP.
442;     ha = histogram(a, min=minn, max=maxx, reverse=ra) < 1
443;     rr = ra[0:nbins] & mask = rr NE rr[1:*] & ra = ra[rr]*mask-1L+mask
444;     hb = histogram(b, min=minn, max=maxx, reverse=rb) < 1
445;     rr = rb[0:nbins] & mask = rr NE rr[1:*] & rb = rb[rr]*mask-1L+mask
446;     ...  AND/OR/XOR NOT masking here ...
447;     ra = ra[wh] & rb = rb[wh]
448;     return, ra*(ra GE 0) + (rb+n1)*(ra LT 0) ;; is last 'ra' right?
449
Note: See TracBrowser for help on using the repository browser.