source: trunk/SRC/Matrix/cmset_op.pro @ 136

Last change on this file since 136 was 136, checked in by pinsard, 18 years ago

some improvements and corrections in some .pro file according to
aspell and idldoc log file

  • Property svn:keywords set to Id
File size: 15.9 KB
Line 
1;+
2; @hidden
3;
4; @file_comments
5; Simplified version of CMSET_OP_UNIQ which sorts, and takes the
6; "first" value, whatever that may mean.
7;
8; @todo seb
9;-
10;
11function cmset_op_uniq, a
12;
13  compile_opt idl2, strictarrsubs
14;
15  if n_elements(a) LE 1 then return, 0L
16
17  ii = sort(a) & b = a[ii]
18  wh = where(b NE shift(b, +1L), ct)
19  if ct GT 0 then return, ii[wh]
20
21  return, 0L
22;
23end
24;+
25;
26; @file_comments
27; Performs an AND, OR, or XOR operation between two sets
28;
29; Description: SET_OP performs three common operations between two sets. The
30; three supported functions of OP are:
31;
32;        OP          Meaning
33;      'AND' - to find the intersection of A and B;
34;      'OR'  - to find the union of A and B;
35;      'XOR' - to find the those elements who are members of A or B
36;              but not both;
37;
38;   Sets as defined here are one dimensional arrays composed of
39;   numeric or string types. Comparisons of equality between elements
40;   are done using the IDL EQ operator.
41;
42;   The complements of either set can be taken as well, by using the
43;   NOT1 and NOT2 keywords. For example, it may be desireable to find
44;   the elements in A but not B, or B but not A (they are different!).
45;   The following IDL expressions achieve each of those effects:
46;
47;      SET = CMSET_OP(A, 'AND', /NOT2, B)   ; A but not B
48;      SET = CMSET_OP(/NOT1, A, 'AND', B)   ; B but not A
49;
50;   Note the distinction between NOT1 and NOT2.  NOT1 refers to the
51;   first set (A) and NOT2 refers to the second (B).  Their ordered
52;   placement in the calling sequence is entirely optional, but the
53;   above ordering makes the logical meaning explicit.
54;
55;   NOT1 and NOT2 can only be set for the 'AND' operator, and never
56;   simultaneously. This is because the results of an operation with
57;   'OR' or 'XOR' and any combination of NOTs -- or with 'AND' and
58;   both NOTs -- formally cannot produce a defined result.
59;
60;   The implementation depends on the type of operands. For integer
61;   types, a fast technique using HISTOGRAM is used. However, this
62;   algorithm becomes inefficient when the dynamic range in the data
63;   is large. For those cases, and for other data types, a technique
64;   based on SORT() is used. Thus the compute time should scale
65;   roughly as (A+B)*ALOG(A+B) or better, rather than (A*B) for the
66;   brute force approach. For large arrays this is a significant
67;   benefit.
68;
69; @categories array
70;
71; @param A {in}{required}
72; The two sets to be operated on. A one dimensional array of
73; either numeric or string type. A and B must be of the same
74; type. Empty sets are permitted, and are either represented
75; as an undefined variable, or by setting EMPTY1 or EMPTY2.
76;
77; @param B {in}{required}
78; See A
79;
80; @param OP0 {in}{required}
81; a string, the operation to be performed. Must be one of
82; 'AND', 'OR' or 'XOR' (lower or mixed case is permitted).
83; Other operations will cause an error message to be produced.
84;
85; @keyword NOT1
86; If set and OP is 'AND', then the complement of A (for
87; NOT1) or B (for NOT2) will be used in the operation.
88; NOT1 and NOT2 cannot be set simultaneously.
89;
90; @keyword NOT2
91; See NOT1
92;
93; @keyword EMPTY1
94; If set, then A (for EMPTY1) or B (for EMPTY2) are
95; assumed to be the empty set. The actual values
96; passed as A or B are then ignored.
97;
98; @keyword EMPTY2
99; See EMPTY1
100;
101; @keyword INDEX
102; if set, then return a list of indices instead of the array
103; values themselves. The "slower" set operations are always
104; performed in this case.
105;
106; The indices refer to the *combined* array [A,B]. To
107; clarify, in the following call: I = CMSET_OP(..., /INDEX);
108; returned values from 0 to NA-1 refer to A[I], and values
109; from NA to NA+NB-1 refer to B[I-NA].
110;
111; @keyword COUNT
112; upon return, the number of elements in the result set.
113; This is only important when the result set is the empty
114; set, in which case COUNT is set to zero.
115;
116; @returns
117; The resulting set as a one-dimensional array. The set may be
118; represented by either an array of data values (default), or an
119; array of indices (if INDEX is set). Duplicate elements, if any,
120; are removed, and element order may not be preserved.
121;
122; The empty set is represented as a return value of -1L, and COUNT
123; is set to zero. Note that the only way to recognize the empty set
124; is to examine COUNT.
125;
126; SEE ALSO:
127;
128;  SET_UTILS.PRO by RSI
129;
130; @history Written, CM, 23 Feb 2000
131;   Added empty set capability, CM, 25 Feb 2000
132;   Documentation clarification, CM 02 Mar 2000
133;   Incompatible but more consistent reworking of EMPTY keywords, CM,
134;     04 Mar 2000
135;   Minor documentation clarifications, CM, 26 Mar 2000
136;   Corrected bug in empty_arg special case, CM 06 Apr 2000
137;   Add INDEX keyword, CM 31 Jul 2000
138;   Clarify INDEX keyword documentation, CM 06 Sep 2000
139;   Made INDEX keyword always force SLOW_SET_OP, CM 06 Sep 2000
140;   Added CMSET_OP_UNIQ, and ability to select FIRST_UNIQUE or
141;     LAST_UNIQUE values, CM, 18 Sep 2000
142;   Removed FIRST_UNIQUE and LAST_UNIQUE, and streamlined
143;     CMSET_OP_UNIQ until problems with SORT can be understood, CM, 20
144;     Sep 2000 (thanks to Ben Tupper)
145;   Still trying to get documentation of INDEX and NOT right, CM, 28
146;     Sep 2000 (no code changes)
147;   Correct bug for AND case, when input sets A and B each only have
148;     one unique value, and the values are equal.  CM, 04 Mar 2004
149;     (thanks to James B. jbattat at cfa dot harvard dot edu)
150;   Add support for the cases where the input data types are mixed,
151;      but still compatible; also, attempt to return the same data
152;      type that was passed in; CM, 05 Feb 2005
153;   Fix bug in type checking (thanks to "marit"), CM, 10 Dec 2005
154;   Work around a stupidity in the built-in IDL HISTOGRAM routine,
155;      which tries to "help" you by restricting the MIN/MAX to the
156;      range of the input variable (thanks to Will Maddox), CM, 16 Jan 2006
157;
158;   Author: Craig B. Markwardt, NASA/GSFC Code 662, Greenbelt, MD 20770
159;   craigm\@lheamail.gsfc.nasa.gov
160;
161; @version $Id$
162;
163; @examples
164; Utility function, similar to UNIQ, but allowing choice of taking
165; first or last unique element, or non-unique elements.
166; Unfortunately this doesn't work because of implementation dependent
167; versions of the SORT() function.
168;
169; function cmset_op_uniq, a, first=first, non=non, count=ct, sort=sortit
170;   if n_elements(a) LE 1 then return, 0L
171;   sh = (2L*keyword_set(first)-1L)*(-2L*keyword_set(non)+1)
172;
173;   if keyword_set(sortit) then begin
174;       ;; Sort it manually
175;       ii = sort(a) & b = a[ii]
176;       if keyword_set(non) then wh = where(b EQ shift(b, sh), ct) $
177;       else                     wh = where(b NE shift(b, sh), ct)
178;       if ct GT 0 then return, ii[wh]
179;   endif else begin
180;       ;; Use the user's values directly
181;       if keyword_set(non) then wh = where(a EQ shift(a, sh), ct) $
182;       else                     wh = where(a NE shift(a, sh), ct)
183;       if ct GT 0 then return, wh
184;   endelse
185;
186;   if keyword_set(first) then return, 0L else return, n_elements(a)-1
187; end
188
189; Simplified version of CMSET_OP_UNIQ which sorts, and takes the
190; "first" value, whatever that may mean.
191;
192;-
193
194
195function cmset_op, a, op0, b, not1=not1, not2=not2, count=count, $
196              empty1=empty1, empty2=empty2, maxarray=ma, index=index
197;
198  compile_opt idl2, strictarrsubs
199;
200
201  on_error, 2 ;; return on error
202  count = 0L
203  index0 = -1L
204  ;; Histogram technique is used for array sizes < 32,000 elements
205  if n_elements(ma) EQ 0 then ma = 32L*1024L
206
207  ;; Check the number of arguments
208  if n_params() LT 3 then begin
209      ARG_ERR:
210      message, 'USAGE: SET = CMSET_OP(A, OP, B [, COUNT=ct])', /info
211      message, '  KEYWORDS: /NOT1, /NOT2, /EMPTY1, /EMPTY2, INDEX', /info
212      return, -1L
213  endif
214  if n_elements(op0) EQ 0 then goto, ARG_ERR
215  kind = keyword_set(index)
216  fst = 1L
217  if keyword_set(last) then fst = 0L
218  if keyword_set(first) then fst = 1L
219
220  ;; Check the operation
221  sz = size(op0)
222  if sz[sz[0]+1] NE 7 then begin
223      OP_ERR:
224      message, "ERROR: OP must be 'AND', 'OR' or 'XOR'"
225      return, -1L
226  endif
227  op = strupcase(op0)
228  if op NE 'AND' AND op NE 'OR' AND op NE 'XOR' then goto, OP_ERR
229
230  ;; Check NOT1 and NOT2
231  if keyword_set(not1) AND keyword_set(not2) then begin
232      message, "ERROR: NOT1 and NOT2 cannot be set simultaneously"
233      return, -1L
234  endif
235  if (keyword_set(not1) OR keyword_set(not2)) AND $
236    (op EQ 'OR' OR op EQ 'XOR') then begin
237      message, "ERROR: NOT1 and NOT2 cannot be set with 'OR' or 'XOR'"
238      return, -1L
239  endif
240
241  ;; Special cases for empty set
242  n1 = n_elements(a) & n2 = n_elements(b)
243  if keyword_set(empty1) then n1 = 0L
244  if keyword_set(empty2) then n2 = 0L
245  if n1 EQ 0 OR n2 EQ 0 then begin
246      ;; Eliminate duplicates
247      if n1 GT 0 then a1 = cmset_op_uniq(a)
248      if n2 GT 0 then b1 = cmset_op_uniq(b)
249      n1 = n_elements(a1) < n1 & n2 = n_elements(b1) < n2
250      case op of
251          'OR': if n1 EQ 0 then goto, RET_A1 else goto, RET_B1
252         'XOR': if n1 EQ 0 then goto, RET_B1 else goto, RET_A1
253         'AND': begin
254             if keyword_set(not1) AND n1 EQ 0 then goto, RET_B1
255             if keyword_set(not2) AND n2 EQ 0 then goto, RET_A1
256             return, -1L
257         end
258     endcase
259     return, -1L
260     RET_A1:
261     count = n1
262     if kind then begin
263         if count GT 0 then return, a1 else return, -1L
264     endif
265     if count GT 0 then return, a[a1] else return, -1L
266     RET_B1:
267     count = n2
268     if kind then begin
269         if count GT 0 then return, b1+n1 else return, -1L
270     endif
271     if count GT 0 then return, b[b1] else return, -1L
272 endif
273
274  ;; Allow data to have different types, but they must be at least of
275  ;; the same "base" type.  That is, you can't combine a number with a
276  ;; string, etc.
277  ;; basetype 0:undefined 1:real number 6:complex number 7:string
278  ;;     8:structure 10:pointer 11:object
279
280  ;;          0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
281  basetype = [0, 1, 1, 1, 1, 1, 6, 7, 8, 6,10,11, 1, 1, 1, 1]
282
283  ;; Check types of operands
284  sz1 = size(a) & tp1 = sz1[sz1[0]+1]
285  sz2 = size(b) & tp2 = sz2[sz2[0]+1]
286  if tp1 LT 0 OR tp1 GE 16 OR tp2 LT 0 OR tp2 GE 16 then begin
287      message, 'ERROR: unrecognized data types for operands'
288      return, -1
289  endif
290  if basetype[tp1] NE basetype[tp2] then begin
291      TYPE1_ERR:
292      message, 'ERROR: both A and B must be of the same type'
293      return, -1L
294  endif
295  if tp1 EQ 8 OR tp1 EQ 10 OR tp1 EQ 11 then begin
296      TYPE2_ERR:
297      message, 'ERROR: operands must be a numeric or string type'
298      return, -1L
299  endif
300
301  ;; Now use two different kinds of algorithms: a slower but more
302  ;; general algorithm for generic types, and the histogram technique
303  ;; for integer types.  Even for integer types, if there is too much
304  ;; dynamic range, then the slow method is used.
305
306  if tp1 GE 4 AND tp1 LE 9 then begin
307      ;; String and real types, or large int arrays
308      SLOW_SET_OP:
309      case op of
310          'OR': begin
311              uu = [a,b]    ;; OR is simple; just take unique values
312              index0 = cmset_op_uniq(uu)
313              count = n_elements(index0)
314              if kind then return, index0
315              return, uu[index0]
316          end
317
318          'XOR': begin
319              ;; Make ordered list of set union
320              ai = cmset_op_uniq(a) & na = n_elements(ai)
321              bi = cmset_op_uniq(b) & nb = n_elements(bi)
322              ui = [ai, bi+n1]
323              uu = [a,b]    & uu = uu[ui] ;; Raw union...
324              us = sort(uu) & uu = uu[us] ;; ...and sort
325              if kind then ui = ui[temporary(us)] else ui = 0
326
327              ;; Values in one set only will not have duplicates
328              wh1 = where(uu NE shift(uu, -1), count1)
329              if count1 EQ 0 then return, -1L
330              wh = where(wh1[1:*]-wh1 EQ 1, count)
331              if wh1[0] EQ 0 then begin
332                  if count GT 0 then wh = [-1L, wh] else wh = [-1L]
333                  count = n_elements(wh)
334              endif
335              if count EQ 0 then return, -1
336              if kind then return, ui[wh1[wh+1]]
337              return, uu[wh1[wh+1]]
338          end
339
340          'AND': begin
341              ;; Make ordered list of set union
342              ai = cmset_op_uniq(a) & na = n_elements(ai)
343              bi = cmset_op_uniq(b) & nb = n_elements(bi)
344              ui = [ai, bi+n1]
345              uu = [a,b]    & uu = uu[ui]  ;; Raw union...
346              us = sort(uu) & uu = uu[us]  ;; ...and sort
347              if kind then ui = ui[us] else ui = 0
348
349              if NOT keyword_set(not1) AND NOT keyword_set(not2) then begin
350
351                  ;; Special case: if there are one in each set, and
352                  ;; they are equal, then the SHIFT() technique below
353                  ;; fails.  Do this one by hand.
354                  if na EQ 1 AND nb EQ 1 AND uu[0] EQ uu[1] then begin
355                      count = 1L
356                      if kind then return, 0L
357                      return, [uu[0]]
358                  endif
359
360                  ;; If neither "NOT" is set, then find duplicates
361                  us = 0L  ;; Save memory
362                  wh = where(uu EQ shift(uu, -1L), count) ;; Find non-unique
363                  if count EQ 0 then return, -1L
364                  ;; This should always select the element from A
365                  ;; rather than B (the smaller of the two)
366                  if kind then return, (ui[wh] < ui[wh+1])
367                  return, uu[wh]
368              endif
369
370              ;; For "NOT" cases, we need to identify by set
371              ii = make_array(na+nb, value=1b)
372              if keyword_set(not1) then ii[0:na-1] = 0
373              if keyword_set(not2) then ii[na:*]   = 0
374              ii = ii[temporary(us)]
375
376              ;; Remove any duplicates
377              wh1 = where(uu EQ shift(uu, -1L), count1) ;; Find non-unique
378              if count1 GT 0 then ii[wh1, wh1+1] = 0
379              ;; Remainder is the desired set
380              wh = where(ii, count)
381              if count EQ 0 then return, -1L
382              if kind then return, ui[wh]
383              return, uu[wh]
384          end
385
386      endcase
387      return, -1L  ;; DEFAULT CASE
388  endif else begin
389
390      ;; INDEX keyword forces the "slow" operation
391      if kind then goto, SLOW_SET_OP
392
393      ;; Integer types - use histogram technique if the data range
394      ;; is small enough, otherwise use the "slow" technique above
395      min1 = min(a, max=max1) & min2 = min(b, max=max2)
396      minn = min1 < min2 & maxx = max1 > max2
397      nbins = maxx-minn+1
398      if (maxx-minn) GT floor(ma[0]) then goto, SLOW_SET_OP
399
400      ;; Work around a stupidity in the built-in IDL HISTOGRAM routine
401      if (tp1 EQ 2 OR tp2 EQ 2) AND (minn LT -32768 OR maxx GT 32767) then $
402        goto, SLOW_SET_OP
403
404      ;; Following operations create a histogram of the integer values.
405      ha = histogram(a, min=minn, max=maxx) < 1
406      hb = histogram(b, min=minn, max=maxx) < 1
407
408      ;; Compute NOT cases
409      if keyword_set(not1) then ha = 1b - ha
410      if keyword_set(not2) then hb = 1b - hb
411      case op of
412          ;; Boolean operations
413          'AND': mask = temporary(ha) AND temporary(hb)
414           'OR': mask = temporary(ha)  OR temporary(hb)
415          'XOR': mask = temporary(ha) XOR temporary(hb)
416      endcase
417
418      wh = where(temporary(mask), count)
419      if count EQ 0 then return, -1L
420
421      result = temporary(wh+minn)
422      if tp1 NE tp2 then return, result
423      szr = size(result) & tpr = szr[szr[0]+1]
424
425      ;; Cast to the original type if necessary
426      if tpr NE tp1 then begin
427          fresult = make_array(n_elements(result), type=tp1)
428          fresult[0] = temporary(result)
429          result = temporary(fresult)
430      endif
431
432      return, result
433
434  endelse
435
436  return, -1L  ;; DEFAULT CASE
437end
438
439;     Here is how I did the INDEX stuff with fast histogramming.  It
440;     works, but is complicated, so I forced it to go to SLOW_SET_OP.
441;     ha = histogram(a, min=minn, max=maxx, reverse=ra) < 1
442;     rr = ra[0:nbins] & mask = rr NE rr[1:*] & ra = ra[rr]*mask-1L+mask
443;     hb = histogram(b, min=minn, max=maxx, reverse=rb) < 1
444;     rr = rb[0:nbins] & mask = rr NE rr[1:*] & rb = rb[rr]*mask-1L+mask
445;     ...  AND/OR/XOR NOT masking here ...
446;     ra = ra[wh] & rb = rb[wh]
447;     return, ra*(ra GE 0) + (rb+n1)*(ra LT 0) ;; is last 'ra' right?
448
Note: See TracBrowser for help on using the repository browser.