source: trunk/SRC/Matrix/cmset_op.pro @ 289

Last change on this file since 289 was 262, checked in by pinsard, 17 years ago

corrections of some headers and parameters and keywords case. change of pro2href to replace proidl

  • Property svn:keywords set to Id
File size: 16.0 KB
Line 
1;+
2;
3; @hidden
4;
5; @file_comments
6; Simplified version of CMSET_OP_UNIQ which sorts, and takes the
7; "first" value, whatever that may mean.
8;
9; @todo seb
10;
11;-
12;
13FUNCTION cmset_op_uniq, a
14;
15  compile_opt idl2, strictarrsubs
16;
17  if n_elements(a) LE 1 then return, 0L
18
19  ii = sort(a) & b = a[ii]
20  wh = where(b NE shift(b, +1L), ct)
21  if ct GT 0 then return, ii[wh]
22
23  return, 0L
24;
25end
26;+
27;
28; @file_comments
29; Performs an AND, OR, or XOR operation between two sets
30;
31; Description: SET_OP performs three common operations between two sets. The
32; three supported functions of OP are:
33;
34;        OP          Meaning
35;      'AND' - to find the intersection of A and B;
36;      'OR'  - to find the union of A and B;
37;      'XOR' - to find the those elements who are members of A or B
38;              but not both;
39;
40;   Sets as defined here are one dimensional arrays composed of
41;   numeric or string types. Comparisons of equality between elements
42;   are done using the IDL EQ operator.
43;
44;   The complements of either set can be taken as well, by using the
45;   NOT1 and NOT2 keywords. For example, it may be desirable to find
46;   the elements in A but not B, or B but not A (they are different!).
47;   The following IDL expressions achieve each of those effects:
48;
49;      SET = CMSET_OP(A, 'AND', /NOT2, B)   ; A but not B
50;      SET = CMSET_OP(/NOT1, A, 'AND', B)   ; B but not A
51;
52;   Note the distinction between NOT1 and NOT2.  NOT1 refers to the
53;   first set (A) and NOT2 refers to the second (B).  Their ordered
54;   placement in the calling sequence is entirely optional, but the
55;   above ordering makes the logical meaning explicit.
56;
57;   NOT1 and NOT2 can only be set for the 'AND' operator, and never
58;   simultaneously. This is because the results of an operation with
59;   'OR' or 'XOR' and any combination of NOTs -- or with 'AND' and
60;   both NOTs -- formally cannot produce a defined result.
61;
62;   The implementation depends on the type of operands. For integer
63;   types, a fast technique using HISTOGRAM is used. However, this
64;   algorithm becomes inefficient when the dynamic range in the data
65;   is large. For those cases, and for other data types, a technique
66;   based on SORT() is used. Thus the compute time should scale
67;   roughly as (A+B)*ALOG(A+B) or better, rather than (A*B) for the
68;   brute force approach. For large arrays this is a significant
69;   benefit.
70;
71; @categories
72; Array
73;
74; @param A {in}{required}
75; The two sets to be operated on. A one dimensional array of
76; either numeric or string type. A and B must be of the same
77; type. Empty sets are permitted, and are either represented
78; as an undefined variable, or by setting EMPTY1 or EMPTY2.
79;
80; @param B {in}{required}
81; See A
82;
83; @param OP0 {in}{required}{type=string}
84; a string, the operation to be performed. Must be one of
85; 'AND', 'OR' or 'XOR' (lower or mixed case is permitted).
86; Other operations will cause an error message to be produced.
87;
88; @keyword NOT1
89; If set and OP is 'AND', then the complement of A (for
90; NOT1) or B (for NOT2) will be used in the operation.
91; NOT1 and NOT2 cannot be set simultaneously.
92;
93; @keyword NOT2
94; See NOT1
95;
96; @keyword EMPTY1
97; If set, then A (for EMPTY1) or B (for EMPTY2) are
98; assumed to be the empty set. The actual values
99; passed as A or B are then ignored.
100;
101; @keyword EMPTY2
102; See EMPTY1
103;
104; @keyword INDEX
105; if set, then return a list of indexes instead of the array
106; values themselves. The "slower" set operations are always
107; performed in this case.
108;
109; The indexes refer to the *combined* array [A,B]. To
110; clarify, in the following call: I = CMSET_OP(..., /INDEX);
111; returned values from 0 to NA-1 refer to A[I], and values
112; from NA to NA+NB-1 refer to B[I-NA].
113;
114; @keyword COUNT
115; upon return, the number of elements in the result set.
116; This is only important when the result set is the empty
117; set, in which case COUNT is set to zero.
118;
119; @returns
120; The resulting set as a one-dimensional array. The set may be
121; represented by either an array of data values (default), or an
122; array of indexes (if INDEX is set). Duplicate elements, if any,
123; are removed, and element order may not be preserved.
124;
125; The empty set is represented as a return value of -1L, and COUNT
126; is set to zero. Note that the only way to recognize the empty set
127; is to examine COUNT.
128;
129; SEE ALSO:
130;
131;  SET_UTILS.PRO by RSI
132;
133; @history
134; Written, CM, 23 Feb 2000
135;   Added empty set capability, CM, 25 Feb 2000
136;   Documentation clarification, CM 02 Mar 2000
137;   Incompatible but more consistent reworking of EMPTY keywords, CM,
138;     04 Mar 2000
139;   Minor documentation clarifications, CM, 26 Mar 2000
140;   Corrected bug in empty_arg special case, CM 06 Apr 2000
141;   Add INDEX keyword, CM 31 Jul 2000
142;   Clarify INDEX keyword documentation, CM 06 Sep 2000
143;   Made INDEX keyword always force SLOW_SET_OP, CM 06 Sep 2000
144;   Added CMSET_OP_UNIQ, and ability to select FIRST_UNIQUE or
145;     LAST_UNIQUE values, CM, 18 Sep 2000
146;   Removed FIRST_UNIQUE and LAST_UNIQUE, and streamlined
147;     CMSET_OP_UNIQ until problems with SORT can be understood, CM, 20
148;     Sep 2000 (thanks to Ben Tupper)
149;   Still trying to get documentation of INDEX and NOT right, CM, 28
150;     Sep 2000 (no code changes)
151;   Correct bug for AND case, when input sets A and B each only have
152;     one unique value, and the values are equal.  CM, 04 Mar 2004
153;     (thanks to James B. jbattat at cfa dot harvard dot edu)
154;   Add support for the cases where the input data types are mixed,
155;      but still compatible; also, attempt to return the same data
156;      type that was passed in; CM, 05 Feb 2005
157;   Fix bug in type checking (thanks to "marit"), CM, 10 Dec 2005
158;   Work around a stupidity in the built-in IDL HISTOGRAM routine,
159;      which tries to "help" you by restricting the MIN/MAX to the
160;      range of the input variable (thanks to Will Maddox), CM, 16 Jan 2006
161;
162;   Author: Craig B. Markwardt, NASA/GSFC Code 662, Greenbelt, MD 20770
163;   craigm\@lheamail.gsfc.nasa.gov
164;
165; @version
166; $Id$
167;
168; @examples
169; Utility function, similar to UNIQ, but allowing choice of taking
170; first or last unique element, or non-unique elements.
171; Unfortunately this doesn't work because of implementation dependent
172; versions of the SORT() function.
173;
174; function cmset_op_uniq, a, first=first, non=non, count=ct, sort=sortit
175;   if n_elements(a) LE 1 then return, 0L
176;   sh = (2L*keyword_set(first)-1L)*(-2L*keyword_set(non)+1)
177;
178;   if keyword_set(sortit) then begin
179;       ;; Sort it manually
180;       ii = sort(a) & b = a[ii]
181;       if keyword_set(non) then wh = where(b EQ shift(b, sh), ct) $
182;       else                     wh = where(b NE shift(b, sh), ct)
183;       if ct GT 0 then return, ii[wh]
184;   endif else begin
185;       ;; Use the user's values directly
186;       if keyword_set(non) then wh = where(a EQ shift(a, sh), ct) $
187;       else                     wh = where(a NE shift(a, sh), ct)
188;       if ct GT 0 then return, wh
189;   endelse
190;
191;   if keyword_set(first) then return, 0L else return, n_elements(a)-1
192; end
193
194; Simplified version of CMSET_OP_UNIQ which sorts, and takes the
195; "first" value, whatever that may mean.
196;
197;-
198;
199FUNCTION cmset_op, a, op0, b, NOT1=not1, NOT2=not2, COUNT=count, $
200              EMPTY1=empty1, EMPTY2=empty2, MAXARRAY=ma, INDEX=index
201;
202  compile_opt idl2, strictarrsubs
203;
204
205  on_error, 2 ;; return on error
206  count = 0L
207  index0 = -1L
208  ;; Histogram technique is used for array sizes < 32,000 elements
209  if n_elements(ma) EQ 0 then ma = 32L*1024L
210
211  ;; Check the number of arguments
212  if n_params() LT 3 then begin
213      ARG_ERR:
214      message, 'USAGE: SET = CMSET_OP(A, OP, B [, COUNT=ct])', /info
215      message, '  KEYWORDS: /NOT1, /NOT2, /EMPTY1, /EMPTY2, INDEX', /info
216      return, -1L
217  endif
218  if n_elements(op0) EQ 0 then goto, ARG_ERR
219  kind = keyword_set(index)
220  fst = 1L
221  if keyword_set(last) then fst = 0L
222  if keyword_set(first) then fst = 1L
223
224  ;; Check the operation
225  sz = size(op0)
226  if sz[sz[0]+1] NE 7 then begin
227      OP_ERR:
228      message, "ERROR: OP must be 'AND', 'OR' or 'XOR'"
229      return, -1L
230  endif
231  op = strupcase(op0)
232  if op NE 'AND' AND op NE 'OR' AND op NE 'XOR' then goto, OP_ERR
233
234  ;; Check NOT1 and NOT2
235  if keyword_set(not1) AND keyword_set(not2) then begin
236      message, "ERROR: NOT1 and NOT2 cannot be set simultaneously"
237      return, -1L
238  endif
239  if (keyword_set(not1) OR keyword_set(not2)) AND $
240    (op EQ 'OR' OR op EQ 'XOR') then begin
241      message, "ERROR: NOT1 and NOT2 cannot be set with 'OR' or 'XOR'"
242      return, -1L
243  endif
244
245  ;; Special cases for empty set
246  n1 = n_elements(a) & n2 = n_elements(b)
247  if keyword_set(empty1) then n1 = 0L
248  if keyword_set(empty2) then n2 = 0L
249  if n1 EQ 0 OR n2 EQ 0 then begin
250      ;; Eliminate duplicates
251      if n1 GT 0 then a1 = cmset_op_uniq(a)
252      if n2 GT 0 then b1 = cmset_op_uniq(b)
253      n1 = n_elements(a1) < n1 & n2 = n_elements(b1) < n2
254      case op of
255          'OR': if n1 EQ 0 then goto, RET_A1 else goto, RET_B1
256         'XOR': if n1 EQ 0 then goto, RET_B1 else goto, RET_A1
257         'AND': begin
258             if keyword_set(not1) AND n1 EQ 0 then goto, RET_B1
259             if keyword_set(not2) AND n2 EQ 0 then goto, RET_A1
260             return, -1L
261         end
262     endcase
263     return, -1L
264     RET_A1:
265     count = n1
266     if kind then begin
267         if count GT 0 then return, a1 else return, -1L
268     endif
269     if count GT 0 then return, a[a1] else return, -1L
270     RET_B1:
271     count = n2
272     if kind then begin
273         if count GT 0 then return, b1+n1 else return, -1L
274     endif
275     if count GT 0 then return, b[b1] else return, -1L
276 endif
277
278  ;; Allow data to have different types, but they must be at least of
279  ;; the same "base" type.  That is, you can't combine a number with a
280  ;; string, etc.
281  ;; basetype 0:undefined 1:real number 6:complex number 7:string
282  ;;     8:structure 10:pointer 11:object
283
284  ;;          0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
285  basetype = [0, 1, 1, 1, 1, 1, 6, 7, 8, 6,10,11, 1, 1, 1, 1]
286
287  ;; Check types of operands
288  sz1 = size(a) & tp1 = sz1[sz1[0]+1]
289  sz2 = size(b) & tp2 = sz2[sz2[0]+1]
290  if tp1 LT 0 OR tp1 GE 16 OR tp2 LT 0 OR tp2 GE 16 then begin
291      message, 'ERROR: unrecognized data types for operands'
292      return, -1
293  endif
294  if basetype[tp1] NE basetype[tp2] then begin
295      TYPE1_ERR:
296      message, 'ERROR: both A and B must be of the same type'
297      return, -1L
298  endif
299  if tp1 EQ 8 OR tp1 EQ 10 OR tp1 EQ 11 then begin
300      TYPE2_ERR:
301      message, 'ERROR: operands must be a numeric or string type'
302      return, -1L
303  endif
304
305  ;; Now use two different kinds of algorithms: a slower but more
306  ;; general algorithm for generic types, and the histogram technique
307  ;; for integer types.  Even for integer types, if there is too much
308  ;; dynamic range, then the slow method is used.
309
310  if tp1 GE 4 AND tp1 LE 9 then begin
311      ;; String and real types, or large int arrays
312      SLOW_SET_OP:
313      case op of
314          'OR': begin
315              uu = [a,b]    ;; OR is simple; just take unique values
316              index0 = cmset_op_uniq(uu)
317              count = n_elements(index0)
318              if kind then return, index0
319              return, uu[index0]
320          end
321
322          'XOR': begin
323              ;; Make ordered list of set union
324              ai = cmset_op_uniq(a) & na = n_elements(ai)
325              bi = cmset_op_uniq(b) & nb = n_elements(bi)
326              ui = [ai, bi+n1]
327              uu = [a,b]    & uu = uu[ui] ;; Raw union...
328              us = sort(uu) & uu = uu[us] ;; ...and sort
329              if kind then ui = ui[temporary(us)] else ui = 0
330
331              ;; Values in one set only will not have duplicates
332              wh1 = where(uu NE shift(uu, -1), count1)
333              if count1 EQ 0 then return, -1L
334              wh = where(wh1[1:*]-wh1 EQ 1, count)
335              if wh1[0] EQ 0 then begin
336                  if count GT 0 then wh = [-1L, wh] else wh = [-1L]
337                  count = n_elements(wh)
338              endif
339              if count EQ 0 then return, -1
340              if kind then return, ui[wh1[wh+1]]
341              return, uu[wh1[wh+1]]
342          end
343
344          'AND': begin
345              ;; Make ordered list of set union
346              ai = cmset_op_uniq(a) & na = n_elements(ai)
347              bi = cmset_op_uniq(b) & nb = n_elements(bi)
348              ui = [ai, bi+n1]
349              uu = [a,b]    & uu = uu[ui]  ;; Raw union...
350              us = sort(uu) & uu = uu[us]  ;; ...and sort
351              if kind then ui = ui[us] else ui = 0
352
353              if NOT keyword_set(not1) AND NOT keyword_set(not2) then begin
354
355                  ;; Special case: if there are one in each set, and
356                  ;; they are equal, then the SHIFT() technique below
357                  ;; fails.  Do this one by hand.
358                  if na EQ 1 AND nb EQ 1 AND uu[0] EQ uu[1] then begin
359                      count = 1L
360                      if kind then return, 0L
361                      return, [uu[0]]
362                  endif
363
364                  ;; If neither "NOT" is set, then find duplicates
365                  us = 0L  ;; Save memory
366                  wh = where(uu EQ shift(uu, -1L), count) ;; Find non-unique
367                  if count EQ 0 then return, -1L
368                  ;; This should always select the element from A
369                  ;; rather than B (the smaller of the two)
370                  if kind then return, (ui[wh] < ui[wh+1])
371                  return, uu[wh]
372              endif
373
374              ;; For "NOT" cases, we need to identify by set
375              ii = make_array(na+nb, value=1b)
376              if keyword_set(not1) then ii[0:na-1] = 0
377              if keyword_set(not2) then ii[na:*]   = 0
378              ii = ii[temporary(us)]
379
380              ;; Remove any duplicates
381              wh1 = where(uu EQ shift(uu, -1L), count1) ;; Find non-unique
382              if count1 GT 0 then ii[wh1, wh1+1] = 0
383              ;; Remainder is the desired set
384              wh = where(ii, count)
385              if count EQ 0 then return, -1L
386              if kind then return, ui[wh]
387              return, uu[wh]
388          end
389
390      endcase
391      return, -1L  ;; DEFAULT CASE
392  endif else begin
393
394      ;; INDEX keyword forces the "slow" operation
395      if kind then goto, SLOW_SET_OP
396
397      ;; Integer types - use histogram technique if the data range
398      ;; is small enough, otherwise use the "slow" technique above
399      min1 = min(a, max=max1) & min2 = min(b, max=max2)
400      minn = min1 < min2 & maxx = max1 > max2
401      nbins = maxx-minn+1
402      if (maxx-minn) GT floor(ma[0]) then goto, SLOW_SET_OP
403
404      ;; Work around a stupidity in the built-in IDL HISTOGRAM routine
405      if (tp1 EQ 2 OR tp2 EQ 2) AND (minn LT -32768 OR maxx GT 32767) then $
406        goto, SLOW_SET_OP
407
408      ;; Following operations create a histogram of the integer values.
409      ha = histogram(a, min=minn, max=maxx) < 1
410      hb = histogram(b, min=minn, max=maxx) < 1
411
412      ;; Compute NOT cases
413      if keyword_set(not1) then ha = 1b - ha
414      if keyword_set(not2) then hb = 1b - hb
415      case op of
416          ;; Boolean operations
417          'AND': mask = temporary(ha) AND temporary(hb)
418           'OR': mask = temporary(ha)  OR temporary(hb)
419          'XOR': mask = temporary(ha) XOR temporary(hb)
420      endcase
421
422      wh = where(temporary(mask), count)
423      if count EQ 0 then return, -1L
424
425      result = temporary(wh+minn)
426      if tp1 NE tp2 then return, result
427      szr = size(result) & tpr = szr[szr[0]+1]
428
429      ;; Cast to the original type if necessary
430      if tpr NE tp1 then begin
431          fresult = make_array(n_elements(result), type=tp1)
432          fresult[0] = temporary(result)
433          result = temporary(fresult)
434      endif
435
436      return, result
437
438  endelse
439
440  return, -1L  ;; DEFAULT CASE
441end
442
443;     Here is how I did the INDEX stuff with fast histogramming.  It
444;     works, but is complicated, so I forced it to go to SLOW_SET_OP.
445;     ha = histogram(a, min=minn, max=maxx, reverse=ra) < 1
446;     rr = ra[0:nbins] & mask = rr NE rr[1:*] & ra = ra[rr]*mask-1L+mask
447;     hb = histogram(b, min=minn, max=maxx, reverse=rb) < 1
448;     rr = rb[0:nbins] & mask = rr NE rr[1:*] & rb = rb[rr]*mask-1L+mask
449;     ...  AND/OR/XOR NOT masking here ...
450;     ra = ra[wh] & rb = rb[wh]
451;     return, ra*(ra GE 0) + (rb+n1)*(ra LT 0) ;; is last 'ra' right?
452
Note: See TracBrowser for help on using the repository browser.