Context Navigation

cmset_op.pro @ 136

Last change on this file since 136 was 136, checked in by pinsard, 18 years ago
some improvements and corrections in some .pro file according to aspell and idldoc log file
Property svn:keywords set to `Id`
File size: 15.9 KB

Line
1	;+
2	; @hidden
3	;
4	; @file_comments
5	; Simplified version of CMSET_OP_UNIQ which sorts, and takes the
6	; "first" value, whatever that may mean.
7	;
8	; @todo seb
9	;-
10	;
11	function cmset_op_uniq, a
12	;
13	compile_opt idl2, strictarrsubs
14	;
15	if n_elements(a) LE 1 then return, 0L
16
17	ii = sort(a) & b = a[ii]
18	wh = where(b NE shift(b, +1L), ct)
19	if ct GT 0 then return, ii[wh]
20
21	return, 0L
22	;
23	end
24	;+
25	;
26	; @file_comments
27	; Performs an AND, OR, or XOR operation between two sets
28	;
29	; Description: SET_OP performs three common operations between two sets. The
30	; three supported functions of OP are:
31	;
32	; OP Meaning
33	; 'AND' - to find the intersection of A and B;
34	; 'OR' - to find the union of A and B;
35	; 'XOR' - to find the those elements who are members of A or B
36	; but not both;
37	;
38	; Sets as defined here are one dimensional arrays composed of
39	; numeric or string types. Comparisons of equality between elements
40	; are done using the IDL EQ operator.
41	;
42	; The complements of either set can be taken as well, by using the
43	; NOT1 and NOT2 keywords. For example, it may be desireable to find
44	; the elements in A but not B, or B but not A (they are different!).
45	; The following IDL expressions achieve each of those effects:
46	;
47	; SET = CMSET_OP(A, 'AND', /NOT2, B) ; A but not B
48	; SET = CMSET_OP(/NOT1, A, 'AND', B) ; B but not A
49	;
50	; Note the distinction between NOT1 and NOT2. NOT1 refers to the
51	; first set (A) and NOT2 refers to the second (B). Their ordered
52	; placement in the calling sequence is entirely optional, but the
53	; above ordering makes the logical meaning explicit.
54	;
55	; NOT1 and NOT2 can only be set for the 'AND' operator, and never
56	; simultaneously. This is because the results of an operation with
57	; 'OR' or 'XOR' and any combination of NOTs -- or with 'AND' and
58	; both NOTs -- formally cannot produce a defined result.
59	;
60	; The implementation depends on the type of operands. For integer
61	; types, a fast technique using HISTOGRAM is used. However, this
62	; algorithm becomes inefficient when the dynamic range in the data
63	; is large. For those cases, and for other data types, a technique
64	; based on SORT() is used. Thus the compute time should scale
65	; roughly as (A+B)ALOG(A+B) or better, rather than (AB) for the
66	; brute force approach. For large arrays this is a significant
67	; benefit.
68	;
69	; @categories array
70	;
71	; @param A {in}{required}
72	; The two sets to be operated on. A one dimensional array of
73	; either numeric or string type. A and B must be of the same
74	; type. Empty sets are permitted, and are either represented
75	; as an undefined variable, or by setting EMPTY1 or EMPTY2.
76	;
77	; @param B {in}{required}
78	; See A
79	;
80	; @param OP0 {in}{required}
81	; a string, the operation to be performed. Must be one of
82	; 'AND', 'OR' or 'XOR' (lower or mixed case is permitted).
83	; Other operations will cause an error message to be produced.
84	;
85	; @keyword NOT1
86	; If set and OP is 'AND', then the complement of A (for
87	; NOT1) or B (for NOT2) will be used in the operation.
88	; NOT1 and NOT2 cannot be set simultaneously.
89	;
90	; @keyword NOT2
91	; See NOT1
92	;
93	; @keyword EMPTY1
94	; If set, then A (for EMPTY1) or B (for EMPTY2) are
95	; assumed to be the empty set. The actual values
96	; passed as A or B are then ignored.
97	;
98	; @keyword EMPTY2
99	; See EMPTY1
100	;
101	; @keyword INDEX
102	; if set, then return a list of indices instead of the array
103	; values themselves. The "slower" set operations are always
104	; performed in this case.
105	;
106	; The indices refer to the combined array [A,B]. To
107	; clarify, in the following call: I = CMSET_OP(..., /INDEX);
108	; returned values from 0 to NA-1 refer to A[I], and values
109	; from NA to NA+NB-1 refer to B[I-NA].
110	;
111	; @keyword COUNT
112	; upon return, the number of elements in the result set.
113	; This is only important when the result set is the empty
114	; set, in which case COUNT is set to zero.
115	;
116	; @returns
117	; The resulting set as a one-dimensional array. The set may be
118	; represented by either an array of data values (default), or an
119	; array of indices (if INDEX is set). Duplicate elements, if any,
120	; are removed, and element order may not be preserved.
121	;
122	; The empty set is represented as a return value of -1L, and COUNT
123	; is set to zero. Note that the only way to recognize the empty set
124	; is to examine COUNT.
125	;
126	; SEE ALSO:
127	;
128	; SET_UTILS.PRO by RSI
129	;
130	; @history Written, CM, 23 Feb 2000
131	; Added empty set capability, CM, 25 Feb 2000
132	; Documentation clarification, CM 02 Mar 2000
133	; Incompatible but more consistent reworking of EMPTY keywords, CM,
134	; 04 Mar 2000
135	; Minor documentation clarifications, CM, 26 Mar 2000
136	; Corrected bug in empty_arg special case, CM 06 Apr 2000
137	; Add INDEX keyword, CM 31 Jul 2000
138	; Clarify INDEX keyword documentation, CM 06 Sep 2000
139	; Made INDEX keyword always force SLOW_SET_OP, CM 06 Sep 2000
140	; Added CMSET_OP_UNIQ, and ability to select FIRST_UNIQUE or
141	; LAST_UNIQUE values, CM, 18 Sep 2000
142	; Removed FIRST_UNIQUE and LAST_UNIQUE, and streamlined
143	; CMSET_OP_UNIQ until problems with SORT can be understood, CM, 20
144	; Sep 2000 (thanks to Ben Tupper)
145	; Still trying to get documentation of INDEX and NOT right, CM, 28
146	; Sep 2000 (no code changes)
147	; Correct bug for AND case, when input sets A and B each only have
148	; one unique value, and the values are equal. CM, 04 Mar 2004
149	; (thanks to James B. jbattat at cfa dot harvard dot edu)
150	; Add support for the cases where the input data types are mixed,
151	; but still compatible; also, attempt to return the same data
152	; type that was passed in; CM, 05 Feb 2005
153	; Fix bug in type checking (thanks to "marit"), CM, 10 Dec 2005
154	; Work around a stupidity in the built-in IDL HISTOGRAM routine,
155	; which tries to "help" you by restricting the MIN/MAX to the
156	; range of the input variable (thanks to Will Maddox), CM, 16 Jan 2006
157	;
158	; Author: Craig B. Markwardt, NASA/GSFC Code 662, Greenbelt, MD 20770
159	; craigm\@lheamail.gsfc.nasa.gov
160	;
161	; @version $Id$
162	;
163	; @examples
164	; Utility function, similar to UNIQ, but allowing choice of taking
165	; first or last unique element, or non-unique elements.
166	; Unfortunately this doesn't work because of implementation dependent
167	; versions of the SORT() function.
168	;
169	; function cmset_op_uniq, a, first=first, non=non, count=ct, sort=sortit
170	; if n_elements(a) LE 1 then return, 0L
171	; sh = (2Lkeyword_set(first)-1L)(-2L*keyword_set(non)+1)
172	;
173	; if keyword_set(sortit) then begin
174	; ;; Sort it manually
175	; ii = sort(a) & b = a[ii]
176	; if keyword_set(non) then wh = where(b EQ shift(b, sh), ct) $
177	; else wh = where(b NE shift(b, sh), ct)
178	; if ct GT 0 then return, ii[wh]
179	; endif else begin
180	; ;; Use the user's values directly
181	; if keyword_set(non) then wh = where(a EQ shift(a, sh), ct) $
182	; else wh = where(a NE shift(a, sh), ct)
183	; if ct GT 0 then return, wh
184	; endelse
185	;
186	; if keyword_set(first) then return, 0L else return, n_elements(a)-1
187	; end
188
189	; Simplified version of CMSET_OP_UNIQ which sorts, and takes the
190	; "first" value, whatever that may mean.
191	;
192	;-
193
194
195	function cmset_op, a, op0, b, not1=not1, not2=not2, count=count, $
196	empty1=empty1, empty2=empty2, maxarray=ma, index=index
197	;
198	compile_opt idl2, strictarrsubs
199	;
200
201	on_error, 2 ;; return on error
202	count = 0L
203	index0 = -1L
204	;; Histogram technique is used for array sizes < 32,000 elements
205	if n_elements(ma) EQ 0 then ma = 32L*1024L
206
207	;; Check the number of arguments
208	if n_params() LT 3 then begin
209	ARG_ERR:
210	message, 'USAGE: SET = CMSET_OP(A, OP, B [, COUNT=ct])', /info
211	message, ' KEYWORDS: /NOT1, /NOT2, /EMPTY1, /EMPTY2, INDEX', /info
212	return, -1L
213	endif
214	if n_elements(op0) EQ 0 then goto, ARG_ERR
215	kind = keyword_set(index)
216	fst = 1L
217	if keyword_set(last) then fst = 0L
218	if keyword_set(first) then fst = 1L
219
220	;; Check the operation
221	sz = size(op0)
222	if sz[sz[0]+1] NE 7 then begin
223	OP_ERR:
224	message, "ERROR: OP must be 'AND', 'OR' or 'XOR'"
225	return, -1L
226	endif
227	op = strupcase(op0)
228	if op NE 'AND' AND op NE 'OR' AND op NE 'XOR' then goto, OP_ERR
229
230	;; Check NOT1 and NOT2
231	if keyword_set(not1) AND keyword_set(not2) then begin
232	message, "ERROR: NOT1 and NOT2 cannot be set simultaneously"
233	return, -1L
234	endif
235	if (keyword_set(not1) OR keyword_set(not2)) AND $
236	(op EQ 'OR' OR op EQ 'XOR') then begin
237	message, "ERROR: NOT1 and NOT2 cannot be set with 'OR' or 'XOR'"
238	return, -1L
239	endif
240
241	;; Special cases for empty set
242	n1 = n_elements(a) & n2 = n_elements(b)
243	if keyword_set(empty1) then n1 = 0L
244	if keyword_set(empty2) then n2 = 0L
245	if n1 EQ 0 OR n2 EQ 0 then begin
246	;; Eliminate duplicates
247	if n1 GT 0 then a1 = cmset_op_uniq(a)
248	if n2 GT 0 then b1 = cmset_op_uniq(b)
249	n1 = n_elements(a1) < n1 & n2 = n_elements(b1) < n2
250	case op of
251	'OR': if n1 EQ 0 then goto, RET_A1 else goto, RET_B1
252	'XOR': if n1 EQ 0 then goto, RET_B1 else goto, RET_A1
253	'AND': begin
254	if keyword_set(not1) AND n1 EQ 0 then goto, RET_B1
255	if keyword_set(not2) AND n2 EQ 0 then goto, RET_A1
256	return, -1L
257	end
258	endcase
259	return, -1L
260	RET_A1:
261	count = n1
262	if kind then begin
263	if count GT 0 then return, a1 else return, -1L
264	endif
265	if count GT 0 then return, a[a1] else return, -1L
266	RET_B1:
267	count = n2
268	if kind then begin
269	if count GT 0 then return, b1+n1 else return, -1L
270	endif
271	if count GT 0 then return, b[b1] else return, -1L
272	endif
273
274	;; Allow data to have different types, but they must be at least of
275	;; the same "base" type. That is, you can't combine a number with a
276	;; string, etc.
277	;; basetype 0:undefined 1:real number 6:complex number 7:string
278	;; 8:structure 10:pointer 11:object
279
280	;; 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
281	basetype = [0, 1, 1, 1, 1, 1, 6, 7, 8, 6,10,11, 1, 1, 1, 1]
282
283	;; Check types of operands
284	sz1 = size(a) & tp1 = sz1[sz1[0]+1]
285	sz2 = size(b) & tp2 = sz2[sz2[0]+1]
286	if tp1 LT 0 OR tp1 GE 16 OR tp2 LT 0 OR tp2 GE 16 then begin
287	message, 'ERROR: unrecognized data types for operands'
288	return, -1
289	endif
290	if basetype[tp1] NE basetype[tp2] then begin
291	TYPE1_ERR:
292	message, 'ERROR: both A and B must be of the same type'
293	return, -1L
294	endif
295	if tp1 EQ 8 OR tp1 EQ 10 OR tp1 EQ 11 then begin
296	TYPE2_ERR:
297	message, 'ERROR: operands must be a numeric or string type'
298	return, -1L
299	endif
300
301	;; Now use two different kinds of algorithms: a slower but more
302	;; general algorithm for generic types, and the histogram technique
303	;; for integer types. Even for integer types, if there is too much
304	;; dynamic range, then the slow method is used.
305
306	if tp1 GE 4 AND tp1 LE 9 then begin
307	;; String and real types, or large int arrays
308	SLOW_SET_OP:
309	case op of
310	'OR': begin
311	uu = [a,b] ;; OR is simple; just take unique values
312	index0 = cmset_op_uniq(uu)
313	count = n_elements(index0)
314	if kind then return, index0
315	return, uu[index0]
316	end
317
318	'XOR': begin
319	;; Make ordered list of set union
320	ai = cmset_op_uniq(a) & na = n_elements(ai)
321	bi = cmset_op_uniq(b) & nb = n_elements(bi)
322	ui = [ai, bi+n1]
323	uu = [a,b] & uu = uu[ui] ;; Raw union...
324	us = sort(uu) & uu = uu[us] ;; ...and sort
325	if kind then ui = ui[temporary(us)] else ui = 0
326
327	;; Values in one set only will not have duplicates
328	wh1 = where(uu NE shift(uu, -1), count1)
329	if count1 EQ 0 then return, -1L
330	wh = where(wh1[1:*]-wh1 EQ 1, count)
331	if wh1[0] EQ 0 then begin
332	if count GT 0 then wh = [-1L, wh] else wh = [-1L]
333	count = n_elements(wh)
334	endif
335	if count EQ 0 then return, -1
336	if kind then return, ui[wh1[wh+1]]
337	return, uu[wh1[wh+1]]
338	end
339
340	'AND': begin
341	;; Make ordered list of set union
342	ai = cmset_op_uniq(a) & na = n_elements(ai)
343	bi = cmset_op_uniq(b) & nb = n_elements(bi)
344	ui = [ai, bi+n1]
345	uu = [a,b] & uu = uu[ui] ;; Raw union...
346	us = sort(uu) & uu = uu[us] ;; ...and sort
347	if kind then ui = ui[us] else ui = 0
348
349	if NOT keyword_set(not1) AND NOT keyword_set(not2) then begin
350
351	;; Special case: if there are one in each set, and
352	;; they are equal, then the SHIFT() technique below
353	;; fails. Do this one by hand.
354	if na EQ 1 AND nb EQ 1 AND uu[0] EQ uu[1] then begin
355	count = 1L
356	if kind then return, 0L
357	return, [uu[0]]
358	endif
359
360	;; If neither "NOT" is set, then find duplicates
361	us = 0L ;; Save memory
362	wh = where(uu EQ shift(uu, -1L), count) ;; Find non-unique
363	if count EQ 0 then return, -1L
364	;; This should always select the element from A
365	;; rather than B (the smaller of the two)
366	if kind then return, (ui[wh] < ui[wh+1])
367	return, uu[wh]
368	endif
369
370	;; For "NOT" cases, we need to identify by set
371	ii = make_array(na+nb, value=1b)
372	if keyword_set(not1) then ii[0:na-1] = 0
373	if keyword_set(not2) then ii[na:*] = 0
374	ii = ii[temporary(us)]
375
376	;; Remove any duplicates
377	wh1 = where(uu EQ shift(uu, -1L), count1) ;; Find non-unique
378	if count1 GT 0 then ii[wh1, wh1+1] = 0
379	;; Remainder is the desired set
380	wh = where(ii, count)
381	if count EQ 0 then return, -1L
382	if kind then return, ui[wh]
383	return, uu[wh]
384	end
385
386	endcase
387	return, -1L ;; DEFAULT CASE
388	endif else begin
389
390	;; INDEX keyword forces the "slow" operation
391	if kind then goto, SLOW_SET_OP
392
393	;; Integer types - use histogram technique if the data range
394	;; is small enough, otherwise use the "slow" technique above
395	min1 = min(a, max=max1) & min2 = min(b, max=max2)
396	minn = min1 < min2 & maxx = max1 > max2
397	nbins = maxx-minn+1
398	if (maxx-minn) GT floor(ma[0]) then goto, SLOW_SET_OP
399
400	;; Work around a stupidity in the built-in IDL HISTOGRAM routine
401	if (tp1 EQ 2 OR tp2 EQ 2) AND (minn LT -32768 OR maxx GT 32767) then $
402	goto, SLOW_SET_OP
403
404	;; Following operations create a histogram of the integer values.
405	ha = histogram(a, min=minn, max=maxx) < 1
406	hb = histogram(b, min=minn, max=maxx) < 1
407
408	;; Compute NOT cases
409	if keyword_set(not1) then ha = 1b - ha
410	if keyword_set(not2) then hb = 1b - hb
411	case op of
412	;; Boolean operations
413	'AND': mask = temporary(ha) AND temporary(hb)
414	'OR': mask = temporary(ha) OR temporary(hb)
415	'XOR': mask = temporary(ha) XOR temporary(hb)
416	endcase
417
418	wh = where(temporary(mask), count)
419	if count EQ 0 then return, -1L
420
421	result = temporary(wh+minn)
422	if tp1 NE tp2 then return, result
423	szr = size(result) & tpr = szr[szr[0]+1]
424
425	;; Cast to the original type if necessary
426	if tpr NE tp1 then begin
427	fresult = make_array(n_elements(result), type=tp1)
428	fresult[0] = temporary(result)
429	result = temporary(fresult)
430	endif
431
432	return, result
433
434	endelse
435
436	return, -1L ;; DEFAULT CASE
437	end
438
439	; Here is how I did the INDEX stuff with fast histogramming. It
440	; works, but is complicated, so I forced it to go to SLOW_SET_OP.
441	; ha = histogram(a, min=minn, max=maxx, reverse=ra) < 1
442	; rr = ra[0:nbins] & mask = rr NE rr[1:] & ra = ra[rr]mask-1L+mask
443	; hb = histogram(b, min=minn, max=maxx, reverse=rb) < 1
444	; rr = rb[0:nbins] & mask = rr NE rr[1:] & rb = rb[rr]mask-1L+mask
445	; ... AND/OR/XOR NOT masking here ...
446	; ra = ra[wh] & rb = rb[wh]
447	; return, ra(ra GE 0) + (rb+n1)(ra LT 0) ;; is last 'ra' right?
448

Note: See TracBrowser for help on using the repository browser.

Download in other formats:

Original Format