Context Navigation

cmset_op.pro @ 289

Last change on this file since 289 was 262, checked in by pinsard, 17 years ago
corrections of some headers and parameters and keywords case. change of pro2href to replace proidl
Property svn:keywords set to `Id`
File size: 16.0 KB

Line
1	;+
2	;
3	; @hidden
4	;
5	; @file_comments
6	; Simplified version of CMSET_OP_UNIQ which sorts, and takes the
7	; "first" value, whatever that may mean.
8	;
9	; @todo seb
10	;
11	;-
12	;
13	FUNCTION cmset_op_uniq, a
14	;
15	compile_opt idl2, strictarrsubs
16	;
17	if n_elements(a) LE 1 then return, 0L
18
19	ii = sort(a) & b = a[ii]
20	wh = where(b NE shift(b, +1L), ct)
21	if ct GT 0 then return, ii[wh]
22
23	return, 0L
24	;
25	end
26	;+
27	;
28	; @file_comments
29	; Performs an AND, OR, or XOR operation between two sets
30	;
31	; Description: SET_OP performs three common operations between two sets. The
32	; three supported functions of OP are:
33	;
34	; OP Meaning
35	; 'AND' - to find the intersection of A and B;
36	; 'OR' - to find the union of A and B;
37	; 'XOR' - to find the those elements who are members of A or B
38	; but not both;
39	;
40	; Sets as defined here are one dimensional arrays composed of
41	; numeric or string types. Comparisons of equality between elements
42	; are done using the IDL EQ operator.
43	;
44	; The complements of either set can be taken as well, by using the
45	; NOT1 and NOT2 keywords. For example, it may be desirable to find
46	; the elements in A but not B, or B but not A (they are different!).
47	; The following IDL expressions achieve each of those effects:
48	;
49	; SET = CMSET_OP(A, 'AND', /NOT2, B) ; A but not B
50	; SET = CMSET_OP(/NOT1, A, 'AND', B) ; B but not A
51	;
52	; Note the distinction between NOT1 and NOT2. NOT1 refers to the
53	; first set (A) and NOT2 refers to the second (B). Their ordered
54	; placement in the calling sequence is entirely optional, but the
55	; above ordering makes the logical meaning explicit.
56	;
57	; NOT1 and NOT2 can only be set for the 'AND' operator, and never
58	; simultaneously. This is because the results of an operation with
59	; 'OR' or 'XOR' and any combination of NOTs -- or with 'AND' and
60	; both NOTs -- formally cannot produce a defined result.
61	;
62	; The implementation depends on the type of operands. For integer
63	; types, a fast technique using HISTOGRAM is used. However, this
64	; algorithm becomes inefficient when the dynamic range in the data
65	; is large. For those cases, and for other data types, a technique
66	; based on SORT() is used. Thus the compute time should scale
67	; roughly as (A+B)ALOG(A+B) or better, rather than (AB) for the
68	; brute force approach. For large arrays this is a significant
69	; benefit.
70	;
71	; @categories
72	; Array
73	;
74	; @param A {in}{required}
75	; The two sets to be operated on. A one dimensional array of
76	; either numeric or string type. A and B must be of the same
77	; type. Empty sets are permitted, and are either represented
78	; as an undefined variable, or by setting EMPTY1 or EMPTY2.
79	;
80	; @param B {in}{required}
81	; See A
82	;
83	; @param OP0 {in}{required}{type=string}
84	; a string, the operation to be performed. Must be one of
85	; 'AND', 'OR' or 'XOR' (lower or mixed case is permitted).
86	; Other operations will cause an error message to be produced.
87	;
88	; @keyword NOT1
89	; If set and OP is 'AND', then the complement of A (for
90	; NOT1) or B (for NOT2) will be used in the operation.
91	; NOT1 and NOT2 cannot be set simultaneously.
92	;
93	; @keyword NOT2
94	; See NOT1
95	;
96	; @keyword EMPTY1
97	; If set, then A (for EMPTY1) or B (for EMPTY2) are
98	; assumed to be the empty set. The actual values
99	; passed as A or B are then ignored.
100	;
101	; @keyword EMPTY2
102	; See EMPTY1
103	;
104	; @keyword INDEX
105	; if set, then return a list of indexes instead of the array
106	; values themselves. The "slower" set operations are always
107	; performed in this case.
108	;
109	; The indexes refer to the combined array [A,B]. To
110	; clarify, in the following call: I = CMSET_OP(..., /INDEX);
111	; returned values from 0 to NA-1 refer to A[I], and values
112	; from NA to NA+NB-1 refer to B[I-NA].
113	;
114	; @keyword COUNT
115	; upon return, the number of elements in the result set.
116	; This is only important when the result set is the empty
117	; set, in which case COUNT is set to zero.
118	;
119	; @returns
120	; The resulting set as a one-dimensional array. The set may be
121	; represented by either an array of data values (default), or an
122	; array of indexes (if INDEX is set). Duplicate elements, if any,
123	; are removed, and element order may not be preserved.
124	;
125	; The empty set is represented as a return value of -1L, and COUNT
126	; is set to zero. Note that the only way to recognize the empty set
127	; is to examine COUNT.
128	;
129	; SEE ALSO:
130	;
131	; SET_UTILS.PRO by RSI
132	;
133	; @history
134	; Written, CM, 23 Feb 2000
135	; Added empty set capability, CM, 25 Feb 2000
136	; Documentation clarification, CM 02 Mar 2000
137	; Incompatible but more consistent reworking of EMPTY keywords, CM,
138	; 04 Mar 2000
139	; Minor documentation clarifications, CM, 26 Mar 2000
140	; Corrected bug in empty_arg special case, CM 06 Apr 2000
141	; Add INDEX keyword, CM 31 Jul 2000
142	; Clarify INDEX keyword documentation, CM 06 Sep 2000
143	; Made INDEX keyword always force SLOW_SET_OP, CM 06 Sep 2000
144	; Added CMSET_OP_UNIQ, and ability to select FIRST_UNIQUE or
145	; LAST_UNIQUE values, CM, 18 Sep 2000
146	; Removed FIRST_UNIQUE and LAST_UNIQUE, and streamlined
147	; CMSET_OP_UNIQ until problems with SORT can be understood, CM, 20
148	; Sep 2000 (thanks to Ben Tupper)
149	; Still trying to get documentation of INDEX and NOT right, CM, 28
150	; Sep 2000 (no code changes)
151	; Correct bug for AND case, when input sets A and B each only have
152	; one unique value, and the values are equal. CM, 04 Mar 2004
153	; (thanks to James B. jbattat at cfa dot harvard dot edu)
154	; Add support for the cases where the input data types are mixed,
155	; but still compatible; also, attempt to return the same data
156	; type that was passed in; CM, 05 Feb 2005
157	; Fix bug in type checking (thanks to "marit"), CM, 10 Dec 2005
158	; Work around a stupidity in the built-in IDL HISTOGRAM routine,
159	; which tries to "help" you by restricting the MIN/MAX to the
160	; range of the input variable (thanks to Will Maddox), CM, 16 Jan 2006
161	;
162	; Author: Craig B. Markwardt, NASA/GSFC Code 662, Greenbelt, MD 20770
163	; craigm\@lheamail.gsfc.nasa.gov
164	;
165	; @version
166	; $Id$
167	;
168	; @examples
169	; Utility function, similar to UNIQ, but allowing choice of taking
170	; first or last unique element, or non-unique elements.
171	; Unfortunately this doesn't work because of implementation dependent
172	; versions of the SORT() function.
173	;
174	; function cmset_op_uniq, a, first=first, non=non, count=ct, sort=sortit
175	; if n_elements(a) LE 1 then return, 0L
176	; sh = (2Lkeyword_set(first)-1L)(-2L*keyword_set(non)+1)
177	;
178	; if keyword_set(sortit) then begin
179	; ;; Sort it manually
180	; ii = sort(a) & b = a[ii]
181	; if keyword_set(non) then wh = where(b EQ shift(b, sh), ct) $
182	; else wh = where(b NE shift(b, sh), ct)
183	; if ct GT 0 then return, ii[wh]
184	; endif else begin
185	; ;; Use the user's values directly
186	; if keyword_set(non) then wh = where(a EQ shift(a, sh), ct) $
187	; else wh = where(a NE shift(a, sh), ct)
188	; if ct GT 0 then return, wh
189	; endelse
190	;
191	; if keyword_set(first) then return, 0L else return, n_elements(a)-1
192	; end
193
194	; Simplified version of CMSET_OP_UNIQ which sorts, and takes the
195	; "first" value, whatever that may mean.
196	;
197	;-
198	;
199	FUNCTION cmset_op, a, op0, b, NOT1=not1, NOT2=not2, COUNT=count, $
200	EMPTY1=empty1, EMPTY2=empty2, MAXARRAY=ma, INDEX=index
201	;
202	compile_opt idl2, strictarrsubs
203	;
204
205	on_error, 2 ;; return on error
206	count = 0L
207	index0 = -1L
208	;; Histogram technique is used for array sizes < 32,000 elements
209	if n_elements(ma) EQ 0 then ma = 32L*1024L
210
211	;; Check the number of arguments
212	if n_params() LT 3 then begin
213	ARG_ERR:
214	message, 'USAGE: SET = CMSET_OP(A, OP, B [, COUNT=ct])', /info
215	message, ' KEYWORDS: /NOT1, /NOT2, /EMPTY1, /EMPTY2, INDEX', /info
216	return, -1L
217	endif
218	if n_elements(op0) EQ 0 then goto, ARG_ERR
219	kind = keyword_set(index)
220	fst = 1L
221	if keyword_set(last) then fst = 0L
222	if keyword_set(first) then fst = 1L
223
224	;; Check the operation
225	sz = size(op0)
226	if sz[sz[0]+1] NE 7 then begin
227	OP_ERR:
228	message, "ERROR: OP must be 'AND', 'OR' or 'XOR'"
229	return, -1L
230	endif
231	op = strupcase(op0)
232	if op NE 'AND' AND op NE 'OR' AND op NE 'XOR' then goto, OP_ERR
233
234	;; Check NOT1 and NOT2
235	if keyword_set(not1) AND keyword_set(not2) then begin
236	message, "ERROR: NOT1 and NOT2 cannot be set simultaneously"
237	return, -1L
238	endif
239	if (keyword_set(not1) OR keyword_set(not2)) AND $
240	(op EQ 'OR' OR op EQ 'XOR') then begin
241	message, "ERROR: NOT1 and NOT2 cannot be set with 'OR' or 'XOR'"
242	return, -1L
243	endif
244
245	;; Special cases for empty set
246	n1 = n_elements(a) & n2 = n_elements(b)
247	if keyword_set(empty1) then n1 = 0L
248	if keyword_set(empty2) then n2 = 0L
249	if n1 EQ 0 OR n2 EQ 0 then begin
250	;; Eliminate duplicates
251	if n1 GT 0 then a1 = cmset_op_uniq(a)
252	if n2 GT 0 then b1 = cmset_op_uniq(b)
253	n1 = n_elements(a1) < n1 & n2 = n_elements(b1) < n2
254	case op of
255	'OR': if n1 EQ 0 then goto, RET_A1 else goto, RET_B1
256	'XOR': if n1 EQ 0 then goto, RET_B1 else goto, RET_A1
257	'AND': begin
258	if keyword_set(not1) AND n1 EQ 0 then goto, RET_B1
259	if keyword_set(not2) AND n2 EQ 0 then goto, RET_A1
260	return, -1L
261	end
262	endcase
263	return, -1L
264	RET_A1:
265	count = n1
266	if kind then begin
267	if count GT 0 then return, a1 else return, -1L
268	endif
269	if count GT 0 then return, a[a1] else return, -1L
270	RET_B1:
271	count = n2
272	if kind then begin
273	if count GT 0 then return, b1+n1 else return, -1L
274	endif
275	if count GT 0 then return, b[b1] else return, -1L
276	endif
277
278	;; Allow data to have different types, but they must be at least of
279	;; the same "base" type. That is, you can't combine a number with a
280	;; string, etc.
281	;; basetype 0:undefined 1:real number 6:complex number 7:string
282	;; 8:structure 10:pointer 11:object
283
284	;; 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
285	basetype = [0, 1, 1, 1, 1, 1, 6, 7, 8, 6,10,11, 1, 1, 1, 1]
286
287	;; Check types of operands
288	sz1 = size(a) & tp1 = sz1[sz1[0]+1]
289	sz2 = size(b) & tp2 = sz2[sz2[0]+1]
290	if tp1 LT 0 OR tp1 GE 16 OR tp2 LT 0 OR tp2 GE 16 then begin
291	message, 'ERROR: unrecognized data types for operands'
292	return, -1
293	endif
294	if basetype[tp1] NE basetype[tp2] then begin
295	TYPE1_ERR:
296	message, 'ERROR: both A and B must be of the same type'
297	return, -1L
298	endif
299	if tp1 EQ 8 OR tp1 EQ 10 OR tp1 EQ 11 then begin
300	TYPE2_ERR:
301	message, 'ERROR: operands must be a numeric or string type'
302	return, -1L
303	endif
304
305	;; Now use two different kinds of algorithms: a slower but more
306	;; general algorithm for generic types, and the histogram technique
307	;; for integer types. Even for integer types, if there is too much
308	;; dynamic range, then the slow method is used.
309
310	if tp1 GE 4 AND tp1 LE 9 then begin
311	;; String and real types, or large int arrays
312	SLOW_SET_OP:
313	case op of
314	'OR': begin
315	uu = [a,b] ;; OR is simple; just take unique values
316	index0 = cmset_op_uniq(uu)
317	count = n_elements(index0)
318	if kind then return, index0
319	return, uu[index0]
320	end
321
322	'XOR': begin
323	;; Make ordered list of set union
324	ai = cmset_op_uniq(a) & na = n_elements(ai)
325	bi = cmset_op_uniq(b) & nb = n_elements(bi)
326	ui = [ai, bi+n1]
327	uu = [a,b] & uu = uu[ui] ;; Raw union...
328	us = sort(uu) & uu = uu[us] ;; ...and sort
329	if kind then ui = ui[temporary(us)] else ui = 0
330
331	;; Values in one set only will not have duplicates
332	wh1 = where(uu NE shift(uu, -1), count1)
333	if count1 EQ 0 then return, -1L
334	wh = where(wh1[1:*]-wh1 EQ 1, count)
335	if wh1[0] EQ 0 then begin
336	if count GT 0 then wh = [-1L, wh] else wh = [-1L]
337	count = n_elements(wh)
338	endif
339	if count EQ 0 then return, -1
340	if kind then return, ui[wh1[wh+1]]
341	return, uu[wh1[wh+1]]
342	end
343
344	'AND': begin
345	;; Make ordered list of set union
346	ai = cmset_op_uniq(a) & na = n_elements(ai)
347	bi = cmset_op_uniq(b) & nb = n_elements(bi)
348	ui = [ai, bi+n1]
349	uu = [a,b] & uu = uu[ui] ;; Raw union...
350	us = sort(uu) & uu = uu[us] ;; ...and sort
351	if kind then ui = ui[us] else ui = 0
352
353	if NOT keyword_set(not1) AND NOT keyword_set(not2) then begin
354
355	;; Special case: if there are one in each set, and
356	;; they are equal, then the SHIFT() technique below
357	;; fails. Do this one by hand.
358	if na EQ 1 AND nb EQ 1 AND uu[0] EQ uu[1] then begin
359	count = 1L
360	if kind then return, 0L
361	return, [uu[0]]
362	endif
363
364	;; If neither "NOT" is set, then find duplicates
365	us = 0L ;; Save memory
366	wh = where(uu EQ shift(uu, -1L), count) ;; Find non-unique
367	if count EQ 0 then return, -1L
368	;; This should always select the element from A
369	;; rather than B (the smaller of the two)
370	if kind then return, (ui[wh] < ui[wh+1])
371	return, uu[wh]
372	endif
373
374	;; For "NOT" cases, we need to identify by set
375	ii = make_array(na+nb, value=1b)
376	if keyword_set(not1) then ii[0:na-1] = 0
377	if keyword_set(not2) then ii[na:*] = 0
378	ii = ii[temporary(us)]
379
380	;; Remove any duplicates
381	wh1 = where(uu EQ shift(uu, -1L), count1) ;; Find non-unique
382	if count1 GT 0 then ii[wh1, wh1+1] = 0
383	;; Remainder is the desired set
384	wh = where(ii, count)
385	if count EQ 0 then return, -1L
386	if kind then return, ui[wh]
387	return, uu[wh]
388	end
389
390	endcase
391	return, -1L ;; DEFAULT CASE
392	endif else begin
393
394	;; INDEX keyword forces the "slow" operation
395	if kind then goto, SLOW_SET_OP
396
397	;; Integer types - use histogram technique if the data range
398	;; is small enough, otherwise use the "slow" technique above
399	min1 = min(a, max=max1) & min2 = min(b, max=max2)
400	minn = min1 < min2 & maxx = max1 > max2
401	nbins = maxx-minn+1
402	if (maxx-minn) GT floor(ma[0]) then goto, SLOW_SET_OP
403
404	;; Work around a stupidity in the built-in IDL HISTOGRAM routine
405	if (tp1 EQ 2 OR tp2 EQ 2) AND (minn LT -32768 OR maxx GT 32767) then $
406	goto, SLOW_SET_OP
407
408	;; Following operations create a histogram of the integer values.
409	ha = histogram(a, min=minn, max=maxx) < 1
410	hb = histogram(b, min=minn, max=maxx) < 1
411
412	;; Compute NOT cases
413	if keyword_set(not1) then ha = 1b - ha
414	if keyword_set(not2) then hb = 1b - hb
415	case op of
416	;; Boolean operations
417	'AND': mask = temporary(ha) AND temporary(hb)
418	'OR': mask = temporary(ha) OR temporary(hb)
419	'XOR': mask = temporary(ha) XOR temporary(hb)
420	endcase
421
422	wh = where(temporary(mask), count)
423	if count EQ 0 then return, -1L
424
425	result = temporary(wh+minn)
426	if tp1 NE tp2 then return, result
427	szr = size(result) & tpr = szr[szr[0]+1]
428
429	;; Cast to the original type if necessary
430	if tpr NE tp1 then begin
431	fresult = make_array(n_elements(result), type=tp1)
432	fresult[0] = temporary(result)
433	result = temporary(fresult)
434	endif
435
436	return, result
437
438	endelse
439
440	return, -1L ;; DEFAULT CASE
441	end
442
443	; Here is how I did the INDEX stuff with fast histogramming. It
444	; works, but is complicated, so I forced it to go to SLOW_SET_OP.
445	; ha = histogram(a, min=minn, max=maxx, reverse=ra) < 1
446	; rr = ra[0:nbins] & mask = rr NE rr[1:] & ra = ra[rr]mask-1L+mask
447	; hb = histogram(b, min=minn, max=maxx, reverse=rb) < 1
448	; rr = rb[0:nbins] & mask = rr NE rr[1:] & rb = rb[rr]mask-1L+mask
449	; ... AND/OR/XOR NOT masking here ...
450	; ra = ra[wh] & rb = rb[wh]
451	; return, ra(ra GE 0) + (rb+n1)(ra LT 0) ;; is last 'ra' right?
452

Note: See TracBrowser for help on using the repository browser.

Download in other formats:

Original Format