summaryrefslogtreecommitdiffstats
path: root/research/arrays-and-hashes.txt
blob: eac18cd4f4af823be89e1c8d43f4af4dd6d5c67f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
Arrays
======

Arrays are simple lists of values.  Arrays are zero-indexed, but internally
there is an offset that is affected by `shift` and `unshift` calls.

Array keys are non-zero integers.

An array has a length limited only by the highest non-zero integer in the
shell's built-in arithmetic.  An array is grown by calls to `push` and `unshift`
as well as by assigning values with out-of-bounds keys.  An array is shrunk by
calls to `pop` and `shift` as well as by unsetting the 0th and last values.

Array Expansion
---------------

Array expansion initializes an array.  The format for array expansion is as
follows:

	@(values)

where *values* is zero or more *word*s.  A reference to the new array is
substituted.

eshtrans will translate array expansions into calls to `__a_new` (part of
libeshtrans) with array references returned in registers allocated at compile
time.

For example:

	cardinals=@("zero" "one" "two")

compiles to:

	__a_new __r0 "zero" "one" "two"
	cardinals=${__r0}

while:

	cmd @("zero" "one" "two")

compiles to:

	__a_new __r0 "zero" "one" "two"
	cmd ${__r0}

and:

	cardinals=@(@("zero" "cero") @("one" "uno") @("two" "dos"))

compiles to:

	__a_new __r0 "zero" "cero"
	__a_new __r1 "one" "uno"
	__a_new __r2 "two" "dos"
	__a_new __r3 ${__r0} ${__r1} ${__r2}
	cardinals=${__r3}

Quoting the `@(` operator removes its special meaning.

Array Value Assignment
----------------------

The format for array value assignment is as follows:

	name[key]=word

eshtrans will translate array value assignments into calls to `__a_set` (part of
libeshtrans).

For example:

	cardinals[0]="zero"

compiles to:

	__a_set cardinals 0 "zero"

Array Value Expansion
---------------------

The format for array value expansion is as follows:

	${name[key]}

eshtrans will translate array value expansions into calls to `__a_get` (part of
libeshtrans).

For example:

	zero=${cardinals[0]}

compiles to:

	__a_get __r0 cardinals 0
	zero=${__r0}

`unset`
-------

`unset` is a reserved word and a function that unsets variables, functions,
array values, and arrays.

The format for unsetting an array value is as follows:

	unset name[key]

The format for unsetting an array is as follows:

	unset name

eshtrans will check the types of all of the operands of `unset` commands and
translate them into calls to `unset` (the shell's built-in utility),
`__a_unset`, or `__a_delete` as appropriate.

For example:

	unset cardinals[1] cardinals[2]

compiles to:

	__a_unset cardinals 1
	__a_unset cardinals 2

and:

	unset cardinals

compiles to:

	__a_delete cardinals

libeshtrans Functions
---------------------

	# __a_new(__r)
	# Paramters:
	#   * __r: Register in which to store array reference
	__a_new()
	{
		__r=${1}
		shift 1
	
		# Allocate a slot on the array heap.
		__malloc __a
		__a=__mr
	
		# Set values.
		__k=0
		for __v in "${@}"; do
			eval "${__a}__${__k}=\${__v}"
			__k=$((${__k} + 1))
		done
	
		# Set length.
		eval "${__a}_l=\${__k}"
	
		# Store the array reference in the specified return register.
		eval "${__r}=\${__a}
	}
	
	# __a_set(__a __k __v)
	# Parameters:
	#   * __a: Array
	#   * __k: Key
	#   * __v: Value
	__a_set()
	{
		__a=${1}
		__k=${2}
		__v=${3}
	
		# Validate key.
		case ${__k} in *[!0-9]*)
			printf 'Error: Illegal array key: %s\n' "${__k}" 1>&2
			exit 1
		;; esac
	
		# Enlarge array if key is out-of-bounds.
		eval "__l=\${${__a}_l}"
		if [ ${__k} -ge ${__l} ]; then
			eval "${__a}_l=\$((\${__k} - 1))"
		fi
	
		# Set value.
		eval "${__a}__${__k}=\${__v}"
	}
	
	# __a_get(__r __a __k)
	# Parameters:
	#   * __r: Register in which to store value
	#   * __a: Array
	#   * __k: Key
	__a_get()
	{
		__r=${1}
		__a=${2}
		__k=${3}
	
		# Validate key.
		case ${__k} in *[!0-9]*)
			printf 'Error: Illegal array key: %s\n' "${__k}" 1>&2
			exit 1
		;; esac
	
		# Offset key.
		eval "__k=\$((\${__k} + \${${__a}__o}))"
	
		# Store the value in the specified return register.
		# NB: If `set -u` is on, the shell will throw an error on undefined values
		# or out-of-bound keys, as one would expect.
		eval "${__r}=\${${__a}__${__k}}"
	}
	
	# __a_unset(__a __k)
	# Parameters:
	#   * __a: Array
	#   * __k: Key
	__a_unset()
	{
		__a=${1}
		__k=${2}
	
		# ...
	}
	
	# __a_delete(__a)
	# Parameters:
	#   * __a: Array
	__a_delete()
	{
		__a=${1}
	
		# ...
	}

libesh Functions
----------------

	# length(a)
	# Parameters:
	#   * a: Array
	length(a)
	{
		l=
	
		eval "${&l}=\${${a}_l}"
	
		return "${l}"
	}
	
	# push(a v)
	# Parameters:
	#   * a: Array
	#   * v: Value
	push(a v)
	{
		l=
		k=
	
		# Get the current length.
		eval "${&l}=\${${a}_l}"
		# Offset the length to get the new key.
		eval "${&k}=\$((${l} + \${${a}_o}))"
		# Set the new value.
		eval "${a}__${k}=\${${&v}}"
		# Increment the length.
		eval "${a}_l=\$((${l} + 1))"
	}
	
	# pop(a)
	# Parameters:
	#   * a: Array
	pop(a)
	{
		# ...
	}
	
	# __shift(a)
	# Parameters:
	#   * a: Array
	__shift(a)
	{
		# ...
	}
	
	# unshift(a v)
	# Parameters:
	#   * a: Array
	#   * v: Value
	unshift(a v)
	{
		# ...
	}
	
	# split(fs str)
	# Parameters:
	#   * fs: Field separator
	#   * str: String to split
	split(fs str)
	{
		# See split.sh
	}
	
	# join(str a)
	# Parameters:
	#   * str: Separator
	#   * a: Array to join
	# str will be duplicated around unset values in the middle of the array.
	join()
	{
		l=
		joined=
	
		# Get the current length.
		eval "${&l}=\${${a}_l}"
	
		# Join all the array values.
		joined="$(eval "printf '%s' \"$(printf "\\\${${a}__%d}\${str}" \
			$(awk -v end=${l} \
			'BEGIN { for (i = 0; i < end; ++i) print(i); }'))\"")"
	
		return "${joined%${str}}"
	}

Hashes
======

Hashes are sets of key/value pairs.

Hash keys may only consist of Latin letters, digits, and underscores.

Each hash has a list of keys, which is returned by the `keys` function.

Hash Expansion
--------------

Hash expansion initializes a hash.  The format for hash expansion is as follows:

	%(values)

where *values* is zero or more *word*s of the following format:

	key=value

A reference to the new hash is substituted.

eshtrans will translate hash expansions into calls to `__h_new` (part of
libeshtrans) with hash references returned in registers allocated at compile
time.

For example:

	numerals=%(zero=0 one=1 two=2)

compiles to:

	__h_new __r0 zero 0 one 1 two 2
	numerals=${__r0}

while:

	cmd %(zero=0 one=1 two=2)

compiles to:

	__h_new __r0 zero 0 one 1 two 2
	cmd ${__r0}

and:

	numerals=%(\
		zero=%(arabic=0 roman=N) \
		one=%(arabic=1 roman=i) \
		two=%(arabic=2 roman=ii) \
	)

compiles to:

	__h_new __r0 arabic 0 roman N
	__h_new __r1 arabic 1 roman i
	__h_new __r2 arabic 2 roman ii
	__h_new __r3 zero ${__r0} one ${__r1} two ${__r2}
	numerals=${__r3}

Quoting the `%(` operator removes its special meaning.

Hash Value Assignment
---------------------

The format for hash value assignment is as follows:

	name{key}=word

eshtrans will translate hash value assignments into calls to `__h_set` (part of
libeshtrans).

For example:

	numerals{zero}=0

compiles to:

	__h_set numerals zero 0

Hash Value Expansion
--------------------

The format for hash value expansion is as follows:

	${name{key}}

eshtrans will translate hash value expansions into calls to `__h_get` (part of
libeshtrans).

For example:

	zero=${numerals{zero}}

compiles to:

	__h_get __r0 numerals zero
	zero=${__r0}

`ENV` Hash
----------

Environment variables can be accessed through the `ENV` hash.  This circumvents
identifier mangling.

Environment variables can be expanded as in the following example:

	IFS=:
	for dir in ${ENV{PATH}}; do
		: ${dir:=.}
		if [ -x "${dir}/${cmd}" ]; then
			printf '%s\n' "${dir}/${cmd}"
			break
		fi
	done

Environment variables can be set as in the following example:

	ENV[LC_ALL]=C

TODO
====

  * Array function `each` (maintains an iterator)
  * Hash value functions: `length`, `keys`, `values`, `each`
    - `keys` `join`s the keys array with `${US}`, then returns the result?
      + Maybe sets `IFS="${US}"`, uses the shell's field splitting to remove
        `${US}`, and returns a string (bad: space-delimited string of keys that
        may contain spaces)
      + Or just return the array?
    - `each` iterates over the keys array (skipping unset keys) and returns a
      key and value, delimited somehow (or, a new array?)
  * Unsetting hash values and hashes
    - Will need to unset keys array value
    - Maintain a key ID (or key key) for each hash key (i.e. the key in the keys
      array)
    - Keys array could become large and sparse with lots of unsetting and
      setting; should be heap instead?
  * As above, consider building the hash structure on a heap built on an array
  * Reference type checking in functions