Sparkで距離計算をしてみる。 データは (key, [v1, v2,..., vn]) というタプルのリストであるとする
Q=J.histogram(50) In [17]: ?Q Type: tuple String form: ([0.056793451016426098, 6.482786133784419, 12.908778816552413, 19.334771499320404, 25.76076418208 <...> 406, 27434, 17670, 11478, 7324, 4280, 2550, 2038, 1316, 346, 328, 210, 34, 10, 8, 6, 8, 2, 0, 2]) Length: 2 Docstring: tuple() -> empty tuple tuple(iterable) -> tuple initialized from iterable's items If the argument is a tuple, the return value is the same object. In [18]: print Q[0] [0.056793451016426098, 6.482786133784419, 12.908778816552413, 19.334771499320404, 25.760764182088398, 32.186756864856392, 38.612749547624382, 45.03874223039238, 51.46473491316037, 57.890727595928361, 64.316720278696366, 70.742712961464363, 77.168705644232347, 83.594698327000344, 90.020691009768342, 96.446683692536325, 102.87267637530432, 109.29866905807232, 115.7246617408403, 122.1506544236083, 128.5766471063763, 135.0026397891443, 141.42863247191229, 147.85462515468026, 154.28061783744826, 160.70661052021626, 167.13260320298426, 173.55859588575225, 179.98458856852025, 186.41058125128822, 192.83657393405622, 199.26256661682422, 205.68855929959221, 212.11455198236021, 218.54054466512821, 224.96653734789621, 231.39253003066418, 237.81852271343217, 244.24451539620017, 250.67050807896817, 257.09650076173614, 263.52249344450411, 269.94848612727213, 276.3744788100401, 282.80047149280813, 289.2264641755761, 295.65245685834407, 302.07844954111209, 308.50444222388006, 314.93043490664809, 321.35642758941606] In [19]: print Q[1] [3246, 4956, 6868, 11482, 24108, 53192, 114492, 228950, 413436, 661572, 948358, 1224238, 1446794, 1585532, 1642804, 1615678, 1519618, 1365182, 1186784, 1000046, 829116, 666752, 533750, 419492, 321666, 247242, 183192, 135116, 95748, 66502, 44406, 27434, 17670, 11478, 7324, 4280, 2550, 2038, 1316, 346, 328, 210, 34, 10, 8, 6, 8, 2, 0, 2]
この先が思いつかないなぁ