python - Why are each frame not equally length? -


i sampling , framing audio files, such can provide input neural network. using librosa sample audio , frame it, framing important, being fed input neural network need means length has consistent, seem problem current. frames.

i sampling , framing this:

def load_sound_files(file_paths , data_input):     raw_sounds = []     data_output = []     fp in file_paths:         y,sr = librosa.load(fp)         x = librosa.util.frame(y)         raw_sounds.append(x) return raw_sounds 

each audio file in appended list, , each entry in list there array each frame. information in raw_sounds stored this:

[array([[frame],[frame],...,[frame]],dtype=float32), ...] 

i seem have problem different sized frames, each audio files has different length, since frame same setting should each frame same, not case according these print debugs.

print len(raw_sounds) print len(raw_sounds[0]) print len(raw_sounds[0][0]) print len(raw_sounds[0][1]) print '\n' print len(raw_sounds[1]) print len(raw_sounds[1][0]) print len(raw_sounds[1][1]) 

which outputs:

270 2048 121 121   96 96 

am setting incorrectly? or doing wrong here?

raw sample:

[array([[ -1.58969939e-04,   2.85098387e-04,   2.57675620e-05,           5.58408792e-04,   2.09050399e-04,   3.10504751e-04,           7.08066545e-06,   6.51864902e-05,   4.64069366e-04,          -1.03915379e-04,  -2.09252365e-04,   9.58807232e-06,          -3.70743481e-04,  -2.73781188e-04,   1.47478888e-03,          -1.24523379e-02,  -1.38171474e-02,   1.42919633e-03,           2.60417676e-03,  -9.49124712e-03,   1.84055939e-02,           5.30609104e-04,  -2.02661729e-03,  -1.09214883e-03,          -2.67810683e-04,  -9.33001807e-04,   1.57146193e-02,           3.06987576e-02,  -2.89204344e-02,   8.31141882e-03,          -5.22559392e-04,   9.57424170e-04,  -1.39959985e-02,          -2.45519826e-04,   7.94889964e-03,  -2.45057382e-02,           2.76992898e-02,   2.75033060e-03,   1.91110268e-03,           2.65958859e-03,   4.22360376e-04,   2.87338579e-03,           3.60440137e-03,  -6.81304885e-03,   1.19333845e-02,           5.27647883e-03,  -8.81725773e-02,  -1.10511519e-02,           1.67427063e-02,   4.18979749e-02,  -1.76561251e-02,           1.40228057e-02,  -6.56250417e-02,   8.04386102e-04,           6.77016005e-03,   8.95334259e-02,  -3.07568144e-02,          -5.68932574e-03,   2.80798669e-03,  -1.94037147e-03,          -6.80876488e-04,  -7.51503045e-04,   1.61860569e-03,          -8.96663638e-04,   1.05839630e-03,   4.16457013e-04,          -1.14849303e-03,   2.51941121e-04,   1.09347668e-04,          -9.77083837e-05,  -9.70639754e-04,   1.23860082e-03,          -5.82281128e-03,  -7.96582922e-03,   1.05014764e-01,           8.55111331e-03,   1.02730282e-02,  -1.64158875e-03,          -9.96976532e-03,  -1.54927105e-03,  -1.33159547e-03,           2.07886100e-03,  -9.63974337e-04,   1.92957837e-03,          -9.57471970e-03,   8.37739408e-02,  -2.46925298e-02,           1.15760174e-02,   1.53850103e-02,   1.39159057e-02,           7.28045590e-04,   1.28218243e-02,   2.47708824e-03,           3.64710722e-05,   2.31177593e-03,  -3.88215925e-03,           2.85943900e-03,   3.40921571e-03,   8.19356064e-04,           1.31994265e-03,  -4.02768754e-04,  -3.73146904e-04,          -2.45199517e-05,  -1.40402978e-03,  -4.53661755e-03,          -8.06837995e-03,  -3.07087135e-03,   5.65649476e-04,           8.99529332e-05,   9.43572959e-04,   1.52094246e-04,          -9.59860045e-04,   2.72397720e-03,   1.27405506e-02,          -9.37244575e-03,  -1.79420076e-02,   1.07235732e-02,           2.84450967e-03,   4.49513178e-03,   2.41923026e-05,          -3.13379533e-05], 

from librosa's documentation util.frame() returns:

returns:         y_frames : np.ndarray [shape=(frame_length, n_frames)]     array of frames sampled y: y_frames[i, j] == y[j * hop_length + i] 

so 2 dimensional array. raw_sounds[0] first sound file loaded, , can addressed in 2 dimensions. should instead use shape size:

print raw_sounds[0].shape 

to 1 frame use notation result[0][:, nf] nf number of frame.

the number getting when ask raw_sounds[0][0] number of frames, , depend on sound samples size. seems work correctly.


Comments

Popular posts from this blog

java - SSE Emitter : Manage timeouts and complete() -

jquery - uncaught exception: DataTables Editor - remote hosting of code not allowed -

java - How to resolve error - package com.squareup.okhttp3 doesn't exist? -