python function - sequence preprocessing pad

Article Directory

- 0. Preface
- 1. Syntax
- - 1.1 Parameter description
  - 1.2 Return value
- 2. Example

python functionsSeries Directory:python functions--Catalog

0. Preface

For simplicity of implementation, keras can only accept sequence inputs of the same length. Therefore, if the current sequence length is uneven, you need to use pad_sequences(). This function converts a sequence into a new sequence of the same length after filling.

1. Syntax

The official syntax is as follows¹：
Code.1.1 pad_sequences syntax

keras.preprocessing.sequence.pad_sequences(sequences, 
	maxlen=None,
	dtype='int32',
	padding='pre',
	truncating='pre', 
	value=0.)

1.1 Parameter description

sequences: Two-layer nested list of floating point numbers or integers
maxlen: None or integer, the maximum length of the sequence. Sequences larger than this length will be truncated, and sequences smaller than this length will be filled in 0 at the back.
dtype: The data type of the returned numpy array
padding: ‘pre’ or ‘post’, determine whether to make up for 0 at the beginning or end of the sequence.
truncating: ‘pre’ or ‘post’, determines whether the sequence needs to be truncated from the beginning or the end
value: Float, this value will replace the default padding value 0 in the fill era

1.2 Return value

Returns a 2-dimensional tensor with lengthmaxlen

2. Example

Code.2.1 Simple Example

>>>list_1 = [[2,3,4]]
>>>keras.preprocessing.sequence.pad_sequences(list_1, maxlen=10)
array([[0, 0, 0, 0, 0, 0, 0, 2, 3, 4]], dtype=int32)

>>>list_2 = [[1,2,3,4,5]]
>>>keras.preprocessing.sequence.pad_sequences(list_2, maxlen=10)
array([[0, 0, 0, 0, 0, 1, 2, 3, 4, 5]], dtype=int32)

In natural language, it is generally used with word participle, and it is also mentioned in word participle notes.pad_sequencesUse effect, see the original textpython function - Keras word participle Tokenizer

Code.2.2 Common Examples

>>>tokenizer.texts_to_sequences(["It rains, I work overtime"])
[[4, 5, 6, 7]]

>>>keras.preprocessing.sequence.pad_sequences(tokenizer.texts_to_sequences(["It rains, I work overtime"]), maxlen=20)
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 5, 6, 7]],dtype=int32)

/en/latest/preprocessing/sequence/ ↩︎