web123456

python function - sequence preprocessing pad_sequences sequence fill

Article Directory

    • 0. Preface
    • 1. Syntax
      • 1.1 Parameter description
      • 1.2 Return value
    • 2. Example


python functionsSeries Directory:python functions--Catalog


0. Preface

For simplicity of implementation, keras can only accept sequence inputs of the same length. Therefore, if the current sequence length is uneven, you need to use pad_sequences(). This function converts a sequence into a new sequence of the same length after filling.

1. Syntax

The official syntax is as follows1
Code.1.1 pad_sequences syntax

keras.preprocessing.sequence.pad_sequences(sequences, 
	maxlen=None,
	dtype='int32',
	padding='pre',
	truncating='pre', 
	value=0.)

1.1 Parameter description

  • sequences: Two-layer nested list of floating point numbers or integers
  • maxlen: None or integer, the maximum length of the sequence. Sequences larger than this length will be truncated, and sequences smaller than this length will be filled in 0 at the back.
  • dtype: The data type of the returned numpy array
  • padding: ‘pre’ or ‘post’, determine whether to make up for 0 at the beginning or end of the sequence.
  • truncating: ‘pre’ or ‘post’, determines whether the sequence needs to be truncated from the beginning or the end
  • value: Float, this value will replace the default padding value 0 in the fill era

1.2 Return value

Returns a 2-dimensional tensor with lengthmaxlen

2. Example

Code.2.1 Simple Example

>>>list_1 = [[2,3,4]]
>>>keras.preprocessing.sequence.pad_sequences(list_1, maxlen=10)
array([[0, 0, 0, 0, 0, 0, 0, 2, 3, 4]], dtype=int32)

>>>list_2 = [[1,2,3,4,5]]
>>>keras.preprocessing.sequence.pad_sequences(list_2, maxlen=10)
array([[0, 0, 0, 0, 0, 1, 2, 3, 4, 5]], dtype=int32)

In natural language, it is generally used with word participle, and it is also mentioned in word participle notes.pad_sequencesUse effect, see the original textpython function - Keras word participle Tokenizer

Code.2.2 Common Examples

>>>tokenizer.texts_to_sequences(["It rains, I work overtime"])
[[4, 5, 6, 7]]

>>>keras.preprocessing.sequence.pad_sequences(tokenizer.texts_to_sequences(["It rains, I work overtime"]), maxlen=20)
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 5, 6, 7]],dtype=int32)

  1. /en/latest/preprocessing/sequence/ ↩︎