Skip to content

Received a label value of -2147483648 which is outside the valid range of [0, 5). #15

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
parkourcx opened this issue Jul 28, 2018 · 10 comments

Comments

@parkourcx
Copy link

parkourcx commented Jul 28, 2018

Bi-directional lstm中文分词里,报错tensorflow.python.framework.errors_impl.InvalidArgumentError: Received a label value of -2147483648 which is outside the valid range of [0, 5). Label values: -2147483648 -2147483648 2 3 -2147483648 0 0 0 0 0 0 0 0 0 0 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 2 3 -2147483648 0 0 0 0 0 0 0 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 2 3 -2147483648 0 0 0 0 0 0 0 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 2 3 -2147483648 0 0 0 0 0 -2147483648 -2147483648 -2147483648 -2147483648 ...等等等].我用的是自己的数据集,处理的跟样例数据集一样的形式(今/B 天/M是/M个/M好/E3天/E2气/E),结果报这个错,请问是否是我的数据集中的句子长度过长?该如何解决?

@parkourcx parkourcx changed the title ValueError: setting an array element with a sequence. Received a label value of -2147483648 which is outside the valid range of [0, 5). Jul 29, 2018
@yongyehuang
Copy link
Owner

@parkourcx 你好,感谢提问。你的这个问题应该不是句子长度的问题,而是数据处理中每个字的label标注的不对。我记得标注中只用了 s b m e 四种标注分别表示: s- 单字成词, b- 词首, m-词中,e-词尾;对于 padding 部分统一使用 x 作为标注。从你的报错来看你的 label 有些 -2147483648 应该是不对的,还有我也不太明白(今/B 天/M是/M个/M好/E3天/E2气/E) 为什么这样标注。

@parkourcx
Copy link
Author

parkourcx commented Aug 4, 2018 via email

@yongyehuang
Copy link
Owner

@parkourcx 这样的话应该没有什么问题,你可以比较一下这样的标注和只使用 s b m e 四tag标注的方式看看那个效果好。模型的话这个模型也是比较简单的模型,你也可以尝试一下 lstm+crf 的模型(我自己也没跑过。。。),序列标注中用得还是比较多的。

@parkourcx
Copy link
Author

parkourcx commented Aug 4, 2018 via email

@yongyehuang
Copy link
Owner

@parkourcx padding 是为了把每个样本变成一样的长度,对于长度不足的部分序列要使用一个特殊符号进行补充,这个特殊符号都标注为一个新的label,所以你还是使用 tags=[‘s’,‘b’,‘m’,‘e’,‘x’] 吧。

@parkourcx
Copy link
Author

parkourcx commented Aug 4, 2018 via email

@yongyehuang
Copy link
Owner

@parkourcx 'x' 是在代码处理中加上的tag,不是标注数据中的 tag

@parkourcx
Copy link
Author

parkourcx commented Aug 4, 2018 via email

@parkourcx
Copy link
Author

parkourcx commented Aug 10, 2018 via email

@parkourcx
Copy link
Author

parkourcx commented Aug 10, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants