c++ - Validation for error -
while developing personal library stumbled upon think error inside libstdc++6.
because i'm quite sure library has been reviewed lot of higher skilled people came here validate finding , assistance on further steps.
consider following code:
#include <regex> #include <iostream> int main() { std::string uri = "http://example.com/test.html"; std::regex reg(...); std::smatch match; std::regex_match(uri, match, reg); for(auto& e: match) { std::cout<<e.str() <<std::endl; } }
i have written regex parse url into
- protocol
- user/pass (optional)
- host
- port (optional)
- path (optional)
- query (optional)
- location (optional)
i used following regex (in c++):
std::regex reg("^(.+):\\/\\/(.+@)?([a-za-z\\.\\-0-9]+)(:\\d{1,5})?([^?\\n\\#]*)(\\?[^#\\n]*)?(\\#.*)?$");
this worked quite fine in online tester , msvc++ 2015 update 3 fails on build host host part matches both host , path.
buildhost:
g++ (ubuntu 5.4.0-6ubuntu1~16.04.2) 5.4.0 20160609
libstdc++6:amd64 5.4.0-6ubuntu1~16.04.2
i consider error because if change regex this:
std::regex reg("^(.+):\\/\\/(.+@)?([a-za-z\\.0-9\\-]+)(:\\d{1,5})?([^?\\n\\#]*)(\\?[^#\\n]*)?(\\#.*)?$");
it works fine althought should behave same.
failing regex: https://ideone.com/7n2jdk
working regex: https://ideone.com/6nmpuw
do miss important here or error within libstdc++6 ?
the difference on char class:
[a-za-z\\.\\-0-9] // not working [a-za-z\\.0-9\\-] // working
this bug because "[.\\-0]"
should parsed character class matching character either .
or -
(since hyphen escaped literal \
) or 0
. unknown reason, hyphen parsed range operator , [a-za-z\\.\\-0-9]+
subexpression becomes equal [a-za-z.-0-9]+
. see this regex demo.
the second expression works because -
@ end of character class (and @ start) parsed literal hyphen.
another example of same bug:
std::string uri = "%"; std::regex reg(r"([$\-&])"); std::smatch match; std::regex_match(uri, match, reg); for(auto& e: match) { std::cout<< e.str() <<std::endl; }
the [$\-&]
regex should not match %
, should match $
, -
or &
, whatever reason, %
(that between $
, &
in ascii table) is still matched.
Comments
Post a Comment