I am currently working on a web app which is using a specific string to call a function. Here is a sample string:
$string = "translate from-to word for translate"
First I need to validate the string, and it should be like the above $string
. How should I validate the string?
Then I need to extract 3 substrings from $string
.
$target
)$source
)$source
to the end of the string. (To be named: $translate
)This is my coding attempt to get the from
and to
:
$found = false;
$source ="";
$target = "";
$next = 3;
$prev = 1;
for($i=0;$i<strlen($string);$i++){
if($found== false){
if($string[$i] == "-"){
$found = true;
while($string[$i+$prev] != " "){
$target .= $string[$i+$prev];
$prev +=1;
}
/*$next -=1;
while($string[$i-$next] != " " && $next > 0){
$source .= $string[$i-$next];
$next -=1;
}*/
}
}
}
From that code, I only can return the $target
which contains to
after -
.
I don't know how to get $source
.
Please show me the fastest way to get the from
as $source
and to
as $target
.
Then I need to get word for translate
(all of the string after from-to
).
So the result should be
$target = "to";
$source = "from";
$translate = "word for translate";
Finally, if the $string
has two hyphens, like translate from-to from-to test-test word for translate
, it should be return false
;
note to
and from
are random strings.
Consider the following possible input strings:
translate from-to word for translate
(1 hyphen, no accents or non-English characters)translate dari-ke dari-ke word for translate
(2 hyphens)translate clé-solution word for translate
(1 hyphen, accented character used)translate goodbye-さようなら word for translate
(1 hyphen , Japanese characters used)A case-insensitive pattern like: /^[a-z]+? ([a-z]+)-([a-z]+?) ([a-z ]+)$/i
will perform as requested on the first two sample strings with high efficiency, but not the last two.
Using the "word character" (\w
) to match the substrings (instead of case-insensitive [a-z]
) will perform as intended with the first two samples with, but also allows 0-9
and _
as valid characters. This means a slight drop in pattern accuracy (this may be of no noticeable consequence to your project).
If you are translating strings that may go beyond English characters, it can be simpler / more forgiving to use a "negated character class" for matching. If you want to allow letters beyond a-z
, like accented and other multibyte characters, then [^-]
will offer a broad allowance of characters (at the expense of allowing many unwanted letters too). Here is a demo of this kind of pattern.
It is important to only write "capture groups" for substrings that you want to subsequently use. For this reason, I do not capture the leading substring translate
.
list()
is a handy "language construct" to assign variable names to array values. Notice that the first element (the fullstring match) is not assigned to a variable. This is why list()
's parameters starts with ,
. If you don't wish to leverage the convenience of list()
, then you can manually assign the three variable names over three lines like this:
$source=$out[1];
$target=$out[2];
$translate=$out[3];
Code: (Demo)
$strings=[
"translate from-to word for translate",
"translate dari-ke dari-ke word for translate",
"translate clé-solution word for translate",
"translate goodbye-さようなら word for translate"
];
foreach($strings as $string){
if(preg_match('/^[a-z]+? ([^-]+)-([^-]+?) ([a-z ]+)$/i',$string,$out)){
list(,$source,$target,$translate)=$out;
echo "source=$source; target=$target; translate=$translate";
}else{
var_export(false); // $found=false;
}
echo "<br>";
}
Output:
source=from; target=to; translate=word for translate
false
source=clé; target=solution; translate=word for translate
source=goodbye; target=さようなら; translate=word for translate
While regex provides a much more concise method with fewer function calls, this is a non-regex method:
if(substr_count($string,'-')!=1){
var_export(false); // $found=false;
}else{
$trimmed=ltrim($string,'translate ');
$array=explode(' ',$trimmed,2);
list($source,$target)=explode('-',$array[0]);
$translate=$array[1];
echo "source=$source; target=$target; translate=$translate";
}