LIKE clause in Sybase/SAP ASE trimmed at the end? - sybase-ase

The emp table below has no ENAME ending in three spaces. However, the following SQL statement behaves like the clause is trimmed at the end (like a '%' pattern), because it returns all records:
select ENAME from dbo.emp where ENAME like '% '
I tried many other database platforms (including SQL Server, SQL Anywhere, Oracle, PostgreSQL, MySQL etc), I've seen this happening only in Sybase/SAP ASE (version 16). Is this a bug or is it "by design"? Nothing specific found in the online spec in this regard.
I'm looking for a generic fix, to apply some simple transformation to the pattern and return what is expected from any other platform. Without knowing in advance what data type the field is or what kind of data it holds.

This is caused by the VARCHAR semantics in ASE, which will always strip leading spaces from a value before operating on it. This is applied to the string '% ' before it is used, since that is a VARCHAR value by definition. This is indeed a particular semantic of ASE.
Now, you could try working around this by using the [ ] wildcard to match a space, but there are some things to be aware of. First, the column being matched (ENAME) must be CHAR, not VARCHAR, otherwise any trialing spaces will have been stripped as well before they were stored. Assuming the column is CHAR, then using a pattern '%[ ][ ][ ]' unfortunately still does not appear to work. I think there may be some trailing-space-stripping still happening here.
The best way to work around this is to use an artificial end-of-field delimiter which will not occur in the data, e.g.
ENAME||'~EOF~' like '% ~EOF~'
This works. But note that the column ENAME must still be CHAR rather than VARCHAR.

Like behavior is somehow documented in here .
For VARCHAR columns this will never work because ASE removes the trailing spaces
For CHAR it depends how do you insert the data.. in a char(10) column , if you insert 2 characters , ASE will add 8 blank spaces after the 2 characters to make them 10 .. so when you query , you will get this 2 characters entry as part of the result set because it includes more than 3 trailing spaces..
If this is not a problem for you, instead of like you can use char_index () which will count the trailing spaces and won't truncate them as like, so you could write something like :
select ENAME from dbo.emp where char_index(' ',ENAME) >0
Or you can calculate the trailing spaces , then check if your 3 spaces come after that or not , like :
select a from A
where charindex(' ',a) > (len(a) - len(convert (varchar(10) , a)))
Now again, this will get you more rows than expected if the data were inserted in a non-uniform count, but will work perfectly if you know exactly what to search for.

SELECT ename from dbo.emp where RIGHT(ENAME ,3) = '      '

Related

Regular expression for query to SQL Server

I have a SQL Server connection to an external table in my application and I need to make a query where one of the columns has wrong formatting, let's say, the format is alphanumeric without symbols but the column has data with dashes, apostrophes, dots, you name it. Is it possible to just query one of the columns with that filtered out? It'd really help me. I'm using Laravel and I know I can make an accessor to clean that out but the query is heavy.
This is an example:
Data sought: 322211564
Data found: 322'211'564
Also 322-211-564
EDIT: Just to clarify, I don't want to EXCLUDE data, but to "reformat" it without symbols.
EDIT: By the way, if you're curious using Laravel 5.7 apparently you can query the accessor directly if you have the collection already. I'm surprised but it does the trick.
A wild card guess, but perhaps this works:
WITH VTE AS(
SELECT *
FROM (VALUES('322''211''564'),
('322-211-564')) V(S))
SELECT S,
(SELECT '' + token
FROM dbo.NGrams8k(V.S,1) N
WHERE token LIKE '[A-z0-9]'
ORDER BY position
FOR XML PATH('')) AS S2
FROM VTE V;
This makes use of the NGrams8k function. If you need other acceptable characters you can simply add them to the pattern string ('[A-z0-9]').
If, for some reason, you don't want to use NGrams8k, you could create an inline tally table, which will perform a similar function:
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
FROM N N1 --10
CROSS JOIN N N2 --100
CROSS JOIN N N3 --1000
CROSS JOIN N N4 --10000 --Do we need any more than that? You may need less
),
VTE AS(
SELECT *
FROM (VALUES('322''211''564'),
('322-211-564')) V(S))
SELECT V.S,
(SELECT '' + SS.C
FROM Tally T
CROSS APPLY (VALUES(SUBSTRING(V.S,T.I,1))) SS(C)
WHERE SS.C LIKE '[A-z0-9]'
ORDER BY T.I
FOR XML PATH(''),TYPE).value('.','varchar(8000)') AS S2
FROM VTE V;
Also, just in case, I've used the TYPE format and the value function. If you then change your mind about not wanting any special characters and need an acceptable character like &, it won't be changed to &.
Note for pattern-based string replacements, you can use a library like SQL Server Regex. Call RegexReplace on the string you want to transform:
select RegexReplace(col, '[^A-Za-z0-9]', '') from tbl
That call will remove any non-alphanumeric character.
To find all the rows where the column contains only alphanumeric characters:
select col from tbl where col not like '%[^A-Za-z0-9]%'
The like pattern consists of:
% - Matches 0 or more characaters.
[^A-Za-z0-9] - Matches any character not in A-Z, a-z, and 0-9. The ^ symbol at the beginning of the character class means characters that do not match.
By using not like your query will reject strings that contain a non-alphanumeric character anywhere in the string.

Delete duplicate records in MS Access that aren't exact matches

I am working with Excel spreadsheets that I'm importing into MS Access. They include a client name, date of birth, some other personal information, and order information. The same clients often have multiple, unique orders. I am creating a table that is just unique clients (which I'll link to the order table later) and so when I import data from Excel I would like to delete duplicate client records, preserving one. I would like to match them on Name and Date of Birth. The issue I'm running into is that some client names are strings that don't match exactly.
For example:
Name DOB
---- ---
DOE,JOHN 1/1/1960
DOE,JOHN L 1/1/1960
JOHNSON,PAT 12/1/1945
SMITH,BETTY 2/1/1935
In the above set I'd like to limit it to just three records and remove an excess John Doe record.
I basically would like to only look at the client name before the space.
I wouldn't be opposed to losing the middle initial totally, so if there's a way to just chop it off, that'd work too. How can I achieve this?
Sounds like your easiest option is to in fact cut off any middle initals.
You'll want to process as follows.
Use Select DISTINCT when all done and said.
If you use the InStr function Syntax HERE , you can search for the space after the first name.
you can then choose to select only what's left of that with the Left function (left minus 1 as to not include the space). You'll come up with an error if a space isn't found, so add and iif statement to simply output just the name.
After reviewing the data, you'll need to remove column 1 (in the example below) as well as insert the Expr1 code directly into the iif statement, so in the end you'll only have two columns: DOB and Expr2 (or rename AS Name)
Here's an example:
SELECT DISTINCT
Table1.Name,
Table1.DOB,
InStr(1,[Table1].[Name]," ",1) AS Expr1,
IIf([expr1]>0,Left([Table1].[Name],[Expr1]-1),[Table1].[Name]) AS Expr2
FROM Table1;
Wayne beat me to it..

remove trailing whitespace in MySQL

Why does my SELECT not ignore whitespace, and why does TRIM seem to have no effect?
Here are the partial results of something like SELECT DISTINCT crop_year from raw_data:
Rows crop_year
105755 '2010'
12326 '2010 '
256363 '2011'
319321 '2011 '
...
I typed in the single quotes to illustrate the fact that there is trailing whitespace.
I have 2 problems here on my production server... which I cannot replicate locally:
A. I expect SELECT to ignore whitespace, as explained in numerous other questions, as well as the MySQL Docs, but it clearly does not work this way on my prod server.
B. I expect UPDATE raw_data SET crop_year = TRIM(crop_year) to fix the problem, but running this query results in 0 affected rows.
Other background:
This column type is VARCHAR(11).
If it's relevant, the table contains mixed storage engines and collations: this table is MyISAM, and this column is currently latin1_swedish_ci.
PS: In this specific case, year is always a 4 digit value, so I eventually changed the column from a VARCHAR(11) to CHAR(4)... which effectively trimmed the whitespace, but I am still posting the question because I find it likely that I will encounter a similar problem on other columns which are not of a fixed length.
Figured it out, with help from this question. TRIM does not respect all whitespace... just spaces. In my case, I needed:
select distinct trim(BOTH '\r' from crop_year) as crop_year FROM raw_field_data
Other variations could include
select distinct trim(BOTH '\n' from crop_year) as crop_year FROM raw_field_data or
select distinct trim(BOTH '\n\r' from crop_year) as crop_year FROM raw_field_data
Maybe you place extra space in your "form" when you enter the value on the crop year. check your input forms maybe...

Trimming Blank Spaces in Char Column in DB2

I'm trying to remove blank spaces that appear in a CHAR column within DB2. I received some helped here with the function TRANSLATE to determine if Left contained records that began with three letters:
select pat.f1, hos.hpid, hos.hpcd
from patall3 pat
join hospidl1 hos on pat.f1=hos.hpacct
where TRANSLATE(
LEFT( hos.hpid, 3 ),
'AAAAAAAAAAAAAAAAAAAAAAAAA',
'BCDEFGHIJKLMNOPQRSTUVWXYZ'
) <> 'AAA'
order by pat.f1;
But as you can see in my screenshot, there are records that remain, presumably because they begin with a blank space. I tried cast (hos.hpid as varchar) but that doesn't work. Is it possible to trim these blank spaces?
Thanks,
Use LTRIM() or TRIM() to trim blanks before the LEFT()
select pat.f1, hos.hpid, hos.hpcd
from patall3 pat
join hospidl1 hos on pat.f1=hos.hpacct
where TRANSLATE(
LEFT( LTRIM(hos.hpid), 3 ),
'AAAAAAAAAAAAAAAAAAAAAAAAA',
'BCDEFGHIJKLMNOPQRSTUVWXYZ'
) <> 'AAA'
order by pat.f1;
Note that the use of such functions in the WHERE clause means that performance is going to take a hit. At minimum, the query engine will have to do a full index scan; it may do a full table scan.
If this is a one time thing or a small table, it's not a big deal. But if you need to do this often on a big table look to see if your platform and version of DB2 supports expressions in indexes...
create index myindex on hospidl1
( TRANSLATE(
LEFT( TRIM(hpid), 3 ),
'AAAAAAAAAAAAAAAAAAAAAAAAA',
'BCDEFGHIJKLMNOPQRSTUVWXYZ'
) );
In recent versions of db2, you can also use just trim() to remove blanks from both sides.

MYSQL, can't compare(equalize) two values

my problem is very strange I think. I'm trying to use the following sql code:
SELECT `name` FROM `table` WHERE `id` = 'roman' GROUP by `name`
and it returns me a null value, I'm sure the code is correct because it works with other tables. I've noticed that if I go to the table, remove the id value and then type it manually again, it works just fine (even tho the rest of the values look the same). My table has been imported from csv file, but I'm sure that I don't have any spaces/letters/wrong chars etc at the end of lines (even checked that with HexEdit). For example if I use:
SELECT `name` FROM `table` WHERE `id` LIKE 'roman%' GROUP by `name`
then it works, so it appears that there's something wrong with the id values somehow. I've tried to change charset too.
The most likely explanation is that there is a non-printable character stored in the column, likely a carriage return, tab, or line feed.
To find out, try getting a hex representation of the value stored, and a byte length.
SELECT HEX('roman')
, LENGTH('roman')
, HEX(t.id)
, LENGTH(t.id)
FROM mytable t
WHERE t.id <> 'roman'
AND t.id LIKE 'roman%'
(It's possible that there's a characterset conversion going on. The same query above will reveal some details.)
There's a carriage return character, decimal value 13, on the end of the value stored. Compare to:
SELECT HEX('roman\r')
HEX('roman\r')
----------------
726F6D616E0D
^^
To remove ALL occurrences of the carriage return characters from the id column (not just a trailing one), you can use the REPLACE function.
SELECT HEX(REPLACE('ro\rman\r','\r',''))
HEX(REPLACE('ro\rman\r','\r',''))
-----------------------------------
726F6D616E
in an UPDATE statement, e.g.
UPDATE mytable SET id = REPLACE(id,'\r','')
If id is the last field in the .csv file, it's likely that the file had lines ended DOS style, with carriage return and line feed; and the process that read the input file stripped off the line feed only, and left the carriage return as part of the data.
Looks like this was caused by csv import, all values from last column have an extra line break symbol. I've tried the following code:
SELECT name FROM table WHERE id LIKE 'roman_' GROUP by name
so it will "skip" the last char and still not select my other values like: roman_string roman_string2 etc. Maybe it's not the best solution but it's better than spending a lot of time rebuilding csv import script :) Thanks

Resources