Switch partition into existing table - sybase-ase

Oracle and SQL Server have this feature which allows to switch a tables partition into an existing table (its discussed for example here). I am unable to find a similar feature for Sybase ASE.
Question Part A: Is there a way to switch partitions in Sybase ASE?
If the answer is 'no' I am unsure how to proceed. My situtation is that I have a very large table which is indexed by dates. Now I need to add data for a new Date T_n+1.
large table new
-------------------- ------
|T1|T2|T3| .... |Tn| + |Tn+1|
-------------------- ------
The insert is fast enough if I drop the index on the table first, but the recreation of the index takes a lot of time. There has to be a better way!
Question Part B: what is the fastest way to add this data for Tn+1 into the large table.

Answer Part A:
While Sybase ASE supports move partition and merge partition, these commands work within a single table, ie, Sybase ASE does not support the movement of partitions between (different) tables.
Answer Part B:
Assuming dropping and recreating indices is too expensive (in terms of time; in terms of users needing indices to access other partitions), you're not left with a lot of options to speed up the inserts other than some basics:
bulk insert
minimize the number of transactions (ie, reduce number of times you have to write to the log)
disable triggers for the session inserting the data [obviously you would need to decide how to handle any must-have logic that resides in the trigger]
bump up the lock escalation threshold (for the table) to insure you don't escalate to a table-level exclusive lock; only of interest if you can't afford to block other users with a table-level exclusive lock; may need to bump up the number of locks configuration setting; less of an issue with ASE 16.0+ since the insert should only escalate to a partition-level lock
if you are replicating out of this table you may want to consider the overhead of replicating the insert vs directly inserting the data into the replicate table(s) [would obviously require disabling replication of the inserts to the primary table]

Related

locking a single field for exclusive access in mariadb

I am running a web application in PHP, using mariadb as the database. Various independent scripts need to allocate a unique sequential sequence number (an invoice number), on completion of tasks performed for customers. Even though the invoices are different in nature and format, and in when and how they are generated, the managers want a common numbering system to show up in the accounting. I have a system parameters table and the next available invoice number is a field therein.
I need for each independent script to be able to allocate the next available invoice number while locking out other scripts trying to do the same thing. While still locked, the number is then incremented for the next process in line. The primary goal is to avoid duplicate numbers without creating a bottleneck. The secondary goal is to avoid wasting numbers by skipping some.
It would be ideal to be able to lock the field, though just locking the table would be sufficient most of the time. The number of allocations over time is not huge, though it could pick up in the future.
Plan A: Simply use AUTO_INCREMENT. Since you want a single stream of ids, you must have a single table where that id is generated. That is not that much impact to worry about. Caveat: In a multi-master (incl. Galera / Group Replication), environment, AUTO_INCREMENT values are not consecutive. Caveat: Try to avoid INSERT IGNORE, REPLACE, IODKU and other commands that may burn ids.
Plan B: Have a table dedicated just to this sequence number task. Write a Stored Function to get the 'next' number. Call it in its own transaction. This will minimize the impact on other activities.
Both of the above avoid dup ids. However, you cannot prevent skipped numbers. Still, there are some things to do to avoid skipping numbers. Minimize the need for ROLLBACK. If you do ROLLBACK, you should probably apply the lost id to a "voided" invoice. This might keep the Accountants and Auditors happier.
Another issue: If you are using Replication, you cannot guarantee that the numbers will show up on the Slaves monotonically. This is because grabbing an id is early in processing, but replication occurs in COMMIT order.
I vote for Plan B (which may include an AUTO_INCREMENT under the covers). It gives you a centralized place to make the next change that the managers will ask for.

How to do monthly refresh of large DB tables without interrupting user access to them

I have four DB tables in an Oracle database that need to be rewritten/refreshed every week or every month. I am writing this script in PHP using the standard OCI functions, that will read new data in from XML and refresh these four tables. The four tables have the following properties
TABLE A - up to 2mil rows, one primary key (One row might take max 2K data)
TABLE B - up to 10mil rows, one foreign key pointing to TABLE A (One row might take max 1100 bytes of data)
TABLE C - up to 10mil rows, one foreign key pointing to TABLE A (One row might take max 1100 bytes of data)
TABLE D - up to 10mil rows, one foreign key pointing to TABLE A (One row might take max 120 bytes of data)
So I need to repopulate these tables without damaging the user experience. I obviously can't delete the tables and just repopulate them as it is a somewhat lengthy process.
I've considered just a big transaction where I DELETE FROM all of the tables and just regenerate them. I get a little concerned about the length of the transaction (don't know yet but it could take an hour or so).
I wanted to create temp table replicas of all of the tables and populate those instead. Then I could DROP the main tables and rename the temp tables. However you can't do the DROP and ALTER table statements within a transaction as they always do an auto commit. This should be able to be done quickly (four DROP and and four ALTER TABLE statements), but it can't guarantee that a user won't get an error within that short period of time.
Now, a combination of the two ideas, I'm considering doing the temp tables, then doing a DELETE FROM on all four original tables and then and INSERT INTO from the temp tables to repopulate the main tables. Since there are no DDL statements here, this would all work within a transaction. Then, however, I wondering if the memory it takes to process some 60 million records within a transaction is going to get me in trouble (this would be a concern for the first idea as well).
I would think this would be a common scenario. Is there a standard or recommended way of doing this? Any tips would be appreciated. Thanks.
Am I the only one (except Vincent) who would first test the simplest possible solution, i.e. DELETE/INSERT, before trying to build something more advanced?
Then, however, I wondering if the memory it takes to process some 60 million records within a transaction is going to get me in trouble (this would be a concern for the first idea as well).
Oracle manages memory quite well, it hasn't been written by a bunch of Java novices (oops it just came out of my mouth!). So the real question is, do you have to worry about the performance penalties of thrashing REDO and UNDO log files... In other words, build a performance test case and run it on your server and see how long it takes. During the DELETE / INSERT the system will be not as responsive as usual but other sessions can still perform SELECTs without any fears of deadlocks, memory leaks or system crashes. Hint: DB servers are usually disk-bound, so getting a proper RAID array is usually a very good investment.
On the other hand, if the performance is critical, you can select one of the alternative approaches described in this thread:
partitioning if you have the license
table renaming if you don't, but be mindful that DDLs on the fly can cause some side effects such as object invalidation, ORA-06508...
You could have a synonym for each of your big tables. Create new incarnations of your tables, populate them, drop and recreate the synonyms, and finally drop your old tables. This has the advantage of (1) only one actual set of DML (the inserts) avoiding redo generation for your deletes and (2) the synonym drop/recreate is very fast, minimizing the potential for a "bad user experience".
Reminds me of a minor peeve of mine about Oracle's synonyms: why isn't there an ALTER SYNONYM command?
I'm assuming your users don't actually modify the data in these tables since it is deleted from another source every week, so it doesn't really matter if you lock the tables for a full hour. The users can still query the data, you just have to size you rollback segment appropriately. A simple DELETE+INSERT therefore should work fine.
Now if you want to speed things up AND if the new data has little difference with the previous data you could load the new data into temporary tables and updating the tables with the delta with a combination of MERGE+DELETE like this:
Setup:
CREATE TABLE a (ID NUMBER PRIMARY KEY, a_data CHAR(200));
CREATE GLOBAL TEMPORARY TABLE temp_a (
ID NUMBER PRIMARY KEY, a_data CHAR(200)
) ON COMMIT PRESERVE ROWS;
-- Load A
INSERT INTO a
(SELECT ROWNUM, to_char(ROWNUM) FROM dual CONNECT BY LEVEL <= 10000);
-- Load TEMP_A with extra rows
INSERT INTO temp_a
(SELECT ROWNUM + 100, to_char(ROWNUM + 100)
FROM dual
CONNECT BY LEVEL <= 10000);
UPDATE temp_a SET a_data = 'x' WHERE mod(ID, 1000) = 0;
This MERGE statement will insert the new rows and update the old rows only if they are different:
SQL> MERGE INTO a
2 USING (SELECT temp_a.id, temp_a.a_data
3 FROM temp_a
4 LEFT JOIN a ON (temp_a.id = a.id)
5 WHERE decode(a.a_data, temp_a.a_data, 1) IS NULL) temp_a
6 ON (a.id = temp_a.id)
7 WHEN MATCHED THEN
8 UPDATE SET a.a_data = temp_a.a_data
9 WHEN NOT MATCHED THEN
10 INSERT (id, a_data) VALUES (temp_a.id, temp_a.a_data);
Done
You will then need to delete the rows that aren't in the new set of data:
SQL> DELETE FROM a WHERE a.id NOT IN (SELECT temp_a.id FROM temp_a);
100 rows deleted
You would insert into A then into the child tables and deleting in reverse order.
In Oracle your can partition your tables and indexes based on a Date or time column that way to remove a lot of data you can simply drop the partition instead of performing a delete command.
We used to use this to manage monthly archives of 100 Million+ records and not have downtime.
http://www.oracle.com/technology/oramag/oracle/06-sep/o56partition.html is a super handy page for learning about partitioning.
I assume that this refreshing activity is the only way of data changing in these tables, so that you don't need to worry about inconsistencies due to other writing processes during the load.
All that deleting and inserting will be costly in terms of undo usage; you also would exclude the option of using faster data loading techniques. For example, your inserts will go much, much faster if you insert into the tables with no indexes, then apply the indexes after the load is done. There are other strategies as well, but both of them preclude the "do it all in one transaction" technique.
Your second choice would be my choice - build the new tables, then rename the old ones to a dummy name, rename the temps to the new name, then drop the old tables. Since the renames are fast, you'd have a less than one second window when the tables were unavailable, and you'd then be free to drop the old tables at your leisure.
If that one second window is unacceptable, one method I've used in situations like this is to use an additional locking object - specifically, a table with a single row that users would be required to select from before they access the real tables, and that your load process could lock in exclusive mode before it it does the rename operation.
Your PHP script would use two connections to the db - one where you do the lock, the other where you do the loading, renaming and dropping. This way the implicit commits in the work connection won't terminate the lock in the other table.
So, in the script, you'd do something like:
Connection 1:
Create temp tables, load them, create new indexes
Connection 2:
LOCK TABLE Load_Locker IN SHARE ROW EXCLUSIVE MODE;
Connection 1:
Perform renaming swap of old & new tables
Connection 2:
Rollback;
Connection 1:
Drop old tables.
Meanwhile, your clients would issue the following command immediately after starting a transaction (or a series of selects):
LOCK TABLE Load_Locker IN SHARE MODE;
You can have as many clients locking the table this way - your process above will block behind them until they have all released the lock, at which point subsequent clients will block until you perform your operations. Since the only thing you're doing inside the context of the SHARE ROW EXCLUSIVE lock is renaming tables, your clients would only ever block for an instant. Additionally, putting this level of granularity allows you to control how long the clients would have a read consistent view of the old table; without it, if you had a client that did a series of reads that took some time, you might end up changing the tables mid-stream and wind up with weird results if the early queries pulled old data & the later queries pulled new data. Using SET TRANSACTION SET ISOLATION LEVEL READ ONLY would be another way of addressing this issue if you weren't using my approach.
The only real downside to this approach is that if your client read transactions take some time, you run the risk of other clients being blocked for longer than an instant, since any locks in SHARE MODE that occur after your load process issues its SHARE ROW EXCLUSIVE lock will block until the load process finishes its task. For example:
10:00 user 1 issues SHARE lock
10:01 user 2 issues SHARE lock
10:03 load process issues SHARE ROW EXCLUSIVE lock (and is blocked)
10:04 user 3 issues SHARE lock (and is blocked by load's lock)
10:10 user 1 releases SHARE
10:11 user 2 releases SHARE (and unblocks loader)
10:11 loader renames tables & releases SHARE ROW EXCLUSIVE (and releases user 3)
10:11 user 3 commences queries, after being blocked for 7 minutes
However, this is really pretty kludgy. Kinlan's solution of partitioning is most likely the way to go. Add an extra column to your source tables that contains a version number, partition your data based on that version, then create views that look like your current tables that only show data that shows the current version (determined by the value of a row in a "CurrentVersion" table). Then just do your load into the table, update your CurrentVersion table, and drop the partition for the old data.
Why not add a version column? That way you can add the new rows with a different version number. Create a view against the table that specifies the current version. After the new rows are added recompile the view with the new version number. When that's done, go back and delete the old rows.
What we do in some cases is have two versions of the tables, say SalesTargets1 and SalesTargets2 (an active and inactive one.) Truncate the records from the inactive one and populate it. Since no one but you uses the inactive one, there should be no locking issues or impact on the users while it is populating. Then have view that selcts all the information from the active table (it should be named what the current table is now, say SalesTargets in my example). Then to switch to the refreshed data, all you have to do is run an alter view statement.
Have you evaluated the size of the delta (of changes).
If the number of rows that get updated (as opposed to inserted) every time you put up a new rowset it not too high, then I think you should consider importing the new set of data into a set of staging tables and do an update-where-exists and insert-where-not-exists (UPSERT) solution and just refresh your indexes (ok ok indices).
Treat it like ETL.
I'm going with an upsert method here.
I added an additional "delete" column to each of the tables.
When I begin processing the feed, I set the delete field for every record to '1'.
Then I go through a serious of updates if the record exists, or inserts if it does not. For each of those inserts/updates, the delete field is then set to zero.
At the end of the process I delete all records that still have a delete value of '1'.
Thanks everybody for your answers. I found it very interesting/educational.

Is there a more efficient method than transactions?

insert into table1 ...;
update table2 set count=count+1;
The above inserts something into table1, and if it succeeds, updates the count field of table2.
Of course this kind of thing can be handled by transactions, but transactions need to lock the table, which will be not efficient in a high concurrent system. And it can be even worse if you need to update multiple tables in that transaction.
What's your solution?
I'm using PHP, and I implement transactions this way:
mysql_query('begin');
mysql_query($statement1);
mysql_query($statement2);
...
mysql_query('commit');
So it looks like all tables referred in those $statement will be locked?
A transaction (which in context of MySQL assumes InnoDB) will not need to lock the whole table.
The INSERT will lock the individual row without gap locks.
The UPDATE will not lock any gaps too if you provide an equality or IN condition on an indexed field in the WHERE clause.
This means that with a properly indexed table, INSERTs will not block each other, while the UPDATEs will only block each other if they affect the same row.
The UPDATE will of course lock the individual row it affects, but since it is the last operation in your transaction, the lock will be lifted immediately after the operation is commited.
The locking itself is in fact required so that two concurrent updates will increment the counts sequentially.
Use InnoDB storage engine. It is row level locking instead of MyISAM which is table level locking.
Transactions will not necessarily request a lock for the whole table.
In addition, InnoDB supports various translation isolation levels using different locking strategies. You may want to check:
MySQL: SET TRANSACTION Syntax
MySQL Transactions, Part II - Transaction Isolation Levels
these looks more like a "Triggers" job to me. onInsert do something.
Transactions are great to ensure a "all or nothing" behavior -- even if there is a high load or high concurrency in your system, you shouldn't stop using transactions, at least if you need your data to stay coherent !
(And transactions will not necessarily lock the whole table(s))
If speed were of the absolute essence, I might be tempted to cache the update count in memory (using thread-safe code) and only write it back to the database periodically. I would certainly use a transaction when I did so. The downside to this is that you would need to be willing to treat the count as only approximate. If you needed it to be absolutely correct for an operation, then it would need to be recalculated, again within a transaction.
As others have pointed out, using a database that supports row-level locking and simply using an explicit (or implicit via a trigger) transaction is probably easier. It's definitely more accurate and safer.

How can I speed up an SQL table that recquires fast insert and select?

I'm doing some web crawling and inserting the result into a database. It takes about 2 seconds to scrape but a lot longer to insert. There are two tables, table one is a list of urls and an Ids, table two is a set of tagIds and siteIds.
When I add indexes to the siteIds (which are md5 hashes of the URL, I did this because it speeds up the insertion as it doesn't have to query the database for each urls id to add the site-tag pairings) the insert speed falls off a cliff after 300,000 or so pages.
Example
Table 1
hash |url |title |description
sjkjsajwoi20doi2jdo2xq2klm www.somesite.com somesite a site with info
Table2
site |tag
sjkjsajwoi20doi2jdo2xq2klm xn\zmcbmmndkd2
When I took off the indexes it went much faster and I was able to add about 25 million records in 12 hours, but searching unindexed tags is just impossible.
I'm using php and mysqli for this, I'm open to suggestions for a better way to organise this data.
Hmm, this is a bit tricky as the slow-down is due to the overhead of the database needing to update the index data structure when each record is inserted.
How are you accessing this? Using PDO for php? Using raw sql? Prepared statements?
I would also ensure if you need transactions or not, as the db could be implicitly using a transaction, and that could slow down the inserts. For atomic records (records not deleted but collected, or ones WITHOUT normalized foreign key dependent records) you don't need this.
You could also consider testing if a STORED PROCEDURE has better efficiency (the db could possibly optimize if it has a stored procedure). Then just call this stored procedure via the PDO. It is also possible that the server / install of the db has a hardware limitation, either storage (not on an ssd) or the db operations / install cannot access the full power of the cpu (low priority in the os, other large processing making the db wait for cpu cycles, etc).

CURSOR in proc that only locks current row being UPDATEd (while UPDATing) and nothing else for duration?

Please bear with me because I'm hopeless on vocabulary, and my own searches aren't going anywhere.
I've learned here that if I update in the way I want to here (with the GROUP_CONCATs also stored in user-defined vars), I'll lock the entire table.
I'm fairly certain the best way to do this would be to UPDATE row by row in a CURSOR, starting with the highest PK id then descending because I'm sure I will have a lower chance of a deadlock, or conflict, or whatever it's called (told you I was bad with the vocab) if I start from the most recent record.
The column being UPDATEd has no index.
I just found out here that a TRIGGER is a TRANSACTION by default, so I will make a proc instead and call from php.
The transaction isolation level is REPEATABLE-READ.
I'm more concerned with these overlapping UPDATEs causing deadlocks with nothing happening than them taking time to complete.
All tables InnoDB.
All SELECTs and UPDATEs are WHEREd on PK id except for the CURSOR SELECT which SELECTs all ids in the table that's being UPDATEd. No joins. No muss. No fuss.
That said, here finally are the questions:
Will the DECLARE's SELECT only SELECT once, or does that also loop (I'd prefer it to SELECT only once)?
Will the DECLARE's SELECT's lock remain for the duration of the proc (I'd prefer it to release asap)?
Will the row lock for each UPDATE release as soon as the query is finished, or will they remain for the duration of the proc (I'd prefer the row lock release after the individual UPDATE query has finshed)?
Will the SELECTs for the user variables also release after they've been set (you guessed it: I prefer those also to release asap)?
Will it be possible to still SELECT the row being UPDATEd with the row lock (again, I'd prefer if I could)?
Many thanks in advance!
Why UPDATE all on INSERT
In my website (in my profile), I allow users to access all submitted links ever. That's fine on a sequential basis, I simply reference the id.
However, I also rank them on a combined percentile of a custom algorithm on the the 3 vote types, equally weighting the 3.
The problem is that in my algorithm, each vote affects all other votes. There's no way around this because of the nature of my algorithm.
On Demand
I tried that route with PHP. No go. There's too much calculating. I'm not so concerned with users having accurate data instantly because the page will automatically update the ranks for the user with the user being non the wiser, but I can't have the user waiting forever because I allow rapid dynamic pagination.
Calculating in a view was an even greater disaster.
The DECLARE's SELECT will select only once, but I am afraid, that won't help you ...
The SELECT cursor might keep something locked from OPEN up to CLOSE - what it is depends on your storage engine: It can be table, page, row or nothing
The UPDATE will lock for as long as the update takes. Again, depending on your storage engine, the lock can be table, page or row (but obviously not nothing)
But you might consider a completely different approach: IIUC, you basically want some ranking or percentile match and you try to do this by reacting to every INSERT. This has a big problem in my POV: You calculate lots and lots of unneeded values. What would happen, if you calculate only on a reading operation?
If there is allways exactly one INSERT between two reading operations, you have to recalculate allways - same case as if you calculate allways as now.
If there are no INSERTs between two reading operations, the MySQL query cache will have kept the result of the prior calculation and will not rerun the query, so you calculate less.
If there are more than one INSERTs between two reading operations, you need to calculate only once for N inserts.
So, calculating the rank/percentile on demand will never be more expensive than calculating on INSERT, but it has the potential to be significantly cheaper.

Resources