Sunday, December 18, 2016

Managed Recovery Stopped Applying Archives on Standby Database After Enabling Apply Delay on The Primary Archive Process

I had a requirement to set the archive apply delay feature between the primary and the standby DB, where the archives will be sent immediately to the standby site but will get applied in a delay of one hour. Such feature enables you to eliminate any erroneous update get executed on the primary from being applied immediately on the standby DB,which gives the DBA some time to think whether to fix the logical corruption or failing over to the standby DB.

Back to my story, On the primary DB, I've set the delay on the archival process that shift the archives to the standby DB to 60 minutes:

SQL> alter system set log_archive_dest_6='service="testprds"','LGWR ASYNC NOAFFIRM delay=60 optional compression=enable max_failure=0 max_connections=1 reopen=300 db_unique_name="testprds" net_timeout=90','valid_for=(all_logfiles,primary_role)';

Restart the archival process: [To let the changes take effect]
SQL> alter system set log_archive_dest_state_6=DEFER;
SQL> alter system set log_archive_dest_state_6=ENABLE;

Restart the recover on the standby which was already started in nodelay mode to start it in delayed mode:
SQL> recover managed standby database cancel;
SQL> recover managed standby database parallel 16 DISCONNECT;

Now I can see in the standby alertlog archives are getting received immediately on the standby and acknowledged with 60 minutes delay for apply:

Archived Log entry 378319 added for thread 1 sequence 681301 ID 0x6a29c0d6 dest 1:
ARCs: Archive log thread 1 sequence 681301 available in 60 minute(s)

Up to here, so far so good, it works as expected, but the horror didn't start yet :-) 
After 1 hour. There are no archives applied on the standby DB.
After 2 hours. Same situation, non of archives applied since I modified the archive process with delay option ! Something went wrong.

Looks easy! Not a rocket science, stop/start the recovery process with NODELAY option and the recovery will start overriding the delay:

SQL> RECOVER MANAGED STANDBY DATABASE cancel;
Media recovery complete.

SQL> RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE NODELAY DISCONNECT;
ORA-01153: an incompatible media recovery is active

WHAT?! It was just successfully cancelled !

Hopelessly I tried the following troubleshooting techniques, but non of them work:
- Repeating same above commands didn't help. 
- No MRP process was found on OS side to kill.
- Altered the delay parameter back to 0 and restart (defer/enable) the archive process on the primary DB didn't help.
- Killing ora_arc* processes on the standby didn't help.
- Deleting the archives that were marked with 60 minutes delay on standby, and recopy those archives again from primary site did help as well.

Now, it's time to think logically. This error means that although you've cancelled the media recovery there is still another media recovery processes is running didn't get cancelled when you executed "recover managed standby database cancel" command. Is this can happen? Yes it can happen !

So, we need to look into all recovery processes and their statuses:

SQL> select PROCESS,PID,STATUS from v$managed_standby;

PROCESS PID         STATUS
-------------   ----------  ------------
RFS            1292        IDLE
RFS            1095        IDLE
MR(fg)          19509       WAIT_FOR_LOG

Yes !, it's the one showing WAIT_FOR_LOG status.

Killed it from Linux shell using its PID:

# kill -9 19509

Now, starting media recovery on standby DB with NODELAY option:

SQL> RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE NODELAY DISCONNECT;
Media recovery complete.

Phew, finally the managed recovery is running!

Someone may ask why you didn't just restart the standby DB instance and it will force clear any stale recovery process?
The answer, imagine you are using your standby DB in Active Data Guard mode to support read only/reporting applications, restarting the standby database will impact the availability of those applications.

Thursday, December 1, 2016

Configuration Baseline Collector Script For Linux OS & Oracle Databases

Have you ever performed a change on your OS or DB and later you figured out that you need to revert back to the previous setting but you struggled a lot to do so?
It happened to me before! but the good lesson I've learned was to keep a configuration baseline for all important settings and keep it somewhere, so when the time come to revert back any change, the configuration baseline will be the reference.

Creating a configuration baseline for each and every OS & database in your environment is definitely a daunting job, unless you do it through a script. In this post I'm sharing a Linux shell script to make such task a piece of cake one for you.

Download the configuration baseline script from this link:
https://www.dropbox.com/s/vf527mb46l2iivp/configuration_baseline.sh?dl=0

This shell script should run by the Oracle software owner (e.g. oracle), it will write one log for the OS configuration baseline and one log for each up and running oracle database. It can also send you the output logs via E-mail if you set MAIL_LIST="youremail@yourcompany.com" parameter to your E-mail address.

The script will do the following:
DATABASE Configuration Baseline: [For each database]
- Gather Instances & Database general info.
- Gather NON-DEFAULT Initialization Parameters.
- Gather DATABASE ENABLED FEATURES.
- Gather DATABASE FEATURES USAGE HISTORY.
- Gather DATABASE SETTINGS [Default tablespaces, Characterset, ...]
- Gather SERVICES details.
- Gather CLUSTERWARE INTERCONNECT details.
- Gather PATCHING history.
- Gather DATABASE LINKS.
- Gather DIRECTORIES info.
- Gather ACLs.
- Gather AUDIT settings.
- Gather USERS AND PROFILES details.
- Gather NUMBER OF OBJECTS in each schema.
- Gather the SIZE of each schema.
- Gather PRIVILEGED USERS details.
- Gather DATABASE PHYSICAL STRUCTURE information:
- CONTROLFILES.
- REDOLOG FILES AND GROUPS.
- TABLESPACES AND DATAFILES [+Utilization].
- ASM DISK GROUPS AND ASM FILES [+Utilization].
- FLASH RECOVERY AREA DETAILS [+Utilization].
- Gather RMAN NON-DEFAULT CONFIGURATIONS.
- Gather ACTIVE INCIDENTS information.
- Gather OUTSTANDING BUILT-IN ALERTS.
- Gather SCHEDULED JOBS details.
- Gather AUTO-TASK MAINTENANCE WINDOW details.
- Gather ADVISORS STATUS.
- Gather HARDWARE STATISTICS details.
- Gather RECYCLEBIN information.
- Gather FLASHBACK RESTORE POINTS details.
- Gather FOREIGN KEY COLUMNS HAVING NO INDEXES information.
- Gather DISABLED CONSTRAINTS details.
- Gather MONITORED INDEXES details.
- Gather COMPRESSED TABLES details.
- Gather PARTITIONED TABLES details.

OPERATING SYSTEM Configuration Baseline:
      - Gather RUNNING DATABASES & LISTENERS names.
      - Gather CLUSTERWARE CONFIGURATIONS.
      - Gather LISTENERS STATUS details.
      - Gather SERVER NAME AND OS/KERNEL VERSION information.
      - Gather BOOT CONFIGURATION.
      - Gather ORACLE FILES details:
- oratab
- listener.ora
- tnsnames.ora
- sqlnet.ora
      - Gather INSTALLED OPATCH PATCHES details.
      - Gather FILESYSTEM details.
      - Gather FILSYSTEM configurations.
- LOCAL FILESYSTEM.
- NFS SHARES.
- RAW DEVICES.
- MULTIPATH CONFIGURATIONS.
- ORACLE ASM CONFIGURATIONS.
      - Gather USERS AND GROUPS details.
      - Gather ACCOUNTS SETTINGS details.
      - Gather USERS RESOURCES LIMITS details.
      - Gather ORACLE USER CRONTAB JOBS details.
      - Gather ORACLE USER PROFILE.
      - Gather GENERIC/bashrc PROFILE.
      - Gather SECURITY CONFIGURATIONS:
- FIREWALL RULES. [hashed]
- PAM configurations.
- LOGINS default configurations.
- SELINUX configurations.
- INTRO MESSAGE.
      - Gather SERVICES CONFIGURATIONS.
      - Gather KERNEL PARAMETERS SETTINGS.
      - Gather NETWORK CONFIGURATIONS:
- GENERAL NETWORK SETTINGS.
- DNS SETTINGS.
- NICS CONFIGURATIONS.
- NICS BONDING ALIASES.
- LOCAL/ALLOWED/DENIED HOSTS SETTINGS.
      - Gather TIME AND DATE CONFIGURATIONS:
- LOCAL TIME CONFIGURATIONS.
- NTP STATUS & SETTINGS.
      - Gather LOGGING SETTINGS:
- SYSLOG SETTINGS.
- KEEP LOG SETTINGS.
- LOG ROTATE SETTINGS.
      - Gather HARDWARE INFORMATION:
- ALL ATTACHED HARDWARE.
- ATTACHED PCI DEVICES.
- CPU details.
- MEMORY details.
      - Gather INSTALLED PACKAGES information.

Please note that this shell script is tested on Linux Redhat & Oracle distributions, I didn't test it on other Linux distributions, nevertheless, you need to test it first on a test environment and use it on your own risk.
Even Oracle Support is saying the same in their scripts, I'm not that arrogant to claim that I'm smarter than Oracle:-))

This script is part of DBA BUNDLE, you can download the whole bundle from here:
http://dba-tips.blogspot.ae/2014/02/oracle-database-administration-scripts.html

Finally, your suggestions are always welcome.